Bayesian reasoning

From The Jolly Contrarian
Jump to navigation Jump to search
Lies, Damn Lies and Statistics
Index: Click to expand:
Tell me more
Sign up for our newsletter — or just get in touch: for ½ a weekly 🍺 you get to consult JC. Ask about it here.

I could go on and on about the failings of Shakespeare ... but really I shouldn’t need to: the Bayesian priors are pretty damning. About half of the people born since 1600 have been born in the past 100 years, but it gets much worse than that. When Shakespeare wrote, almost all of Europeans were busy farming, and very few people attended university; few people were even literate—probably as low as about ten million people. By contrast, there are now upwards of a billion literate people in the Western sphere. What are the odds that the greatest writer would have been born in 1564?

Chauncey Gardiner’s “sophomore college blog”, quoted in Michael Lewis’ Going Infinite

You ever seen the dude from FTX? The one that went to prison? That dude shouldn’t be talking about Shakespeare.

—Mike Tyson

Bayesian reasoning
beɪzˈiːən ˈpraɪə (n.)
The initial probability of an event before any new evidence is taken into account. The prior probability can be informed by the base rate — in the absence of other information it should be the same — but it can also include other sources of local information.

The “base rate” is the general prevalence of a characteristic or event within a given population. For example, if 1% of a population has a certain disease, the “base rate” of the disease is 1%.

While the base rate may also be a Bayesian prior probability it will not necessarily be, if you have additional information about the subject (for example knowledge of a genetic history of the disease in a given patient).

“Sampling” versus “inferential” probabilities

When calculating probabilities you have a “population” set of data — the complete data set about which you want to draw a conclusion, and the “sample” — those items you actually test. There is a two-way relationship between sample and population. If we know the population but have not yet taken a sample, we can caculate the probability of a specific sample from it: this is the “sampling” probability; if we have drawn a sample but do not yet know the makeup of the population, we can calculate the probability of the whole population having a certain characteristic based on the sample. This is an “inferential” probability.

Sampling probability: “What is the likelihood of choosing a 6-foot person from a population of 1000 people whose average height is five foot eight with a standard deviation of 5 inches?”

Inferential probability: “What is the likelihood that, having selected a 6-foot person from a population of 1000 people, the population's average height is five foot eight with a standard deviation of 5 inches?”

Now, to healthcare serial murderise that:

Sampling probability: “What is the likelihood of randomly selecting a nurse from a population of fifty nurses at a given hospital who was on duty for ten suspicious events at the hospital in a given 12-month period?”

Inferential probability: “What is the likelihood that, given there are ten suspicious events at a hospital in a 12-month period, one nurse out of fifty was on duty for all of them?”

Chauncey dunk

A way to incorporate existing knowledge or beliefs about a parameter into statistical analysis.

For example, if you believe that:

  1. All playwrights can be objectively ranked according to independent, observable criteria;
  2. The quality of those playwrights in a given sample will be normally distributed; and
  3. The best way of assessing the quality of dramas is by statistical analysis

Then:

  1. You have already made several category errors, should not be talking about art, and if you are, no-one should be listening; but
  2. If nonetheless you still are, and they still are, and you are trying to estimate the statistical likelihood of a specific Elizabethan playwright being the best in history, then your knowledge that there were vastly fewer playwrights active in the Elizabethan period than have existed in all of history until now — which is a Bayesian “prior distribution” — might help you conclude that the odds of that Elizabethan playwright really being the best are vanishingly low.

At the same time, everyone else will conclude that you have no idea about literature and a shaky grasp even of Bayesian statistics.

But JC digresses.

Bayesian statistics have, in our dystopian techno-determinist age, a lot to answer for.

In their place they can unravel surprising odds in a game of chance that human brains intuitively misapprehend — this will help should you be asked to choose wisely between goats and cars — but outside the tight swim lanes of statistical experiment, they can be easily misapplied and may get badly lost in weighing up the risks of the market, the merits of Shakespeare, our debt to distant future generations, and the prospect of onrushing apocalypse, courtesy of which, some theorists tell us, there won’t be many future generations to worry about anyway.

Goats and sportscars

The neatest illustration of how “Bayesian priors” work is the “Monty Hall” problem, named for the ghost of the gameshow Deal or No Deal:

You are a game-show contestant. The host shows you three doors and tells you: “Behind one of those doors is a Ferrari. Behind the other two are goats.[1] You may choose one door.

Knowing you have a ⅓ chance, you choose a door at random.

Now the host theatrically opens one of the doors you didn’t choose, revealing a goat.

Two closed doors remain. She offers you the chance to reconsider your choice.

Do you stick with your original choice, switch, or does it not make a difference?

Intuition suggests it makes no difference. At the beginning, each door carries an equal probability: ⅓, After the reveal, the remaining doors still do: ½.

So, while your odds have improved, the odds remain equal for each unopened door. So, it still doesn’t matter which you choose: Right?

Wrong. The best odds are if you switch: there remains a ⅓ chance the car is behind the first door you picked; there is now a ⅔ chance the Ferrari is behind the other door. Staying put is to commit to a choice you made then the odds were worse.

We know this thanks to Bayesian inference. There are two categories of door; ones you chose, and ones you didn’t. There’s only one door in the “chosen” category and two doors in the “unchosen” category. At the start you knew each was equally likely to hold the car. This was the “prior probability”. There was a ⅓ chance per door or, if we categorise the doors, a ⅓ chance it was behind a chosen door and a ⅔ chance it was behind an unchosen door.

Then you got some updated information, but only about the “unchosen door” category: One of those doors definitely doesn’t hold the car. You have no new information about the “chosen door” category, however.

You can update your prior probability estimates about the unchosen doors. One now has a zero chance of holding the car. Therefore, it follows the other door has a ⅔ chance. All the odds of the unchosen category now sit behind its single unopened door.

Therefore you have a better chance of winning the car (though not a certainty — one time in three you’ll lose) if you switch.

A person who now arrives, with two doors remaining, who is given the choice without your prior knowledge, would calculate the probabilities at 50:50. But she is ignorant of your original choice and the decision the host made, based on your original choice (remember the door opened by the host depended on your choice: the rule was “open a door that the contestant did not choose and that does not conceal the Ferrari”). Without that information, from the newcomers perspective, the odds really are 50:50.

So you should switch doors. This proposal outrages some people, at first. Especially when explained to them at a pub, it outrages them later. But it is true.

It is easier to see if instead there are one thousand doors, not three, and after your first pick the host opens 998 of the other doors.

Here you know you were almost certainly wrong first time, so if every possible wrong answer but one is revealed to you it stands more obviously to reason that the other door which accounts for 999/1000 of the original options, is the one holding the car.

Lesson: use what you already know about history, and your place in it, to update your choices. This ought not to be such a revelation. Count cards. Update your predictions and become a “super forecaster”.

Bayesian probabilities are models

Now, all of this is well and good and unimpeachable if the conditions in which probabilities hold are present: a static, finite “sample space” — 3, 10 or 1000 doors — a finite and known number of discrete outcomes — goat or car — and a lack of intervening causes like moral (immoral?) agents who can capriciously affect the random outcomes.

It works well for carefully controlled games of chance involving flipped coins, thrown dice, randomly drawn playing cards and, of course Deal or No Deal. They are all simple systems, easily reduced to “nomological machines

When you apply it to unbounded complex systems involving, well, people, it works less well.

The doomsday problem

Bayesian probabilities, if misused, can lead statistics professors to the a priori deduction that we are all screwed.

A priori
(adj.)
Following logically from existing premises. Necessarily so. Not dependent on observation or falsifiable evidence.

Where it is not possible to gather the necessary evidence, philosophers have a weakness for a priori arguments. They are prevalent in metaphysical enquiries: Pascal ’s wager, cogito, ergo sum, the argument from design. Any argument based purely on probabilities is a priori: the general principle is extrapolated to predict a factual answer. A specific If you find yourself at or near the beginning of something, such as Civilisation, a bayesian model will tell you it will almost certainly end soon.

It works on elementary probability and can be illustrated simply.

Imagine there are two opaque barrels. One contains ten pool balls and the other contains ten thousand, in each case sequentially numbered from 1. You cannot tell which barrel is which.

A magician draws a ball with a seven on it from one barrel.

What are the odds that this came from the barrel with just ten balls?

Naive probability says that since both barrels contain a 7 ball, it is 50:50. Bayesian probability takes the additional fact we know about each barrel: the odds of drawing a seven from one barrel is 1 in 10, and from the other is 1 in 10,000, and concludes it is 1,000 times more likely that the 7 came from the barrel with just ten balls.

The proof of this intuition is if you drew ball 235, there would be no chance it came from the ten-ball barrel.

This logical reasoning is, obviously, sound. The same logic behind the “three door choice problem

How do we get from this to the imminence of the apocalypse?

Well, the start of your life is, across the cosmic stretch of human existence, like a random draw with a sequentially numbered birth year on each ball.

Now imagine an array of a million hypothetical barrels containing balls engraved with sequentially numbered years, beginning at the dawn of civilisation which, for arguments sake, we shall call the start of te Christian era .

The first barrel had just one ball, with 0 on it — the next has two: 0 and -3299, and so on, up to one million years after the fall of Troy.

Let's say your birth year was the 6001st after Troy. What are the odds that your birthday would be drawn at random from each of the million barrels? We know the odds for the first 6,000: zero. None of them have a ball 6001. Across the remaining 994,000 the probabilies fall from 1/6001 to 1/1,000,000. Using the same principle as above we can see that the probability is clustered somewhere nearer the “short end” (near 6001) than the “long end” (1,000,000).

If we assume your birthdate is drawn randomly from all the birthdates available to you then this sort of implies everything is likely to go pltetas arriba sooner rather than later.

This is rather like a malign inversion of the Lindy effect.


Assessing the probability that your ball came from a given barrel is somewhat complicated but clearly we can rule out barrels 1-6,000, andthe higher your birth year, the more probability there is that it resides in a higher barrel.

See also

  1. Why goats? — Ed