Bayesian reasoning

Lies, Damn Lies and Statistics

Index: Click ᐅ to expand:

Tell me more
Sign up for our newsletter — or just get in touch: for ½ a weekly 🍺 you get to consult JC. Ask about it here.

I could go on and on about the failings of Shakespeare ... but really I shouldn’t need to: the Bayesian priors are pretty damning. About half of the people born since 1600 have been born in the past 100 years, but it gets much worse than that. When Shakespeare wrote, almost all of Europeans were busy farming, and very few people attended university; few people were even literate—probably as low as about ten million people. By contrast, there are now upwards of a billion literate people in the Western sphere. What are the odds that the greatest writer would have been born in 1564?

—Chauncey Gardiner’s “sophomore college blog”, quoted in Michael Lewis’ Going Infinite

You ever seen the dude from FTX? The one that went to prison? That dude shouldn’t be talking about Shakespeare.

—Mike Tyson

Bayesian reasoning
beɪzˈiːən ˈpraɪə (n.)
The initial probability of an event before any new evidence is taken into account. The prior probability can be informed by the base rate — in the absence of other information it should be the same — but it can also include other sources of local information.

The “base rate” is the general prevalence of a characteristic or event within a given population. For example, if 1% of a population has a certain disease, the “base rate” of the disease is 1%.

While the base rate may also be a Bayesian prior probability it will not necessarily be, if you have additional information about the subject (for example knowledge of a genetic history of the disease in a given patient).

“Sampling” versus “inferential” probabilities

When calculating probabilities you have a “population” — the complete data set of all events, and the “sample” — those items you actually test. There is a two-way relationship between population and sample.

If I know the general characteristics of the population, I can calculate the probability that a specific sample from that population has those characteristics. If I know this bag contains 30 red marbles and 70 blue ones, I can deduce that the probability of randomly drawing a red marble is 30%. This is the “sampling” probability: do know population, don’t know sample:

“Given we sampled 50 people and found their average height was 6 foot, what can we infer about the true population mean height?”

If I don’t know the general characteristics of the population, by drawing a sample, I can infer the probability of the whole population having characteristics based on the make up of the sample. This is an “inferential” probability: don’t know population, do know sample:

“What is the likelihood that, having selected a 6-foot person from a population of 1000 people whose heights we do not know, that population's average height is five foot eight with a standard deviation of 5 inches?”

Now, to healthcare serial murderise that:

Sampling probability:

“What is the likelihood of randomly choosing a nurse from a population of fifty nurses working equal shifts at a given hospital who was on duty for ten days in a given 12-month period?”

This we can work out pretty quickly, though the assumption there nurses worked equal shifts is not reasonable: some joined or left the hospital during the 12 month period; some worked part-time, and some volunteered for extra shifts or came in at short notice to cover for absences. A nurse who worked nearby the hospital and was generally up for extra shifts would be likely to work a greater number of shifts.

Inferential probability:

“Given we observe that one nurse was present for all ten suspicious events, what is the probability that these events occurred randomly (and had nothing to do with the presence of the nurse)?”

“These events occurring randomly” is the “null hypothesis”. The alternative hypothesis is: “These events occurred because of this nurse’s presence”. But to fairly evaluate the null hypothesis, we would need a lot more information:

What was the baseline rate of “suspicious events”, whether or not the nurse was present? This would need to be measured over a longer period even than the 12-month population sample.
What even is a “suspicious event”, and how reliably were they captured: Would it include collapses and deaths with a satisfactory explanation? What is the criteria for unexplained/unexpected? What of recoveries?
How were shifts distributed across all nurses?
Were there similar events when this nurse wasn't present? Were there shifts where this nurse was present and no events occurred?

By only presenting “Nurse X was present for all these specific events”, we can’t establish the baseline rate of such events, or how unusual this pattern is, nor what other potential explanations are and how plausible they are, or whether the correlation is meaningful.

Chauncey dunk

A way to incorporate existing knowledge or beliefs about a parameter into statistical analysis.

For example, if you believe that:

All playwrights can be objectively ranked according to independent, observable criteria;
The quality of those playwrights in a given sample will be normally distributed; and
The best way of assessing the quality of dramas is by statistical analysis

Then:

You have already made several category errors, should not be talking about art, and if you are, no-one should be listening; but
If nonetheless you still are, and they still are, and you are trying to estimate the statistical likelihood of a specific Elizabethan playwright being the best in history, then your knowledge that there were vastly fewer playwrights active in the Elizabethan period than have existed in all of history until now — which is a Bayesian “prior distribution” — might help you conclude that the odds of that Elizabethan playwright really being the best are vanishingly low.

At the same time, everyone else will conclude that you have no idea about literature and a shaky grasp even of Bayesian statistics.

But JC digresses.

Bayesian statistics have, in our dystopian techno-determinist age, a lot to answer for.

In their place they can unravel surprising odds in a game of chance that human brains intuitively misapprehend — this will help should you be asked to choose wisely between goats and cars — but outside the tight swim lanes of statistical experiment, they can be easily misapplied and may get badly lost in weighing up the risks of the market, the merits of Shakespeare, our debt to distant future generations, and the prospect of onrushing apocalypse, courtesy of which, some theorists tell us, there won’t be many future generations to worry about anyway.

Goats and sportscars

The neatest illustration of how “Bayesian priors” work is the “Monty Hall” problem, named for the ghost of the gameshow Deal or No Deal:

You are a game-show contestant. The host shows you three doors and tells you: “Behind one of those doors is a Ferrari. Behind the other two are goats.^[1] You may choose one door.
Knowing you have a ⅓ chance, you choose a door at random.
Now the host theatrically opens one of the doors you didn’t choose, revealing a goat.
Two closed doors remain. She offers you the chance to reconsider your choice.
Do you stick with your original choice, switch, or does it not make a difference?

Intuition suggests it makes no difference. At the beginning, each door carries an equal probability: ⅓, After the reveal, the remaining doors still do: ½.

So, while your odds have improved, the odds remain equal for each unopened door. So, it still doesn’t matter which you choose: Right?

Wrong. The best odds are if you switch: there remains a ⅓ chance the car is behind the first door you picked; there is now a ⅔ chance the Ferrari is behind the other door. Staying put is to commit to a choice you made then the odds were worse.

We know this thanks to Bayesian inference. There are two categories of door; ones you chose, and ones you didn’t. There’s only one door in the “chosen” category and two doors in the “unchosen” category. At the start you knew each was equally likely to hold the car. This was the “prior probability”. There was a ⅓ chance per door or, if we categorise the doors, a ⅓ chance it was behind a chosen door and a ⅔ chance it was behind an unchosen door.

Then you got some updated information, but only about the “unchosen door” category: One of those doors definitely doesn’t hold the car. You have no new information about the “chosen door” category, however.

You can update your prior probability estimates about the unchosen doors. One now has a zero chance of holding the car. Therefore, it follows the other door has a ⅔ chance. All the odds of the unchosen category now sit behind its single unopened door.

Therefore you have a better chance of winning the car (though not a certainty — one time in three you’ll lose) if you switch.

A person who now arrives, with two doors remaining, who is given the choice without your prior knowledge, would calculate the probabilities at 50:50. But she is ignorant of your original choice and the decision the host made, based on your original choice (remember the door opened by the host depended on your choice: the rule was “open a door that the contestant did not choose and that does not conceal the Ferrari”). Without that information, from the newcomers perspective, the odds really are 50:50.

So you should switch doors. This proposal outrages some people, at first. Especially when explained to them at a pub, it outrages them later. But it is true.

It is easier to see if instead there are one thousand doors, not three, and after your first pick the host opens 998 of the other doors.

Here you know you were almost certainly wrong first time, so if every possible wrong answer but one is revealed to you it stands more obviously to reason that the other door which accounts for 999/1000 of the original options, is the one holding the car.

Lesson: use what you already know about history, and your place in it, to update your choices. This ought not to be such a revelation. Count cards. Update your predictions and become a “super forecaster”.

Bayesian probabilities are models

Now, all of this is well and good and unimpeachable if the conditions in which probabilities hold are present: a static, finite “sample space” — 3, 10 or 1000 doors — a finite and known number of discrete outcomes — goat or car — and a lack of intervening causes like moral (immoral?) agents who can capriciously affect the random outcomes.

It works well for carefully controlled games of chance involving flipped coins, thrown dice, randomly drawn playing cards and, of course Deal or No Deal. They are all simple systems, easily reduced to “nomological machines”

When you apply it to unbounded complex systems involving, well, people, it works less well.

The doomsday problem

Bayesian probabilities, if misused, can lead statistics professors to the a priori deduction that we are all screwed.

A priori
(adj.)
Following logically from existing premises. Necessarily so. Not dependent on observation or falsifiable evidence.

Where it is not possible to gather the necessary evidence, philosophers have a weakness for a priori arguments. They are prevalent in metaphysical enquiries: Pascal ’s wager, cogito, ergo sum, the argument from design. Any argument based purely on probabilities is a priori: the general principle is extrapolated to predict a factual answer. A specific If you find yourself at or near the beginning of something, such as Civilisation, a bayesian model will tell you it will almost certainly end soon.

It works on elementary probability and can be illustrated simply.

Imagine there are two opaque barrels. One contains ten pool balls and the other contains ten thousand, in each case sequentially numbered from 1. You cannot tell which barrel is which.

A magician draws a ball with a seven on it from one barrel.

What are the odds that this came from the barrel with just ten balls?

Naive probability says that since both barrels contain a 7 ball, it is 50:50. Bayesian probability takes the additional fact we know about each barrel: the odds of drawing a seven from one barrel is 1 in 10, and from the other is 1 in 10,000, and concludes it is 1,000 times more likely that the 7 came from the barrel with just ten balls.

The proof of this intuition is if you drew ball 235, there would be no chance it came from the ten-ball barrel.

This logical reasoning is, obviously, sound. The same logic behind the “three door choice problem”

How do we get from this to the imminence of the apocalypse?

Well, the start of your life is, across the cosmic stretch of human existence, like a random draw with a sequentially numbered birth year on each ball.

Now imagine an array of a million hypothetical barrels containing balls engraved with sequentially numbered years, beginning at the dawn of civilisation which, for arguments sake, we shall call the start of te Christian era .

The first barrel had just one ball, with 0 on it — the next has two: 0 and -3299, and so on, up to one million years after the fall of Troy.

Let's say your birth year was the 6001st after Troy. What are the odds that your birthday would be drawn at random from each of the million barrels? We know the odds for the first 6,000: zero. None of them have a ball 6001. Across the remaining 994,000 the probabilies fall from 1/6001 to 1/1,000,000. Using the same principle as above we can see that the probability is clustered somewhere nearer the “short end” (near 6001) than the “long end” (1,000,000).

If we assume your birthdate is drawn randomly from all the birthdates available to you then this sort of implies everything is likely to go pltetas arriba sooner rather than later.

This is rather like a malign inversion of the Lindy effect.

Assessing the probability that your ball came from a given barrel is somewhat complicated but clearly we can rule out barrels 1-6,000, andthe higher your birth year, the more probability there is that it resides in a higher barrel.

Bayesian reasoning

“Sampling” versus “inferential” probabilities

Chauncey dunk

Goats and sportscars

Bayesian probabilities are models

The doomsday problem

See also

Navigation menu

Bayesian reasoning

“Sampling” versus “inferential” probabilities

Chauncey dunk

Goats and sportscars

Bayesian probabilities are models

The doomsday problem

See also

Navigation menu

Search