Bayesian reasoning

Revision as of 10:15, 4 August 2024 by Amwelladmin (talk | contribs)
Lies, Damn Lies and Statistics
Index: Click to expand:
Tell me more
Sign up for our newsletter — or just get in touch: for ½ a weekly 🍺 you get to consult JC. Ask about it here.

I could go on and on about the failings of Shakespeare ... but really I shouldn’t need to: the Bayesian priors are pretty damning. About half of the people born since 1600 have been born in the past 100 years, but it gets much worse than that. When Shakespeare wrote, almost all of Europeans were busy farming, and very few people attended university; few people were even literate—probably as low as about ten million people. By contrast, there are now upwards of a billion literate people in the Western sphere. What are the odds that the greatest writer would have been born in 1564?

Chauncey Gardiner’s “sophomore college blog”, quoted in Michael Lewis’ Going Infinite

You ever seen the dude from FTX? The one that went to prison? That dude shouldn’t be talking about Shakespeare.

—Mike Tyson

Bayesian reasoning
beɪzˈiːən ˈpraɪə (n.)
The initial probability of an event before any new evidence is taken into account. The prior probability can be informed by the base rate — in the absence of other information it should be the same — but it can also include other sources of local information.

The “base rate” is the general prevalence of a characteristic or event within a given population. For example, if 1% of a population has a certain disease, the “base rate” of the disease is 1%.

While the base rate may also be a Bayesian prior probability it will not necessarily be, if you have additional information about the subject (for example knowledge of a genetic history of the disease in a given patient).

“Sampling” versus “inferential” probabilities

When calculating probabilities you have a “population” set of data — the complete data set about which you want to draw a conclusion, and the “sample” — those items you actually test. There is a two-way relationship between sample and population. If we know the population but have not yet taken a sample, we can caculate the probability of a specific sample from it: this is the “sampling” probability; if we have drawn a sample but do not yet know the makeup of the population, we can calculate the probability of the whole population having a certain characteristic based on the sample. This is an “inferential” probability.

Sampling probability: “What is the likelihood of choosing a 6-foot person from a population of 1000 people whose average height is five foot eight with a standard deviation of 5 inches?”

Inferential probability: “What is the likelihood that, having selected a 6-foot person from a population of 1000 people, the population's average height is five foot eight with a standard deviation of 5 inches?”

Now, to healthcare serial murderise that:

Sampling probability: “What is the likelihood of randomly selecting a nurse from a population of fifty nurses at a given hospital who was on duty for ten suspicious events at the hospital in a given 12-month period?”

Inferential probability: “What is the likelihood that, given there are ten suspicious events at a hospital in a 12-month period, one nurse out of fifty was on duty for all of them?”

Chauncey dunk

A way to incorporate existing knowledge or beliefs about a parameter into statistical analysis.

For example, if you believe that:

  1. All playwrights can be objectively ranked according to independent, observable criteria;
  2. The quality of those playwrights in a given sample will be normally distributed; and
  3. The best way of assessing the quality of dramas is by statistical analysis

Then:

  1. You have already made several category errors, should not be talking about art, and if you are, no-one should be listening; but
  2. If nonetheless you still are, and they still are, and you are trying to estimate the statistical likelihood of a specific Elizabethan playwright being the best in history, then your knowledge that there were vastly fewer playwrights active in the Elizabethan period than have existed in all of history until now — which is a Bayesian “prior distribution” — might help you conclude that the odds of that Elizabethan playwright really being the best are vanishingly low.

At the same time, everyone else will conclude that you have no idea about literature and a shaky grasp even of Bayesian statistics.

But JC digresses.

Bayesian statistics have, in our dystopian techno-determinist age, a lot to answer for.

In their place they can unravel surprising odds in a game of chance that human brains intuitively misapprehend — this will help should you be asked to choose wisely between goats and cars — but outside the tight swim lanes of statistical experiment, they can be easily misapplied and may get badly lost in weighing up the risks of the market, the merits of Shakespeare, our debt to distant future generations, and the prospect of onrushing apocalypse, courtesy of which, some theorists tell us, there won’t be many future generations to worry about anyway.

Goats and sportscars

The neatest illustration of how “Bayesian priors” work is the “Monty Hall” problem, named for the ghost of the gameshow Deal or No Deal:

A game show contestant is asked to choose a prize from behind one of three doors. She is told one door conceals s a sports car and the other two goats. [Why goats? — Ed]

When the contestant has chosen, the host theatrically opens one of doors she did not choose, to reveal a goat.

“Knowing what you know now, would you reconsider?”

If you have not seen it before, intuitively you may say, well, at the beginning each door carried an equal probability — 1/3 — and the remaining doors still do after the reveal — 1/2 — so while the player’s odds have improved, either choice remains even. It diesn’t matter whether she sticks or twists, so she should be indifferent.

Bayesian probability theory shows this intuition to be wrong.

Staying put is to commit to choice you made then the odds were worse. So its odds remain the same. You have no more information about your original choice: you already knew it may or may not contain the car. You do, however, know something new about one of the doors you didn’t choose. The odds as between the other two doors change, from 1/3 each to 0/3 for the open door — it definitely doesnʼt hold the car — and 2/3 for the closed one, which still might.

The probabilities for the remaining options are therefore 1/3, for your original choice, and 2/3 for the other remaining door.

Oddly, a new person who now arrives and is presented the choice without that prior information, would calculate the probability at 50:50. The probabilities are a calculation based upon what you know. The calculation would be wrong because an important assumption in calculating probabilities - that the car and goat were randomly, normally distributed between two doors - is wrong. A third door has been unrandomly eliminated.

So you should switch doors. You exchange a 1/3 chance of being right for a 1/3 risk of being wrong. This proposal outrages some people, at first. Apparently, even statisticians. But it is true.

It is easier to see if instead there are one thousand doors, not three, and after your first pick the host opens 998 of the other doors.

Here you know you were almost certainly wrong first time, so if every possible wrong answer but one is revealed to you it stands more obviously to reason that the other door which accounts for 999/1000 of the original options, is the one holding the car.

Lesson: use what you already know about history, and your place in it, to update your choices. This ought not to be such a revelation. Count cards. Update your predictions and become a “super forecaster”.

Bayesian probabilities are models

Now, all of this is well and good and unimpeachable if the conditions in which probabilities hold are present: a static, finite “sample space” — 3, 10 or 1000 doors — a finite and known number of discrete outcomes — goat or car — and a lack of intervening causes like moral (immoral?) agents who can capriciously affect the random outcomes.

It works well for carefully controlled games of chance involving flipped coins, thrown dice, randomly drawn playing cards and, of course Deal or No Deal. They are all simple systems, easily reduced to “nomological machines

When you apply it to unbounded complex systems involving, well, people, it works less well.

The doomsday problem

Bayesian probabilities, if misused, can lead statistics professors to the a priori deduction that we are all screwed.

A priori
(adj.)
Following logically from existing premises. Necessarily so. Not dependent on observation or falsifiable evidence.

Where it is not possible to gather the necessary evidence, philosophers have a weakness for a priori arguments. They are prevalent in metaphysical enquiries: Pascal ’s wager, cogito, ergo sum, the argument from design. Any argument based purely on probabilities is a priori: the general principle is extrapolated to predict a factual answer. A specific If you find yourself at or near the beginning of something, such as Civilisation, a bayesian model will tell you it will almost certainly end soon.

It works on elementary probability and can be illustrated simply.

Imagine there are two opaque barrels. One contains ten pool balls and the other contains ten thousand, in each case sequentially numbered from 1. You cannot tell which barrel is which.

A magician draws a ball with a seven on it from one barrel.

What are the odds that this came from the barrel with just ten balls?

Naive probability says that since both barrels contain a 7 ball, it is 50:50. Bayesian probability takes the additional fact we know about each barrel: the odds of drawing a seven from one barrel is 1 in 10, and from the other is 1 in 10,000, and concludes it is 1,000 times more likely that the 7 came from the barrel with just ten balls.

The proof of this intuition is if you drew ball 235, there would be no chance it came from the ten-ball barrel.

This logical reasoning is, obviously, sound. The same logic behind the “three door choice problem

How do we get from this to the imminence of the apocalypse?

Well, the start of your life is, across the cosmic stretch of human existence, like a random draw with a sequentially numbered birth year on each ball.

Now imagine an array of a million hypothetical barrels containing balls engraved with sequentially numbered years, beginning at the dawn of civilisation which, for arguments sake, we shall call the start of te Christian era .

The first barrel had just one ball, with 0 on it — the next has two: 0 and -3299, and so on, up to one million years after the fall of Troy.

Let's say your birth year was the 6001st after Troy. What are the odds that your birthday would be drawn at random from each of the million barrels? We know the odds for the first 6,000: zero. None of them have a ball 6001. Across the remaining 994,000 the probabilies fall from 1/6001 to 1/1,000,000. Using the same principle as above we can see that the probability is clustered somewhere nearer the “short end” (near 6001) than the “long end” (1,000,000).

If we assume your birthdate is drawn randomly from all the birthdates available to you then this sort of implies everything is likely to go pltetas arriba sooner rather than later.

This is rather like a malign inversion of the Lindy effect.


Assessing the probability that your ball came from a given barrel is somewhat complicated but clearly we can rule out barrels 1-6,000, andthe higher your birth year, the more probability there is that it resides in a higher barrel.

See also