Correlation

Lies, Damn Lies and Statistics

A quincunx, yesterday

Index: Click ᐅ to expand: Statistics Systems theory

Tell me more Sign up for our newsletter — or just get in touch: for ½ a weekly 🍺 you get to consult JC. Ask about it here.

Triago: If substance is family then form is the state
A contrivance, precariously stack’d
Bids yield our resilient bonds
To th’escapements of voguish clockery
To rudely declare in the interests of nation
A final victory of correlation over causation
Nuncle: But the cleverest contraption rusts
Upon immersion in snot.

—Büchstein, The Victory of Form over Substance

The idea, following from Sir Francis Galton’s experiments with a quincunx and first articulated by statistician Karl Pearson^[1] that a relationship between two variables could be characterised according to its statistical strength and expressed in numbers, regardless of any perceived causal connection between them.

If one can derive significance from a purely statistical correlation without a deeper mechanical theory of the universe that might tell us why, we are well on our way to an artificially intelligent future where robots can wipe elderly arses, all bankers are redundant (good, right?), so is everyone else (not so good?) and it is only a matter of time before Skynet becomes self-aware and starts hunting down random skater kids from the 1990s.

If.

But, in some cases you can derive a significance; in some cases you can’t^[2] but — irony upcoming — without a sophisticated theory of causality, it will be hard to tell them apart. That is to say, a bare correlation won’t tell you whether there is a causal arrow at all, much less — if there is — which way it flows.

“Correlation”^[3] ought to be a synonym for “mere coincidence”^[4] though in its more fashionable usages, especially among big data freaks, this tends to get — well — buried in the noise. There may be something profound, reflexive and ironic about this, but it’s too early in the morning to figure out out. At any rate, the more data you have the, the worse your signal, and the more chanting “correlation does not imply causation” in a sing-song voice whenever anyone cites a correlation will annoy the hell out of big data freaks — which is all the more reason to do it.

Correlation and causation

Now it is true that correlation doesn’t imply causation, but it doesn’t rule it out either. And it is easy to infer from a lack of correlation that there is no causation.

But hold your horses.

“All other things being equal, a correlation is more likely to evidence a causation than a lack of correlation”, is one of those logical canards. As Monty Python put it, “universal affirmatives can only be partially converted: all of Alma Cogan is dead, but only some of the class of dead people are Alma Cogan.”

So here’s the thing, and I am straining to avoid distracting myself onto my pet subjects of transcendent truth and causal skepticism, so bear with me:

Even if you accept some objectivist model where, whether we can know it or not, there is a true, unique, single cause for every effect — and down that rabbit hole are a bunch of consequences you really wouldn’t like, but let’s say — it follows that an event must have but one cause (or consistent matrix of causes) to the absolute exclusion of any other explanation. There cannot be alternative, mutually exclusive, causal explanations of the same event, for that would imply ghastly relativism^[5]

That is to say, for every single “true” correlation, there are multiple spurious correlations — events that serendipitously seem, by their statistical regularity, to have causal significance to each other but, in transcendent fact, don’t.

How many is “multiple”? Depends on how much data, and how much imagination, you’ve got. Seeing as portion of all data we have collected is necessarily nil, the best answer is that there are infinite spurious correlations and only one true correlation between a cause and its effect. The likelihood, without better evidence,^[6] that a given correlation is the true one is therefore 1/∞, or zero.

So it is true to say a lack of any correlation may not increase the likelihood of events being causally related, but nor, without other evidence, does the presence of one. Especially seeing as there may be some data, as yet uncollected or unnarratised, that could explain how apparently uncorrelated events are, in fact, causally related.

Where does this leave us? Here: Any correlation, in the absence of better evidence of causation, is meaningless.

“Better evidence of causation”

Glomming on to a satisfying correlation dodges the hard question, which is, “what possible better evidence of true causation — a “necesary connexion” between cause and effect — could there be?”

This is not a new conundrum. It was first posed by David Hume, in 1739 — “necessary connexion” was his phrase — and he answered it in the negative. There is no better evidence of causation.

But, fortunately for the interests of narrow-minded righteousness and determinism, Hume allegedly once met someone who was racist, so we can entirely ignore him and the quarter of a millennium of epistemology that he spurred. Plus, he was a Scot.^[7]

References

↑ So Slate Magazine argues, at any rate.
↑ There are whole websites devoted to spurious correlations. Like, well, http://www.spuriouscorrelations.com.
↑ “A mutual relationship or connection between two or more things.”
↑ “A remarkable concurrence of events or circumstances without apparent causal connection”.
↑ Not ghastly.
↑ You are right. This qualification is doing a lot of work.
↑ Disclosure for humourless libtards: deliberate irony, intended as a joke.

[1] So Slate Magazine argues, at any rate.

[2] There are whole websites devoted to spurious correlations. Like, well, http://www.spuriouscorrelations.com.

[3] “A mutual relationship or connection between two or more things.”

[4] “A remarkable concurrence of events or circumstances without apparent causal connection”.

[5] Not ghastly.

[6] You are right. This qualification is doing a lot of work.

[7] Disclosure for humourless libtards: deliberate irony, intended as a joke.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

Correlation

Correlation and causation

“Better evidence of causation”

See also

References

Navigation menu

Correlation

Correlation and causation

“Better evidence of causation”

See also

References

Navigation menu

Search