Big data: Difference between revisions

From The Jolly Contrarian
Jump to navigation Jump to search
No edit summary
Tags: Mobile edit Mobile web edit
1
Tags: Mobile edit Mobile web edit
Line 16: Line 16:
Not only (per our careful argument at [[signal-to-noise-ratio]]) is the overall quantity of data we have skewed in time and practically nil in quantity, the quality data in of our tawdry collection is poor. And not just in its profusion of cat videos and hot takes on Twitter, either. For, as the evolutionary record, it contains all the errors and the one successful trial; all the abandoned drafts, all the false starts, all the typos, [[split infinitive]]s, tendentious arguments, feeble caveats and needless [[for the avoidance of doubt|avoidances of doubt]]. The data we have, that is, even on our own rationalised terms, mainly noise.
Not only (per our careful argument at [[signal-to-noise-ratio]]) is the overall quantity of data we have skewed in time and practically nil in quantity, the quality data in of our tawdry collection is poor. And not just in its profusion of cat videos and hot takes on Twitter, either. For, as the evolutionary record, it contains all the errors and the one successful trial; all the abandoned drafts, all the false starts, all the typos, [[split infinitive]]s, tendentious arguments, feeble caveats and needless [[for the avoidance of doubt|avoidances of doubt]]. The data we have, that is, even on our own rationalised terms, mainly noise.


And even so, we self-select to eschew ''good'' noise, as being worthless: scientific journals do not publish accounts of research programmes that turn out badly, and of those which turn out well, little mention remains of the failed hypotheses leading up to the Eureka moment. At the same time,  trials who needs to know about all the experiments that didn’t work, even though that is good noise, with real information content: do not try this again)
Qqq
 
===Noise===
===Noise===
There are two kinds of noise. There is ''good'' noise: random, valid information that is just not the information you are looking for, but is broadcast on the same frequency (background radiation, cosmic chatter, cocktail party [[hubbub]] — hundreds of other perfectly meaningful exchanges; just not the one you are interested in) and there is ''bad'' noise: errors, mistakes, hot takes on Twitter and so on.
There are two kinds of noise. There is ''good'' noise: random, valid information that is just not the information you are looking for, but is broadcast on the same frequency (background radiation, cosmic chatter, cocktail party [[hubbub]] — hundreds of other perfectly meaningful exchanges; just not the one you are interested in) and there is ''bad'' noise: errors, mistakes, hot takes on Twitter and so on.

Revision as of 07:28, 8 November 2022

The JC’s amateur guide to systems theory
Index: Click to expand:
Tell me more
Sign up for our newsletter — or just get in touch: for ½ a weekly 🍺 you get to consult JC. Ask about it here.

As at the time of its analysis, all data is from the past.

Roger Martin

Turkey: “I have transformed myself into a data-driven business. All my data — and I’ve got reams of the stuff — tells me that every morning I shall be fed at 9 am on the dot. Aha! Here comes the farmer, right on time! I wonder if I’ll get a special treat because it is Christmas!”

Charlotte (spinning web): Ummm


Beware an over-commitment to data analytics:

It is premium mediocre

Firstly it expresses a preference for the aggregate over the specific, and the average over the outlier, the individual, the unique or extraordinary. It is to prefer the mediocre, for its weight of numbers, over the isolated vision of a genius or the depravity of the ugliest man.

As surely as the ugliest man killed God, so did data kill the superman. The will to power is defeated by the million-strong dull blades of the will to entropy. It is the will to premium mediocre.

It is historical

All data is from the past, as Roger Martin has it.

It is bad

Not only (per our careful argument at signal-to-noise-ratio) is the overall quantity of data we have skewed in time and practically nil in quantity, the quality data in of our tawdry collection is poor. And not just in its profusion of cat videos and hot takes on Twitter, either. For, as the evolutionary record, it contains all the errors and the one successful trial; all the abandoned drafts, all the false starts, all the typos, split infinitives, tendentious arguments, feeble caveats and needless avoidances of doubt. The data we have, that is, even on our own rationalised terms, mainly noise.

Qqq

Noise

There are two kinds of noise. There is good noise: random, valid information that is just not the information you are looking for, but is broadcast on the same frequency (background radiation, cosmic chatter, cocktail party hubbub — hundreds of other perfectly meaningful exchanges; just not the one you are interested in) and there is bad noise: errors, mistakes, hot takes on Twitter and so on.

It is illiberal

Second, in its reductionism, in its funnelling of a dispersed population into an essential homogeneity, it speaks to the underlying belief in a grand unifying theory of everything: a transcendent truth. This, in the JC’s view, is a profoundly illiberal idea: to be unable to accommodate pluralism is to deny of pluralism.

It is noisy

Thirdly, to embrace all the data you can find is to degrade the signal-to-noise ratio. Even if you buy into the incoherent reductionist idea that the “signal” is some kind of transcendent truth, by industrialising your data, you risk burying it and if you don’t — if like we pluralists you see any signal as not just a suitable narrative for your present purposes, the more data you gather, the more possible narratives — conflicting narratives; incommensurable narratives — you will have. Now this is, for a pluralist, is a good thing: every narrative is a tool in your workshop, the more you have the better you are equipped to deal with the unknown unknowns our complex world will surely throw at us — but that tends not to be what big data disciples are after.

It is not a universal affirmative

Even if, from pure data, you could establish the causal relationship between data you have observed and an event that drives it (it is axiomatic that you can’t, by the way: you can only derive a correlation, and we know how spurious those can be) you still can’t conclude that the cause propelling the general is the same one that compelled any particular.

Averages are crappy things to aspire to, or configure your business to, for a number of reasons .

Because the machinations of statistics can, in certain contexts, inflame the passions of the righteous, the JC has devised the parable of the squirrels to tease this out.

See also