Big data: Difference between revisions
Amwelladmin (talk | contribs) No edit summary |
Amwelladmin (talk | contribs) No edit summary Tags: Mobile edit Mobile web edit |
||
(22 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
{{a| | {{a|systems|}}{{quote|As at the time of its analysis, [[all data is from the past]]. | ||
:—[[Roger Martin]]}} | |||
{{Quote|''Turkey'': “I have transformed myself into a data-driven business. All my data — and I’ve got reams of the stuff — tells me that every morning I shall be fed at 9 am on the dot. Aha! Here comes the farmer, right on time! I wonder if I’ll get a special treat because it is Christmas!” | |||
''Charlotte (spinning web)'': Ummm}} | |||
Firstly it expresses a preference for the aggregate and the | {{Quote|The final triumph of [[correlation]] over [[causation]]. | ||
: — {{buchstein}} (''attrib.'')}} | |||
Beware an over-commitment to [[data]] analytics: | |||
===It is [[premium mediocre]]=== | |||
Firstly it expresses a preference for the ''aggregate'' over the specific, and the ''average'' over the outlier, the individual, the unique or extraordinary. It is to prefer the mediocre, for its weight of numbers, over the isolated vision of a genius or the depravity of the ugliest man. | |||
As surely as [[ugliest man|the ugliest man]] killed God, so did data kill the [[superman]]. The ''will to power'' is defeated by the million-strong dull blades of the ''[[will to entropy]]''. It is the ''will to [[premium mediocre]]''. | |||
===It is historical=== | |||
All data is from the past, as Roger Martin has it. That means all data is skewed in time | |||
===It is bad=== | |||
Not only (per our careful argument at [[signal-to-noise ratio]]) is the overall quantity of data we have skewed in ''time'' (all from the past, none from the future), ''place'' (only what we’ve been looking at, none of what we haven’t) and practically ''nil'' in quantity, the ''quality'' of data in our tawdry collection is ''poor''. And not just in its profusion of cat videos and [[hot takes on Twitter]], either. For, as an evolutionary record, it contains ''all'' the errors and the one successful trial; all the abandoned drafts, all the false starts, all the typos, [[split infinitive]]s, tendentious arguments, feeble caveats and needless [[for the avoidance of doubt|avoidances of doubt]]. The data we have, that is, even on our own rationalised terms, mainly noise. ''Bad'' noise. | |||
===Noise=== | |||
There are two kinds of noise. | |||
There is ''good'' noise: random, valid information that is just not the information you are looking for, but is broadcast on the same frequency (background radiation, cosmic chatter, cocktail party [[hubbub]] — hundreds of other perfectly meaningful exchanges; just not the one you are interested in) and there is ''bad'' noise: errors, mistakes, [[hot takes on Twitter]] and so on. | |||
And we ''self-select'' for bad noise, eschewing ''good'' noise as being worthless: scientific journals do not publish accounts of research programmes that turn out badly. Little evidence remains in the [[fossil record]] of the failed experiments, unsuccessful iterations and occasions where, under laboratory conditions, the data did not do what the experimenter hoped they would, for ''no-one publicises non-results''. Even for the ''good'' experiments, the published research that it makes it past peer review is filtered and cleansed of all those false starts and misconceptions that led to the Eureka moment. Mean time, the hot takes on Twitter, conspiracy theories and cat videos continue to mushroom. | |||
===It is illiberal=== | |||
Second, in its [[reductionism]], in its funnelling of a dispersed population of ideas into an essential homogeneity, it speaks to the underlying belief in a [[grand unifying theory]] of everything: a simple set of organising rules, based upon a transcendent ''truth''. If so, one is justified in suppressing any description which is at variance with the one true path, purely on grounds of ''efficiency''. Why waste time and energy, and divert our people from the chosen path, by humouring false explanations? This, in the [[JC]]’s view, is a profoundly illiberal idea: to be unable to accommodate pluralism is to ''deny'' of pluralism. | |||
It may be “true” that the richness of the universe boils down to a single simple [[algorithm]] — perhaps not [[Conway’s Game of Life]], but maybe something winsomely similar — but if so, it is also true that we are in and of and ''part'' of that grand machine. Our trajectory through universal design-space is just as ineffably preordained — we are but a deterministic subroutine — which means we cannot control, change or even know, what we do not know: we are as assuredly in the hands of cruel mechanical fate, as wanton boys are to the gods: either we will attain certain knowledge of that algorithm, and wake up into a glorious Singularity of cosmic consciousness — or we won’t, but either way there's nothing to be done — so we might as well enjoy the illusion that there ''is'' control. And if we do, against expectation, turn out to have control, then we get to tell our stories —pluralistically plural — if we didn’t, well, no harm trying: we weren’t to know any better, we were doomed to do it anyway. | |||
We can see here that [[reductionism]] is not just illiberal but ''nosy'': if you are right, why are you even ''having'' this argument? (except that you can't help yourself) — why do you ''care''? You’ve won anyway. What difference does it make, either way, who is right and who is wrong? This is a bad, fatalistic, negative, zero-sum disposition and, since you’re bothering to argue about it, it sounds like you don’t even buy it yourself. | |||
You can’t have it both ways. You are either strapped to your rail, a chimpanzee in a rocket ship, in which case shut up already, or you aren't. | |||
But while you move through ''It’s A Small World'', bound and gagged on your rails of destiny, let me sing. | |||
{{Quote| | |||
Anyone who believes in conspiracy theories has obviously never tried to organise a surprise party. | |||
:—Anon}} | |||
Doesn’t the very idea that all this freedom, variability and randomness that we apprehend throughout all of creation is an illusion — seem rather ''neat''? | |||
There are plenty of things — most things — in creation that seem deterministic and play no such tricks on us: the way a shadow falls and light reflects, as you pass under streetlights. All the causal regularities of the physical world that allow ships to sail, planes to fly, and satellites to orbit the world, and that deny sperm whales and petunias to materialise in orbital space. Most of the world really seems as deterministic ii apparently is. It would certainly be odd if something that wasn’t deterministic nonetheless behaved as if it were. Wouldn't that be odd? But is it any odder that something that ''is'' deterministic behaves as if it is ''not''? And does this by the dint of the same regularity that casts shadows and propels cellular mitosis? | |||
What an exceedingly clever trick to trace every possible voluntary movement to make it feel willed, when in fact every molecule is strung upon a causal wire. | |||
Isn't that the greatest conspiracy that there ever was? Isn't that is fantastical as God? | |||
Does not [[Occam’s razor]] come down on the pragmatists side? | |||
===It is noisy=== | |||
Thirdly, to embrace all the data you can find is to degrade the [[signal-to-noise ratio]]. Even if you buy into the incoherent [[reductionist]] idea that the “signal” is some kind of transcendent truth, by industrialising your data, you risk burying it and if you don’t — if like we pluralists you see ''any'' signal as not just a suitable narrative for your present purposes, the more data you gather, the more possible narratives — conflicting narratives; [[incommensurable]] narratives — you will have. Now this is, for a pluralist, is a good thing: every narrative is a tool in your workshop, the more you have the better you are equipped to deal with the [[unknown unknown]]s our [[complex]] world will surely throw at us — but that tends ''not'' to be what big data disciples are after. | |||
===It is not a [[universal affirmative]]=== | |||
Even if, from pure data, you ''could'' establish the causal relationship between data you have observed and an event that drives it (it is axiomatic that you ''can’t'', by the way: you can only derive a [[correlation]], and we know how spurious ''those'' can be) you still can’t conclude that the cause propelling the ''general'' is the same one that compelled any ''particular''. | |||
[[Averages]] are crappy things to aspire to, or configure your business to, for a number of reasons. | |||
Because the machinations of statistics can, in certain contexts, inflame the passions of the righteous, the JC has devised the [[parable of the squirrels]] to tease this out. | |||
{{sa}} | {{sa}} | ||
*[[ | *[[Signal-to-noise ratio]] | ||
*[[In God we trust, all others must bring data]] | *[[In God we trust, all others must bring data]] |
Latest revision as of 06:12, 24 April 2023
The JC’s amateur guide to systems theory™
|
As at the time of its analysis, all data is from the past.
Turkey: “I have transformed myself into a data-driven business. All my data — and I’ve got reams of the stuff — tells me that every morning I shall be fed at 9 am on the dot. Aha! Here comes the farmer, right on time! I wonder if I’ll get a special treat because it is Christmas!”
Charlotte (spinning web): Ummm
The final triumph of correlation over causation.
- — Büchstein (attrib.)
Beware an over-commitment to data analytics:
Firstly it expresses a preference for the aggregate over the specific, and the average over the outlier, the individual, the unique or extraordinary. It is to prefer the mediocre, for its weight of numbers, over the isolated vision of a genius or the depravity of the ugliest man.
As surely as the ugliest man killed God, so did data kill the superman. The will to power is defeated by the million-strong dull blades of the will to entropy. It is the will to premium mediocre.
It is historical
All data is from the past, as Roger Martin has it. That means all data is skewed in time
It is bad
Not only (per our careful argument at signal-to-noise ratio) is the overall quantity of data we have skewed in time (all from the past, none from the future), place (only what we’ve been looking at, none of what we haven’t) and practically nil in quantity, the quality of data in our tawdry collection is poor. And not just in its profusion of cat videos and hot takes on Twitter, either. For, as an evolutionary record, it contains all the errors and the one successful trial; all the abandoned drafts, all the false starts, all the typos, split infinitives, tendentious arguments, feeble caveats and needless avoidances of doubt. The data we have, that is, even on our own rationalised terms, mainly noise. Bad noise.
Noise
There are two kinds of noise.
There is good noise: random, valid information that is just not the information you are looking for, but is broadcast on the same frequency (background radiation, cosmic chatter, cocktail party hubbub — hundreds of other perfectly meaningful exchanges; just not the one you are interested in) and there is bad noise: errors, mistakes, hot takes on Twitter and so on.
And we self-select for bad noise, eschewing good noise as being worthless: scientific journals do not publish accounts of research programmes that turn out badly. Little evidence remains in the fossil record of the failed experiments, unsuccessful iterations and occasions where, under laboratory conditions, the data did not do what the experimenter hoped they would, for no-one publicises non-results. Even for the good experiments, the published research that it makes it past peer review is filtered and cleansed of all those false starts and misconceptions that led to the Eureka moment. Mean time, the hot takes on Twitter, conspiracy theories and cat videos continue to mushroom.
It is illiberal
Second, in its reductionism, in its funnelling of a dispersed population of ideas into an essential homogeneity, it speaks to the underlying belief in a grand unifying theory of everything: a simple set of organising rules, based upon a transcendent truth. If so, one is justified in suppressing any description which is at variance with the one true path, purely on grounds of efficiency. Why waste time and energy, and divert our people from the chosen path, by humouring false explanations? This, in the JC’s view, is a profoundly illiberal idea: to be unable to accommodate pluralism is to deny of pluralism.
It may be “true” that the richness of the universe boils down to a single simple algorithm — perhaps not Conway’s Game of Life, but maybe something winsomely similar — but if so, it is also true that we are in and of and part of that grand machine. Our trajectory through universal design-space is just as ineffably preordained — we are but a deterministic subroutine — which means we cannot control, change or even know, what we do not know: we are as assuredly in the hands of cruel mechanical fate, as wanton boys are to the gods: either we will attain certain knowledge of that algorithm, and wake up into a glorious Singularity of cosmic consciousness — or we won’t, but either way there's nothing to be done — so we might as well enjoy the illusion that there is control. And if we do, against expectation, turn out to have control, then we get to tell our stories —pluralistically plural — if we didn’t, well, no harm trying: we weren’t to know any better, we were doomed to do it anyway.
We can see here that reductionism is not just illiberal but nosy: if you are right, why are you even having this argument? (except that you can't help yourself) — why do you care? You’ve won anyway. What difference does it make, either way, who is right and who is wrong? This is a bad, fatalistic, negative, zero-sum disposition and, since you’re bothering to argue about it, it sounds like you don’t even buy it yourself.
You can’t have it both ways. You are either strapped to your rail, a chimpanzee in a rocket ship, in which case shut up already, or you aren't.
But while you move through It’s A Small World, bound and gagged on your rails of destiny, let me sing.
Anyone who believes in conspiracy theories has obviously never tried to organise a surprise party.
- —Anon
Doesn’t the very idea that all this freedom, variability and randomness that we apprehend throughout all of creation is an illusion — seem rather neat?
There are plenty of things — most things — in creation that seem deterministic and play no such tricks on us: the way a shadow falls and light reflects, as you pass under streetlights. All the causal regularities of the physical world that allow ships to sail, planes to fly, and satellites to orbit the world, and that deny sperm whales and petunias to materialise in orbital space. Most of the world really seems as deterministic ii apparently is. It would certainly be odd if something that wasn’t deterministic nonetheless behaved as if it were. Wouldn't that be odd? But is it any odder that something that is deterministic behaves as if it is not? And does this by the dint of the same regularity that casts shadows and propels cellular mitosis?
What an exceedingly clever trick to trace every possible voluntary movement to make it feel willed, when in fact every molecule is strung upon a causal wire.
Isn't that the greatest conspiracy that there ever was? Isn't that is fantastical as God?
Does not Occam’s razor come down on the pragmatists side?
It is noisy
Thirdly, to embrace all the data you can find is to degrade the signal-to-noise ratio. Even if you buy into the incoherent reductionist idea that the “signal” is some kind of transcendent truth, by industrialising your data, you risk burying it and if you don’t — if like we pluralists you see any signal as not just a suitable narrative for your present purposes, the more data you gather, the more possible narratives — conflicting narratives; incommensurable narratives — you will have. Now this is, for a pluralist, is a good thing: every narrative is a tool in your workshop, the more you have the better you are equipped to deal with the unknown unknowns our complex world will surely throw at us — but that tends not to be what big data disciples are after.
It is not a universal affirmative
Even if, from pure data, you could establish the causal relationship between data you have observed and an event that drives it (it is axiomatic that you can’t, by the way: you can only derive a correlation, and we know how spurious those can be) you still can’t conclude that the cause propelling the general is the same one that compelled any particular.
Averages are crappy things to aspire to, or configure your business to, for a number of reasons.
Because the machinations of statistics can, in certain contexts, inflame the passions of the righteous, the JC has devised the parable of the squirrels to tease this out.