Big data: Difference between revisions

659 bytes added ,  8 November 2022
no edit summary
(1)
Tags: Mobile edit Mobile web edit
No edit summary
Tags: Mobile edit Mobile web edit
Line 14: Line 14:
All data is from the past, as Roger Martin has it.
All data is from the past, as Roger Martin has it.
===It is bad===
===It is bad===
Not only (per our careful argument at [[signal-to-noise-ratio]]) is the overall quantity of data we have skewed in time and practically nil in quantity, the quality data in of our tawdry collection is poor. And not just in its profusion of cat videos and hot takes on Twitter, either. For, as the evolutionary record, it contains all the errors and the one successful trial; all the abandoned drafts, all the false starts, all the typos, [[split infinitive]]s, tendentious arguments, feeble caveats and needless [[for the avoidance of doubt|avoidances of doubt]]. The data we have, that is, even on our own rationalised terms, mainly noise.
Not only (per our careful argument at [[signal-to-noise ratio]]) is the overall quantity of data we have skewed in time (all from the past, none from the future), place (only what we’ve been looking at, none of what we haven’t) and practically ''nil'' in quantity, the ''quality'' of data in our tawdry collection is ''poor''. And not just in its profusion of cat videos and [[hot takes on Twitter]], either. For, as an evolutionary record, it contains ''all'' the errors and the one successful trial; all the abandoned drafts, all the false starts, all the typos, [[split infinitive]]s, tendentious arguments, feeble caveats and needless [[for the avoidance of doubt|avoidances of doubt]]. The data we have, that is, even on our own rationalised terms, mainly noise. ''Bad'' noise.


Qqq
===Noise===
===Noise===
There are two kinds of noise. There is ''good'' noise: random, valid information that is just not the information you are looking for, but is broadcast on the same frequency (background radiation, cosmic chatter, cocktail party [[hubbub]] — hundreds of other perfectly meaningful exchanges; just not the one you are interested in) and there is ''bad'' noise: errors, mistakes, hot takes on Twitter and so on.
There are two kinds of noise.  
 
There is ''good'' noise: random, valid information that is just not the information you are looking for, but is broadcast on the same frequency (background radiation, cosmic chatter, cocktail party [[hubbub]] — hundreds of other perfectly meaningful exchanges; just not the one you are interested in) and there is ''bad'' noise: errors, mistakes, [[hot takes on Twitter]] and so on.
 
And we ''self-select'' for bad noise.  Little evidence remains in the fossil record of the failed experiments unsuccessful iterations, occasions where under laboratory conditions the data did not do what the experimenter hoped they would, for no-one publicises failure. Even for the good experiments, the published research it it makes it past peer review is filtered and cleansed of all those false starts and misconceptions.
 
Despite them being, by dint of their known status as false starts, ''good'' noise.
===It is illiberal===
===It is illiberal===
Second, in its [[reductionism]], in its funnelling of a dispersed population into an essential homogeneity, it speaks to the underlying belief in a grand unifying theory of everything: a transcendent ''truth''. This, in the [[JC]]’s view, is a profoundly illiberal idea: to be unable to accommodate pluralism is to ''deny'' of pluralism.
Second, in its [[reductionism]], in its funnelling of a dispersed population into an essential homogeneity, it speaks to the underlying belief in a grand unifying theory of everything: a transcendent ''truth''. This, in the [[JC]]’s view, is a profoundly illiberal idea: to be unable to accommodate pluralism is to ''deny'' of pluralism.
Line 26: Line 31:
Even if, from pure data, you ''could'' establish the causal relationship between data you have observed and an event that drives it (it is axiomatic that you ''can’t'', by the way: you can only derive a [[correlation]], and we know how spurious ''those'' can be) you still can’t conclude that the cause propelling the ''general'' is the same one that compelled any ''particular''.  
Even if, from pure data, you ''could'' establish the causal relationship between data you have observed and an event that drives it (it is axiomatic that you ''can’t'', by the way: you can only derive a [[correlation]], and we know how spurious ''those'' can be) you still can’t conclude that the cause propelling the ''general'' is the same one that compelled any ''particular''.  


[[Averages]] are crappy things to aspire to, or configure your business to, for a number of reasons .
[[Averages]] are crappy things to aspire to, or configure your business to, for a number of reasons.


Because the machinations of statistics can, in certain contexts, inflame the passions of the righteous, the JC has devised the [[parable of the squirrels]] to tease this out.
Because the machinations of statistics can, in certain contexts, inflame the passions of the righteous, the JC has devised the [[parable of the squirrels]] to tease this out.