You know, we have real problems to solve. I can’t dick around, frankly, thinking about other things like causality right now.
We find ourselves in a new world, argue Viktor Mayer-Schönberger and Kenneth Cukier. No longer need we grapple with the world by spinning theories and using them to make predictions. We now have Big Data and Big Data will speak to us, gifting us with insights that were never before accessible.
By Big Data, Mayer-Schönberger and Cukier refer to the vastly greater amount of collected and stored data around us. Big Data also reflect a new economics - where the costs to acquire, store and manipulate data are increasingly negligible. Big Data is often collected mindlessly and incessantly: our continuous GPS coordinates, our Google searches.
Big Data presents new opportunities for prediction. Old prediction involved the collection of precise sample data, which would then be fitted into a theory. Theory was developed under causal lines - data confirmed theory and reflected a link between cause and result. If we collect data showing a large number of people diagnosed with the flu, we may infer the presence of an epidemic.
Big Data permits a relaxation of the linkages between the data and the inference. If identifiable Google searches are associated with flu outbreaks (to use one of the central illustrations of Big Data), mere correlation may be sufficient to permit reliable prediction. The Google searches may be causally linked to the flu (a search for flu remedies stimulated by the onset of flu symptoms) or may not be (a search for last night’s baseball scores, that correlates for no known or even knowable reason with the flu); mere correlation is sufficient to unlock Big Data’s power. We learn from Big Data that orange cars are more likely to be well-maintained; this link is demonstrated by the data. We do not know why this is - indeed, we need not know why this is so. A large part of the argument presented in Big Data is the abandonment of an insistence on causality in order to make predictions.
And, Mayer-Schönberger and Cukier argue, given its power, Big Data can be “messier” than old data. More and more, Big Data is exhaustive (it approaches “N=all,” as the authors style it), and so sampling error is no longer a concern. Big Data need not be pristine to work; the signal will cut through the noise.
But there is still a historicity about the inferences we reach with Big Data. We continue to use the past to predict the future - and this can be dangerous. The Financial Crisis clearly demonstrated the weakness of any presumption of continuity. Big Data may teach us that people buy Strawberry Pop-Tarts when a hurricane approaches (another prominent example in Big Data), but our prediction that they will buy Pop-Tarts in the future remains blinkered. (The authors hesitate to suggest that the National Hurricane Center use Pop-Tart sales data in its forecasting).
Further, while the use of Big Data for predictive inference may no longer be (strictly speaking) causal, there still is an inherent chronology to the inference: the collection of Big Data precedes in time its application. Big Data about falling leaves correlate with Spring as well as Autumn - what differs is the temporal relationship of the two ‘predictions’.
Mayer-Schönberger and Cukier invite us to let data ‘speak’ - as if all that is required of us is attentiveness. But what was true at Delphi remains true with Big Data: the data only respond to the questions asked of it. And there is an innately human element in the formation of the question (even if the ‘question’ asked is an open search for correlations). The universe of data still does not approach the real world, as Mayer-Schönberger and Cukier concede. Only the tiniest bit of our existence is datafied - and so the direction for the collection of more Big Data is still largely arbitrary and path-dependent, even for Google.
Mayer-Schönberger and Cukier are generally cheerful in their assessment of Big Data. The privacy concerns that alarm many of us they find tractable. They are concerned, almost to the point of obsession given the repetitiveness of the discussion, with the predicament presented in the film Minority Report. Here an individual is arrested for a crime he is about to commit, fingered (as it were) by Big Data. While this scenario is scary, the philosophical questions it raises are hardly new; criminal liability for attempts and other so-called inchoate offenses have long been a difficult challenge in criminal theory.