When data speaks
The effects of big data are large on a practical level, as the technology is applied to find solutions for vexing everyday problems. But that is just the start. Big data is poised to reshape the way we live, work, and think. The change we face is in some ways even greater than those sparked by earlier epochal innovations that dramatically expanded the scope and scale of information in society. The ground beneath our feet is shifting. Old certainties are being questioned. Big data requires fresh discussion of the nature of decision-making, destiny, justice. A worldview we thought was made of causes is being challenged by a preponderance of correlations. The possession of knowledge, which once meant an understanding of the past, is coming to mean an ability to predict the future.
These issues are much more significant than the ones that presented themselves when we prepared to exploit e-commerce, live with the Internet, enter the computer age, or take up the abacus. The idea that our quest to understand causes may be overrated—that in many cases it may be more advantageous to eschew why in favor of what—suggests that the matters are fundamental to our society and our existence. The challenges posed by big data may not have set answers, either. Rather, they are part of the timeless debate over man’s place in the universe and his search for meaning amid the hurly-burly of a chaotic, incomprehensible world.
Ultimately, big data marks the moment when the “information society” finally fulfills the promise implied by its name. The data takes center stage. All those digital bits that we have gathered can now be harnessed in novel ways to serve new purposes and unlock new forms of value. But this requires a new way of thinking and will challenge our institutions and even our sense of identity. The one certainty is that the amount of data will continue to grow, as will the power to process it all. But where most people have considered big data as a technological matter, focusing on the hardware or the software, we believe the emphasis needs to shift to what happens when the data speaks.
We can capture and analyze more information than ever before. The scarcity of data is no longer the characteristic that defines our efforts to interpret the world. We can harness vastly more data and in some instances, get close to all of it. But doing so forces us to operate in untraditional ways and, in particular, changes our idea of what constitutes useful information.
Instead of obsessing about the accuracy, exactitude, cleanliness, and rigor of the data, we can let some slack creep in. We shouldn’t accept data that is outright wrong or false, but some messiness may become acceptable in return for capturing a far more comprehensive set of data. In fact, in some cases big and messy can even be beneficial, since when we tried to use just a small, exact portion of the data, we ended up failing to capture the breadth of detail where so much knowledge lies.
Because correlations can be found far faster and cheaper than causation, they’re often preferable. We will still need causal studies and controlled experiments with carefully curated data in certain cases, such as designing a critical airplane part. But for many everyday needs, knowing what not why is good enough. And big-data correlations can point the way toward promising areas in which to explore causal relationships.
These quick correlations let us save money on plane tickets, predict flu outbreaks, and know which manholes or overcrowded buildings to inspect in a resource-constrained world. They may enable health insurance firms to provide coverage without a physical exam and lower the cost of reminding the sick to take their medication. Languages are translated and cars drive themselves on the basis of predictions made through big-data correlations. Walmart can learn which flavor Pop-Tarts to stock at the front of the store before a hurricane. (Answer: strawberry.) Of course, causality is nice when you can get it. The problem is that it’s often hard to get, and when we think we’ve found it we’re often deluding ourselves.
New tools, from faster processors and more memory to smarter software and algorithms, are only part of the reason we can do all this. While the tools are important, a more fundamental reason is that we have more data, since more aspects of the world are being datafied. To be sure, the human ambition to quantify the world long predated the computer revolution. But digital tools facilitate datafication greatly. Not only can mobile phones track whom we call and where we go, but the data they collect can be used to detect whether we’re falling ill. Soon big data may be able to tell whether we’re falling in love.
Our ability to do new, do more, do better, and do faster has the potential to unleash enormous value, creating new winners and losers. Much of the value of data will come from its secondary uses, its option value, not simply its primary use, as we’re accustomed to think about it. As a result, for most types of data, it seems sensible to collect as much as one can and hold it as long as it adds value, and let others analyze it if they’re better suited to extract its value (provided one can share in the lucre the analysis unleashes).
Companies that can situate themselves in the middle of information flows and can collect data will thrive. Harnessing big data effectively requires technical skills and a lot of imagination—a big-data mindset. But the crux of the value may go to those who hold the data. And sometimes an important asset will not be just the plainly visible information but the data exhaust created by people’s interactions with information, which a clever company can use to improve an existing service or launch an entirely new one.
At the same time, big data presents us with huge risks. It renders ineffective the core technical and legal mechanisms through which we currently try to protect privacy. In the past what constituted personally identifiable information was well known—names, Social Security numbers, tax records, and so on—and hence relatively easy to protect. Today, even the most innocuous data can reveal someone’s identity if a data collector has amassed enough of it. Anonymization or hiding in plain sight no longer works. Moreover, targeting an individual for surveillance now entails a more extensive invasion of privacy than ever before, since authorities not only want to see as much information about a person as possible, but also the widest range of relationships, connections, and interactions.
In addition to challenging privacy, these uses of big data raise another unique and troubling concern: the risk that we may judge people not just for their actual behavior but for propensities the data suggests they have. As big-data predictions become more accurate, society may use them to punish people for predicted behavior—acts they have not yet committed. Such predictions are axiomatically impossible to disprove; hence the people they accuse can never exculpate themselves. Punishment on this basis negates the concept of free will and denies the possibility, however small, that a person may choose a different path. As society assigns individual responsibility (and metes out punishment), human volition must be considered inviolable. The future must remain something that we can shape to our own design. If it does not, big data will have perverted the very essence of humanity: rational thought and free choice.
There are no foolproof ways to fully prepare for the world of big data; it will require that we establish new principles by which we govern ourselves. A series of important changes to our practices can help society as it becomes more familiar with big data’s character and shortcomings. We must protect privacy by shifting responsibility away from individuals and toward the data users—that is, to accountable use. In a world of predictions, it’s vital we ensure that human volition is held sacrosanct and we preserve not only people’s capacity for moral choice but individual responsibility for individual acts. And society must design safeguards to allow a new professional class of “algorithmists” to assess big-data analytics—so that a world which has become less random by dint of big data does not turn into a black box, simply replacing one form of the unknowable with another.
Big data will become integral to understanding and addressing many of our pressing global problems. Tackling climate change requires analyzing pollution data to understand where best to focus our efforts and find ways to mitigate problems. The
sensors being placed all over the world, including those embedded in smartphones, provide a cornucopia of data that will let us model global warming at a better level of detail. Meanwhile, improving and lowering the cost of healthcare, especially for the world’s poor, will be in large part about automating tasks that currently seem to need human judgment but could be done by computer, such as examining biopsies for cancerous cells or detecting infections before symptoms fully emerge.
Big data has already been used for economic development and for conflict prevention. It has revealed areas of African slums that are vibrant communities of economic activity by analyzing the movements of cellphone users. It has uncovered areas that are ripe for ethnic clashes and indicated how refugee crises might unfold. And its uses will only multiply as the technology is applied to more aspects of life.
Big data helps us do what we already do better, and it allows us to do new things altogether. Yet it is no magic wand. It won’t bring about world peace, eradicate poverty, or produce the next Picasso. Big data can’t make a baby—but it can save premature ones. In time, we will come to expect it to be used in almost every facet of life, and perhaps we’ll be slightly alarmed when it’s absent, in the same way that we expect a doctor to order an X-ray to uncover problems that couldn’t possibly be gleaned from a physical exam.
As big data becomes commonplace, it may well affect how we think about the future. Around five hundred years ago, humanity went through a profound shift in its perception of time, as part of the move toward a more secular, science-based, and enlightened Europe. Before that, time was experienced as cyclical, and so was life. Every day (and year) was much like the one before, and even the end of life resembled its start, as adults again became childlike. Later, time came to be seen as linear—an unfolding sequence of days in which the world could be shaped and life’s trajectory influenced. If earlier, the past, present, and future had all been fused together, now humanity had a past to look back upon, and a future to look forward to, as it shaped its present.
While the present could be molded, the future turned from something perfectly predictable into something open, pristine—a vast, empty canvas that individuals could fill according to their own values and efforts. One of the defining features of modern times is our sense of ourselves as masters of our fate; this attitude sets us apart from our ancestors, for whom determinism of some form was the norm. Yet big-data predictions render the future less open and untouched. Rather than being a blank canvas, our future seems already sketched in faint traces that are discernible to those with the technology to make them apparent. This seems to diminish our capacity to shape our destiny. Potentiality is slaughtered on the altar of probability.
At the same time, big data may mean that we are forever prisoners of our previous actions, which can be used against us in systems that presume to predict our future behavior: we can never escape what has come before. “What’s past is prologue,” wrote Shakespeare. Big data enshrines this algorithmically, for ill as well as good. Will a world of predictions dampen our enthusiasm to greet the sunrise, our desire to put our own human imprint on the world?
The opposite is actually more likely. Knowing how actions may play out in the future will allow us to take remedial steps to prevent problems or improve outcomes. We will spot students who are starting to slip long before the final exam. We will detect tiny cancers and treat them before the full-blown disease has a chance to emerge. We will see the likelihood of unwanted teenage pregnancy or a life of crime and intervene to change, as much as we can, that predicted outcome. We will prevent deadly fires from consuming overcrowded New York tenements by knowing which buildings to inspect first.
Nothing is preordained, because we can always respond and react to the information we receive. Big data’s predictions are not set in stone—they are only likely outcomes, and that means that if we want to change them we can do so. We may identify how to best greet the future and be its master, just as Maury found natural pathways within the vast, open space of wind and waves. And to accomplish this we won’t need to comprehend the nature of the cosmos or prove the existence of the gods—big data will be good enough.
Even bigger data
As big data transforms our lives—optimizing, improving, making more efficient, and capturing benefits—what role is left for intuition, faith, uncertainty, and originality?
If big data teaches us anything, it is that just acting better, making improvements—without deeper understanding—is often good enough. Continually doing so is virtuous. Even if you don’t know why your efforts work as they do, you’re generating better outcomes than you would by not making such efforts. Flowers and his “kids” in New York may not embody the enlightenment of the sages, but they do save lives.
Big data is not an ice-cold world of algorithms and automatons. There is an essential role for people, with all our foibles, misperceptions and mistakes, since these traits walk hand in hand with human creativity, instinct, and genius. The same messy mental processes that lead to our occasional humiliation or wrongheadedness also give rise to successes and stumbling upon our greatness. This suggests that, just as we’re learning to embrace messy data because it serves a larger purpose, we ought to welcome the inexactitude that is part of what it means to be human. After all, messiness is an essential property of both the world and our minds; in both cases, we only benefit by accepting it and applying it.
In a world in which data informs decisions, what purpose remains for people, or for intuition and going against the facts? If everyone appeals to the data and harnesses big-data tools, perhaps what becomes the central point of differentiation is unpredictability: the human element of instinct, risk-taking, accident, and error.
If so, then there will be a special need to carve out a place for the human: to reserve space for intuition, common sense, and serendipity to ensure that they are not crowded out by data and machine-made answers. What is greatest about human beings is precisely what the algorithms and silicon chips don’t reveal, what they can’t reveal because it can’t be captured in data. It is not the “what is,” but the “what is not”: the empty space, the cracks in the sidewalk, the unspoken and the not-yet-thought.
This has important implications for the notion of progress in society. Big data enables us to experiment faster and explore more leads. These advantages should produce more innovation. But the spark of invention becomes what the data does not say. That is something that no amount of data can ever confirm or corroborate, since it has yet to exist. If Henry Ford had queried big-data algorithms for what his customers wanted, they would have replied “a faster horse” (to rephrase his famous saying). In a world of big data, it is our most human traits that will need to be fostered—our creativity, intuition, and intellectual ambition—since our ingenuity is the source of our progress.
Big data is a resource and a tool. It is meant to inform, rather than explain; it points us toward understanding, but it can still lead to misunderstanding, depending on how well or poorly it is wielded. And however dazzling we find the power of big data to be, we must never let its seductive glimmer blind us to its inherent imperfections.
The totality of information in the world—the ultimate N=all—can never be gathered, stored, or processed by our technologies. For example, the CERN particle-physics laboratory in Switzerland collects less than 0.1 percent of the information that is generated during its experiments—the rest, seemingly of no use, is left to dissipate into the ether. But this is hardly a new truth. Society has always been hobbled by the limitations of the tools we use to measure and know reality, from compass and sextant to telescope and radar to today’s GPS. Our tools may be twice or ten times or a thousand times as powerful tomorrow as they are today, making what we know now seem minuscule then. Our current big-data world will, before long, look as quaint as the four kilobytes of writeable memory in Apollo 11’s guidance control computer does now.
What we are able to collect and process will always be just a tiny f
raction of the information that exists in the world. It can only be a simulacrum of reality, like the shadows on the wall of Plato’s cave. Because we can never have perfect information, our predictions are inherently fallible. This doesn’t mean they’re wrong, only that they are always incomplete. It doesn’t negate the insights that big data offers, but it puts big data in its place—as a tool that doesn’t offer ultimate answers, just good-enough ones to help us now until better methods and hence better answers come along. It also suggests that we must use this tool with a generous degree of humility . . . and humanity.
Notes
1. Now
[>] Google Flu Trends—Jeremy Ginsburg et al., “Detecting Influenza Epidemics Using Search Engine Query Data,” Nature 457 (2009), pp. 1012–14 (http://www.nature.com/nature/journal/v457/n7232/full/nature07634.html).
[>] Follow-on study of Google Flu Trends—A. F. Dugas et al., “Google Flu Trends: Correlation with Emergency Department Influenza Rates and Crowding Metrics,” CID Advanced Access (January 8, 2012); DOI 10.1093 /cid/cir883.
[>] Buying airplane tickets, Farecast—The information comes from Kenneth Cukier, “Data, Data Everywhere,” The Economist special report, February 27, 2010, pp. 1–14, and from interviews with Etzioni between 2010 and 2012. Etzioni’s Hamlet project—Oren Etzioni, C. A. Knoblock, R. Tuchinda, and A. Yates, “To Buy or Not to Buy: Mining Airfare Data to Minimize Ticket Purchase Price,” SIGKDD ’03, August 24–27, 2003 (http://knight.cis.temple.edu/~yates//papers/hamlet-kdd03.pdf).
Big Data: A Revolution That Will Transform How We Live, Work, and Think Page 22