Big Data: A Revolution That Will Transform How We Live, Work, and Think
Page 19
This negates the very idea of the presumption of innocence, the principle upon which our legal system, as well as our sense of fairness, is based. And if we hold people responsible for predicted future acts, ones they may never commit, we also deny that humans have a capacity for moral choice.
The important point here is not simply one of policing. The danger is much broader than criminal justice; it covers all areas of society, all instances of human judgment in which big-data predictions are used to decide whether people are culpable for future acts or not. Those include everything from a company’s decision to dismiss an employee, to a doctor denying a patient surgery, to a spouse filing for divorce.
Perhaps with such a system society would be safer or more efficient, but an essential part of what makes us human—our ability to choose the actions we take and be held accountable for them—would be destroyed. Big data would have become a tool to collectivize human choice and abandon free will in our society.
Of course, big data offers numerous benefits. What turns it into a weapon of dehumanization is a shortcoming, not of big data itself, but of the ways we use its predictions. The crux is that holding people culpable for predicted acts before they can commit them uses big-data predictions based on correlations to make causal decisions about individual responsibility.
Big data is useful to understand present and future risk, and to adjust our actions accordingly. Its predictions help patients and insurers, lenders and consumers. But big data does not tell us anything about causality. In contrast, assigning “guilt”—individual culpability—requires that people we judge have chosen a particular action. Their decision must have been causal for the action that followed. Precisely because big data is based on correlations, it is an utterly unsuitable tool to help us judge causality and thus assign individual culpability.
The trouble is that humans are primed to see the world through the lens of cause and effect. Thus big data is under constant threat of being abused for causal purposes, of being tied to rosy visions of how much more effective our judgment, our human decision-making of assigning culpability, could be if we only were armed with big-data predictions.
It is the quintessential slippery slope—leading straight to the society portrayed in Minority Report, a world in which individual choice and free will have been eliminated, in which our individual moral compass has been replaced by predictive algorithms and individuals are exposed to the unencumbered brunt of collective fiat. If so employed, big data threatens to imprison us—perhaps literally—in probabilities.
The dictatorship of data
Big data erodes privacy and threatens freedom. But big data also exacerbates a very old problem: relying on the numbers when they are far more fallible than we think. Nothing underscores the consequences of data analysis gone awry more than the story of Robert McNamara.
McNamara was a numbers guy. Appointed the U.S. secretary of defense when tensions in Vietnam started in the early 1960s, he insisted on getting data on everything he could. Only by applying statistical rigor, he believed, could decision-makers understand a complex situation and make the right choices. The world in his view was a mass of unruly information that if delineated, denoted, demarcated, and quantified could be tamed by human hand and would fall under human will. McNamara sought Truth, and that Truth could be found in data. Among the numbers that came back to him was the “body count.”
McNamara developed his love of numbers as a student at Harvard Business School and then its youngest assistant professor at age 24. He applied this rigor during the Second World War as part of an elite Pentagon team called Statistical Control, which brought data-driven decision-making to one of the world’s largest bureaucracies. Prior to this, the military was blind. It didn’t know, for instance, the type, quantity, or location of spare airplane parts. Data came to the rescue. Just making armament procurement more efficient saved $3.6 billion in 1943. Modern war was about the efficient allocation of resources; the team’s work was a stunning success.
At war’s end, the group decided to stick together and offer their skills to corporate America. The Ford Motor Company was floundering, and a desperate Henry Ford II handed them the reins. Just as they knew nothing about the military when they helped win the war, so too were they clueless about car making. Still, the so-called “Whiz Kids” turned the company around.
McNamara rose swiftly up the ranks, trotting out a data point for every situation. Harried factory managers produced the figures he demanded—whether they were correct or not. When an edict came down that all inventory from one car model must be used before a new model could begin production, exasperated line managers simply dumped excess parts into a nearby river. The brass at headquarters nodded approvingly when the foremen sent back numbers confirming that the order had been obeyed. But the joke at the factory was that a fellow could walk on water—atop rusted pieces of 1950 and 1951 cars.
McNamara epitomized the mid-twentieth-century manager, the hyper-rational executive who relied on numbers rather than sentiments, and who could apply his quantitative skills to any industry he turned them to. In 1960 he was named president of Ford, a position he only held for a few weeks before President Kennedy appointed him secretary of defense.
As the Vietnam conflict escalated and the United States sent more troops, it became clear that this was a war of wills, not of territory. America’s strategy was to pound the Viet Cong to the negotiation table. The way to measure progress, therefore, was by the number of enemy killed. The body count was published daily in the newspapers. To the war’s supporters it was proof of progress; to critics, evidence of its immorality. The body count was the data point that defined an era.
In 1977, two years after the last helicopter lifted off the rooftop of the U.S. embassy in Saigon, a retired Army general, Douglas Kinnard, published a landmark survey of the generals’ views. Called The War Managers, the book revealed the quagmire of quantification. A mere 2 percent of America’s generals considered the body count a valid way to measure progress. Around two-thirds said it was often inflated. “A fake—totally worthless,” wrote one general in his comments. “Often blatant lies,” wrote another. “They were grossly exaggerated by many units primarily because of the incredible interest shown by people like McNamara,” said a third.
Like the factory men at Ford who dumped engine parts into the river, junior officers sometimes gave their superiors impressive numbers to keep their commands or boost their careers—telling the higher-ups what they wanted to hear. McNamara and the men around him relied on the figures, fetishized them. With his perfectly combed-back hair and his flawlessly knotted tie, McNamara felt he could only comprehend what was happening on the ground by staring at a spreadsheet—at all those orderly rows and columns, calculations and charts, whose mastery seemed to bring him one standard deviation closer to God.
The use, abuse, and misuse of data by the U.S. military during the Vietnam War is a troubling lesson about the limitations of information in an age of small data, a lesson that must be heeded as the world hurls toward the big-data era. The quality of the underlying data can be poor. It can be biased. It can be mis-analyzed or used misleadingly. And even more damningly, data can fail to capture what it purports to quantify.
We are more susceptible than we may think to the “dictatorship of data”—that is, to letting the data govern us in ways that may do as much harm as good. The threat is that we will let ourselves be mindlessly bound by the output of our analyses even when we have reasonable grounds for suspecting something is amiss. Or that we will become obsessed with collecting facts and figures for data’s sake. Or that we will attribute a degree of truth to the data which it does not deserve.
As more aspects of life become datafied, the solution that policymakers and businesspeople are starting to reach for first is to get more data. “In God we trust—all others bring data,” is the mantra of the modern manager, heard echoing in Silicon Valley cubicles, on factory floors, and along the corridors of gov
ernment agencies. The sentiment is sound, but one can easily be deluded by data.
Education seems on the skids? Push standardized tests to measure performance and penalize teachers or schools that by this measure aren’t up to snuff. Whether the tests actually capture the abilities of schoolchildren, the quality of teaching, or the needs of a creative, adaptable modern workforce is an open question—but one that the data does not admit.
Want to prevent terrorism? Create layers of watch lists and no-fly lists in order to police the skies. But whether such datasets offer the protection they promise is in doubt. In one famous incident, the late Senator Ted Kennedy of Massachusetts was ensnared by the no-fly list, stopped, and questioned, simply for having the same name as a person in the database.
People who work with data have an expression for some of these problems: “garbage in, garbage out.” In certain cases, the reason is the quality of the underlying information. Often, though, it is the misuse of the analysis that is produced. With big data, these problems may arise more frequently or have larger consequences.
Google, as we’ve shown in many examples, runs everything according to data. That strategy has obviously led to much of its success. But it also trips up the company from time to time. Its co-founders, Larry Page and Sergey Brin, long insisted on knowing all job candidates’ SAT scores and their grade point averages when they graduated from college. In their thinking, the first number measured potential and the second measured achievement. Accomplished managers in their forties who were being recruited were hounded for the scores, to their outright bafflement. The company even continued to demand the numbers long after its internal studies showed no correlation between the scores and job performance.
Google ought to know better, to resist being seduced by data’s false charms. The measure leaves little room for change in a person’s life. It fails to count knowledge rather than book-smarts. And it may not reflect the qualifications of people from the humanities, where know-how may be less quantifiable than in science and engineering. Google’s obsession with such data for HR purposes is especially queer considering that the company’s founders are products of Montessori schools, which emphasize learning, not grades. And it repeats the mistakes of past technology powerhouses that vaunted people’s résumés above their actual abilities. Would Larry and Sergey, as PhD dropouts, have stood a chance of becoming managers at the legendary Bell Labs? By Google’s standards, not Bill Gates, nor Mark Zuckerberg, nor Steve Jobs would have been hired, since they lack college degrees.
The firm’s reliance on data sometimes seems overblown. Marissa Mayer, when she was one of its top executives, once ordered staff to test 41 gradations of blue to see which ones people used more, to determine the color of a toolbar on the site. Google’s deference to data has been taken to extremes. It even sparked revolt.
In 2009 Google’s top designer, Douglas Bowman, quit in a huff because he couldn’t stand the constant quantification of everything. “I had a recent debate over whether a border should be 3, 4 or 5 pixels wide, and was asked to prove my case. I can’t operate in an environment like that,” he wrote on a blog announcing his resignation. “When a company is filled with engineers, it turns to engineering to solve problems. Reduce each decision to a simple logic problem. That data eventually becomes a crutch for every decision, paralyzing the company.”
Brilliance doesn’t depend on data. Steve Jobs may have continually improved the Mac laptop over years on the basis of field reports, but he used his intuition, not data, to launch the iPod, iPhone, and iPad. He relied on his sixth sense. “It isn’t the consumers’ job to know what they want,” he famously said, when telling a reporter that Apple did no market research before releasing the iPad.
In the book Seeing Like a State, the anthropologist James Scott of Yale University documents the ways in which governments, in their fetish for quantification and data, end up making people’s lives miserable rather than better. They use maps to determine how to reorganize communities rather than learn anything about the people on the ground. They use long tables of data about harvests to decide to collectivize agriculture without knowing a whit about farming. They take all the imperfect, organic ways in which people have interacted over time and bend them to their needs, sometimes just to satisfy a desire for quantifiable order. The use of data, in Scott’s view, often serves to empower the powerful.
This is the dictatorship of data writ large. And it was a similar hubris that led the United States to escalate the Vietnam War partly on the basis of body counts, rather than to base decisions on more meaningful metrics. “It is true enough that not every conceivable complex human situation can be fully reduced to the lines on a graph, or to percentage points on a chart, or to figures on a balance sheet,” said McNamara in a speech in 1967, as domestic protests were growing. “But all reality can be reasoned about. And not to quantify what can be quantified is only to be content with something less than the full range of reason.” If only the right data were used in the right way, not respected for data’s sake.
Robert Strange McNamara went on to run the World Bank throughout the 1970s, then painted himself as a dove in the 1980s. He became an outspoken critic of nuclear weapons and a proponent of environmental protection. Later in life he underwent an intellectual conversion and produced a memoir, In Retrospect, that criticized the thinking behind the war and his own decisions as secretary of defense. “We were wrong, terribly wrong,” he wrote. But he was referring to the war’s broad strategy. On the question of data, and of body counts in particular, he remained unrepentant. He admitted many of the statistics were “misleading or erroneous.” “But things you can count, you ought to count. Loss of life is one. . . .” McNamara died in 2009 at age 93, a man of intelligence but not of wisdom.
Big data may lure us to commit the sin of McNamara: to become so fixated on the data, and so obsessed with the power and promise it offers, that we fail to appreciate its limitations. To catch a glimpse of the big-data equivalent of the body count, we need only look back at Google Flu Trends. Consider a situation, not entirely implausible, in which a deadly strain of influenza rages across the country. Medical professionals would be grateful for the ability to forecast in real time the biggest hotspots by dint of search queries. They’d know where to intervene with help.
But suppose that in a moment of crisis political leaders argue that simply knowing where the disease is likely to get worse and trying to head it off is not enough. So they call for a general quarantine—not for all people in those regions, which would be unnecessary and overbroad. Big data allows us to be more particular. So the quarantine applies only to the individual Internet users whose searches were most highly correlated with having the flu. Here we have the data on whom to pick up. Federal agents, armed with lists of Internet Protocol addresses and mobile GPS information, herd the individual web searchers into quarantine centers.
But as reasonable as this scenario might sound to some, it is just plain wrong. Correlations do not imply causation. These people may or may not have the flu. They’d have to be tested. They’d be prisoners of a prediction, but more important, they’d be victims of a view of data that lacks an appreciation for what the information actually means. The point of the actual Google Flu Trends study is that certain search terms are correlated with the outbreak—but the correlation may exist because of circumstances like healthy co-workers hearing sneezes in the office and going online to learn how to protect themselves, not because the searchers are ill themselves.
The dark side of big data
As we have seen, big data allows for more surveillance of our lives while it makes some of the legal means for protecting privacy largely obsolete. It also renders ineffective the core technical method of preserving anonymity. Just as unsettling, big-data predictions about individuals may be used to, in effect, punish people for their propensities, not their actions. This denies free will and erodes human dignity.
At the same time, there is a real risk that the
benefits of big data will lure people into applying the techniques where they don’t perfectly fit, or into feeling overly confident in the results of the analyses. As big-data predictions improve, using them will only become more appealing, fueling an obsession over data since it can do so much. That was the curse of McNamara and is the lesson his story holds.
We must guard against overreliance on data rather than repeat the error of Icarus, who adored his technical power of flight but used it improperly and tumbled into the sea. In the next chapter, we’ll consider ways that we can control big data, lest we be controlled by it.
9
CONTROL
CHANGES IN THE WAY WE produce and interact with information lead to changes in the rules we use to govern ourselves, and in the values society needs to protect. Consider an example from a previous data deluge, the one unleashed by the printing press.
Before Johannes Gutenberg invented moveable type around 1450, the spread of ideas in the West was largely limited to personal connections. Books were mostly confined to monastic libraries, tightly guarded by monks acting for the Catholic Church to protect and preserve its dominance. Outside the Church, books were extremely rare. A few universities had collected only dozens or perhaps a couple of hundred books. Cambridge University began the fifteenth century with a mere 122 tomes.
Within a few decades after Gutenberg’s invention, his printing press had been replicated across Europe, making possible the mass production of books and pamphlets. When Martin Luther translated the Latin Bible into everyday German, people suddenly had a reason to become literate: reading the Bible themselves, they could bypass priests to learn the word of God. The Bible became a best seller. And once literate, people continued to read. Some even decided to write. In less than a person’s life span, the flow of information had changed from a trickle to a torrent.