Big Data: A Revolution That Will Transform How We Live, Work, and Think
Page 17
These are not the sorts of things that a game designer toiling in a studio might have known, but the data spoke. “We are an analytics company masquerading as a gaming company. Everything is run by the numbers,” explained Ken Rudin, then Zynga’s analytics chief, before jumping ship to head analytics at Facebook. Harnessing data is no guarantee of business success but shows what is possible.
The shift to data-driven decisions is profound. Most people base their decisions on a combination of facts and reflection, plus a heavy dose of guesswork. “A riot of subjective visions—feelings in the solar plexus,” in the poet W. H. Auden’s memorable words. Thomas Davenport, a business professor at Babson College in Massachusetts and the author of numerous books on analytics, calls it “the golden gut.” Executives are just sure of themselves from gut instinct, so they go with that. But this is starting to change as managerial decisions are made or at least confirmed by predictive modeling and big-data analysis.
For instance, The-Numbers.com uses lots of data and mathematics to tell independent Hollywood producers how much income a film is likely to earn long before the first scene is shot. The company’s database crunches around 30 million records covering every commercial U.S. film going back decades. It includes each film’s budget, genre, cast, crew, and awards, as well as revenue (from U.S. and international box office, overseas rights, video sales and rentals, and so on), and much more. The database also contains a ganglion of human connections, such as “this screenwriter worked with this director; this director worked with this actor,” explains its founder and president, Bruce Nash.
The-Numbers.com is able to find intricate correlations that predict the income of film projects. Producers take that information to studios or investors to get financial backing. The firm can even tinker with variables to tell clients how to increase their haul (or minimize the risk of losses). In one instance, its analysis found that a project would have a far better chance of success if the male lead was an A-list actor: specifically, an Oscar-nominated one paid in the $5 million range. In another case, Nash informed the IMAX studio that a sailing documentary would probably be profitable only if its $12 million budget was reduced to $8 million. “It made the producer happy—the director less so,” says Nash.
From whether to make a movie to what shortstop to sign, the shift in corporate decision-making is beginning to show up on bottom lines. Erik Brynjolfsson, a business professor at MIT’s Sloan School of Management, and his colleagues studied the performance of companies that excel at data-driven decision-making and compared it with the performance of other firms. They found that productivity levels were as much as 6 percent higher at such firms than at companies that did not emphasize using data to make decisions. This gives the data-guided firms a significant leg up—though like the advantage of mindset and skills, it may be short-lived as more companies adopt big-data approaches to their business.
A question of utility
As big data becomes a source of competitive advantage for many companies, the structure of entire industries will be reshaped. The rewards, however, will accrue unequally. And the winners will be found among large and small firms, squeezing out the mass in the middle.
The largest players like Amazon and Google will continue to soar. Unlike the situation in the industrial age, however, their competitive advantage will not rest on physical scale. The vast technical infrastructure of data centers that they command is important but not their most essential quality. With abundant digital storage and processing available to lease inexpensively and add to within minutes, firms can adjust their amount of computing horsepower and storage to fit actual demand. By turning what had been a fixed cost into a variable one, this change erodes the advantages of scale based on technical infrastructure that large companies have long enjoyed.
Scale still matters, but it has shifted. What counts is scale in data. This means holding large pools of data and being able to capture ever more of it with ease. Thus large data holders will flourish as they gather and store more of the raw material of their business, which they can reuse to create additional value.
The challenge for the victors of a small-data world and for offline champions—companies like Walmart, Proctor & Gamble, GE, Nestlé, and Boeing—is to appreciate the power of big data and collect and use data more strategically. The aircraft engine-maker Rolls-Royce completely transformed its business over the past decade by analyzing the data from its products, not just building them. From its operations center in Britain, the company continuously monitors the performance of more than 3,700 jet engines worldwide to spot problems before breakdowns occur. It used data to help turn a manufacturing business into a razor-and-blades one: Rolls-Royce sells the engines but also offers to monitor them, charging customers based on usage time (and repairs or replaces them in case of problems). Services now account for around 70 percent of the civil-aircraft engine division’s annual revenue.
Startups as well as old stalwarts in new business areas are positioning themselves to capture vast streams of data. Apple’s foray into mobile phones is a case in point. Before the iPhone, mobile operators amassed potentially valuable usage data from subscribers but failed to capitalize on it. Apple, in contrast, demanded in its contracts with operators that it would receive much of the most useful information. By obtaining data from scores of operators around the world, Apple gets a far richer picture of cellphone use than any mobile carrier alone can see.
Big data offers exciting opportunities at the other end of the size spectrum as well. Smart and nimble small players can enjoy “scale without mass,” in the celebrated phrase of Professor Brynjolfsson. That is, they can have a large virtual presence without hefty physical resources, and can diffuse innovations broadly at little cost. Importantly, because some of the best big-data services are based primarily on innovative ideas, they may not require large initial investments. Small firms can license the data rather than own it, run their analysis on inexpensive cloud computing platforms, and pay the licensing fees with a percentage of income earned.
There’s a good chance that these advantages at both ends of the spectrum will not be limited to data users but will accrue to data holders as well. Large data holders have strong incentives to add to their hoards of data, since doing so provides greater benefits at only marginal cost. First, they already have the infrastructure in place, in terms of storage and processing. Second, there is a special value in combining datasets. And third, a one-stop shop to obtain data simplifies life for data users.
Yet more intriguingly, a new breed of data holders may also emerge at the other extreme: individuals. As the value of data becomes increasingly apparent, people may want to flex their muscles as holders of information that pertains to them—for example, their shopping preferences, media-viewing habits, and perhaps health data too.
Personal-data ownership may empower individual consumers in ways that haven’t been considered before. People may wish to decide for themselves whom to license their data to, and for how much. Of course, not everyone will want to flog his bits to the highest bidder; many will be content to see it reused for free in return for better service like accurate Amazon book recommendations and a better user experience on Pinterest, the digital pinboard and content sharing service. But for a significant number of digitally savvy consumers, the idea of marketing and selling their personal information may become as natural as blogging, tweeting, or editing a Wikipedia entry.
For this to work, however, more is needed than just a shift in consumer sophistication and preferences. Today it would be much too complicated and costly for people to license their personal data and for companies to transact with each individual to obtain it. More likely, we’ll see the advent of new firms that pool data from many consumers, provide an easy way to license it, and automate the transactions. If their costs are low enough, and if enough people trust them, it is conceivable that a market for personal data could be established. Businesses such as Mydex in Britain and groups such as ID3, co-found
ed by Sandy Pentland, the personal-data analytics guru at MIT, are already working to make this vision a reality.
Until these intermediaries are up and running and data users have begun to use them, however, people desiring to become their own data holders have extremely limited options at their disposal. In the interim, to retain their options for a time when the infrastructure and intermediaries are in place, individuals may consider disclosing less rather than more.
For mid-sized companies, however, big data is less helpful. There are scale advantages to the very large, and cost and innovation advantages to the small, argues Philip Evans of the Boston Consulting Group, a prescient thinker on technology and business. In traditional sectors, medium-sized firms exist because they combine a certain minimum size to reap the benefits of scale with a certain flexibility that large players lack. But in a big-data world, there is no minimum scale that a company must reach to pay for its investments in production infrastructure. Big-data users wanting to remain flexible yet successful will find they no longer need to attain a threshold in size. Instead, they can remain small and still flourish (or be acquired by a big-data giant).
Big data squeezes the middle of an industry, pushing firms to be very large, or small and quick, or dead. Many traditional sectors will eventually be recast as big-data ones, from financial services to pharmaceuticals to manufacturing. Big data will not eliminate all mid-sized firms in all sectors, but it will certainly place pressure on companies in industries that are vulnerable to being shaken up by the power of big data.
Big data is poised to disrupt the competitive advantages of states as well. At a time when manufacturing has been largely lost to developing countries and innovation seems to be up for grabs, industrialized nations retain an advantage in that they hold the data and know how to use it. The bad news is that this advantage is not sustainable. As happened with computing and the Internet, the West’s early lead in big data will diminish as other parts of the world adopt the technology. The good news for today’s powerhouse firms from developed countries, however, is that big data will probably exacerbate corporate strengths and weaknesses. So if a company masters big data, it stands a chance of not only outperforming its peers but widening its lead.
The race is on. Just as Google’s search algorithm needs users’ data exhaust to work well, and just as the German car-parts supplier saw the importance of data to improve its components, so too all firms can gain by tapping data in clever ways.
Despite the rosy benefits, however, there are also reasons to worry. As big data makes increasingly accurate predictions about the world and our place in it, we may not be ready for its impact on our privacy and our sense of freedom. Our perceptions and institutions were constructed for a world of information scarcity, not surfeit. We explore the dark side of big data in the next chapter.
8
RISKS
FOR ALMOST FORTY YEARS, until the Berlin Wall came down in 1989, the East German state security agency known as the Stasi spied on millions of people. Employing around a hundred thousand full-time staff, the Stasi watched from cars and streets. It opened letters and peeked into bank accounts, bugged apartments and wiretapped phone lines. And it induced lovers and couples, parents and children, to spy on each other, betraying the most basic trust humans have in each other. The resulting files—including at least 39 million index cards and 70 miles of documents—recorded and detailed the most intimate aspects of the lives of ordinary people. East Germany was one of the most comprehensive surveillance states ever seen.
Twenty years after East Germany’s demise, more data is being collected and stored about each one of us than ever before. We’re under constant surveillance: when we use our credit cards to pay, our cellphones to communicate, or our Social Security numbers to identify ourselves. In 2007 the British media relished the irony that there were more than 30 surveillance cameras within 200 yards of the London apartment where George Orwell wrote 1984. Well before the advent of the Internet, specialized companies like Equifax, Experian, and Acxiom collected, tabulated, and provided access to personal information for hundreds of millions of people worldwide. The Internet has made tracking easier, cheaper, and more useful. And clandestine three-letter government agencies are not the only ones spying on us. Amazon monitors our shopping preferences and Google our browsing habits, while Twitter knows what’s on our minds. Facebook seems to catch all that information too, along with our social relationships. Mobile operators know not only whom we talk to, but who is nearby.
With big data promising valuable insights to those who analyze it, all signs seem to point to a further surge in others’ gathering, storing, and reusing our personal data. The size and scale of data collections will increase by leaps and bounds as storage costs continue to plummet and analytic tools become ever more powerful. If the Internet age threatened privacy, does big data endanger it even more? Is that the dark side of big data?
Yes, and it is not the only one. Here, too, the essential point about big data is that a change of scale leads to a change of state. As we’ll explain, this transformation not only makes protecting privacy much harder, but also presents an entirely new menace: penalties based on propensities. That is the possibility of using big-data predictions about people to judge and punish them even before they’ve acted. Doing this negates ideas of fairness, justice, and free will.
In addition to privacy and propensity, there is a third danger. We risk falling victim to a dictatorship of data, whereby we fetishize the information, the output of our analyses, and end up misusing it. Handled responsibly, big data is a useful tool of rational decision-making. Wielded unwisely, it can become an instrument of the powerful, who may turn it into a source of repression, either by simply frustrating customers and employees or, worse, by harming citizens.
The stakes are higher than is typically acknowledged. The dangers of failing to govern big data in respect to privacy and prediction, or of being deluded about the data’s meaning, go far beyond trifles like targeted online ads. The history of the twentieth century is blood-soaked with situations in which data abetted ugly ends. In 1943 the U.S. Census Bureau handed over block addresses (but not street names and numbers, to maintain the fiction of protecting privacy) of Japanese-Americans to facilitate their internment. The Netherlands’ famously comprehensive civil records were used by the invading Nazis to round up Jews. The five-digit numbers tattooed into the forearms of Nazi concentration-camp prisoners initially corresponded to IBM Hollerith punch-card numbers; data processing facilitated murder on an industrial scale.
Despite its informational prowess, there was much that the Stasi could not do. It could not know where everyone moved at all times or whom they talked to without great effort. Today, though, much of this information is collected by mobile phone carriers. The East German state could not predict which people would become dissidents, nor can we—but police forces are starting to use algorithmic models to decide where and when to patrol, which gives a hint of things to come. These trends make the risks inherent in big data as large as the datasets themselves.
Paralyzing privacy
It is tempting to extrapolate the danger to privacy from the growth in digital data and see parallels to Orwell’s surveillance dystopia 1984. And yet the situation is more complex. To start, not all big data contains personal information. Sensor data from refineries does not, nor does machine data from factory floors or data on manhole explosions or airport weather. BP and Con Edison do not need (or want) personal information in order to gain value from the analytics they perform. Big-data analyses of those types of information pose practically no risk to privacy.
Still, much of the data that’s now being generated does include personal information. And companies have a welter of incentives to capture more, keep it longer, and reuse it often. The data may not even explicitly seem like personal information, but with big-data processes it can easily be traced back to the individual it refers to. Or intimate details about a person’s life can b
e deduced.
For instance, utilities are rolling out “smart” electrical meters in the United States and Europe that collect data throughout the day, perhaps as frequently as every six seconds—far more than the trickle of information on overall energy use that traditional meters gathered. Importantly, the way electrical devices draw power creates a “load signature” that is unique to the appliance. So a hot-water heater is different from a computer, which differs from marijuana grow-lights. Thus a household’s energy use discloses private information, be it the residents’ daily behavior, health conditions or illegal activities.
The important question, however, is not whether big data increases the risk to privacy (it does), but whether it changes the character of the risk. If the threat is simply larger, then the laws and rules that protect privacy may still work in the big-data age; all we need to do is redouble our existing efforts. On the other hand, if the problem changes, we may need new solutions.
Unfortunately, the problem has been transformed. With big data, the value of information no longer resides solely in its primary purpose. As we’ve argued, it is now in secondary uses.
This change undermines the central role assigned to individuals in current privacy laws. Today they are told at the time of collection which information is being gathered and for what purpose; then they have an opportunity to agree, so that collection can commence. While this concept of “notice and consent” is not the only lawful way to gather and process personal data, according to Fred Cate, a privacy expert at Indiana University, it has been transmogrified into a cornerstone of privacy principles around the world. (In practice, it has led to super-sized privacy notices that are rarely read, let alone understood—but that is another story.)