Book Read Free

Big Data: A Revolution That Will Transform How We Live, Work, and Think

Page 15

by Viktor Mayer-Schonberger


  Some firms have shrewdly positioned themselves in the center of information flows so they can achieve scale and capture value from data. That’s been the case in the credit card industry in the United States. For years, the high cost of fighting fraud led many small and midsized banks to avoid issuing their own credit cards and to turn their card operations over to larger financial institutions, which had the size and scale to invest in the technology. Firms like Capital One and Bank of America’s MBNA lapped up the business. But the smaller banks now regret that decision, because having shed the card operations deprives them of data on spending patterns that would let them know more about their customers so they could sell them tailored services.

  Instead, the larger banks and the card issuers like Visa and MasterCard seem to be in the sweet spot of the information value chain. By serving many banks and merchants, they can see more transactions over their networks and use them to make inferences about consumer behavior. Their business model shifts from simply process- ing payments to collecting data. The question then is what they do with it.

  MasterCard could license the data to third parties who would extract the value, as ITA did, but the company prefers to do the analysis itself. A division called MasterCard Advisors aggregates and analyzes 65 billion transactions from 1.5 billion cardholders in 210 countries in order to divine business and consumer trends. Then it sells that information to others. It discovered, among other things, that if people fill up their gas tanks at around four o’clock in the afternoon, they’re quite likely to spend between $35 and $50 in the next hour at a grocery store or restaurant. A marketer might use that insight to print out coupons for a nearby supermarket on the back of gas-station receipts around that time of day.

  As a middleman to information flows, MasterCard is in a prime position to collect data and capture its value. One can imagine a future when card companies forgo their commissions on transactions, processing them for free in return for access to more data, and earn income from selling highly sophisticated analytics based on it.

  The second category consists of data specialists: companies with the expertise or technologies to carry out complex analysis. MasterCard chose to do this in house, and some firms migrate between categories. But lots of others turn to specialists. For example, the consultancy Accenture works with firms in many industries to deploy advanced wireless-sensor technologies and to analyze the data the sensors collect. In a pilot project with the city of St. Louis, Missouri, Accenture installed wireless sensors in a score of public buses to monitor their engines to predict breakdowns or determine the optimal time to do regular maintenance. It lowered the cost of ownership by as much as 10 percent. Just one finding—that the city could delay a scheduled part change from every 200,000–250,000 miles to 280,000 miles—saved more than a thousand dollars per vehicle. The client, not the consultancy, reaped the value of the data.

  In the realm of medical data, we see another striking example of how outside technology firms can provide useful services. The MedStar Washington Hospital Center in Washington, D.C., working with Microsoft Research and using Microsoft’s Amalga software, analyzed several years of its anonymized medical records—patient demographics, tests, diagnoses, treatments, and more—for ways to reduce readmission rates and infections. These are some of the costliest parts of healthcare, so anything that can lower the rates means huge savings.

  The technique uncovered some surprising correlations. One result was a list of all conditions that increased the chances that a discharged patient would return within a month. Some are well known and have no easy solution. A patient with congestive heart failure is likely to be back: it’s a hard condition to treat. But the system also spotted another unexpected top predictor: the patient’s mental state. The probability that a person would be readmitted within a month of discharge increased markedly if the initial complaint contained words that suggested mental distress, such as “depression.”

  Although this correlation says nothing to establish causality, it nevertheless suggests that a post-discharge intervention that addresses patients’ mental health might improve their physical health too, reducing readmissions and lowering medical costs. This finding, which a machine sifted out of a vast trove of data, is something a person studying the data might never have spotted. Microsoft didn’t control the data, which belonged to the hospital. And it didn’t have an astonishing idea; that wasn’t what was required here. Instead, it offered the software tool, the Amalga software, to spot the insight.

  The firms that are big-data holders rely on specialists to extract value from the data. But despite the high praise and chic job titles like “data ninja,” the life of technical experts is not always as glamorous as it may seem. They toil in the diamond mines of big data, taking home a pleasant paycheck, but they hand over the gems they unearth to those who have the data.

  The third group is made up of companies and individuals with a big-data mindset. Their strength is that they see opportunities before others do—even if they lack the data or the skills to act upon those opportunities. Indeed, perhaps it is precisely because, as outsiders, they lack these things that their minds are free of imaginary prison bars: they see what is possible rather than being limited by a sense of what is feasible.

  Bradford Cross personifies what it means to have a big-data mindset. In August 2009, when he was in his mid-twenties, he and some friends created FlightCaster.com. Like FlyOnTime.us, FlightCaster predicted if a flight in the United States was likely to be delayed. To make the predictions, it analyzed every flight over the previous ten years, matched against historic and current weather data.

  Interestingly, the data holders themselves couldn’t do that. None had the incentive—or the regulatory mandate—to use the data in this way. In fact, if the data sources—the U.S. Bureau of Transportation Statistics, the Federal Aviation Administration, and the National Weather Service—had dared to predict commercial flight delays, Congress would have probably held hearings and bureaucrats’ heads would have rolled. And the airlines couldn’t do it—or wouldn’t. They benefit from keeping their middling performance as obscure as possible. Instead, achieving it took a bunch of engineers in hoodies. In fact, FlightCaster’s predictions were so uncannily accurate that even airline employees started using them: airlines don’t want to announce delays until the very last minute, so although they’re the ultimate source of the information, they aren’t the most timely source.

  Because of its big-data mindset—its inspired realization that publicly available data could be processed in a way that offered answers that millions of people would crave—Cross’s FlightCaster was a first mover, but just barely. In the same month that FlightCaster was launched, the geeks behind FlyOnTime.us began cobbling together open data to build their site. The advantage that FlightCaster enjoyed would soon ebb. In January 2011 Cross and his partners sold the firm to Next Jump, a company that manages corporate-discount programs using big-data techniques.

  Then Cross turned his sights on another aging industry where he spotted a niche that an outside innovator could enter: the news media. His startup company Prismatic aggregates and ranks content from across the Web on the basis of text analysis, user preferences, social-network-related popularity, and big-data analytics. Importantly, the system does not make a big distinction between a teenager’s blog post, a corporate website, and an article in the Washington Post: if the content is deemed relevant and popular (by how widely it is viewed and how much it is shared), it appears at the top of the screen.

  As a service, Prismatic is a recognition of the ways a younger generation is interacting with the media. For them the source of information has lost its primal importance. This is a humbling reminder to the high priests of mainstream media that the public is in aggregate more knowledgeable than they are, and that cufflinked journalists must compete against bloggers in their bathrobes. Yet the key point is that it is hard to imagine that Prismatic would have emerged from within the media industry itself, ev
en though it collects lots of information. The regulars around the bar of the National Press Club never thought to reuse online data about media consumption. Nor might the analytics specialists in Armonk, New York, or Bangalore, India, have harnessed the information in this way. It took Cross, a louche outsider with disheveled hair and a slacker’s drawl, to presume that by using data he could tell the world what it ought pay attention to better than the editors of the New York Times.

  The notion of the big-data mindset, and the role of a creative outsider with a brilliant idea, are not unlike what happened at the dawn of e-commerce in the mid-1990s, when the pioneers were unencumbered by the entrenched thinking or institutional restraints of older industries. Thus a hedge-fund quant, not Barnes & Noble, founded an online bookstore (Amazon’s Jeff Bezos). A software developer, not Sotheby’s, built an auction site (eBay’s Pierre Omidyar). Today the entrepreneurs with the big-data mindset often don’t have the data when they start. But because of this, they also don’t have the vested interests or financial disincentives that might prevent them from unleashing their ideas.

  As we’ve seen, there are cases where one firm combines many of these big-data characteristics. Etzioni and Cross may have had their killer ideas before others did, but they had the skills as well. The factory hands at Teradata and Accenture don’t just punch a clock; they too are known to have a great notion from time to time. Still, the archetypes are helpful as a way to appreciate the roles that different firms play. Today’s pioneers of big data often come from disparate backgrounds and cross-apply their data skills in a wide variety of areas. A new generation of angel investors and entrepreneurs is emerging, notably from among ex-Googlers and the so-called PayPal Mafia (the firm’s former leaders like Peter Thiel, Reid Hoffman, and Max Levchin). They, along with a handful of academic computer scientists, are some of the biggest backers of today’s data-infused startups.

  The creative vision of individuals and firms in the big data food-chain helps us reassess the worth of companies. For instance, Salesforce.com may not simply be a useful platform for firms to host their corporate applications: it is also well placed to unleash value from the data that flows atop its infrastructure. Mobile phone companies, as we saw in the previous chapter, collect a gargantuan amount of data but are often culturally blinded to its worth. They could, however, license it to others who are able to extract novel value from it—just as Twitter decided to grant the rights to license its data to two outside companies.

  Some fortunate enterprises straddle the different domains as a matter of conscious strategy. Google collects data like search-query typos, has the bright idea to use it to create a spell checker, and enjoys the in-house skills to execute the idea brilliantly. With many of its other activities, too, Google benefits from vertical integration in the big-data value chain, where it occupies all three positions at once. At the same time, Google also makes some of its data available to others via application programming interfaces (APIs) so it can be reused and further value can be added. One example is Google’s maps, which are used throughout the Web by everyone from real estate agencies to government websites for free (though heavily visited websites have to pay).

  Amazon, too, has the mindset, the expertise, and the data. In fact, the company approached its business model in that order, which is the inverse of the norm. It initially only had the idea for its celebrated recommendation system. Its stock market prospectus in 1997 described “collaborative filtering” before Amazon knew how it would work in practice or had enough data to make it useful.

  Both Google and Amazon span the categories, but their strategies differ. When Google first sets out to collect any sort of data, it has secondary uses in mind. Its Street View cars, as we have seen, collected GPS information not just for its map service but also to train self-driving cars. By contrast, Amazon is more focused on the primary use of data and only taps the secondary uses as a marginal bonus. Its recommendation system, for example, relies on clickstream data as a signal, but the company hasn’t used the information to do extraordinary things like predict the state of the economy or flu outbreaks.

  Despite Amazon’s Kindle e-book readers’ being capable of showing whether a certain page has been heavily annotated and underlined by users, the firm does not sell that information to authors and publishers. Marketers would love to learn which passages are most popular and use that knowledge to sell books better. Authors might like to know where in their lofty tomes most readers give up, and could use that information to improve their work. Publishers might spot themes that herald the next big book. But Amazon seems to leave the field of data to lie fallow.

  Harnessed shrewdly, big data can transform companies’ business models and the ways that long-standing partners interact. In one stunning case, a large European carmaker reshaped its commercial relationship with a parts supplier by harnessing usage data that the component manufacturer lacked. (Because we learned this example on a background basis from one of the principal firms that crunched the data, we regrettably cannot disclose the company names.)

  Cars today are stuffed with chips, sensors, and software that upload performance data to the carmakers’ computers when the vehicle is serviced. Typical mid-tier vehicles now have some 40 microprocessors; all of a car’s electronics account for one-third of its costs. This makes the cars fitting successors to the ships Maury called “floating observatories.” The ability to gather data about how car parts are actually used on the road—and to reincorporate this data to improve them—is turning out to be a big competitive advantage for the firms that can get hold of the information.

  Working with an outside analytics firm, the carmaker was able to spot that a sensor in the fuel tank made by a German supplier was doing terribly, producing a score of erroneous alarms for every valid one. The company could have handed that information to the supplier and requested the adjustment. In a more gentlemanly era of business it might have done just that. But the manufacturer had been spending a fortune on its analytics program. It wanted to use this information to recoup some of its investment.

  The company pondered its options. Should it sell the data? How would the info be valued? What if the supplier balked, and the carmaker was stuck with a poorly functioning part? And it knew that if it handed over the information, similar parts that went into its competitors’ vehicles would also be improved. Ensuring that the improvement would only benefit its own vehicles seemed a shrewder move. In the end, the auto manufacturer came up with a novel idea. It found a way to improve the part with modified software, received a patent on the technique, then sold the patent to the supplier—and earned a pretty penny in the process.

  The new data intermediaries

  Who holds the most value in the big-data value chain? Today the answer would appear to be those who have the mindset, the innovative ideas. As we saw from the dotcom era, those with a first-mover advantage can really prosper. But this advantage may not hold for very long. As the era of big data moves forward, others will adopt the mindset and the advantage of the early pioneers will diminish, relatively speaking.

  Perhaps, then, the crux of the value is really in the skills? After all, a gold mine isn’t worth anything if you can’t extract the gold. Yet the history of computing suggests otherwise. Today expertise in database management, data science, analytics, machine-learning algorithms, and the like are in hot demand. But over time, as big data becomes more a part of everyday life, as the tools get better and easier to use, and as more people acquire the expertise, the value of the skills will also diminish in relative terms. Similarly, computer programming ability became more common between the 1960s and 1980s. Today, offshore outsourcing firms have reduced the value of programming even more; what was once the paragon of technical acumen is now an engine of development for the world’s poor. This isn’t to say that big-data expertise is unimportant. But it isn’t the most crucial source of value, since one can bring it in from the outside.

  Today, in big data’s early stages, th
e ideas and the skills seem to hold the greatest worth. But eventually most value will be in the data itself. This is because we’ll be able to do more with the information, and also because data holders will better appreciate the potential value of the asset they possess. As a result, they’ll probably hold it more tightly than ever, and charge outsiders a high price for access. To continue with the metaphor of the gold mine: the gold itself will matter most.

  However, there is an important dimension to data holders’ long-term rise that deserves noting. In some cases, “data intermediaries” will emerge that are able to collect data from multiple sources, aggregate it, and do innovative things with it. The data holders will let these intermediaries perform this role because some of the data’s value can only be reaped through them.

  An example is Inrix, a traffic-analysis firm based outside Seattle. It compiles real-time geo-location data from 100 million vehicles in North America and Europe. The data comes from cars by BMW, Ford, and Toyota, among others, as well as from commercial fleets like taxis and delivery vans. It also obtains data from individual drivers’ mobile phones (its free smartphone apps are important here: users get traffic info, Inrix gets their coordinates in return). Inrix combines this information with data on historical traffic patterns, weather, and other things like local events to predict how traffic will flow. The product from its data assembly line is relayed to cars’ navigation systems, and is used by governments and commercial fleets.

  Inrix is the quintessential independent data intermediary. It collects its information from numerous rival car companies and thereby generates a product more valuable than any of them could have achieved on its own. Each carmaker may have a few million data points from its vehicles on the road. Though it could use the data to predict traffic flows, those predictions wouldn’t be very accurate or complete. The predictive quality improves as the amount of data increases. Also, the car companies may not have the skills: their competence is mostly bending metal, not pondering Poisson distributions. So they all have an incentive to turn to a third party to do the job. Besides, though traffic prediction is important to drivers, it hardly influences whether or not someone buys a particular car. So the competitors don’t mind joining forces in this way.

 

‹ Prev