Big Data: A Revolution That Will Transform How We Live, Work, and Think
Page 16
Of course, firms in many industries have shared information before, notably insurance underwriters’ laboratories and networked sectors like banking, energy, and telecoms, where exchanging information is critical to avoid problems and regulators at times require it. Market research firms have aggregated industry data for decades, as have companies for specialized tasks like the auditing of newspaper circulation. For some trade associations, it is the core of what they do.
The difference today is that the data is now raw material entering the marketplace; an asset independent of what it had previously aimed to measure. For example, Inrix’s information is more useful than it might seem on the surface. Its traffic analysis is used to measure the health of local economies because it can offer insights about unemployment, retail sales, and leisure activities. When the U.S. economic recovery started to sputter in 2011, signs of it were picked up by traffic analysis despite politicians’ denials that it was happening: rush hours had become less crowded, suggesting more unemployment. Also, Inrix has sold its data to an investment fund that uses traffic patterns around a major retailer’s stores as a proxy for its sales, which the fund uses to trade the company’s shares before its quarterly earning announcements. More cars in the area correlate with better sales.
Other such intermediaries are cropping up within the big-data value chain. An early player was Hitwise, later bought by Experian, which struck deals with Internet service providers to collect their clickstream data in return for some extra income. The data was licensed for a small fixed fee rather than a percentage of the value it produced. Hitwise captured the majority of the value as the intermediary. Another example is Quantcast, which measures online traffic to websites to help them know more about their visitors’ demographics and usage patterns. It gives away an online tool so sites can track visits; in return Quantcast gets to see the data, which enables it to improve its ad targeting.
These new intermediaries have identified lucrative niche positions without threatening the business models of the data holders from which they get their data. For the moment, Internet advertising is one of these niches, since that’s where the most data is, and where there’s a burning need to mine it to target ads. But as more of the world becomes datafied and more industries realize that their core business is learning from data, these independent information intermediaries will emerge elsewhere as well.
Some of the intermediaries may not be commercial enterprises, but nonprofits. For example, the Health Care Cost Institute was created in 2012 by a handful of America’s biggest health insurers. Their combined data amounted to five billion (anonymized) claims involving 33 million people. Sharing the records let the firms spot trends that they might not have been able to see in their smaller individual datasets. Among the first findings was that U.S. medical costs had increased three times faster than inflation in 2009–10, but with pronounced differences at a granular level: emergency-room prices grew by 11 percent while nursing facilities’ prices actually declined. Clearly health insurers would never have handed over their prized data to anything but a nonprofit intermediary. A nonprofit’s motives are less suspect, and the organization can be designed with transparency and accountability in mind.
The variety of big-data firms shows how the value of information is shifting. In the case of Decide.com, the price data is provided by partner websites on a revenue-sharing basis. Decide.com earns commissions when people buy goods through the site, but the companies that supplied the data also get a piece of the action. This suggests a maturation in the way industry works with data: In the past, ITA didn’t receive any commissions on the data it supplied Farecast, only a basic license fee. Now data providers are able to strike more appealing terms. For Etzioni’s next startup, one can presume that he’ll try to supply the data himself, since the value has migrated from the expertise to the idea and is now moving to the data.
Business models are being upended as the value shifts to those who control the data. The European carmaker that struck the intellectual property deal with its supplier had a strong in-house data-analysis team but needed to work with an outside technology vendor to uncover insights from the data. The tech firm was paid for its work, but the carmaker kept the bulk of the profits. Sniffing opportunity, however, the tech company has tweaked its business model to share some of the risk and reward with clients. It has experimented with working for a lower fee in return for sharing some of the wealth that its analysis unleashes. (As for auto-parts suppliers, it is probably safe to say that in the future they all will want to add measurement sensors to their products, or insist on access to performance data as a standard part of the sales contract, in order to continually improve their components.)
As for intermediaries, their lives are complicated because they need to convince companies of the value in sharing. For instance, Inrix has started to collect more than just geo-loco information. In 2012 it ran a trial of analyzing where and when cars’ automatic braking systems (ABS) kicked in, for a carmaker that designed its telemetry system to collect the information in real time. The idea is that frequent triggering of the ABS on a particular stretch of road may imply that conditions there are dangerous, and that drivers should consider alternative routes. So with this data Inrix could recommend not only the shortest route but the safest one as well.
Yet the carmaker doesn’t plan to share this data with others. Instead, it insists that Inrix deploy the system in its cars exclusively. The value of trumpeting the feature is seen to outweigh the gain from aggregating its data with others’ data to improve the system’s overall accuracy. That said, Inrix believes that, in time, all carmakers will see the utility of aggregating all their data. As a data intermediary, Inrix has a strong incentive to cling to such optimism: its business is built entirely on access to multiple data sources.
Companies are also experimenting with different organizational forms in the business of big data. Inrix didn’t stumble upon its business model as many startups do—its role as an intermediary was established by design. Microsoft, which owned the essential patents to the technology, figured that a small, independent firm—rather than a big company—might be perceived as more neutral, and could bring together industry rivals and get the most from its intellectual property. Similarly, the MedStar Washington Hospital Center that used Microsoft’s Amalga software to analyze patient readmissions knew exactly what it was doing with its data: the Amalga system was originally the hospital’s own in-house emergency-room software, called Azyxxi, which it sold in 2006 to Microsoft so that it could be better developed.
In 2010 UPS sold an in-house data-analysis unit, called UPS Logistics Technologies, to the private equity firm Thoma Bravo. Now operating as Roadnet Technologies, the unit is freer to do route analysis for more than one company. Roadnet collects data from many clients to provide an industry-wide benchmarking service used by UPS and its competitors alike. As UPS Logistics, it never would have persuaded its parent firm’s rivals to hand over their datasets, explains Roadnet chief executive Len Kennedy. But after it became independent, UPS’s competitors felt more comfortable supplying their data, and ultimately everyone benefited from the improved accuracy that aggregation brings.
Evidence that data itself, rather than skills or mindset, will come to be most valued can be found in numerous acquisitions in the big-data business. For example, in 2006 Microsoft rewarded Etzioni’s big-data mindset by buying Farecast for around $110 million. But two years later Google paid $700 million to acquire Farecast’s data supplier, ITA Software.
The demise of the expert
In the movie Moneyball, about how the Oakland A’s became a winning baseball team by applying analytics and new types of metrics to the game, there is a delightful scene in which grizzled old scouts are sitting around a table discussing players. The audience can’t help cringing, not simply because the scene exposes the way decisions are made devoid of data, but because we’ve all been in situations where “certainty” was based on sentiment rather tha
n science.
“He’s got a baseball body . . . a good face,” says one scout.
“He’s got a beautiful swing. When it connects, he drives it, it pops off the bat,” chimes in a frail, gray-haired fellow wearing a hearing aid. “A lot of pop off the bat,” another scout concurs.
A third man cuts the conversation short, declaring, “He’s got an ugly girlfriend.”
“What does that mean?” asks the scout leading the meeting.
“An ugly girlfriend means no confidence,” the naysayer explains matter-of-factly.
“OK,” says the leader, satisfied and ready to move on.
After spirited banter, a scout speaks up who had been silent: “This guy’s got an attitude. An attitude is good. I mean, he’s the guy, walks into a room, and his dick’s already been there two minutes.” Adds another: “He passes the eye-candy test. He’s got the looks, he’s ready to play the part. He just needs some playing time.”
“I’m just sayin’,” reiterates the naysayer, “his girlfriend’s a six—at best!”
The scene perfectly depicts the shortcomings of human judgment. What passes for reasoned debate is really based on nothing concrete. Decisions about millions of dollars’ worth of player contracts are made on gut instinct, absent of objective measures. Yes, it is just a film, but real life isn’t much different. Similar empty reasoning is employed from Manhattan boardrooms to the Oval Office to coffee shops and kitchen tables everywhere else.
Moneyball, based on the book by Michael Lewis, tells the true story of Billy Beane, the Oakland A’s general manager who threw out the century-old rulebook on how to value players in favor of a math-infused method that looks at the game from a new set of metrics. Out went time-honored stats like “batting average” and in came seemingly odd ways of thinking about the game like “on-base percentage.” The data-driven approach revealed a dimension to the sport that had always been present but hidden amid the peanuts and Cracker Jack. It didn’t matter how a player got on base, via a bouncy grounder or an ignoble walk, so long as he got on. When the data showed that stealing bases was inefficient, out went one of the most exciting, but least “productive,” elements of the game.
Amid considerable controversy, Beane enshrined in the team’s front office the method known as sabermetrics, a term coined by the sportswriter Bill James in reference to the Society for American Baseball Research, which had until then been the province of a geeky subculture. Beane was challenging the dogma of the dugout, just as Galileo’s heliocentric views had affronted the authority of the Catholic Church. Ultimately he led the long-suffering team to a first-place finish in the American League West in the 2002 season, including a 20-game winning streak. From then on, statisticians supplanted the scouts as the sport’s savants. And lots of other teams scrambled to adopt sabermetrics themselves.
In the same spirit, the biggest impact of big data will be that data-driven decisions are poised to augment or overrule human judgment. In his book Super Crunchers, the Yale economist and law professor Ian Ayers argued that statistical analyses force people to reconsider their instincts. Through big data, this becomes even more essential. The subject-area expert, the substantive specialist, will lose some of his or her luster compared with the statistician and data analyst, who are unfettered by the old ways of doing things and let the data speak. This new cadre will rely on correlations without prejudgments and prejudice, just as Maury didn’t take at face value what wizened skippers had to say about a certain passage over a pint at the pub, but trusted the aggregated data to reveal practical truths.
We are seeing the waning of subject-matter experts’ influence in many areas. In media, the content that gets created and publicized on websites like Huffington Post, Gawker, and Forbes is regularly determined by data, not just the judgment of human editors. The data can reveal what people want to read about better than the instincts of seasoned journalists. The online education company Coursera uses data on what sections of a video lecture students replay to learn what material may have been unclear, and feeds the information back to teachers so they can improve. As we noted earlier, Jeff Bezos got rid of in-house book reviewers at Amazon when the data showed that algorithmic recommendations drove more sales.
This means that the skills necessary to succeed in the workplace are changing. It alters what employees are expected to bring to their organizations. Dr. McGregor, caring for premature babies in Ontario, doesn’t need to be the wisest doctor at the hospital, or the world’s foremost authority on neonatal care, to produce the best results for her patients. In fact, she is not a medical physician at all—she holds a PhD in computer science. But she avails herself of data amounting to more than a decade of patient-years, which the computer crunches and she parlays into recommendations for treatment.
As we’ve seen, the pioneers in big data often come from fields outside the domain where they make their mark. They are specialists in data analysis, artificial intelligence, mathematics, or statistics, and they apply those skills to specific industries. The winners of Kaggle competitions, the online platform for big-data projects, are typically new to the sector in which they produce successful results, explains Kaggle’s chief executive Anthony Goldbloom. A British physicist developed near-winning algorithms to predict insurance claims and identify defective used cars. A Singaporean actuary led a competition to predict biological responses to chemical compounds. Meanwhile, at Google’s machine-translation group, the engineers celebrate their translations of languages that no one in the office speaks. Similarly, statisticians at Microsoft’s machine-translation unit relish trotting out an old quip: that the quality of translations increases whenever a linguist leaves the team.
To be sure, subject-area experts won’t die out. But their supremacy will ebb. From now on, they must share the podium with the big-data geeks, just as princely causation must share the limelight with humble correlation. This transforms the way we value knowledge, because we tend to think that people with deep specialization are worth more than generalists—that fortune favors depth. Yet expertise is like exactitude: appropriate for a small-data world where one never has enough information, or the right information, and thus has to rely on intuition and experience to guide one’s way. In such a world, experience plays a critical role, since it is the long accumulation of latent knowledge—knowledge that one can’t transmit easily or learn from a book, or perhaps even be consciously aware of—that enables one to make smarter decisions.
But when you are stuffed silly with data, you can tap that instead, and to greater effect. Thus those who can analyze big data may see past the superstitions and conventional thinking not because they’re smarter, but because they have the data. (And being outsiders, they are impartial about squabbles within the field that may narrow an expert’s vision to whichever side of a squabble she’s on.) This suggests that what it takes for an employee to be valuable to a company changes. What you need to know changes, whom you need to know changes, and so does what you need to study to prepare for professional life.
Mathematics and statistics, perhaps with a sprinkle of programming and network science, will be as foundational to the modern workplace as numeracy was a century ago and literacy before that. In the past, to be an excellent biologist one needed to know lots of other biologists. That hasn’t changed entirely. Yet today big-data breadth matters too, not just subject-expertise depth. Solving a puzzling biological problem may be as likely to happen through an association with an astrophysicist or a data-visualization designer.
Video gaming is one industry where the lieutenants of big data have already elbowed their way to stand beside the generals of expertise, transforming the industry in the process. The video-game sector is big business, reaping more than the Hollywood box office annually worldwide. In the past, companies would design a game, release it, and hope it became a hit. On the basis of sales data, firms would either prepare a sequel or start a new project. Decisions over the pace of play and elements of the games like characte
rs, plot, objects, and events were based on the creativity of the designers, who took their jobs with the same seriousness as Michelangelo painting the Sistine Chapel. It was art, not science; a world of hunches and instincts, much like that of the baseball scouts in Moneyball.
But those days are over. Zynga’s FarmVille, FrontierVille, FishVille, and other games are online and interactive. On the surface, online gaming allows Zynga to look at usage data and modify the games on the basis of how they’re actually played. So if players are having difficulty advancing from one level to another, or tend to leave at a certain moment because the action loses its pace, Zynga can spot those problems in the data and remedy them. But what is less evident is that the company can tailor games to the traits of individual players. There is not one version of FarmVille—there are hundreds of them.
Zynga’s big-data analysts study whether sales of virtual goods are affected by their color, or by players’ seeing their friends using them. For example, after the data showed that FishVille players bought a translucent fish at six times the rate of other creatures, Zynga offered more translucent species and profited handsomely. In the game Mafia Wars, the data revealed that players bought more weapons with gold borders and purchased pet tigers that were all white.