Big Data: A Revolution That Will Transform How We Live, Work, and Think

Home > Other > Big Data: A Revolution That Will Transform How We Live, Work, and Think > Page 14
Big Data: A Revolution That Will Transform How We Live, Work, and Think Page 14

by Viktor Mayer-Schonberger


  In parallel, communities of web developers and visionary thinkers have formed around the data to figure out ways to get the most from it, such as Code for America and the Sunlight Foundation in the United States and the Open Knowledge Foundation in Britain.

  An early example of the possibilities of open data comes from a website called FlyOnTime.us. Visitors to the site can interactively find out (among many other correlations) how likely it is that in- clement weather will delay flights at a particular airport. The web- site combines flight and weather information from official data sources that are freely available and accessible through the Internet. It was developed by open-data advocates to show the usefulness of information amassed by the federal government. Even the site’s software code is open-source, so others can learn from it and reuse it.

  FlyOnTime.us lets the data do the talking, and it often says surprising things. One can see that for flights from Boston to New York’s LaGuardia Airport, travelers need to be prepared for delays twice as long for fog than for snow. This probably isn’t what most people would have guessed as they milled about the departure lounge; snow would have seemed like a bigger reason for a delay. But it is the sort of insight that big data makes possible, when one crunches historical flight-delay data from the Bureau of Transportation with current airport information from the Federal Aviation Administration, alongside past weather reports from the National Oceanic and Atmospheric Administration and real-time conditions from the National Weather Service. FlyOnTime.us highlights how an entity that does not collect or control information flows, like a search engine or big retailer, can still obtain and use data to create value.

  Valuing the priceless

  Whether open to the public or locked away in corporate vaults, data’s value is hard to measure. Consider the events of Friday, May 18, 2012. On that day, Facebook’s 28-year-old founder Mark Zuckerberg symbolically rang NASDAQ’s opening bell from the company’s headquarters in Menlo Park, California. The world’s biggest social network—which boasted around one out of every ten people on the planet as a member—began its new life as a public company. The stock immediately jumped 11 percent, as many new technology stocks do on their first day of trading. However, then something odd happened. Facebook shares began to fall. It didn’t help that there was a technical glitch with NASDAQ’s computers that temporarily halted trading. A bigger problem was afoot. Sensing trouble, the stock’s underwriters, led by Morgan Stanley, actually propped up the listing so that it would stay above its issue price.

  The evening before, Facebook’s banks had priced the company at $38 per share, which translated into a $104 billion valuation. (That, by way of comparison, was roughly the market capitalizations of Boeing, General Motors, and Dell Computers combined.) What was Facebook actually worth? In its audited financial accounts for 2011, with which investors sized up the company, Facebook reported assets of $6.3 billion. That represented the value of its computer hardware, office equipment, and other physical stuff. As for the book value placed on the vast stores of information that Facebook held in its corporate vault? Basically zero. It wasn’t included, even though the company is almost nothing but data.

  The situation gets odder. Doug Laney, vice president of research at Gartner, a market research firm, crunched the numbers during the period before the initial public offering (IPO) and reckoned that Facebook had collected 2.1 trillion pieces of “monetizable content” between 2009 and 2011, such as “likes,” posted material, and comments. Compared against its IPO valuation, this means that each item, considered as a discrete data point, had a value of about five cents. Another way of looking at it is that every Facebook user was worth around $100, since users are the source of the information that Facebook collects.

  How to explain the vast divergence between Facebook’s worth under accounting standards ($6.3 billion) and what the market initially valued it at ($104 billion)? There is no good way to do so. Rather, there is widespread agreement that the current method of determining corporate worth, by looking at a company’s “book value” (that is, mostly, the worth of its cash and physical assets), no longer adequately reflects the true value. In fact, the gap between book value and “market value”—what the company would fetch on the stock market or if it were bought outright—has been growing for decades. The U.S. Senate even held hearings in the year 2000 about modernizing the financial reporting rules, which emerged in the 1930s when information-based businesses scarcely existed. The issue affects more than just a company’s balance sheet: the inability to properly value corporate worth arguably produces business risk and market volatility.

  The difference between a company’s book value and its market value is accounted for as “intangible assets.” It has grown from around 40 percent of the value of publicly traded companies in the United States in the mid-1980s to three-fourths of their value at the dawn of the new millennium. This is a hefty divergence. These intangible assets are considered to include brand, talent, and strategy—anything that’s not physical and part of the formal financial-accounting system. But increasingly, intangible assets are coming to mean the data that companies hold and use, too.

  Ultimately, what this shows is that there is currently no obvious way to value data. The day Facebook’s shares opened, the gap between its formal assets and its unrecorded intangible value was nearly $100 billion. This is ludicrous. Yet gaps like this must and will close as companies find ways to record the value of their data assets on their balance sheets.

  Baby steps in this direction are under way. A senior executive at one of America’s largest wireless operators confided that the carrier recognized the immense value of its data and studied whether to treat it as a corporate asset in formal accounting terms. But as soon as the company’s lawyers heard about the initiative, they stopped it in its tracks. Putting the data on the books may make the firm legally liable for it, the legal eagles argued, which they thought was not such a good idea.

  Meanwhile, investors will also start to take notice of the option value of data. Share prices may swell for companies that have data or can collect it easily, while others in less fortunate positions may see their market valuations shrink. The data does not have to formally show up on the balance sheets for this to happen. Markets and investors will price these intangible assets into their valuations—albeit with difficulty, as the seesawing of Facebook’s share price in its first few months attests. But as accounting quandaries and liability concerns are alleviated, it is almost certain that the value of data will show up on corporate balance sheets and emerge as a new asset class.

  How will data be valued? Calculating its worth will no longer mean simply adding up what’s gained from its primary use. Yet if most of data’s value is latent and derived from unknown future secondary uses, it is not immediately clear how one might go about estimating it. This resembles the difficulties of pricing financial derivatives prior to the development of the Black-Scholes equation in the 1970s, or the difficulty in valuing patents, where auctions, exchanges, private sales, licensing, and lots of litigation are slowly creating a market for knowledge. If nothing else, putting a price tag on data’s option value certainly represents a rich opportunity for the financial sector.

  One way to start is to look at the different strategies data holders apply to extract value. The most obvious possibility is for the firm’s own use. It is unlikely, however, that a company is capable of uncovering all of the data’s latent value. More ambitiously, therefore, one could license the data to third parties. In the big-data age, many data holders may want to opt for an arrangement that pays them a percentage of the value extracted from the data rather than a fixed fee. It is similar to how publishers pay a percentage of book, music, or movie sales as royalties to authors and performers. It also resembles intellectual property deals in biotechnology, where licensors may demand royalties on any subsequent inventions that spring from their technology. This way all parties have an incentive to maximize the value gained from data’s reuse.


  However, because the licensee may fail to extract the full option value, data holders may not want to grant access to their troves exclusively. Rather, “data promiscuity” may become the norm. That way data holders can hedge their bets.

  A number of marketplaces have sprung up to experiment with ways to price data. DataMarket, founded in Iceland in 2008, provides access to free datasets from other sources, such as the United Nations, the World Bank, and Eurostat, and earns revenue by reselling data from commercial providers like market research firms. Other startups have tried to be information middlemen, platforms for third parties to share their data either for free or for a fee. The idea is to let anyone sell the data they happen to have in their databases, just as eBay provided a platform for people to sell the stuff in their attic. Import.io encourages firms to license their data that might otherwise get “scraped” from the Net and used for free. And Factual, founded by a former Googler, Gil Elbaz, is making available datasets it takes the time to compile itself.

  Microsoft has entered the arena with the Windows Azure Marketplace. It aims to focus on high-quality data and oversee what is on offer, similar to the way Apple supervises the offerings in its app store. In Microsoft’s vision, a marketing executive working on an Excel spreadsheet may want to cross-tabulate her internal company data against GDP growth forecasts from an economic consultancy. So she clicks to buy the data then and there, and it instantly flows into her columns on the screen.

  So far there’s no telling how the valuation models will play out. But what’s certain is that economies are starting to form around data—and that many new players stand to benefit, while a number of old ones will probably find a surprising new lease on life. “Data is a platform,” in the words of Tim O’Reilly, a technology publisher and savant of Silicon Valley, since it is a building block for new goods and business models.

  The crux of data’s worth is its seemingly unlimited potential for reuse: its option value. Collecting the information is crucial but not enough, since most of data’s value lies in its use, not its mere possession. In the next chapter, we examine how the data is actually being used and the big-data businesses that are emerging.

  7

  IMPLICATIONS

  IN 2011 A CLEVER STARTUP in Seattle called Decide.com opened its online doors with fantastically bold ambitions. It wanted to be a price-prediction engine for zillions of consumer products. But it planned to start relatively modestly: with every possible tech gadget, from mobile phones and flat-screen televisions to digital cameras. Its computers sucked down data feeds from e-commerce sites and scoured the Web for any other price and product information they could find.

  Prices on the Web constantly change throughout the day, dynamically updating based on countless, intricate factors. So the company needed to collect pricing data at all times. It isn’t just big data but “big text” too, since the system had to analyze words to recognize when a product was being discontinued or a newer model was about to launch, information that consumers ought to know and that affects prices.

  A year later, Decide.com was analyzing four million products using over 25 billion price observations. It identified oddities about retailing that people had never been able to “see” before, like the fact that prices might temporarily increase for older models once new ones are introduced. Most people would purchase the older one figuring it would be cheaper, but depending on when they clicked “buy,” they might pay more. As online stores increasingly use automated pricing systems, Decide.com can spot unnatural, algorithmic price spikes and warn consumers to wait. The company’s predictions, according to its internal measurements, are accurate 77 percent of the time and provide buyers with average potential savings of around $100 per product. So confident is the company, that in cases where its predictions prove incorrect, Decide.com will reimburse the price difference to paying members of the service.

  On the surface, Decide.com sounds like many promising startups that aim to harness information in new ways and earn an honest dollar for their effort. What makes Decide.com special isn’t the data: the company relies on information it licenses from e-commerce sites and scrapes off the Web, where it is free for the taking. It also isn’t technical expertise: the company doesn’t do anything so complex that the only engineers in the world capable of pulling it off are the ones at its own office. Rather, although collecting the data and technical skills are important, the essence of what makes Decide.com special is the idea: the company has a “big-data mindset.” It spied an opportunity and recognized that certain data could be mined to reveal valuable secrets. And if there seem to be echoes between Decide.com and the airfare prediction site Farecast, there is good reason: each is the brainchild of Oren Etzioni.

  In the previous chapter we noted that data is becoming a new source of value in large part because of what we termed its option value, as it’s put to novel purposes. The emphasis was on firms that collect data. Now our regard shifts to the companies that use data, and how they fit into the information value chain. We’ll consider what this means for organizations and for individuals, both in their careers and in their everyday lives.

  Three types of big-data companies have cropped up, which can be differentiated by the value they offer. Think of it as the data, the skills, and the ideas.

  First is the data. These are the companies that have the data or at the least have access to it. But perhaps that is not what they are in the business for. Or, they don’t necessarily have the right skills to extract its value or to generate creative ideas about what is worth unleashing. The best example is Twitter, which obviously enjoys a massive stream of data flowing through its servers but turned to two independent firms to license it to others to use.

  Second are skills. They are often the consultancies, technology vendors, and analytics providers who have special expertise and do the work, but probably do not have the data themselves nor the ingenuity to come up with the most innovative uses for it. In the case of Walmart and Pop-Tarts, for example, the retailer turned to the specialists at Teradata, a data-analytics firm, to help tease out the insights.

  Third is the big-data mindset. For certain firms, the data and the know-how are not the main reasons for their success. What sets them apart is that their founders and employees have unique ideas about ways to tap data to unlock new forms of value. An example is Pete Warden, the geeky co-founder of Jetpac, which makes travel recommendations based on the photos users upload to the site.

  So far, the first two of these elements get the most attention: the skills, which today are scarce, and the data, which seems abundant. A new profession has emerged in recent years, the “data scientist,” which combines the skills of the statistician, software programmer, infographics designer, and storyteller. Instead of squinting into a microscope to unlock a mystery of the universe, the data scientist peers into databases to make a discovery. The McKinsey Global Institute proffers dire predictions about the dearth of data scientists now and in the future (which today’s data scientists like to cite to feel special and to pump up their salaries).

  Hal Varian, Google’s chief economist, famously calls statistician the “sexiest” job around. “If you want to be successful, you want to be complementary and scarce to something that is ubiquitous and cheap,” he says. “Data is so widely available and so strategically important that the scarce thing is the knowledge to extract wisdom from it. That is why statisticians, and database managers and machine learning people, are really going to be in a fantastic position.”

  However, all the focus on the skills and the downplaying of the importance of the data may prove to be short-lived. As the industry evolves, the paucity of personnel will be overcome as the skills that Varian vaunts become commonplace. What’s more, there is a mistaken belief that just because there is so much data around, it is free for the taking or its value is meager. In fact, the data is the critical ingredient. To appreciate why, consider the different parts of the big-data value chain, and how they are
likely to change over time. To start, let’s examine each category—data holder, data specialist, and big-data mindset—in turn.

  The big-data value chain

  The primary substance of big data is the information itself. So it makes sense to look first at the data holders. They may not have done the original collection, but they control access to information and use it themselves or license it to others who extract its value. For instance, ITA Software, a large airline reservation network (after Amadeus, Travelport, and Sabre), provided data to Farecast for its airfare predictions, but did not do the analysis itself. Why not? ITA perceived its business as using the data for the purpose for which it was designed—selling airline tickets—not for ancillary uses. As such, its core competencies were different. Moreover, it would have had to work around Etzioni’s patent.

  The company also chose not to exploit the data because of where it sat on the information value chain. “ITA shied away from projects that involved making commercial use of data too closely related to airline revenue,” recalls Carl de Marcken, a co-founder of ITA Software and its former chief technology officer. “ITA had special access to such data, required to provide ITA’s service, and couldn’t afford to jeopardize that.” Instead, it delicately stayed an arm’s length away by licensing the data but not using it. The majority of the data’s secondary value went to Farecast: to its customers in the form of cheaper tickets, and to its employees and owners from the income Farecast earned off ads, commissions, and eventually the sale of the firm.

 

‹ Prev