The Fifth Risk
Page 11
All sorts of natural phenomena might be modeled and understood with chaos theory. The collapse of the sardine population off the California coast, for example. Or the bizarre long landslides that occurred in the Mojave Desert, where the rocks ended up inexplicably far from where they’d started, given the slope of their journey. “These long run-out landslides are crazy. The question is: how did the rocks end up so far away?” In theory, the new math might explain it. In practice, there wasn’t enough data on the movement of the rocks, or the sardine holocaust, for him or anyone else to study them effectively. The same went for traffic jams, the boom-bust cycles in the wolf and deer populations in the American West, and countless other big events triggered by surprisingly small ones.
Then he happened upon the weather. He’d always been interested in it, but never thought of it as something he might study until he discovered that the U.S. government was sitting on a huge trove of weather data. It resided inside something called the National Oceanic and Atmospheric Administration, which was in turn inside the Department of Commerce—but he didn’t have any idea of that yet. He was just roaming around servers within the U.S. government, the sole supplier of the data he needed if he was going to get his PhD. “The only place I could get the data was the weather.”
Since the end of the Second World War, weather data collection has become one of the greatest illustrations of the possibilities of global collaboration and public-spiritedness. Every day thousands of amateur weather observers report data to their governments, as do a lot of experts aboard commercial planes in the sky and on ships at sea. Every day, twice a day, almost nine hundred weather balloons are released from nine hundred different spots on the globe, ninety-two of them by the U.S. government. A half-dozen countries, including the United States, deploy thousands of buoys to collect weather from the ocean surface. Then there’s the data collected by billion-dollar satellites and fancy radar stations—in the United States alone, the National Weather Service maintains 159 high-resolution Doppler radar sites.
The United States shares its weather data with other countries—just as other countries share their weather data with the United States. But back in 1996, when DJ was hacking the Department of Commerce computer servers, weather data was not generally available to even the most enterprising hacker. “It wasn’t open to the public,” said DJ, “but it turned out there was a hole.” What came through that hole was such a vast trove of information that it overwhelmed the capacity of the computers in the University of Maryland’s math department, so DJ hunted for other computers at the university he might use. “You can get historical data and play with it,” he said “It was the original idea of the internet. I was that guy. I didn’t have a supercomputer. So I just had to steal that, too.”
He’d start work at eight every night, when no one else was using the computers, and go until seven the next morning. He cobbled together enough storage to hold his borrowed treasure. “That was my academic claim to fame,” he said. “That I downloaded the Weather Service’s data.”
As he looked at the data, a couple of things became apparent. First, that the weather forecasts were improving more dramatically than he’d imagined. No one else was paying much attention to this, but for the first time in history the weatherman was becoming useful. Before the Second World War meteorology had been a bit like medicine in the nineteenth century: the demand for expertise was so relentless that the supply had no choice but to make fraudulent appearances. Right through the 1970s, the weather forecaster would look at the available weather information and, relying heavily on his judgment and personal experience, offer a prediction. His vision typically extended no more than thirty-six hours into the future, and even then it was blurry: snow will fall somewhere over these three states. For a very long time the weather had been only theoretically predictable—that is, people had some pretty good ideas about how it might be predicted, without being able actually to predict it.
Around the time DJ began downloading it, the weather data had led to practical progress that shocked even the theoreticians. On March 12, 1993, what became known as the Storm of the Century hit the eastern United States. Its force was incredible: waves in the Gulf of Mexico sank a two-hundred-foot ship. Roofs across southern states collapsed under the weight of the snow. Tornadoes killed dozens of people. Travel ceased along the entire Eastern Seaboard.
But the biggest difference between this storm and those that had come before it was that it had been predicted by a model. Following a segment on CBS Evening News about the siege of the Branch Davidian compound in Waco, Texas, Louis Uccellini, a meteorologist with the National Weather Service, had warned of the coming massive threat.
The TV hosts had treated the nation’s weatherman with amusement—they ended the story by saying, “The weatherman is usually wrong.” But this time he wasn’t. The National Weather Service had relied on its forecasting model, with no human laying hands on the results, and it had predicted the location and severity of the storm five days before it hit. “It was unheard-of,” said Uccellini. “When I started in the 1970s, the idea of predicting extreme events was almost forbidden. How can you see a storm before the storm can be seen? This time, states declared an emergency before the first flake of snow. It was just amazing for us to watch. We sat there wrapping our heads around what we’d done.” Six years after the storm, Uccellini described the advances in weather prediction from about the end of World War II as “one of the major intellectual achievements of the twentieth century.”
The achievements received surprisingly little attention, perhaps because they were, at least at first, difficult to see. It was not as if one day the weather could not be predicted and the next it could be predicted with perfect accuracy. What was happening was a shift in the odds that the weather forecast was right. It was the difference between an ordinary blackjack player and a blackjack player who was counting the cards. Over time the skill means beating, rather than losing to, the house. But at any given moment it is impossible to detect.
DJ could see that this progress was a big deal. A world-historic event. Here you could see chaos theory dramatized, but in reverse. You could rewind history and consider how things might have come out differently if our ability to predict the weather had been even a tiny bit better, or worse. “The failed hostage rescue in Iran was caused by a sandstorm we didn’t see coming,” said DJ. “The Kosovo offensive was so effective because we knew we wouldn’t have cloud cover.” You could pick almost any extreme weather event and imagine a different outcome for it, if only people had known it was coming. The hurricane that struck Galveston, Texas, back in 1900, before anyone thought to name such storms, had struck without warning and killed so many people that no one ever figured out exactly how many had died. Maybe six thousand or maybe twelve thousand. What their grandchildren would know about the weather might have saved them all.
Here was yet another illustration of chaos in life: even slight changes in our ability to predict the weather might have fantastic ripple effects. The weather itself was chaotic. Some slight change in the conditions somewhere on the planet could lead to huge effects elsewhere. The academic meteorologists around DJ knew this; the question was what to do about it. The Department of Meteorology at the University of Maryland, as it happened, had led a new movement in forecasting and spurred the National Weather Service to change its approach to its own models. Before December 1992 the meteorologists had simply plugged the data they had into their forecasting model: wind speeds, barometric pressure, ocean temperatures, and so on. But most of the planet’s weather went unobserved: there was no hard data. As a result, many of the model’s inputs were just estimates—you didn’t actually know the wind speed or barometric pressure or humidity or anything else at every spot on the planet.
An idea pursued at Maryland and a couple of other places was to run the weather model over and over, with different initial weather conditions. Alter the conditions slightly, in reasonable ways. Vary the wind speed, or barometric
pressure at 10,000 feet, or the ocean temperature, or whatever seemed reasonable to vary. (How you did this was its own art.) Do it twenty times and you wind up with twenty different forecasts. A range of forecasts generated a truer prediction of the weather than a single forecast, because it captured the uncertainty of each one. Instead of saying, “Here’s where the hurricane is going,” or “We have no idea where the hurricane is going,” you could say, “We don’t know for sure where the hurricane might go, but we have a cone of probability you can use to make your decisions.”
“Ensemble forecasting,” the new technique was called. It implied that every weather forecast—and not just hurricanes—should include a cone of uncertainty. (Why they don’t is a great question.) “There’s a storm coming on Saturday” means one thing if all the forecasts in the ensemble say the storm is coming. It means another if some of the forecasts say there is no chance of rain on Saturday and others say that a storm is all but certain. Really, the weather predictions should reflect this uncertainty. “Why is the newspaper always giving us a five-day forecast?” asked DJ. “It should be a two-day forecast sometimes. And it should be a fourteen-day forecast other times.”
By the time DJ discovered the security hole in the government’s database, the National Weather Service had taken to ensemble forecasting and was generating a dozen or more forecasts for each day. On some days the forecasts would be largely in agreement: slight changes in the estimates of current weather conditions did not lead to big changes in the future weather. At other times they varied radically. That is, sometimes the weather was highly chaotic and sometimes not. DJ quickly saw that instability was not in any way linked to severity: a Category 5 hurricane might keep on being a Cat 5 hurricane without a whole lot of doubt. Then, other times it wouldn’t. “Why in the case of one storm are the forecasts all the same, and in the case of another they are all different?” he asked. Why was the weather sometimes highly predictable and other times less so? Or as DJ put it, “Why does a butterfly flapping its wings in Brazil cause or not cause a tornado in Oklahoma?”
With the government’s data he was able to contribute a new idea: that the predictability of the weather might itself be quantified. “We all know the weather is chaotic,” he said. “The question is: how chaotic. You should be able to assess when a forecast is likely to go seriously bad, versus when the weather is stable.” In the end his thesis created a new statistic: how predictable the weather was at any given moment.
When he defended his thesis, in the summer of 2001, he was surprised by what the U.S. government’s data had enabled him to do. “As a grad student you’re just like, I hope I have something that doesn’t suck. You don’t actually expect your stuff to work.” He wasn’t a meteorologist. Yet he’d found new ways to describe the weather. He’d also found, in himself, a more general interest: in data. What else might it be used to discover?
The relevance of that ambition became a bit clearer after the terrorist attacks of September 11, 2001. “There was a sense that this was, among other things, a failure of data analysis,” he said. “If we had known how to distinguish signal from noise we’d have seen it and prevented it.‘Hey, why are all these guys suddenly taking flight lessons?’” The assassins’ use of credit cards alone, properly analyzed, would have revealed they were up to no good. “The image of a good network is messy,” said DJ. “It’s really hard to fake messiness. It’s hard to fake being an American with a credit card.”
The big question now in DJ’s world was: How, using data, do you identify threats to U.S. interests? By this time a young postdoc at Maryland, he attended a talk by a guy who ran something called the Defense Threat Reduction Agency. The agency, inside the U.S. Department of Defense, was charged with defending the country against weapons of mass destruction. It was trying to understand terrorist networks so it might disrupt them. “I hear the talk, and I was like, Wait a second,” DJ recalled. “The idea that if you push a network a certain way it might collapse. Is the network stable or unstable? It’s a lot like the question I was asking about weather forecasts.” A terrorist network, like a thunderstorm, might be chaotic. Terrorist networks, along with a lot of other security matters, might be better understood through chaos theory. “If you pull out a node in a terrorist cell, does it collapse? Or the opposite: How do we design our electricity grid so that if you take out a node it does NOT collapse?”
Thinking they would make use of his data skills, he went to work at the Department of Defense, where he expected to look for patterns in terrorist networks. But instead of sticking him at a computer, his new employer shipped him off to a couple of former Soviet republics, to track and understand the stockpiles of biological and chemical weapons left behind by the Russians. “They tell me,‘We need you to go to Uzbekistan and Kazakhstan,’ and I’m like,‘I’m a mathematician.’ That was the first question I asked:‘Why me?’ They said,‘Hey, you’re a doctor.’ And I said,‘I’m not that kind of doctor.’ And they said,‘Close enough, you’ll figure it out.’” After that, they sent him to Iraq, to help rebuild the school system. All of the work was interesting, and a lot of it useful, but it didn’t have much to do with his deep ambition. “People still didn’t really appreciate how you can use data to transform,” he said.
To his surprise, this was true even of people back home where he had grown up, in Silicon Valley, to which he soon returned. Even there he couldn’t get a job doing what he wanted to do with data. “I was just trying to figure out where I could be helpful,” he said. “Google passed on me. Yahoo! passed on me.” His mom knew someone at eBay and so he finally was hired the undignified way. At eBay he tried, and failed, to persuade his superiors to let him use the data on hand to find new ways to detect fraud.
At length he moved to a new, slow-growing company called LinkedIn, where job seekers posted their CVs and attempted to create their own little networks. His new bosses asked him to be Head of Analytics and Data Product Teams. There, for the first time, he found an audience receptive to his pitch. “The same tools you use to identify where bad guys are, you can do with job skills,” he said. “You can show people where skills cluster. Where they might belong in the economy. If you’re trained in the army in ordnance disposal, maybe you’d be good at mining.” The analytics he’d created at LinkedIn had done exactly that—prodded an army bomb expert to find work setting explosives in mines.
Along with much more: in the space of a few years, the interest in data analysis went from curiosity to fad. The fetish for data overran everything from political campaigns to the management of baseball teams. Inside LinkedIn, DJ presided over an explosion of job titles that described similar tasks: analyst, business analyst, data analyst, research sci. The people in human resources complained to him that the company had too many data-related job titles. The company was about to go public, and they wanted to clean up the organization chart. To that end DJ sat down with his counterpart at Facebook, who was dealing with the same problem. What could they call all these data people? “Data scientist,” his Facebook friend suggested. “We weren’t trying to create a new field or anything, just trying to get HR off our backs,” said DJ. He replaced the job titles for some openings with “data scientist.” To his surprise, the number of applicants for the jobs skyrocketed. “Data scientists” were what people wanted to be.
In the fall of 2014 someone from the White House called him. Obama was coming to San Francisco and wanted to meet with him. “He’d seen the power of data in his campaign,” said DJ, “and he knew there was a new opportunity to use it to transform the country.” When the White House asked him if he wanted to bring his wife to the meeting, DJ figured that Obama was looking for more than a conversation. Inside of eight years he’d gone from being a guy who couldn’t get a job in Silicon Valley to being a guy the president of the United States wanted to offer a job he couldn’t refuse. When Obama did ask DJ to move to Washington, it was DJ’s wife who responded. “How do we know if any of this will be of any use?” she asked
. “If your husband is as good as everyone says he is, he’ll figure it out,” said Obama. Which of course made it even harder for DJ to refuse.
DJ went to Washington. His assignment was to figure out how to make better use of the data created by the U.S. government. His title: Chief Data Scientist of the United States. He’d be the first person to hold the job. He made his first call at the Department of Commerce, to meet with Penny Pritzker, the commerce secretary, and Kathy Sullivan, the head of the National Oceanic and Atmospheric Administration. They were pleased to see him but also a bit taken aback that he had come. “They seemed a little surprised I was there,” recalled DJ. “I said,‘I’m the data guy and you’re the data agency. This is where a huge amount of the data is.’ And they’re like,‘Yes, but how did you know?’”
Nobody understood what it did but, then, like so many United States government agencies, the Department of Commerce is seriously misnamed. It has almost nothing to do with commerce directly and is actually forbidden by law from engaging in business. But it runs the United States Census, the only real picture of who Americans are as a nation. It collects and makes sense of all the country’s economic statistics—without which the nation would have very little idea of how it was doing. Through the Patent and Trademark Office it tracks all the country’s inventions. It contains an obscure but wildly influential agency called the National Institute of Standards and Technology, stuffed with Nobel laureates, which does everything from setting the standards for construction materials to determining the definition of a “second” and of an “inch.” (It’s more complicated than you might think.) But of the roughly $9 billion spent each year by the Commerce Department, $5 billion goes to NOAA, and the bulk of that money is spent, one way or another, on figuring out the weather. Each and every day, NOAA collects twice as much data as is contained in the entire book collection of the Library of Congress. “Commerce is one of the most misunderstood jobs in the cabinet, because everyone thinks it works with business,” says Rebecca Blank, a former acting commerce secretary in the Obama administration and now chancellor of the University of Wisconsin. “It produces public goods that are of value to business, but that’s different. Every secretary who comes in thinks Commerce does trade. But trade is maybe ten percent of what Commerce does—if that.” The Department of Commerce should really be called the Department of Information. Or maybe the Department of Data.