The Signal and the Noise
Page 14
The Weather Service’s origins are still apparent in its culture today. Its weather forecasters work around the clock for middling salaries27 and they see themselves as public servants. The meteorologists I met in Camp Springs were patriotic people, rarely missing an opportunity to remind me about the importance that weather forecasting plays in keeping the nation’s farms, small businesses, airlines, energy sector, military, public services, golf courses, picnic lunches, and schoolchildren up and running, all for pennies on the dollar. (The NWS gets by on just $900 million per year28—about $3 per U.S. citizen—even though weather has direct effects on some 20 percent of the nation’s economy.29)
Jim Hoke, one of the meteorologists I met, is the director of the NWS’s Hydrometeorological Prediction Center. He is also a thirty-five-year veteran of the field, having taken his turn both on the computational side of the NWS (helping to build the computer models that his forecasters use) and on the operational side (actually making those forecasts and communicating them to the public). As such, he has some perspective on how man and machine intersect in the world of meteorology.
What is it, exactly, that humans can do better than computers that can crunch numbers at seventy-seven teraFLOPS? They can see. Hoke led me onto the forecasting floor, which consisted of a series of workstations marked with blue overhanging signs with such legends as MARITIME FORECAST CENTER and NATIONAL CENTER. Each station was manned by one or two meterologists—accompanied by an armada of flat-screen monitors that displayed full-color maps of every conceivable type of weather data for every corner of the country. The forecasters worked quietly and quickly, with a certain amount of Grant’s military precision.30
Some of the forecasters were drawing on these maps with what appeared to be a light pen, painstakingly adjusting the contours of temperature gradients produced by the computer models—fifteen miles westward over the Mississippi Delta, thirty miles northward into Lake Erie. Gradually, they were bringing them one step closer to the Platonic ideal they were hoping to represent.
The forecasters know the flaws in the computer models. These inevitably arise because—as a consequence of chaos theory—even the most trivial bug in the model can have potentially profound effects. Perhaps the computer tends to be too conservative on forecasting nighttime rainfalls in Seattle when there’s a low-pressure system in Puget Sound. Perhaps it doesn’t know that the fog in Acadia National Park in Maine will clear up by sunrise if the wind is blowing in one direction, but can linger until midmorning if it’s coming from another. These are the sorts of distinctions that forecasters glean over time as they learn to work around the flaws in the model, in the way that a skilled pool player can adjust to the dead spots on the table at his local bar.
The unique resource that these forecasters were contributing was their eyesight. It is a valuable tool for forecaters in any discipline—a visual inspection of a graphic showing the interaction between two variables is often a quicker and more reliable way to detect outliers in your data than a statistical test. It’s also one of those areas where computers lag well behind the human brain. Distort a series of letters just slightly—as with the CAPTCHA technology that is often used in spam or password protection—and very “smart” computers get very confused. They are too literal-minded, unable to recognize the pattern once its subjected to even the slightest degree of manipulation. Humans by contrast, out of pure evolutionary necessity, have very powerful visual cortexes. They rapidly parse through any distortions in the data in order to identify abstract qualities like pattern and organization—qualities that happen to be very important in different types of weather systems.
FIGURE 4-3: CAPTCHA
Indeed, back in the old days when meterological computers weren’t much help at all, weather forecasting was almost entirely a visual process. Rather than flat screens, weather offices were instead filled with a series of light tables, illuminating maps that meterologists would mark with chalk or drafting pencils, producing a weather forecast fifteen miles at a time. Although the last light table was retired many years ago, the spirit of the technique survives today.
The best forecasters, Hoke explained, need to think visually and abstractly while at the same time being able to sort through the abundance of information the computer provides them with. Moreover, they must understand the dynamic and nonlinear nature of the system they are trying to study. It is not an easy task, requiring vigorous use of both the left and right brain. Many of his forecasters would make for good engineers or good software designers, fields where they could make much higher incomes, but they choose to become meteorologists instead.
The NWS keeps two different sets of books: one that shows how well the computers are doing by themselves and another that accounts for how much value the humans are contributing. According to the agency’s statistics, humans improve the accuracy of precipitation forecasts by about 25 percent over the computer guidance alone,31 and temperature forecasts by about 10 percent.32 Moreover, according to Hoke, these ratios have been relatively constant over time: as much progress as the computers have made, his forecasters continue to add value on top of it. Vision accounts for a lot.
Being Struck by Lightning Is Increasingly Unlikely
When Hoke began his career, in the mid-’70s, the jokes about weather forecasters had some grounding in truth. On average, for instance, the NWS was missing the high temperature by about 6 degrees when trying to forecast it three days in advance (figure 4-4). That isn’t much better than the accuracy you could get just by looking up a table of long-term averages. The partnership between man and machine is paying big dividends, however. Today, the average miss is about 3.5 degrees, meaning that almost half the inaccuracy has been stripped out.
FIGURE 4-4: AVERAGE HIGH TEMPERATURE ERROR IN NWS FORECASTS
Weather forecasters are also getting better at predicting severe weather. What are your odds of being struck—and killed—by lightning? Actually, this is not a constant number; they depend on how likely you are to be outdoors when lightning hits and unable to seek shelter in time because you didn’t have a good forecast. In 1940, the chance of an American being killed by lightning in a given year was about 1 in 400,000.33 Today, it’s just 1 chance in 11,000,000, making it almost thirty times less likely. Some of this reflects changes in living patterns (more of our work is done indoors now) and improvement in communications technology and medical care, but it’s also because of better weather forecasts.
Perhaps the most impressive gains have been in hurricane forecasting. Just twenty-five years ago, when the National Hurricane Center tried to forecast where a hurricane would hit three days in advance of landfall, it missed by an average of 350 miles.34 That isn’t very useful on a human scale. Draw a 350-mile radius outward from New Orleans, for instance, and it covers all points from Houston, Texas, to Tallahassee, Florida (figure 4-5). You can’t evacuate an area that large.
FIGURE 4-5: IMPROVEMENT IN HURRICANE TRACK FORECASTING
Today, however, the average miss is only about one hundred miles, enough to cover only southeastern Louisiana and the southern tip of Mississippi. The hurricane will still hit outside that circle some of the time, but now we are looking at a relatively small area in which an impact is even money or better—small enough that you could plausibly evacuate it seventy-two hours in advance. In 1985, by contrast, it was not until twenty-four hours in advance of landfall that hurricane forecasts displayed the same skill. What this means is that we now have about forty-eight hours of additional warning time before a storm hits—and as we will see later, every hour is critical when it comes to evacuating a city like New Orleans.*
The Weather Service hasn’t yet slain Laplace’s Demon, but you’d think they might get more credit than they do. The science of weather forecasting is a success story despite the challenges posed by the intricacies of the weather system. As you’ll find throughout this book, cases like these are more the exception than the rule when it comes to making forecasts. (Save your jokes for the ec
onomists instead.)
Instead, the National Weather Service often goes unappreciated. It faces stiff competition from private industry,35 competition that occurs on a somewhat uneven playing field. In contrast to most of its counterparts around the world, the Weather Service is supposed to provide its model data free of charge to anyone who wants it (most other countries with good weather bureaus charge licensing or usage fees for their government’s forecasts). Private companies like AccuWeather and the Weather Channel can then piggyback off their handiwork to develop their own products and sell them commercially. The overwhelming majority of consumers get their forecast from one of the private providers; the Weather Channel’s Web site, Weather.com, gets about ten times more traffic than Weather.gov.36
I am generally a big fan of free-market competition, or competition between the public and private sectors. Competition was a big part of the reason that baseball evolved as quickly as it did to better combine the insights gleaned from scouts and statistics in forecasting the development of prospects.
In baseball, however, the yardstick for competition is clear: How many ballgames did you win? (Or if not that, how many ballgames did you win relative to how much you spent.) In weather forecasting, the story is a little more complicated, and the public and private forecasters have differing agendas.
What Makes a Forecast Good?
“A pure researcher wouldn’t be caught dead watching the Weather Channel, but lots of them do behind closed doors,” Dr. Bruce Rose, the affable principal scientist and vice president at the Weather Channel (TWC), informed me. Rose wasn’t quite willing to say that TWC’s forecasts are better than those issued by the government, but they are different, he claimed, and oriented more toward the needs of a typical consumer.
“The models typically aren’t measured on how well they predict practical weather elements,” he continued. “It’s really important if, in New York City, you get an inch of rain rather than ten inches of snow.37 That’s a huge [distinction] for the average consumer, but scientists just aren’t interested in that.”
Much of Dr. Rose’s time, indeed, is devoted to highly pragmatic and even somewhat banal problems related to how customers interpret his forecasts. For instance: how to develop algorithms that translate raw weather data into everyday verbiage. What does bitterly cold mean? A chance of flurries? Just where is the dividing line between partly cloudy and mostly cloudy? The Weather Channel needs to figure this out, and it needs to establish formal rules for doing so, since it issues far too many forecasts for the verbiage to be determined on an ad hoc basis.
Sometimes the need to adapt the forecast to the consumer can take on comical dimensions. For many years, the Weather Channel had indicated rain on their radar maps with green shading (occasionally accompanied by yellow and red for severe storms). At some point in 2001, someone in the marketing department got the bright idea to make rain blue instead—which is, after all, what we think of as the color of water. The Weather Channel was quickly beseiged with phone calls from outraged—and occasionally terrified—consumers, some of whom mistook the blue blotches for some kind of heretofore unknown precipitation (plasma storms? radioactive fallout?). “That was a nuclear meltdown,” Dr. Rose told me. “Somebody wrote in and said, ‘For years you’ve been telling us that rain is green—and now it’s blue? What madness is this?’”
But the Weather Channel also takes its meteorology very seriously. And at least in theory, there is reason to think that they might be able to make a better forecast than the government. The Weather Channel, after all, gets to use all of the government’s raw data as their starting point and then add whatever value they might be able to contribute on their own.
The question is, what is a “better” forecast? I’ve been defining it simply as a more accurate one. But there are some competing ideas, and they are pertinent in weather forecasting.
An influential 1993 essay38 by Allan Murphy, then a meteorologist at Oregon State University, posited that there were three definitions of forecast quality that were commonplace in the weather forecasting community. Murphy wasn’t necessarily advocating that one or another definition was better; he was trying to faciliate a more open and honest conversation about them. Versions of these definitions can be applied in almost any field in which forecasts or predictions are made.
One way to judge a forecast, Murphy wrote—perhaps the most obvious one—was through what he called “quality,” but which might be better defined as accuracy. That is, did the actual weather match the forecast?
A second measure was what Murphy labeled “consistency” but which I think of as honesty. However accurate the forecast turned out to be, was it the best one the forecaster was capable of at the time? Did it reflect her best judgment, or was it modified in some way before being presented to the public?
Finally, Murphy said, there was the economic value of a forecast. Did it help the public and policy makers to make better decisions?
Murphy’s distinction between accuracy and honesty is subtle but important. When I make a forecast that turns out to be wrong, I’ll often ask myself whether it was the best forecast I could have made given what I knew at the time. Sometimes I’ll conclude that it was: my thought process was sound; I had done my research, built a good model, and carefully specified how much uncertainty there was in the problem. Other times, of course, I’ll find that there was something I didn’t like about it. Maybe I had too hastily dismissed a key piece of evidence. Maybe I had overestimated the predictability of the problem. Maybe I had been biased in some way, or otherwise had the wrong incentives.
I don’t mean to suggest that you should beat yourself up every time your forecast is off the mark. To the contrary, one sign that you have made a good forecast is that you are equally at peace with however things turn out—not all of which is within your immediate control. But there is always room to ask yourself what objectives you had in mind when you made your decision.
In the long run, Murphy’s goals of accuracy and honesty should converge when we have the right incentives. But sometimes we do not. The political commentators on The McLaughlin Group, for instance, probably cared more about sounding smart on telvision than about making accurate predictions. They may well have been behaving rationally. But if they were deliberately making bad forecasts because they wanted to appeal to a partisan audience, or to be invited back on the show, they failed Murphy’s honesty-in-forecasting test.
Murphy’s third criterion, the economic value of a forecast, can complicate matters further. One can sympathize with Dr. Rose’s position that, for instance, a city’s forecast might deserve more attention if it is close to its freezing point, and its precipitation might come down as rain, ice, or snow, each of which would have different effects on the morning commute and residents’ safety. This, however, is more a matter of where the Weather Channel focuses its resources and places its emphasis. It does not necessarily impeach the forecast’s accuracy or honesty. Newspapers strive to ensure that all their articles are accurate and honest, but they still need to decide which ones to put on the front page. The Weather Channel has to make similar decisions, and the economic impact of a forecast is a reasonable basis for doing so.
There are also times, however, when the goals may come into more conflict, and commercial success takes precedence over accuracy.
When Competition Makes Forecasts Worse
There are two basic tests that any weather forecast must pass to demonstrate its merit:
It must do better than what meteorologists call persistence: the assumption that the weather will be the same tomorrow (and the next day) as it was today.
It must also beat climatology, the long-term historical average of conditions on a particular date in a particular area.
These were the methods that were available to our ancestors long before Richardson, Lorenz, and the Bluefire came along; if we can’t improve on them, then all that expensive computer power must not be doing much good.
We have
lots of data, going back at least to World War II, on past weather outcomes: I can go to Wunderground.com, for instance, and tell you that the weather at 7 A.M. in Lansing, Michigan, on January 13, 1978—the date and time when I was born—was 18 degrees with light snow and winds from the northeast.39 But relatively few people had bothered to collect information on past weather forecasts. Was snow expected in Lansing that morning? It was one of the few pieces of information that you might have expected to find on the Internet but couldn’t.
In 2002 an entrepeneur named Eric Floehr, a computer science graduate from Ohio State who was working for MCI, changed that. Floehr simply started collecting data on the forecasts issued by the NWS, the Weather Channel, and AccuWeather, to see if the government model or the private-sector forecasts were more accurate. This was mostly for his own edification at first—a sort of very large scale science fair project—but it quickly evolved into a profitable business, ForecastWatch.com, which repackages the data into highly customized reports for clients ranging from energy traders (for whom a fraction of a degree can translate into tens of thousands of dollars) to academics.
Floehr found that there wasn’t any one clear overall winner. His data suggests that AccuWeather has the best precipitation forecasts by a small margin, that the Weather Channel has slightly better temperature forecasts, and the government’s forecasts are solid all around. They’re all pretty good.
But the further out in time these models go, the less accurate they turn out to be (figure 4-6). Forecasts made eight days in advance, for example, demonstate almost no skill; they beat persistence but are barely better than climatology. And at intervals of nine or more days in advance, the professional forecasts were actually a bit worse than climatology.