The Signal and the Noise

Page 19

by Nate Silver

6

HOW TO DROWN IN THREE FEET OF WATER

Political polls are dutifully reported with a margin of error, which gives us a clue that they contain some uncertainty. Most of the time when an economic prediction is presented, however, only a single number is mentioned. The economy will create 150,000 jobs next month. GDP will grow by 3 percent next year. Oil will rise to $120 per barrel.

This creates the perception that these forecasts are amazingly accurate. Headlines expressing surprise at any minor deviation from the prediction are common in coverage of the economy:

Unexpected Jump in Unemployment

Rate to 9.2% Stings Markets

—Denver Post, July 9, 20111

If you read the fine print of that article, you’d discover that the “unexpected” result was that the unemployment rate had come in at 9.2 percent—rather than 9.1 percent2 as economists had forecasted. If a one-tenth of a percentage point error is enough to make headlines, it seems like these forecasts must ordinarily be very reliable.

Instead, economic forecasts are blunt instruments at best, rarely being able to anticipate economic turning points more than a few months in advance. Fairly often, in fact, these forecasts have failed to “predict” recessions even once they were already under way: a majority of economists did not think we were in one when the three most recent recessions, in 1990, 2001, and 2007, were later determined to have begun.3

Forecasting something as large and complex as the American economy is a very challenging task. The gap between how well these forecasts actually do and how well they are perceived to do is substantial.

Some economic forecasters wouldn’t want you to know that. Like forecasters in most other disciplines, they see uncertainty as the enemy—something that threatens their reputation. They don’t estimate it accurately, making assumptions that lower the amount of uncertainty in their forecast models but that don’t improve their predictions in the real world. This tends to leave us less prepared when a deluge hits.

The Importance of Communicating Uncertainty

In April 1997, the Red River of the North flooded Grand Forks, North Dakota, overtopping the town’s levees and spilling more than two miles into the city.*4 Although there was no loss of life, nearly all of the city’s 50,000 residents had to be evacuated, cleanup costs ran into the billions of dollars,5 and 75 percent of the city’s homes were damaged or destroyed.6

Unlike a hurricane or an earthquake, the Grand Forks flood may have been a preventable disaster. The city’s floodwalls could have been reinforced using sandbags.7 It might also have been possible to divert the overflow into depopulated areas—into farmland instead of schools, churches, and homes.

Residents of Grand Forks had been aware of the flood threat for months. Snowfall had been especially heavy in the Great Plains that winter, and the National Weather Service, anticipating runoff as the snow melted, had predicted the waters of the Red River would crest to forty-nine feet, close to the all-time record.

There was just one small problem. The levees in Grand Forks had been built to handle a flood of fifty-one feet. Even a small miss in the forty-nine-foot prediction could prove catastrophic.

In fact, the river crested to fifty-four feet. The Weather Service’s forecast hadn’t been perfect by any means, but a five-foot miss, two months in advance of a flood, is pretty reasonable—about as well as these predictions had done on average historically. The margin of error on the Weather Service’s forecast—based on how well their flood forecasts had done in the past—was about plus or minus nine feet. That implied about a 35 percent chance of the levees being overtopped.8

FIGURE 6-1: FLOOD PREDICTION WITH MARGIN OF ERROR9

The problem is that the Weather Service had explicitly avoided communicating the uncertainty in their forecast to the public, emphasizing only the forty-nine-foot prediction. The forecasters later told researchers that they were afraid the public might lose confidence in the forecast if they had conveyed any uncertainty in the outlook.

Instead, of course, it would have made the public much better prepared—and possibly able to prevent the flooding by reinforcing the levees or diverting the river flow. Left to their own devices, many residents became convinced they didn’t have anything to worry about. (Very few of them bought flood insurance.10) A prediction of a forty-nine-foot crest in the river, expressed without any reservation, seemed to imply that the flood would hit forty-nine feet exactly; the fifty-one-foot levees would be just enough to keep them safe. Some residents even interpreted the forecast of forty-nine feet as representing the maximum possible extent of the flood.11

An oft-told joke: a statistician drowned crossing a river that was only three feet deep on average. On average, the flood might be forty-nine feet in the Weather Service’s forecast model, but just a little bit higher and the town would be inundated.

The National Weather Service has since come to recognize the importance of communicating the uncertainty in their forecasts accurately and honestly to the public, as we saw in chapter 4. But this sort of attitude is rare among other kinds of forecasters, especially when they predict the course of the economy.

Are Economists Rational?

Now consider what happened in November 2007. It was just one month before the Great Recession officially began. There were already clear signs of trouble in the housing market: foreclosures had doubled,12 and the mortgage lender Countrywide was on the verge of bankruptcy.13 There were equally ominous signs in credit markets.14

Economists in the Survey of Professional Forecasters, a quarterly poll put out by the Federal Reserve Bank of Philadelphia, nevertheless foresaw a recession as relatively unlikely. Instead, they expected the economy to grow at a just slightly below average rate of 2.4 percent in 2008. And they thought there was almost no chance of a recession as severe as the one that actually unfolded.

The Survey of Professional Forecasters is unique in that it asks economists to explicitly indicate a range of outcomes for where they see the economy headed. As I have emphasized throughout this book, a probabilistic consideration of outcomes is an essential part of a scientific forecast. If I asked you to forecast the total that will be produced when you roll a pair of six-sided dice, the correct answer is not any single number but an enumeration of possible outcomes and their respective probabilities, as in figure 6-2. Although you will roll 7 more often than any other number, it is not intrinsically any more or any less consistent with your forecast than a roll of 2 or 12, provided that each number comes up in accordance with the probability you assign it over the long run.

The economists in the Survey of Professional Forecasters are asked to do something similar when they forecast GDP and other variables—estimating, for instance, the probability that GDP might come in at between 2 percent and 3 percent, or between 3 percent and 4 percent. This is what their forecast for GDP looked like in November 2007 (figure 6-3):

As I mentioned, the economists in this survey thought that GDP would end up at about 2.4 percent in 2008, slightly below its long-term trend. This was a very bad forecast: GDP actually shrank by 3.3 percent once the financial crisis hit. What may be worse is that the economists were extremely confident in their bad prediction. They assigned only a 3 percent chance to the economy’s shrinking by any margin over the whole of 2008.15 And they gave it only about a 1-in-500 chance of shrinking by at least 2 percent, as it did.16

Indeed, economists have for a long time been much too confident in their ability to predict the direction of the economy. In figure 6-4, I’ve plotted the forecasts of GDP growth from the Survey of Professional Forecasters for the eighteen years between 1993 and 2010.17 The bars in the chart represent the 90 percent prediction intervals as stated by the economists.

A prediction interval is a range of the most likely outcomes that a forecast provides for, much like the margin of error in a poll. A 90 percent prediction interval, for instance, is supposed to cover 90 percent of the possible real-world outcomes, leaving only
the 10 percent of outlying cases at the tail ends of the distribution. If the economists’ forecasts were as accurate as they claimed, we’d expect the actual value for GDP to fall within their prediction interval nine times out of ten, or all but about twice in eighteen years.

FIGURE 6-4: GDP FORECASTS: 90 PERCENT PREDICTION INTERVALS AGAINST ACTUAL RESULTS

In fact, the actual value for GDP fell outside the economists’ prediction interval six times in eighteen years, or fully one-third of the time. Another study,18 which ran these numbers back to the beginnings of the Survey of Professional Forecasters in 1968, found even worse results: the actual figure for GDP fell outside the prediction interval almost half the time. There is almost no chance19 that the economists have simply been unlucky; they fundamentally overstate the reliability of their predictions.

In reality, when a group of economists give you their GDP forecast, the true 90 percent prediction interval—based on how these forecasts have actually performed20 and not on how accurate the economists claim them to be—spans about 6.4 points of GDP (equivalent to a margin of error of plus or minus 3.2 percent).*

When you hear on the news that GDP will grow by 2.5 percent next year, that means it could quite easily grow at a spectacular rate of 5.7 percent instead. Or it could fall by 0.7 percent—a fairly serious recession. Economists haven’t been able to do any better than that, and there isn’t much evidence that their forecasts are improving. The old joke about economists’ having called nine out of the last six recessions correctly has some truth to it; one actual statistic is that in the 1990s, economists predicted only 2 of the 60 recessions around the world a year ahead of time.21

Economists aren’t unique in this regard. Results like these are the rule; experts either aren’t very good at providing an honest description of the uncertainty in their forecasts, or they aren’t very interested in doing so. This property of overconfident predictions has been identified in many other fields, including medical research, political science, finance, and psychology. It seems to apply both when we use our judgment to make a forecast (as Phil Tetlock’s political scientists did) and when we use a statistical model to do so (as in the case of the failed earthquake forecasts that I described in chapter 5).

But economists, perhaps, have fewer excuses than those in other professions for making these mistakes. For one thing, their predictions have not just been overconfident but also quite poor in a real-world sense, often missing the actual GDP figure by a very large and economically meaningful margin. For another, organized efforts to predict variables like GDP have been around for many years, dating back to the Livingston Survey in 1946, and these results are well-documented and freely available. Getting feedback about how well our predictions have done is one way—perhaps the essential way—to improve them. Economic forecasters get more feedback than people in most other professions, but they haven’t chosen to correct for their bias toward overconfidence.

Isn’t economics supposed to be the field that studies the rationality of human behavior? Sure, you might expect someone in another field—an anthropologist, say—to show bias when he makes a forecast. But not an economist.

Actually, however, that may be part of the problem. Economists understand a lot about rationality—which means they also understand a lot about how our incentives work. If they’re making biased forecasts, perhaps this is a sign that they don’t have much incentive to make good ones.

“Nobody Has a Clue”

Given the track record of their forecasts, there was one type of economist I was most inclined to seek out—an economist who would be honest about how difficult his job is and how easily his forecast might turn out to be wrong. I was able to find one: Jan Hatzius, the chief economist at Goldman Sachs.

Hatzius can at least claim to have been more reliable than his competitors in recent years. In November 2007, a time when most economists still thought a recession of any kind to be unlikely, Hatzius penned a provocative memo entitled “Leveraged Losses: Why Mortgage Defaults Matter.” It warned of a scenario in which millions of homeowners could default on their mortgages and trigger a domino effect on credit and financial markets, producing trillions of dollars in losses and a potentially very severe recession—pretty much exactly the scenario that unfolded. Hatzius and his team were also quick to discount the possibility of a miraculous postcrisis recovery. In February 2009, a month after the stimulus package had been passed and the White House had claimed it would reduce unemployment to 7.8 percent by the end of 2009, Hatzius projected unemployment to rise to 9.5 percent22 (quite close to the actual figure of 9.9 percent).

Hatzius, a mellow to the point of melancholy German who became Goldman Sachs’s chief economist in 2005,23 eight years after starting at the firm, draws respect even from those who take a skeptical view of the big banks. “[Jan is] very good,” Paul Krugman told me. “I hope that Lloyd Blankfein’s malevolence won’t spill over to Jan and his people.” Hatzius also has a refreshingly humble attitude about his ability to forecast the direction of the U.S. economy.

“Nobody has a clue,” he told me when I met him at Goldman’s glassy office on West Street in New York. “It’s hugely difficult to forecast the business cycle. Understanding an organism as complex as the economy is very hard.”

As Hatzius sees it, economic forecasters face three fundamental challenges. First, it is very hard to determine cause and effect from economic statistics alone. Second, the economy is always changing, so explanations of economic behavior that hold in one business cycle may not apply to future ones. And third, as bad as their forecasts have been, the data that economists have to work with isn’t much good either.

Correlations Without Causation

The government produces data on literally 45,000 economic indicators each year.24 Private data providers track as many as four million statistics.25 The temptation that some economists succumb to is to put all this data into a blender and claim that the resulting gruel is haute cuisine. There have been only eleven recessions since the end of World War II.26 If you have a statistical model that seeks to explain eleven outputs but has to choose from among four million inputs to do so, many of the relationships it identifies are going to be spurious. (This is another classic case of overfitting—mistaking noise for a signal—the problem that befell earthquake forecasters in chapter 5.)

Consider how creative you might be when you have a stack of economic variables as thick as a phone book. A once-famous “leading indicator” of economic performance, for instance, was the winner of the Super Bowl. From Super Bowl I in 1967 through Super Bowl XXXI in 1997, the stock market27 gained an average of 14 percent for the rest of the year when a team from the original National Football League (NFL) won the game.28 But it fell by almost 10 percent when a team from the original American Football League (AFL) won instead.

Through 1997, this indicator had correctly “predicted” the direction of the stock market in twenty-eight of thirty-one years. A standard test of statistical significance,29 if taken literally, would have implied that there was only about a 1-in-4,700,000 possibility that the relationship had emerged from chance alone.

It was just a coincidence, of course. And eventually, the indicator began to perform badly. In 1998, the Denver Broncos, an original AFL team, won the Super Bowl—supposedly a bad omen. But rather than falling, the stock market gained 28 percent amid the dot-com boom. In 2008, the NFL’s New York Giants came from behind to upset the AFL’s New England Patriots on David Tyree’s spectacular catch—but Tyree couldn’t prevent the collapse of the housing bubble, which caused the market to crash by 35 percent. Since 1998, in fact, the stock market has done about 10 percent better when the AFL team won the Super Bowl, exactly the opposite of what the indicator was fabled to predict.

How does an indicator that supposedly had just a 1-in-4,700,000 chance of failing flop so badly? For the same reason that, even though the odds of winning the Powerball lottery are only 1 chance in 195 million,30 somebody wins it every few week
s. The odds are hugely against any one person winning the lottery—but millions of tickets are bought, so somebody is going to get lucky. Likewise, of the millions of statistical indicators in the world, a few will have happened to correlate especially well with stock prices or GDP or the unemployment rate. If not the winner of the Super Bowl, it might be chicken production in Uganda. But the relationship is merely coincidental.

Although economists might not take the Super Bowl indicator seriously, they can talk themselves into believing that other types of variables—anything that has any semblance of economic meaning—are critical “leading indicators” foretelling a recession or recovery months in advance. One forecasting firm brags about how it looks at four hundred such variables,31 far more than the two or three dozen major ones that Hatzius says contain most of the economic substance.* Other forecasters have touted the predictive power of such relatively obscure indicators as the ratio of bookings-to-billings at semiconductor companies.32 With so many economic variables to pick from, you’re sure to find something that fits the noise in the past data well.

It’s much harder to find something that identifies the signal; variables that are leading indicators in one economic cycle often turn out to be lagging ones in the next. Of the seven so-called leading indicators in a 2003 Inc. magazine article,33 all of which had been good predictors of the 1990 and 2001 recessions, only two—housing prices and temporary hiring—led the recession that began in 2007 to any appreciable degree. Others, like commercial lending, did not begin to turn downward until a year after the recession began.

Even the well-regarded Leading Economic Index, a composite of ten economic indicators published by the Conference Board, has had its share of problems. The Leading Economic Index has generally declined a couple of months in advance of recessions. But it has given roughly as many false alarms—including most infamously in 1984, when it sharply declined for three straight months,34 signaling a recession, but the economy continued to zoom upward at a 6 percent rate of growth. Some studies have even claimed that the Leading Economic Index has no predictive power at all when applied in real time.35

‹ Prev Next ›