The Signal and the Noise
Page 21
ECRI’s recession call isn’t based on just one or two leading indexes, but on dozens of specialized leading indexes, including the U.S. Long Leading Index . . . to be followed by downturns in the Weekly Leading Index and other shorter-leading indexes. In fact, the most reliable forward-looking indicators are now collectively behaving as they did on the cusp of full-blown recessions.54
There’s plenty of jargon, but what is lacking in this description is any actual economic substance. Theirs was a story about data—as though data itself caused recessions—and not a story about the economy. ECRI actually seems quite proud of this approach. “Just as you do not need to know exactly how a car engine works in order to drive safely,” it advised its clients in a 2004 book, “You do not need to understand all the intricacies of the economy to accurately read those gauges.”55
This kind of statement is becoming more common in the age of Big Data.56 Who needs theory when you have so much information? But this is categorically the wrong attitude to take toward forecasting, especially in a field like economics where the data is so noisy. Statistical inferences are much stronger when backed up by theory or at least some deeper thinking about their root causes. There were certainly reasons for economic pessimism in September 201157—for instance, the unfolding debt crisis in Europe—but ECRI wasn’t looking at those. Instead, it had a random soup of variables that mistook correlation for causation.58
Indeed, the ECRI forecast seemed to demark an economic turning point—but it was a positive one. The S&P 500 gained 21 percent in the five months after ECRI announced its recession call,59 while GDP growth registered at a fairly healthy clip of 3.0 percent in the last quarter of 2011 instead of going into recession. ECRI kicked the can down the road, “clarifying” the call to say that it extended all the way into 2012 even though this is not what they had implied originally.60
When Biased Forecasts Are Rational
If you’re looking for an economic forecast, the best place to turn is the average or aggregate prediction rather than that of any one economist. My research into the Survey of Professional Forecasters suggests that these aggregate forecasts are about 20 percent more accurate61 than the typical individual’s forecast at predicting GDP, 10 percent better at predicting unemployment, and 30 percent better at predicting inflation. This property—group forecasts beat individual ones—has been found to be true in almost every field in which it has been studied.
And yet while the notion that aggregate forecasts beat individual ones is an important empirical regularity, it is sometimes used as a cop-out when forecasts might be improved. The aggregate forecast is made up of individual forecasts; if those improve, so will the group’s performance. Moreover, even the aggregate economic forecasts have been quite poor in any real-world sense, so there is plenty of room for progress.
Most economists rely on their judgment to some degree when they make a forecast, rather than just take the output of a statistical model as is. Given how noisy the data is, this is probably helpful. A study62 by Stephen K. McNess, the former vice president of the Federal Reserve Bank of Boston, found that judgmental adjustments to statistical forecasting methods resulted in forecasts that were about 15 percent more accurate. The idea that a statistical model would be able to “solve” the problem of economic forecasting was somewhat in vogue during the 1970s and 1980s when computers came into wider use. But as was the case in other fields, like earthquake forecasting during that time period, improved technology did not cover for the lack of theoretical understanding about the economy; it only gave economists faster and more elaborate ways to mistake noise for a signal. Promising-seeming models failed badly at some point or another and were consigned to the dustbin.63
Invoking one’s judgment, however, also introduces the potential for bias. You may make the forecast that happens to fit your economic incentives or your political beliefs. Or you may be too proud to change your story even when the facts and circumstances demand it. “I do think that people have the tendency, which needs to be actively fought,” Hatzius told me, “to see the information flow the way you want to see it.”
Are some economists better at managing this trade-off than others? Is the economist who called the last recession right more likely to get the next one too? This question has an interesting answer.
Statistical tests designed to identify predictive skill have generally come up with negative results when applied to the Survey of Professional Forecasters.64 That is, if you look at that survey, there doesn’t seem to be much evidence that some economists are consistently better than others. Studies of another panel, the Blue Chip Economic Survey, have more often come up with positive findings, however.65 There is clearly a lot of luck involved in economic forecasting—economists who are permanently bearish or bullish are guaranteed to be right every now and then. But the studies of the Blue Chip panel seem to find that some economists do a little bit better than others over the long run.
What is the difference between the two surveys? The Survey of Professional Forecasters is conducted anonymously: each economist is assigned a random ID number that remains constant from survey to survey, but nothing is revealed about just who he is or what he does. In the Blue Chip panel, on the other hand, everybody’s forecast has his name and reputation attached to it.
When you have your name attached to a prediction, your incentives may change. For instance, if you work for a poorly known firm, it may be quite rational for you to make some wild forecasts that will draw big attention when they happen to be right, even if they aren’t going to be right very often. Firms like Goldman Sachs, on the other hand, might be more conservative in order to stay within the consensus.
Indeed, this exact property has been identified in the Blue Chip forecasts:66 one study terms the phenomenon “rational bias.”67 The less reputation you have, the less you have to lose by taking a big risk when you make a prediction. Even if you know that the forecast is dodgy, it might be rational for you to go after the big score. Conversely, if you have already established a good reputation, you might be reluctant to step too far out of line even when you think the data demands it.
Either of these reputational concerns potentially distracts you from the goal of making the most honest and accurate forecasts—and they probably worsen forecasts on balance. Although the differences are modest, historically the anonymous participants in the Survey of Professional Forecasters have done slightly better at predicting GDP and unemployment than the reputation-minded Blue Chip panelists.68
Overcoming Bias
If it can be rational to produce bad forecasts, that implies there are consumers of these forecasts who aid and abet them. Just as there are political pundits who make careers out of making implausible claims to partisan audiences, there are bears, bulls, and contrarians who will always have a constituency in the marketplace for economic ideas. (Sometimes economic forecasts have expressly political purposes too. It turns out that the economic forecasts produced by the White House, for instance, have historically been among the least accurate of all,69 regardless of whether it’s a Democrat or a Republican in charge.)
When it comes to economic forecasting, however, the stakes are higher than for political punditry. As Robert Lucas pointed out, the line between economic forecasting and economic policy is very blurry; a bad forecast can make the real economy worse.
There may be some hope at the margin for economic forecasting to benefit from further technological improvements. Things like Google search traffic patterns, for instance, can serve as leading indicators for economic data series like unemployment.
“The way we think about it is if you take something like initial claims on unemployment insurance, that’s a very good predictor for unemployment rates, which is a good predictor for economic activity,” I was told by Google’s chief economist, Hal Varian, at Google’s headquarters in Mountain View, California. “We can predict unemployment initial claims earlier because if you’re in a company and a rumor goes around that there a
re going to be layoffs, then people start searching ‘where’s the unemployment office,’ ‘how am I going to apply for unemployment,’ and so on. It’s a slightly leading indicator.”
Still, the history of forecasting in economics and other fields suggests that technological improvements may not help much if they are offset by human biases, and there is little indication that economic forecasters have overcome these. For instance, they do not seem to have been chastened much by their experience with the Great Recession. If you look at the forecasts for GDP growth that the Survey of Professional Forecasters made in November 2011 (figure 6-6), it still exhibited the same tendency toward overconfidence that we saw in 2007, with forecasters discounting both upside and downside economic scenarios far more than is justified by the historical accuracy of their forecasts.70
If we want to reduce these biases—we will never be rid of them entirely—we have two fundamental alternatives. One might be thought of as a supply-side approach—creating a market for accurate economic forecasts. The other might be a demand-side alternative: reducing demand for inaccurate and overconfident ones.
Robin Hanson, an economist at George Mason University, is an advocate of the supply-side alternative. I met him for lunch at one of his favorite Moroccan places in northern Virginia. He’s in his early fifties but looks much younger (despite being quite bald), and is a bit of an eccentric. He plans to have his head cryogenically frozen when he dies.71 He is also an advocate of a system he calls “futarchy” in which decisions on policy issues are made by prediction markets72 rather than politicians. He is clearly not a man afraid to challenge the conventional wisdom. Instead, Hanson writes a blog called Overcoming Bias, in which he presses his readers to consider which cultural taboos, ideological beliefs, or misaligned incentives might constrain them from making optimal decisions.
“I think the most interesting question is how little effort we actually put into forecasting, even on the things we say are important to us,” Hanson told me as the food arrived.
“In an MBA school you present this image of a manager as a great decision maker—the scientific decision maker. He’s got his spreadsheet and he’s got his statistical tests and he’s going to weigh the various options. But in fact real management is mostly about managing coalitions, maintaining support for a project so it doesn’t evaporate. If they put together a coalition to do a project, and then at the last minute the forecasts fluctuate, you can’t dump the project at the last minute, right?
“Even academics aren’t very interested in collecting a track record of forecasts—they’re not very interested in making clear enough forecasts to score,” he says later. “What’s in it for them? The more fundamental problem is that we have a demand for experts in our society but we don’t actually have that much of a demand for accurate forecasts.”
Hanson, in order to address this deficiency, is an advocate of prediction markets—systems where you can place bets on a particular economic or policy outcome, like whether Israel will go to war with Iran, or how much global temperatures will rise because of climate change. His argument for these is pretty simple: they ensure that we have a financial stake in being accurate when we make forecasts, rather than just trying to look good to our peers.
We will revisit the idea of prediction markets in chapter 11; they are not a panacea, particularly if we make the mistake of assuming that they can never go wrong. But as Hansen says, they can yield some improvement by at least getting everyone’s incentives in order.
One of the most basic applications might simply be markets for predicting macroeconomic variables like GDP and unemployment. There are already a variety of direct and indirect ways to bet on things like inflation, interest rates, and commodities prices, but no high-volume market for GDP exists.
There could be a captive audience for these markets: common stocks have become more highly correlated with macroeconomic risks in recent years,73 so they could provide a means of hedging against them. These markets would also provide real-time information to policy makers, essentially serving as continuously updated forecasts of GDP. Adding options to the markets—bets on, say, whether GDP might grow by 5 percent, or decline by 2 percent—would punish overconfident forecasters and yield more reliable estimates of the uncertainties inherent in forecasting the economy.
The other solution, the “demand-side” approach, is slower and more incremental. It simply means that we have to be better consumers of forecasts. In the context of the economic forecasting, that might mean turning the spotlight away from charlatans with “black box” models full of random assortments of leading indicators and toward people like Jan Hatzius who are actually talking economic substance. It might also mean placing more emphasis on the noisiness of economic indicators and economic forecasts. Perhaps initial estimates of GDP should be reported with margins of error, just as political polls are.
More broadly, it means recognizing that the amount of confidence someone expresses in a prediction is not a good indication of its accuracy—to the contrary, these qualities are often inversely correlated. Danger lurks, in the economy and elsewhere, when we discourage forecasters from making a full and explicit account of the risks inherent in the world around us.
7
ROLE MODELS
The flu hit Fort Dix like clockwork every January; it had almost become a rite of passage. Most of the soldiers would go home for Christmas each year, fanning out to all corners of the United States for their winter break. They would then return to the base, well-fed and well-rested, but also carrying whichever viruses might have been going around their hometowns. If the flu was anywhere in the country, it was probably coming back with them. Life in the cramped setting of the barracks, meanwhile, offered few opportunities for privacy or withdrawal. If someone—anyone—had caught the flu back home, he was more likely than not to spread it to the rest of the platoon. You could scarcely conjure a scenario more favorable to transmission of the disease.
Usually this was no cause for concern; tens of millions of Americans catch the flu in January and February every year. Few of them die from it, and young, healthy men like David Lewis, a nineteen-year-old private from West Ashley, Massachusetts, who had returned to Fort Dix that January, are rarely among the exceptions. So Lewis, even though he’d been sicker than most of the recruits and ordered to stay in the barracks, decided to join his fellow privates on a fifty-mile march through the snow-blanketed marshlands of central New Jersey. He was in no mood to let a little fever bother him—it was 1976, the year of the nation’s bicentennial, and the country needed order and discipline in the uncertain days following Watergate and Vietnam.1
But Lewis never made it back to the barracks: thirteen miles into the march, he collapsed and was later pronounced dead. An autopsy revealed that Lewis’s lungs were flush with blood: he had died of pneumonia, a common complication of flu, but not usually one to kill a healthy young adult like Lewis.
The medics at Fort Dix had already been nervous about that year’s flu bug. Although some of the several hundred soldiers who had gotten ill that winter had tested positive for the A/Victoria flu strain—the name for the common and fairly benign virus that was going around the world that year2—there were others like Lewis who had suffered from an unidentified and apparently much more severe type of flu. Samples of their blood were sent to the Center for Disease Control (CDC) in Atlanta for further testing.
Two weeks later the CDC revealed the identity of the mysterious virus. It was not a new type of flu after all but instead something altogether more disturbing, a ghost from epidemics past: influenza virus type H1N1, more commonly known as the swine flu. H1N1 had been responsible for the worst pandemic in modern history: the Spanish flu of 1918–20, which afflicted a third of humanity and killed 50 million,3 including 675,000 in the United States. For reasons of both science and superstition, the disclosure sent a chill though the nation’s epidemiological community. The 1918 outbreak’s earliest manifestations had also come at a military bas
e, Fort Riley in Kansas, where soldiers were busy preparing to enter World War I.4 Moreover, there was a belief at that time—based on somewhat flimsy scientific evidence—that a major flu epidemic manifested itself roughly once every ten years.5 The flu had been severe in 1938, 1947, 1957, and 1968;6 in 1976, the world seemed due for the next major pandemic.
A series of dire predictions soon followed. The concern was not an immediate outbreak—by the time the CDC had positively identified the H1N1 strain, flu season had already run its course. But scientists feared that it foreshadowed something much worse the following winter. There had never been a case, a prominent doctor noted to the New York Times,7 in which a newly identified strain of the flu had failed to outcompete its rivals and become the global hegemon: wimpy A/Victoria stood no chance against its more virulent and ingenious rival. And if H1N1 were anywhere near as deadly as the 1918 version had been, the consequences might be very bad indeed. Gerald Ford’s secretary of health, F. David Mathews, predicted that one million Americans would die, eclipsing the 1918 total.8
President Ford found himself in a predicament. The vaccine industry, somewhat like the fashion industry, needs at least six months of lead time to know what the hip vaccine is for the new season; the formula changes a little bit every year. If they suddenly had to produce a vaccine that guarded against H1N1—and particularly if they were going to produce enough of it for the entire nation—they would need to get started immediately. Meanwhile, Ford was struggling to overcome a public perception that he was slow-witted and unsure of himself—an impression that grew more entrenched every weekend with Chevy Chase’s bumbling-and-stumbling caricature of him on NBC’s new hit show, Saturday Night Live. So Ford took the resolute step of asking Congress to authorize some 200 million doses of vaccine, and ordered a mass vaccination program, the first the country had seen since Jonas Salk had developed the polio vaccine in the 1950s.