The Signal and the Noise

Page 18

by Nate Silver

The name overfitting comes from the way that statistical models are “fit” to match past observations. The fit can be too loose—this is called underfitting—in which case you will not be capturing as much of the signal as you could. Or it can be too tight—an overfit model—which means that you’re fitting the noise in the data rather than discovering its underlying structure. The latter error is much more common in practice.

To see how this works, let’s give ourselves an advantage that we’ll almost never have in real life: we’ll know exactly what the real data is supposed to look like. In figure 5-5, I’ve drawn a smooth parabolic curve, which peaks in the middle and trails off near the ends. This could represent any sort of real-world data that you might like: as we saw in chapter 3, for instance, it represents a pretty good description of how baseball players perform as they age, since they are better in the middle of their careers than at the end or the beginning.

However, we do not get to observe this underlying relationship directly. Instead, it manifests itself through a series of individual data points and we have to infer the pattern from those. Moreover, these data points are affected by idiosyncratic circumstances—so there is some signal, but there is also some noise. In figure 5-5, I’ve plotted one hundred data points, represented by circles and triangles. This looks to be enough to detect the signal through the noise. Although there is some randomness in the data, it’s pretty clear that they follow our curve.

What happens, however, when we have a more limited amount of data, as will usually be the case in real life? Then we have more potential to get ourselves in trouble. In figure 5-6a, I’ve limited us to about twenty-five of our one hundred observations. How would you connect these dots?

Knowing what the real pattern is supposed to be, of course, you’ll still be inclined to fit the points with some kind of curve shape. Indeed, modeling this data with a simple mathematical expression called a quadratic equation does a very good job of re-creating the true relationship (figure 5-6b).

When we don’t know the Platonic ideal for our data, however, sometimes we get greedy. Figure 5-6c represents an example of this: an overfit model. In figure 5-6c, we’ve devised a complex function57 that chases down every outlying data point, weaving up and down implausibly as it tries to connect the dots. This moves us further away from the true relationship and will lead to worse predictions.

This seems like an easy mistake to avoid, and it would be if only we were omniscient and always knew about the underlying structure of the data. In almost all real-world applications, however, we have to work by induction, inferring the structure from the available evidence. You are most likely to overfit a model when the data is limited and noisy and when your understanding of the fundamental relationships is poor; both circumstances apply in earthquake forecasting.

If we either don’t know or don’t care about the truth of the relationship, there are lots of reasons why we may be prone to overfitting the model. One is that the overfit model will score better according to most of the statistical tests that forecasters use. A commonly used test is to measure how much of the variability in the data is accounted for by our model. According to this test, the overfit model (figure 5-6c) explains 85 percent of the variance, making it “better” than the properly fit one (figure 5-6b), which explains 56 percent. But the overfit model scores those extra points in essence by cheating—by fitting noise rather than signal. It actually does a much worse job of explaining the real world.58

As obvious as this might seem when explained in this way, many forecasters completely ignore this problem. The wide array of statistical methods available to researchers enables them to be no less fanciful—and no more scientific—than a child finding animal patterns in clouds.* “With four parameters I can fit an elephant,” the mathematician John von Neumann once said of this problem.59 “And with five I can make him wiggle his trunk.”

Overfitting represents a double whammy: it makes our model look better on paper but perform worse in the real world. Because of the latter trait, an overfit model eventually will get its comeuppance if and when it is used to make real predictions. Because of the former, it may look superficially more impressive until then, claiming to make very accurate and newsworthy predictions and to represent an advance over previously applied techniques. This may make it easier to get the model published in an academic journal or to sell to a client, crowding out more honest models from the marketplace. But if the model is fitting noise, it has the potential to hurt the science.

As you may have guessed, something like Keilis-Borok’s earthquake model was badly overfit. It applied an incredibly complicated array of equations to noisy data. And it paid the price—getting just three of its twenty-three predictions correct. David Bowman recognized that he had similar problems and pulled the plug on his model.

To be clear, these mistakes are usually honest ones. To borrow the title of another book, they play into our tendency to be fooled by randomness. We may even grow quite attached to the idiosyncrasies in our model. We may, without even realizing it, work backward to generate persuasive-sounding theories that rationalize them, and these will often fool our friends and colleagues as well as ourselves. Michael Babyak, who has written extensively on this problem,60 puts the dilemma this way: “In science, we seek to balance curiosity with skepticism.” This is a case of our curiosity getting the better of us.

An Overfit Model of Japan?

Our tendency to mistake noise for signal can occasionally produce some dire real-world consequences. Japan, despite being extremely seismically active, was largely unprepared for its devastating 2011 earthquake. The Fukushima nuclear reactor was built to withstand a magnitude 8.6 earthquake,61 but not a 9.1. Archaeological evidence62 is suggestive of historic tsunamis on the scale of the 130-foot waves that the 2011 earthquake produced, but these cases were apparently forgotten or ignored.

A magnitude 9.1 earthquake is an incredibly rare event in any part of the world: nobody should have been predicting it to the exact decade, let alone the exact date. In Japan, however, some scientists and central planners dismissed the possibility out of hand. This may reflect a case of overfitting.

In figure 5-7a, I’ve plotted the historical frequencies of earthquakes near the 2011 epicenter in Japan.63 The data includes everything up through but not including the magnitude 9.1 earthquake on March 11. You’ll see that the relationship almost follows the straight-line pattern that Gutenberg and Richter’s method predicts. However, at about magnitude 7.5, there is a kink in the graph. There had been no earthquakes as large as a magnitude 8.0 in the region since 1964, and so the curve seems to bend down accordingly.

So how to connect the dots? If you go strictly by the Gutenberg–Richter law, ignoring the kink in the graph, you should still follow the straight line, as in figure 5-7b. Alternatively, you could go by what seismologists call a characteristic fit (figure 5-7c), which just means that it is descriptive of the historical frequencies of the earthquake in that area. In this case, that would mean that you took the kink in the historical data to be real—meaning, you thought there was some good reason why earthquakes larger than about magnitude 7.6 were unlikely to occur in the region.

Here is another example where an innocuous-seeming choice of assumptions will yield radically distinct conclusions—in this case, about the probability of a magnitude 9 earthquake in this part of Japan. The characteristic fit suggests that such an earthquake was nearly impossible—it implies that one might occur about every 13,000 years. The Gutenberg–Richter estimate, on the other hand, was that you’d get one such earthquake every three hundred years. That’s infrequent but hardly impossible—a tangible enough risk that a wealthy nation like Japan might be able to prepare for it.64

The characteristic fit matched the recent historical record from a bit more snugly. But as we’ve learned, this type of pattern-matching is not always a good thing—it could imply an overfit model, in which case it will do a worse job of matching the true rela
tionship.

In this case, an overfit model would dramatically underestimate the likelihood of a catastrophic earthquake in the area. The problem with the characteristic fit is that it relied on an incredibly weak signal. As I mentioned, there had been no earthquake of magnitude 8 or higher in this region in the forty-five years or so prior to . However, these are rare events to begin with: the Gutenberg–Richter law posits that they might occur only about once per thirty years in this area. It’s not very hard at all for a once-per-thirty-year event to fail to occur in a forty-five-year window,65 no more so than a .300 hitter having a bad day at the plate and going 0-for-5.66 Meanwhile, there were quite a few earthquakes with magnitudes in the mid- to high 7’s in this part of Japan. When such earthquakes had occurred in other parts of the world, they had almost always suggested the potential for larger ones. What justification was there to think that would be a special case?

Actually, seismologists in Japan and elsewhere came up with a few rationalizations for that. They suggested, for instance, that the particular composition of the seafloor in the region, which is old and relatively cool and dense, might prohibit the formation of such large earthquakes.67 Some seismologists observed that, before 2004, no magnitude 9 earthquake had occurred in a region with that type of seafloor.

This was about like concluding that it was impossible for anyone from Pennsylvania to win the Powerball jackpot because no one had done so in the past three weeks. Magnitude 9 earthquakes, like lottery winners, are few and far between. Before 2004, in fact, only three of them had occurred in recorded history anywhere in the world. This wasn’t nearly enough data to support such highly specific conclusions about the exact circumstances under which they might occur. Nor was Japan the first failure of such a theory; a similar one had been advanced about Sumatra68 at a time when it had experienced lots of magnitude 7 earthquakes69 but nothing stronger. Then the Great Sumatra Earthquake, magnitude 9.2,70 hit in December 2004.

The Gutenberg–Richter law would not have predicted the exact timing of the Sumatra or Japan earthquakes, but it would have allowed for their possibility.71 So far, it has held up remarkably well when a great many more elaborate attempts at earthquake prediction have failed.

The Limits of Earthquakes and Our Knowledge of Them

The very large earthquakes of recent years are causing seismologists to rethink what the upper bounds of earthquakes might be. If you look at figure 5-3b, which accounts for all earthquakes since 1964 (including Sumatra and ) it now forms a nearly straight line though all the data points. A decade ago, you would have detected more of a kink in the graph (as in the chart in figure 5-7a). What this meant is that there were slightly fewer megaquakes than the Gutenberg–Richter law predicted. But recently we have been catching up.

Because they occur so rarely, it will take centuries to know what the true rate of magnitude 9 earthquakes is. It will take even longer to know whether earthquakes larger than magnitude 9.5 are possible. Hough told me that there may be some fundamental constraints on earthquake size from the geography of fault systems. If the largest continuous string of faults in the world ruptured together—everything from Tierra Del Fuego at the southern tip of South America all the way up through the Aleutians in Alaska—a magnitude 10 is about what you’d get, she said. But it is hard to know for sure.

Even if we had a thousand years of reliable seismological records, however, it might be that we would not get all that far. It may be that there are intrinsic limits on the predictability of earthquakes.

Earthquakes may be an inherently complex process. The theory of complexity that the late physicist Per Bak and others developed is different from chaos theory, although the two are often lumped together. Instead, the theory suggests that very simple things can behave in strange and mysterious ways when they interact with one another.

Bak’s favorite example was that of a sandpile on a beach. If you drop another grain of sand onto the pile (what could be simpler than a grain of sand?), it can actually do one of three things. Depending on the shape and size of the pile, it might stay more or less where it lands, or it might cascade gently down the small hill toward the bottom of the pile. Or it might do something else: if the pile is too steep, it could destabilize the entire system and trigger a sand avalanche. Complex systems seem to have this property, with large periods of apparent stasis marked by sudden and catastrophic failures. These processes may not literally be random, but they are so irreducibly complex (right down to the last grain of sand) that it just won’t be possible to predict them beyond a certain level.

The Beauty of the Noise

And yet complex processes produce order and beauty when you zoom out and look at them from enough distance. I use the terms signal and noise very loosely in this book, but they originally come from electrical engineering. There are different types of noise that engineers recognize—all of them are random, but they follow different underlying probability distributions. If you listen to true white noise, which is produced by random bursts of sound over a uniform distribution of frequencies, it is sibilant and somewhat abrasive. The type of noise associated with complex systems, called Brownian noise, is more soothing and sounds almost like rushing water.72

Meanwhile, the same tectonic forces that carve fault lines beneath the earth’s surface also carve breathtaking mountains, fertile valleys, and handsome coastlines. What that means is that people will probably never stop living in them, despite the seismic danger.

Science on Trial

In a final irony of the L’Aquila earthquake, a group of seven scientists and public officials were quite literally put on trial for manslaughter in 2011.73 Prosecutors from the city of L’Aquila alleged that they had failed to adequately notify the public about the risk of a Big One after the earthquake swarm there.

The trial was obviously ridiculous, but is there anything the scientists could have done better? Probably there was; there is fairly clear evidence that the risk of a major earthquake increases substantially—perhaps temporarily becoming one hundred to five hundred times higher than its baseline rate74—following an earthquake swarm. The risk was nevertheless extremely low—most earthquake swarms do not produce major quakes—but it was not quite right to imply that everything was normal and that people should sit down and have a glass of wine.

This book takes the view that the first duty of a forecaster is always fealty to the truth of the forecast. Politics, broadly defined, can get in the way of that. The seismological community is still scarred by the failed predictions in Lima and Parkfield, and by having to compete against the likes of Giuliani. This complicates their incentives and distracts them from their mission. Bad and irresponsible predictions can drive out good ones.

Hough is probably right that the Holy Grail of earthquake prediction will never be attained. Even if individual seismologists are behaving responsibly, we nevertheless have the collective output of the discipline to evaluate, which together constitutes thousands of hypotheses about earthquake predictability. The track record suggests that most of these hypotheses have failed and that magic-bullet approaches to earthquake prediction just aren’t likely to work.

However, the track record of science as a whole is a remarkable one; that is also a clear signal. It is probably safe to conclude that the same method attempted over and over with little variation is unlikely to yield different results. But science often produces “unpredictable” breakthroughs.

One area in which seismologists have made some progress is in the case of very short term earthquake forecasts, as might have been relevant in L’Aquila. Next to the Gutenberg–Richter law, the knowledge that major earthquakes essentially always produce aftershocks is the most widely accepted finding in the discipline. Some seismologists I spoke with, like John Rundle of UC Davis and Tom Jordan of the University of Southern California, are concentrating more on these near-term forecasts and increasingly take the view that they should be communicated clearly and completely to the public.

J
ordan’s research, for instance, suggests that aftershocks sometimes move in a predictable geographic direction along a fault line. If they are moving in the direction of a population center, they can potentially be more threatening to life and property even if they are becoming less powerful. For instance, the magnitude 5.8 earthquake in Christchurch, New Zealand, in 2011, which killed 185, was an aftershock of a 7.0 earthquake that occurred in September 2010 in a remote part of the country.75 When it comes to aftershocks, there is clearly a lot of signal, so this may be the more natural place to focus.

Finally, technology is always pushing forward. Recent efforts by NASA and by Rundle to measure fault stress through remote sensing systems like GPS satellites have shown some promise.76 Although the efforts are crude for the time being, there is potential to increase the amount of data at seismologists’ disposal and get them closer to understanding the root causes of earthquakes.

• • •

These methods may eventually produce some forward progress. If success in earthquake prediction has been almost nonexistent for millennia, the same was true for weather forecasting until about forty years ago. Or it may be that as we develop our understanding of complexity theory—itself a very new branch of science—we may come to a more emphatic conclusion that earthquakes are not really predictable at all.

Either way, there will probably be some failed predictions first. As the memory of our mistakes fades, the signal will again seem to shimmer over the horizon. Parched for prediction we will pursue it, even if it is a mirage.

‹ Prev Next ›