The Signal and the Noise
Page 17
Furthermore, it follows that an earthquake that measured 7.0 or greater would occur about once every three hundred years near Tehran. This is the earthquake that Susan Hough fears. The Haiti earthquake of 2010, which measured magnitude 7.0 and killed 316,000,32 showed the apocalyptic consequences that earthquakes can produce in the developing world. Iran shares many of Haiti’s problems—poverty, lax building codes, political corruption33—but it is much more densely populated. The USGS estimates, on the basis of high death tolls from smaller earthquakes in Iran, that between 15 and 30 percent of Tehran’s population could die in the event of a catastrophic tremor there.34 Since there are about thirteen million people in Tehran’s metro area, that would mean between two and four million fatalities.
What the Gutenberg–Richter law does not tell us anything about is when the earthquake would strike. (Nor does it suggest that Tehran is “due” for an earthquake if it hasn’t experienced one recently.) Countries like Iran and Haiti do not have the luxury of making contingency plans for a once-every-three-hundred-year event. The earthquake forecasts produced using the Gutenberg–Richter law provide for a good general guide to the hazard in an area. But like weather forecasts determined from statistical records alone (it rains 35 percent of the time in London in March), they don’t always translate into actionable intelligence (should I carry an umbrella?). Geological time scales occupy centuries or entire millennia; human life spans are measured in years.
The Temptation of the Signal
What seismologists are really interested in—what Susan Hough calls the “Holy Grail” of seismology—are time-dependent forecasts, those in which the probability of an earthquake is not assumed to be constant across time.
Even seismologists who are skeptical of the possibility of making time-dependent earthquake forecasts acknowledge that there are some patterns in the earthquake distribution. The most obvious is the presence of aftershocks. Large earthquakes are almost always followed by dozens or even thousands of aftershocks (the 2011 earthquake in Japan produced at least 1,200 of them). These aftershocks follow a somewhat predictable pattern.35 Aftershocks are more likely to occur immediately after an earthquake than days later, and more likely to occur days later than weeks after the fact.
This, however, is not terribly helpful when it comes to saving lives. This is because aftershocks, by definition, are always less powerful than the initial earthquake. Usually, if a particular fault produces a sufficiently powerful earthquake, there will be a few aftershocks and then that’ll be the end of the fireworks for a while. This isn’t always the case, however. For example, the incredibly powerful earthquake that hit the New Madrid Fault on the Missouri-Tennessee border on December 16, 1811, evaluated by seismologists as magnitude 8.2, was followed just six hours later by another shock of about the same magnitude. And the fault was not yet quiesced: the December 16 quakes were succeeded by another magnitude 8.1 earthquake on January 23, and then yet another, even more powerful 8.3 earthquake on February 7. Which ones were the foreshocks? Which ones were the aftershocks? Any interpretation is about as useless as any other.
The question, of course, is whether we can predict earthquakes before the fact: can we tell the foreshocks and aftershocks apart in advance? When we look at data that shows the distribution of earthquakes across time and space, it tempts us with the possibility that there might be some signal in the noise.
Figure 5-4a, for instance, shows the distribution of earthquakes near L’Aquila36 from 2006 until the magnitude 6.3 earthquake hit in 2009.37 All the data in this chart, except the large black circle that indicates the main earthquake, shows earthquakes that occurred before the main shock. In the case of L’Aquila, there does seem to be a discernible pattern. A big cluster of earthquakes, measuring up to about magnitude 4, occurred just before the main shock in early 2009—much higher than the background rate of seismic activity in the area.
A more debatable case is the Japan earthquake of 2011. When we make one of these plots for the region (figure 5-4b), we see, first of all, that it is much more seismically active than Italy. But are there patterns in the timing of the earthquakes there? There seem to be some; for instance, there is a cluster of earthquakes measuring between magnitude 5.5 and magnitude 7.0 in mid-2008. These, however, did not precipitate a larger earthquake. But we do see an especially large foreshock, magnitude 7.5, on March 9, 2011, preceding the magnitude 9.1 earthquake38 by about fifty hours.
Only about half of major earthquakes are preceded by discernible foreshocks,39 however. Haiti’s was not (figure 5-4c). Instrumentation is not very good in most parts of the Caribbean, so we don’t have records of magnitude 2 and 3 earthquakes, but seismometers in the United States and other areas should be able to pick up anything that registers at 4 or higher. The last time there had been even a magnitude 4 earthquake in the area was in 2005, five years before the magnitude 7.0 earthquake hit in 2010. There was just no warning at all.
Complicating matters further are false alarms—periods of increased seismic activity that never result in a major tremor. One case well known to seismologists is a series of smaller earthquakes near Reno, Nevada, in early 2008. The Reno earthquake swarm looks a lot like the one we saw before L’Aquila in 2009. But it never amounted to anything much; the largest earthquake in the series was just magnitude 5.0 and no major earthquake followed.
FIGURE 5-4D: EARTHQUAKES NEAR RENO, NEVADA JANUARY 1, 2006–DECEMBER 31, 2011
This is just a taste of the maddening array of data that seismologists observe. It seems to exist in a purgatory state—not quite random and not quite predictable. Perhaps that would imply that we could at least get halfway there and make some progress in forecasting earthquakes—even if we can never get to hard-and-fast predictions. But the historical record of attempts to predict earthquakes is one of almost complete failure.
A Parade of Failed Forecasts
Hough’s 2009 book, Predicting the Unpredictable: The Tumultuous Science of Earthquake Prediction, is a history of efforts to predict earthquakes, and is as damning to that enterprise as Phil Tetlock’s study was to political pundits. There just seems to have been no progress at all, and there have been many false alarms.
Lima, Peru
One of the more infamous cases involved a geophysicist named Brian Brady, who had a Ph.D. from MIT and worked at Colorado School of Mines. Brady asserted that a magnitude 9.2 earthquake—one of the largest in recorded history—would hit Lima, Peru, in 1981.40 His prediction initially had a fair amount of support in the seismological community—an early version of it had been coauthored with a USGS scientist. But as the theory became more elaborate—Brady would eventually invoke everything from the rock bursts he had observed in his studies of mines to Einstein’s theory of relativity in support of it—colleagues had started telling him that theory was beyond their understanding:41 a polite way of saying that he was nuts. Eventually, he predicted that the magnitude 9.2 earthquake would be just one in a spectacular series in Peru, culminating in a magnitude 9.9 earthquake, the largest in recorded history, in August 1981.42
The prediction was leaked to the Peruvian media and terrified the population; this serious-seeming American scientist was sure their capital city would be in ruins. Their fear only intensified when it was reported that the Peruvian Red Cross had requested 100,000 body bags to prepare for the disaster. Tourism and property values declined,43 and the U.S. government eventually dispatched a team of scientists and diplomats to Peru in an effort to calm nerves. It made front-page news when there was no Great Peruvian Earthquake in 1981 (or even a minor one).
Parkfield, California
If Lima had provided a warning that false alarms can extract a substantial psychological and economic cost on the population, it did not stop seismologists from seeking out the Holy Grail. While Brady had been something of a lone wolf, there were cases when earthquake prediction had much more explicit backing from the USGS and the rest of the seismological community. These efforts did not go so well e
ither.
Among the most studied seismic zones in the world is Parkfield, California, which sits along the San Andreas Fault somewhere between Fresno, Bakersfield, and the next exit with an In-N-Out Burger. There had been earthquakes in Parkfield at what seemed to be regular intervals about twenty-two years apart: in 1857, 1881, 1901, 1922, 1934, and 1966. A USGS-sponsored paper44 projected the trend forward and predicted with 95 percent confidence that there would be another such earthquake at some point between 1983 and 1993, most likely in 1988. The next significant earthquake to hit Parkfield did not occur until 2004, however, well outside of the prediction window.
Apart from being wrong, the Parkfield prediction also seemed to reinforce a popular misconception about earthquakes: that they come at regular intervals and that a region can be “due” for one if it hasn’t experienced an earthquake in some time. Earthquakes result from a buildup of stress along fault lines. It might follow that the stress builds up until it is discharged, like a geyser erupting with boiling water, relieving the stress and resetting the process.
But the fault system is complex: regions like California are associated with multiple faults, and each fault has its own branches and tributaries. When an earthquake does strike, it may relieve the stress on one portion of a fault, but it can transfer it along to neighboring faults, or even to some faraway portion of the same fault.45 Moreover, the stress on a fault is hard to observe directly—until an earthquake hits.
What this means is that if San Francisco is forecasted to have a major earthquake every thirty-five years, it does not imply that these will be spaced out evenly (as in 1900, 1935, 1970). It’s safer to assume there is a 1 in 35 chance of an earthquake occurring every year, and that this rate does not change much over time regardless of how long it has been since the last one.
Mojave Desert, California
The Brady and Parkfield fiascoes seemed to suppress efforts at earthquake prediction for some time. But they came back with a vengeance in the 2000s, when newer and seemingly more statistically driven methods of earthquake prediction became the rage.
One such method was put forward by Vladimir Keilis-Borok, a Russian-born mathematical geophysicist who is now in his late eighties and teaches at UCLA. Keilis-Borok had done much to advance the theory of how earthquakes formed and first achieved notoriety in 1986 when, at a summit meeting in Reykjavík with Mikhail Gorbachev, President Reagan was handed a slip of paper predicting a major earthquake in the United States within the next five years, an event later interpreted to be the Loma Prieta quake that struck San Francisco in 1989.46
In 2004, Keilis-Borok and his team claimed to have made a “major breakthrough” in earthquake prediction.47 By identifying patterns from smaller earthquakes in a given region, they said, they were able to predict large ones. The methods that Keilis-Borok applied to identify these patterns were elaborate and opaque,48 representing past earthquakes with a series of eight equations, each of which was applied in combination with the others at all conceivable intervals of time and space. But, the team said, their method had correctly predicted 2003 earthquakes in San Simeon, California, and Hokkaido, Japan.
Whether the San Simeon and Hokkaido predictions were publicly communicated ahead of time remains unclear;49 a search of the Lexis-Nexis database of newspapers reveals no mention of them in 2003.50 When we are evaluating the success of a forecasting method, it is crucial to keep “retrodictions” and predictions separate; predicting the past is an oxymoron and obviously should not be counted among successes.51
By January 2004, however, Keilis-Borok had gone very public with another prediction:52 an earthquake measuring at least magnitude 6.4 would hit an area of the Mojave Desert in Southern California at some point within the subsequent nine months. The prediction began to attract widespread attention: Keilis-Borok was featured in the pages of Discover magazine, the Los Angeles Times, and a dozen or so other mainstream publications. Someone from Governor Schwarzenegger’s office called; an emergency panel was convened. Even the famously skeptical USGS was willing to give some credit; their Web site conceded that “the work of the Keilis-Borok team is a legitimate approach to earthquake prediction research.”53
But no major earthquake hit the Mojave Desert that year, and indeed, almost a decade later, none has. The Keilis-Borok team has continued to make predictions about earthquakes in California, Italy, and Japan but with little success: a 2010 analysis found three hits but twenty-three misses among predictions that they had clearly enunciated ahead of time.54
Sumatra, Indonesia
There is another type of error, in which an earthquake of a given magnitude is deemed unlikely or impossible in a region—and then it happens. David Bowman, a former student of Keilis-Borok who is now the chair of the Department of Geological Sciences at Cal State Fullerton, had redoubled his efforts at earthquake prediction after the Great Sumatra Earthquake of 2004, the devastating magnitude 9.2 disaster that produced a tsunami and killed 230,000 people. Bowman’s technique, like Keilis-Borok’s, was highly mathematically driven and used medium-size earthquakes to predict major ones.55 However, it was more elegant and ambitious, proposing a theory called accelerated moment release that attempted to quantify the amount of stress at different points in a fault system. In contrast to Keilis-Borok’s approach, Bowman’s system allowed him to forecast the likelihood of an earthquake along any portion of a fault; thus, he was not just predicting where earthquakes would hit, but also where they were unlikely to occur.
Bowman and his team did achieve some initial success; the massive aftershock in Sumatra in March 2005, measuring magnitude 8.6, had its epicenter in an area his method identified as high-risk. However, a paper that he published in 200656 also suggested that there was a particularly low risk of an earthquake on another portion of the fault, in the Indian Ocean adjacent to the Indonesian province of Bengkulu. Just a year later, in September 2007, a series of earthquakes hit exactly that area, culminating in a magnitude 8.5. Fortunately, the earthquakes occurred far enough offshore that fatalities were light, but it was devastating to Bowman’s theory.
Between a Rock and a Hard Place
After the model’s failure in 2007, Bowman did something that forecasters very rarely do. Rather than blame the failure on bad luck (his model had allowed for some possibility of an earthquake near Bengkulu, just not a high one), he reexamined his model and decided his approach to predicting earthquakes was fundamentally flawed—and gave up on it.
“I’m a failed predictor,” Bowman told me in 2010. “I did a bold and stupid thing—I made a testable prediction. That’s what we’re supposed to do, but it can bite you when you’re wrong.”
Bowman’s idea had been to identify the root causes of earthquakes—stress accumulating along a fault line—and formulate predictions from there. In fact, he wanted to understand how stress was changing and evolving throughout the entire system; his approach was motivated by chaos theory.
Chaos theory is a demon that can be tamed—weather forecasters did so, at least in part. But weather forecasters have a much better theoretical understanding of the earth’s atmosphere than seismologists do of the earth’s crust. They know, more or less, how weather works, right down to the molecular level. Seismologists don’t have that advantage.
“It’s easy for climate systems,” Bowman reflected. “If they want to see what’s happening in the atmosphere, they just have to look up. We’re looking at rock. Most events occur at a depth of fifteen kilometers underground. We don’t have a hope of drilling down there, realistically—sci-fi movies aside. That’s the fundamental problem. There’s no way to directly measure the stress.”
Without that theoretical understanding, seismologists have to resort to purely statistical methods to predict earthquakes. You can create a statistical variable called “stress” in your model, as Bowman tried to do. But since there’s no way to measure it directly, that variable is still just expressed as a mathematical function of past earthquakes. Bowman thinks that
purely statistical approaches like these are unlikely to work. “The data set is incredibly noisy,” he says. “There’s not enough to do anything statistically significant in testing hypotheses.”
What happens in systems with noisy data and underdeveloped theory—like earthquake prediction and parts of economics and political science—is a two-step process. First, people start to mistake the noise for a signal. Second, this noise pollutes journals, blogs, and news accounts with false alarms, undermining good science and setting back our ability to understand how the system really works.
Overfitting: The Most Important Scientific Problem You’ve Never Heard Of
In statistics, the name given to the act of mistaking noise for a signal is overfitting.
Suppose that you’re some sort of petty criminal and I’m your boss. I deputize you to figure out a good method for picking combination locks of the sort you might find in a middle school—maybe we want to steal everybody’s lunch money. I want an approach that will give us a high probability of picking a lock anywhere and anytime. I give you three locks to practice on—a red one, a black one, and a blue one.
After experimenting with the locks for a few days, you come back and tell me that you’ve discovered a foolproof solution. If the lock is red, you say, the combination is 27-12-31. If it’s black, use the numbers 44-14-19. And if it’s blue, it’s 10-3-32.
I’d tell you that you’ve completely failed in your mission. You’ve clearly figured out how to open these three particular locks. But you haven’t done anything to advance our theory of lock-picking—to give us some hope of picking them when we don’t know the combination in advance. I’d have been interested in knowing, say, whether there was a good type of paper clip for picking these locks, or some sort of mechanical flaw we can exploit. Or failing that, if there’s some trick to detect the combination: maybe certain types of numbers are used more often than others? You’ve given me an overly specific solution to a general problem. This is overfitting, and it leads to worse predictions.