by Nate Silver
Inaccurate data was also responsible for some of the poor predictions about swine flu in 2009. The fatality rate for H1N1 was apparently extremely high in Mexico in 2009 but turned out to be extremely low in the United States. Although some of that has to do with differences in how effective the medical care was in each country, much of the discrepancy was a statistical illusion.
The case fatality rate is a simple ratio: the number of deaths caused by a disease divided by the number of cases attributed to it. But there are uncertainties on both sides of the equation. On the one hand, there was some tendency in Mexico to attribute deaths related to other forms of the flu, or other diseases entirely, to H1N1. Laboratory tests revealed that as few as one-quarter of the deaths supposedly linked to H1N1 in fact showed distinct signs of it. On the other hand, there was surely underreporting, probably by orders of magnitude, in the number of cases of H1N1. Developing countries like Mexico have neither the sophisticated reporting systems of the United States nor a culture of going to the doctor at the first sign of illness;58 that the disease spread so rapidly once the United States imported it suggests that there might have been thousands, perhaps even tens of thousands, of mild cases of the flu in Mexico that had never become known to authorities.
In fact, H1N1 may have been circulating in southern and central Mexico for months before it came to the medical community’s attention (especially since they were busy looking for bird flu in Asia). Reports of an attack of respiratory illness had surfaced in the small town of La Gloria, Veracruz, in early March 2009, where the majority of the town had taken ill, but Mexican authorities initially thought it was caused by a more common strain of the virus called H3N2.59
In contrast, swine flu was the subject of obsessive media speculation from the very moment that it entered the United States. Few cases would have gone unnoticed. With these higher standards of reporting, the case fatality rate in the United States was reasonably reliable and took some of the worst-case scenarios off the table—but not until it was too late to undo some of the alarming predictions that had been made on the public record.
Self-Fulfilling and Self-Canceling Predictions
In many cases involving predictions about human activity, the very act of prediction can alter the way that people behave. Sometimes, as in economics, these changes in behavior can affect the outcome of the prediction itself, either nullifying it or making it more accurate. Predictions about the flu and other infectious diseases are affected by both sides of this problem.
A case where a prediction can bring itself about is called a self-fulfilling prediction or a self-fulfilling prophecy. This can happen with the release of a political poll in a race with multiple candidates, such as a presidential primary. Voters in these cases may behave tactically, wanting to back a candidate who could potentially win the state rather than waste their vote, and a well-publicized poll is often the best indication of whether a candidate is fit to do that. In the late stages of the Iowa Republican caucus race in 2012, for example, CNN released a poll that showed Rick Santorum surging to 16 percent of the vote when he had been at about 10 percent before.60 The poll may have been an outlier—other surveys did not show Santorum gaining ground until after the CNN poll had been released.61 Nevertheless, the poll earned Santorum tons of favorable media coverage and some voters switched to him from ideologically similar candidates like Michele Bachmann and Rick Perry. Before long, the poll had fulfilled its own destiny, with Santorum eventually winning Iowa while Bachmann and Perry finished far out of the running.
More subtle examples of this involve fields like design and entertainment, where businesses are essentially competing with one another to predict the consumer’s taste—but also have some ability to influence it through clever marketing plans. In fashion, there is something of a cottage industry to predict which colors will be popular in the next season62—this must be done a year or so in advance because of the planning time required to turn around a clothing line. If a group of influential designers decide that brown will be the hot color next year and start manufacturing lots of brown clothes, and they get models and celebrities to wear brown, and stores begin to display lots of brown in their windows and their catalogs, the public may well begin to comply with the trend. But they’re responding more to the marketing of brown than expressing some deep underlying preference for it. The designer may look like a savant for having “anticipated” the in color, but if he had picked white or black or lavender instead, the same process might have unfolded.63
Diseases and other medical conditions can also have this self-fulfilling property. When medical conditions are widely discussed in the media, people are more likely to identify their symptoms, and doctors are more likely to diagnose (or misdiagnose) them. The best-known case of this in recent years is autism. If you compare the number of children who are diagnosed as autistic64 to the frequency with which the term autism has been used in American newspapers,65 you’ll find that there is an almost perfect one-to-one correspondence (figure 7-4), with both having increased markedly in recent years. Autism, while not properly thought of as a disease, presents parallels to something like the flu.
“It’s a fascinating phenomenon that we’ve seen. In diseases that have no causal mechanism, news events precipitate increased reporting,” I was told by Dr. Alex Ozonoff of the Harvard School for Public Health. Ozonoff received his training in pure mathematics and is conversant in many data-driven fields, but now concentrates on applying rigorous statistical analysis to the flu and other infectious diseases. “What we find again and again and again is that the more a particular condition is on people’s minds and the more it’s a current topic of discussion, the closer the reporting gets to 100 percent.”
Ozonoff thinks this phenomenon may have been responsible for some of the velocity with which swine flu seemed to have spread throughout the United States in 2009. The disease was assuredly spreading rapidly, but some of the sharp statistical increase may have come from people reporting symptoms to their doctors which they might otherwise have ignored.
If doctors are looking to make estimates of the rate at which the incidence of a disease is expanding in the population, the number of publicly reported cases may provide misleading estimates of it. The situation can be likened to crime reporting: if the police report an increased number of burglaries in a neighborhood, is that because they are being more vigilant and are catching crimes that they had missed before, or have made it easier to report them?* Or is it because the neighborhood is becoming more dangerous? These problems are extremely vexing for anyone looking to make predictions about the flu in its early stages.
Self-Canceling Predictions
A self-canceling prediction is just the opposite: a case where a prediction tends to undermine itself. One interesting case is the GPS navigation systems that are coming into more and more common use. There are two major north-to-south routes through Manhattan: the West Side Highway, which borders the Hudson River, and the FDR Drive, which is on Manhattan’s east side. Depending on her destination, a driver may not strongly prefer either thoroughfare. However, her GPS system will tell her which one to take, depending on which has less traffic—it is predicting which route will make for the shorter commute. The problem comes when a lot of other drivers are using the same navigation systems—all of a sudden, the route will be flooded with traffic and the “faster” route will turn out to be the slower one. There is already some theoretical66 and empirical67 evidence that this has become a problem on certain commonly used routes in New York, Boston, and London, and that these systems can sometimes be counterproductive.
This self-defeating quality can also be a problem for the accuracy of flu predictions because their goal, in part, is to increase public awareness of the disease and therefore change the public’s behavior. The most effective flu prediction might be one that fails to come to fruition because it motivates people toward more healthful choices.
Simplicity Without Sophistication
The Finnish scientist Hanna Kokko likens building a statistical or predictive model to drawing a map.68 It needs to contain enough detail to be helpful and do an honest job of representing the underlying landscape—you don’t want to leave out large cities, prominent rivers and mountain ranges, or major highways. Too much detail, however, can be overwhelming to the traveler, causing him to lose his way. As we saw in chapter 5 these problems are not purely aesthetic. Needlessly complicated models may fit the noise in a problem rather than the signal, doing a poor job of replicating its underlying structure and causing predictions to be worse.
But how much detail is too much—or too little? Cartography takes a lifetime to master and combines elements of both art and science. It probably goes too far to describe model building as an art form, but it does require a lot of judgment.
Ideally, however, questions like Kokko’s can be answered empirically. Is the model working? If not, it might be time for a different level of resolution. In epidemiology, the traditional models that doctors use are quite simple—and they are not working that well.
The most basic mathematical treatment of infectious disease is called the SIR model (figure 7-5). The model, which was formulated in 1927,69 posits that there are three “compartments” in which any given person might reside at any given time: S stands for being susceptible to a disease, I for being infected by it, and R for being recovered from it. For simple diseases like the flu, the movement from compartment to compartment is entirely in one direction: from S to I to R. In this model, a vaccination essentially serves as a shortcut,* allowing a person to progress from S to R without getting ill. The mathematics behind the model is relatively straightforward, boiling down to a handful of differential equations that can be solved in a few seconds on a laptop.
FIGURE 7-5: SCHEMATIC OF SIR MODEL
The problem is that the model requires a lot of assumptions to work properly, some of which are not very realistic in practice. In particular, the model assumes that everybody in a given population behaves the same way—that they are equally susceptible to a disease, equally likely to be vaccinated for it, and that they intermingle with one another at random. There are no dividing lines by race, gender, age, religion, sexual orientation, or creed; everybody behaves in more or less the same way.
An HIV Paradox in San Francisco
It is easiest to see why these assumptions are flawed in the case of something like a sexually transmitted disease.
The late 1990s and early 2000s were accompanied by a marked rise in unprotected sex in San Francisco’s gay community,70 which had been devastated by the HIV/AIDS pandemic two decades earlier. Some researchers blamed this on increasing rates of drug use, particularly crystal methamphetamine, which is often associated with riskier sexual behavior. Others cited the increasing effectiveness of antiretroviral therapy—cocktails of medicine that can extend the lives of HIV-positive patients for years or decades: gay men no longer saw an HIV diagnosis as a death sentence. Yet other theories focused on generational patterns—the San Francisco of the 1980s, when the AIDS epidemic was at its peak, was starting to feel like ancient history to a younger generation of gay men.71
The one thing the experts agreed on was that as unprotected sex increased, HIV infection rates were liable to do so as well.72
But that did not happen. Other STDs did increase: the number of new syphilis diagnoses among men who have sex with men (MSM)73—which had been virtually eradicated from San Francisco in the 1990s—rose substantially, to 502 cases in 2004 from 9 in 1998.74 Rates of gonorrhea also increased. Paradoxically, however, the number of new HIV cases did not rise. In 2004, when syphilis reached its highest level in years, the number of HIV diagnoses fell to their lowest figure since the start of the AIDS epidemic. This made very little sense to researchers; syphilis and HIV are normally strongly correlated statistically, and they also have a causal relationship, since having one disease can make you more vulnerable to acquiring the other one.75
The solution to the paradox, it now appears, is that gay men had become increasingly effective at “serosorting”—that is, they were choosing sex partners with the same HIV status that they had. How they were able to accomplish this is a subject of some debate, but it has been documented by detailed behavioral studies in San Francisco,76 Sydney,77 London, and other cities with large gay populations. It may be that public health campaigns—some of which, wary of “condom fatigue,” instead focused on the notion of “negotiated safety”—were having some positive effect. It may be that the Internet, which to some extent has displaced the gay bar as the preferred place to pick up a sex partner, has different norms for disclosure: many men list their HIV status in their profiles, and it may be easier to ask tough questions (and to get honest responses) from the privacy of one’s home than in the din of the dance hall.78
Whatever the reason, it was clear that this type of specific, localized behavior was confounding the simpler disease models—and fortunately in this case it meant that they were overpredicting HIV. Compartmental models like SIR assume that every individual has the rate of susceptibility to a disease. That won’t work as well for diseases that require more intimate types of contact, or where risk levels are asymmetric throughout different subpopulations. You don’t just randomly walk into the grocery store and come home with HIV.
How the Models Failed at Fort Dix
Even in the case of simpler diseases, however, the compartmental models have sometimes failed because of their lax assumptions. Take measles, for instance. Measles is the first disease that most budding epidemiologists learn about in their Ph.D. programs because it is the easiest one to study. “Measles is the model system for infectious disease,” says Marc Lipsitch, a colleague of Ozonoff’s at Harvard. “It’s unambiguous. You can do a blood test, there’s only one strain, and everyone who has it is symptomatic. Once you have it you don’t have it again.” If there’s any disease that the SIR models should be able to handle, it should be measles.
But in the 1980s and early 1990s, there were a series of unusually severe measles outbreaks in Chicago that epidemiologists were having a tough time predicting. The traditional models suggested that enough Chicagoans had been vaccinated that the population should have achieved a condition known as “herd immunity”—the biological equivalent of a firewall in which the disease has too few opportunities to spread and dies out. In some years during the 1980s, however, as many as a thousand Chicagoans—most of them young children—caught measles; the problem was so frightening that the city ordered nurses to go door-to-door to administer shots.79
Dr. Robert Daum, a pediatrician and infectious disease specialist who works at University of Chicago hospitals, has studied these measles outbreaks in great depth. Daum is a doctor’s doctor, with a dignified voice, a beard, and a detached sense of humor. He had just returned from Haiti, where he had assisted with relief efforts for the 2010 earthquake, when I visited him and two of his colleagues in Chicago.
Chicago, where I lived for thirteen years, is a city of neighborhoods. Those neighborhoods are often highly segregated, with little mixing across racial or socioeconomic lines. Daum discovered that the neighborhoods also differed in their propensity toward vaccination: inner-city residents in Chicago’s mostly poor, mostly black South Side were less likely to have had their children get their MMR (measles, mumps, and rubella) shots. Those unvaccinated children were going to school together, playing together, coughing and sneezing on one another. They were violating one of the assumptions of the SIR model called random mixing, which assumes that any two members of the population are equally likely to come into contact with each other. And they were spreading measles.
This phenomenon of nonrandom mixing may also have been the culprit in the swine flu fiasco of 1976, when scientists were challenged to extrapolate the national H1N1 threat from the cases they had seen at Fort Dix. The swine flu strain—now known as A/New Jersey/76—appeared so threatening in part because it had spread very quickly throughout the base:
230 confirmed cases were eventually diagnosed in a period of two or three weeks.80 Thus, scientists inferred that it must have had a very high basic reproduction ratio (R0)—perhaps as high as the 1918 pandemic, which had an R0 of about 3.
A military platoon, however, is an usually disease-prone environment. Soldiers are in atypically close contact with one another, in cramped settings in which they may be sharing essential items like food and bedding materials, and in which there is little opportunity for privacy. Moreover, they are often undergoing strenuous physical exercise—temporarily depleting their immune systems—and the social norm of the military is to continue to report to work even when you are sick. Infectious disease has numerous opportunities to be passed along, and so transmission will usually occur very quickly.
Subsequent study81 of the outbreak at Fort Dix has revealed that the rapid spread of the disease was caused by these circumstantial factors, rather than by the disease’s virulence. Fort Dix just wasn’t anything like a random neighborhood or workplace somewhere in America. In fact, A/New Jersey/76 was nothing much to worry about at all—its R0 was a rather wimpy 1.2, no higher than that of the seasonal flu. Outside a military base, or a roughly analogous setting like a college dormitory or a prison, it wasn’t going to spread very far. The disease had essentially lived out its life span at Fort Dix, running out of new individuals to infect.
The fiasco over A/New Jersey/76—like the HIV/syphilis paradox in San Francisco, or the Chicago measles outbreaks of the 1980s—speaks to the limitations of models that make overly simplistic assumptions. I certainly do not mean to suggest that you should always prefer complex models to simple ones; as we have seen in other chapters in this book, complex models can also lead people astray. And because complex models often give more precise (but not necessarily more accurate) answers, they can trip a forecaster’s sense of overconfidence and fool him into thinking he is better at prediction than he really is.