Snowball in a Blizzard
Page 5
My goal here is not to take a strong side in this dispute, although on the whole I do find Dr. Rosenhan’s critique compelling and think the data does indeed support his thesis. Regardless, I wish to point out that the legacy of the Rosenhan experiment is almost universally understood through the lens of psychiatry and what it means for that profession, and nobody ever considers the wider implications it may have for the rest of medicine. Psychiatrists are often, if only politely and subtly, dismissed by their colleagues for engaging in what is seen as a quasimedieval enterprise. Internists, surgeons, and obstetricians can all employ tests that provide binary or quantifiable results. Their ability to diagnose conditions is vastly superior to the crude tools of psychiatry, the reasoning goes. Why should they be surprised that someone showed psychiatrists, however bright they are, don’t know what they’re doing?
But I would argue the experiment’s results can in fact be applied well beyond the realm of psychiatry—they can be generalized to all doctors when they consider all diagnoses. And there is a growing body of research showing that nonpsychiatric doctors can be just as flawed in their judgments as insanity doctors who are unable to see sane people standing right in front of them. Rosenhan thought he was doing an experiment on diagnosis in psychiatry, but at a deeper level he was performing an experiment on diagnosis itself, and it wasn’t psychiatrists per se who failed in this regard, it was doctors. The fact that very few physicians outside of psychiatry who have read this research think it applies to them speaks more to the provincialism of these specialties than to anything else. As we shall see, there is increasing evidence that not only are all physicians prone to the same kinds of cognitive errors witnessed in the Rosenhan experiment, but there is a good chance physicians are doing harm as a result.
This book is primarily concerned with data. As I will try to show in the chapters to come, most data in the realm of medicine is inherently fuzzy—tests are almost never 100 percent positive or 100 percent negative, and there are real-world consequences related to just how certain one can be that a test’s result matches the underlying reality. We are only very rarely at the far end of the spectrum of certainty, where we can be extremely confident of our diagnoses and the benefits of our treatments. Some diseases require only a single test whose results can be interpreted with high degrees of certainty, but many others require a careful consideration of several pieces of information, at least some of which can be contradictory.
The human brain making a diagnosis can be thought of as another illustration of this principle. The psychiatrists in the Rosenhan experiment made two kinds of diagnostic errors. First, they diagnosed insanity in patients who were not insane: we call that kind of diagnosis a “false positive.” Likewise, when tipped off that pseudopatients would be wandering the halls of their institutions, the psychiatrists suddenly saw sanity in patients with genuine mental illness, and that is known as a “false negative.” As I will discuss in the chapters dealing with mammography and Lyme disease, these two concepts of false-positive and false-negative errors are critically important to seeing how uncertainty affects doctors and patients alike. The error rates associated with false-positive and false-negative diagnoses can be quantified. Thus, doctors who process this data and offer a diagnosis can be thought of as “diagnosis machines,” and, like all tests, these diagnosis machines are subject to overcalls and undercalls.
This chapter, then, looks at human cognition, but it does so with the perspective that we are, in some sense, error machines whose characteristics can be mathematically described in similar ways to all the other tests in medicine. To do so, we will need to consider cognitive biases, the “how” of our mental errors, as well as our motivations for doing so—the “why” of them. This is only a brief detour, as there are many fine books written about the subject of cognition. In particular, the experimental psychologist Daniel Kahneman’s book Thinking Fast and Slow, whose quote began the chapter, and to which we will return shortly, takes on this topic specifically.
Why did the psychiatrists in the Rosenhan experiment fail so profoundly? Were they merely negligent? Or was there some biological principle at work that predisposed them to miss the obvious? I would argue for the latter. Consider the following story about one of the pseudopatients who truthfully reported the state of his emotional life to a psychiatrist during his initial evaluation. He said he was close with his mother as a child and distant from his father, but as he grew older he found himself becoming closer to his father and more distant from his mother. He was happily married but reported having occasional spats, and had children who on rare occasion had been spanked (keeping in mind this was in the early 1970s when spanking was a less frowned-upon child-rearing method than it is today). In short, whatever the powerful psychological issues that drove his perception of his relationships, none of this story seems particularly disturbing and could describe any number of perfectly sane people we know, including us.
In the psychiatrist’s mind, however, the story was a demonstration of the severe pathology lurking underneath:
This white 39-year-old male . . . manifests a long history of considerable ambivalence in close relationships, which begins in early childhood. A warm relationship with his mother cools during his adolescence. A distant relationship with his father is described as becoming very intense. Affective stability is absent. His attempts to control emotionality with his wife and children are punctuated by angry outbursts and, in the case of the children, spankings. And while he says that he has several good friends, one senses considerable ambivalence embedded in those relationships also. (my emphasis)
One can’t claim that the psychiatrist’s misdiagnosis was due to being distracted during the evaluation: this note suggests that the doctor paid careful attention to his history. For instance, the note points to the fact of the pseudopatient’s changing relationship to his parents as being psychologically significant. But everyone has a rich inner psychological life filled with emotional conflicts worthy of exploration, whether in the setting of psychotherapy or not. In this case, this fact was interpreted as a complete absence of “affective stability,” which is psychiatryspeak for an inability to form solid, lasting relationships with people. Routine conflicts with his wife—it would be pathological not to have occasional disputes in the course of a marriage—indicated a controlling personality. Finally, the fact that he had friends should have been a powerful piece of evidence suggesting he was living a healthy life, but instead the psychiatrist used this fact to conclude nearly the complete opposite, by sensing “considerable ambivalence” of the patient toward his friends.
The process by which a psychiatrist manages to involute a person’s mundane-but-healthy relationships into deep psychopathology requiring prolonged hospitalization is known as the confirmation bias. It is powerful, and everyone is capable of suffering from it. Confirmation bias occurs when we misinterpret incoming information because we have a strongly held hypothesis that makes us want to see things as they are not. Ultimately, confirmation bias takes the form of circular reasoning. A thirty-nine-year-old man discussing occasional conflicts in his personal life? That’s a sign of an unstable personality. How do we know he is personally unstable? Because he presented to a psychiatric institution complaining of auditory hallucinations. But how do we know that he has paranoid schizophrenia rather than having suffered from an isolated sensory event? Well, just look at the conflict in his interpersonal relationships! The mistake seems so obvious from our perch as spectators, but in the case of the Rosenhan experiment, we were in on the fix from the start. How well would we have fared had we been in their shoes?
Confirmation bias can be subtle, and it is far from the only cognitive error to which doctors—that is, human beings—can be prone. Psychological theorists are not fully settled on the question of why we make so many of these errors in data interpretation, but over the past generation a group of psychologists has sought to explain our systematic tendency toward various cognitive errors through the len
s of evolution. They have proposed a model known as error management theory, which incorporates uncertainty right at its heart. As humans moving through a world constantly bombarded by data, we are forced to make hypotheses about our environments and to act on them for survival, despite our inability to be completely certain our hypotheses are correct. Is it safe to go outside? Is a storm coming? Is this other human an enemy, a friend, or neither? When we face questions such as these, we are in the milieu of uncertainty, and our responses to these questions can sometimes have life-or-death consequences. Error management theory posits that, through evolution, we have developed into creatures with a predilection for specific cognitive biases. Over thousands of generations, the theory goes, it paid off for us to look at the world in a skewed way. There is an evolutionary advantage in misinterpreting our environment to an extent.
Why might it be advantageous to us to make repeated mental mistakes? Consider the following scenario: You are living in a forest tens of thousands of years ago, at the dawn of modern humanity, when our ancestors were more or less the same as we are in a physiological and anatomical sense. As you move through this forest seeking food, you are aware that the forest floor has all manner of objects. Many of them are just sticks, but some of them are poisonous snakes. Most of the time there is enough visual information for you to separate the sticks from the snakes, but what happens when the two blur together? What do you do when, say, it is twilight, and some small rodent of which you were unaware had scurried away and in doing so made a stick roll over so that it can’t be easily distinguished from a moving snake?*
The snake/stick hypothetical is the creation of Professor Martie Haselton of UCLA; see bibliography for further information.
Error management theory suggests that we can make two errors of interpretation in this context. We can overdiagnose the situation by regarding the stick as a snake, and flee—that’s a false-positive error. Likewise, we can underdiagnose the situation regarding a harmful snake as a stick, which is a false-negative error. It should be clear that these two misinterpretations of environment, although equal in the degree of the cognitive error, have very different real-world consequences. If we think a stick is a poisonous snake, we flee from a harmless object at little overall cost (unless, of course, we run headlong into a real snake). If, however, we ignore the threat of the real snake, we can die. These two cognitive errors, therefore, are asymmetric: if you started out with tribes of people divided equally between those predisposed to snake overdiagnosis and snake underdiagnosis, you would eventually see more snake overdiagnosers in subsequent generations simply because the snake underdiagnosers would eliminate themselves from the gene pool from lethal bites. Error management theory argues that we are hardwired by evolution to misinterpret our surroundings, and that we are much more likely to overread a situation than we are to do the opposite, seeing patterns in data where in fact there is nothing but noise. We not only look for the snowballs in blizzards, we instinctively see ones that aren’t there. It is how we are built.
In medicine, this means we have a powerful tendency to overdiagnose disease. We see patterns of data that suggest to us that patients have disease when in fact they do not. And because of rapid advances in diagnostic technology, we have been doing more overdiagnosis in medicine than ever before.
The Rise of Pseudodisease
As part of a series wrapping up 2013 in news, National Public Radio highlighted some stories they regarded as the “best good news” of the year. Since news is typically about things gone wrong, and good news stories aren’t inherently predisposed to flashy headlines (“Something Works!”), the report noted that too much news could make you think humans can’t do anything right. To provide some pick-me-up perspective, they offered the three following examples: it is safer to fly now than ever before, life in sub-Saharan Africa is getting a lot better, and death from cancer is decreasing. “A person in their mid-50s has a chance of dying from cancer that’s 20 percent lower than a person of the same age in 1990, 1991,” said Dr. Otis Brawley, the chief medical officer of the American Cancer Society. He specifically cited the drops in the death rates of breast and colorectal cancer as being exemplary.
While factually accurate, the cancer entry might have been better served had a big asterisk been placed alongside it, insofar as one can place an asterisk on the radio.* Nestled as it was between two other items whose improvement came about almost exclusively through technology, a layperson could reasonably conclude that the “cancer is down” message is due to medicine’s ongoing march of progress via breakthrough scientific achievements. Yet the reality is a good deal more complicated, and our progress with respect to cancer is, at best, more of a mixed bag. For instance, lung cancer death rates are declining, but this is due almost completely to the reduction in smoking. It’s unquestionably a public health triumph, but the science of it was worked out nearly sixty years ago and isn’t due to any fancy technology. (If anything, the fact that it took so long for us to see our lung cancer mortality fall should be cause for mild embarrassment rather than celebration.) Not only is the drop in the lung cancer death rate not attributable to any scientific game changers, but as we have adopted more sophisticated technology as we screen for cancers, we’ve had to cope with effects more appropriately described as subtly sinister than unambiguously beneficial.
To be clear, Dr. Brawley has frequently attempted to raise public awareness about the problems associated with overdiagnosis. I include his quote here because it serves as a useful instance in understanding how overdiagnosis can lead to statistical versions of optical illusions.
Cancer in the early twenty-first century has had some unquestionable successes, at least some of which are due to the kind of genuine laboratory wizardry that local TV news outlets are so fond of displaying for their “health minute” broadcasts that I will cover in depth toward the end of the book.† But what is left unsaid in the stream of happy analysis is that medicine’s ability to recognize cancer is starting to show the same problems we witnessed in the psychiatrists of the Rosenhan experiment: we are seeing diseases that aren’t really there. Note that I say “diseases” and not “cancers”—more on that shortly.
Arguably, the most important advance in cancer treatment in the past generation has been the development of the drug imatinib, which goes by the trade name Gleevec, for chronic myelogenous leukemia, or CML. Gleevec has largely turned CML from a fatal disease to a chronic one, and consequently should be thought of as a true game changer for that disease. But only about 6,000 people are diagnosed with CML annually, far less than the true killers in cancer, those of the lung, colon, breast, pancreas, and prostate, which account for just under 1 million diagnoses and 300,000 deaths per year.
How do we know this? There are two indirect but very persuasive lines of evidence. The first way involves cancer statistics over decades. The details are hidden within a database that is freely available online and is known as the Surveillance, Epidemiology, and End Results program, or SEER, as it is more commonly called. SEER is a by-product of the Nixon administration’s War on Cancer; today it contains all of the information on every type cancer ever diagnosed in the United States since the early 1970s, as well as all the deaths that result from those cancer diagnoses. Because of Nixon’s political calculation, we have an extremely good idea of how well our therapies stack up against the past and how well we are performing with respect to cancer treatment.‡ It is at least as important a legacy of his time in Washington as Watergate.
Irresistible political footnote: SEER, which is a tremendous achievement of public health and epidemiology, is only possible through a well-run, nonpartisan, technocratic organization funded by a centralized federal government. It would simply be impossible for this to be undertaken by a conglomeration of academic medical centers. I so tire of the canard that governments can’t do anything. Only a government could build SEER!
When the statistics relating to the diagnosis and mortality of cancers are analyzed ove
r this period, several of them show a remarkably similar pattern despite their underlying biological differences: the total number of diagnoses, year after year, tends to rise, while the total rate of death remains basically the same. In the Journal of the National Cancer Institute in 2010, H. Gilbert Welch, a professor of medicine at Dartmouth and the author of the groundbreaking book Overdiagnosed, performed a survey of five different cancers and saw this pattern, as seen on the following page.
Either this rise in the total number of cancer cases is due to overdiagnosis, Welch argued, or the increasing incidence is occurring simultaneous to major advances in these fields that is improving survival. But this seems implausible, as these advances aren’t hailed in the medical literature. Moreover, if there really were dramatic improvements, one would expect the data to “jump” with quantum changes in relatively short periods. The death rates of these cancers, by contrast, are smooth lines across time.
Why the rise? The answer is almost certainly because we have created technologies that are ever more sensitive at detecting abnormalities that we call cancer, and at a cellular level they are cancers, but they turn out not to be the kind of cancers that threaten lives—in short, we find “cancer” that isn’t “cancer” in the way both doctors and patients understand the term.