Snowball in a Blizzard
Page 7
With this rapid improvement in CT resolution came an ever-greater ability to find pulmonary emboli, and the incidence of PEs from 1998 (the year in which CT pulmonary angiography made its debut) to 2006—a scant eight years—nearly doubled. However, the mortality from PE hardly budged during this time. As with the cancers discussed above, the CT enabled us to find many actual pulmonary emboli, but the PEs we have found on the whole haven’t helped prevent death.
FIGURE 1.4. A CT pulmonary angiogram. It does not take a radiographer to see the superior detail of a CT when compared to a VQ scan. The dark gray areas in the midst of the white area that appears like an upside-down “T” are blood clots. This patient has large PEs that are almost certainly clinically significant, but these scans can also find much smaller clots in more distal arteries, the importance of which is unclear. Nevertheless, these smaller blood clots are also usually treated, with equally unclear benefits.
SOURCE: Image courtesy of Dr. Hao Lo of UMass Memorial Medical Center.
One of the important causes of the rise of overdiagnosis, especially in PEs, can be traced to the speed by which these technologies have arisen. The testing has developed at a rate that has outpaced the researchers trying to study its effectiveness, and so we cannot be certain of the true meaning of the pathology we discover when some tests turn up positive. This leads to one of the most important ways by which we rush headlong from overdiagnosis into overtreatment: we reify our diagnoses. That is, we will regard it as a thing—a real, unambiguous, categorically true disease that requires treatment, and in many cases fairly serious treatment, which always carries the risk of harm.
Suppose a fifty-year-old male patient comes to the emergency room with some vague chest discomfort and a rapid heart rate after a long flight across the country. He has no history of heart disease, but both of his parents had heart attacks at young ages, so he is concerned the same could be happening to him. During his stay in the ER, the rapid heart rate goes away, as does the chest discomfort. The workup for a heart attack is negative, and the physician, rightfully concerned that the long flight could have led to a pulmonary embolism (long periods of limited movement are one of the biggest risk factors), orders a CT pulmonary angiogram. An hour later, the physician receives a report from the radiologist that the patient has a “subsegmental pulmonary embolus”—fancy medical talk meaning that there’s a blood clot in one of the smaller arteries in the lungs. Not long after, the patient is admitted to the hospital and started on a blood-thinning medication, heparin.
This could be a story of a genuine PE that might have taken this man’s life. Likewise, it might be a story of overdiagnosis. For the most part, we don’t know what happens to people with small or medium-sized pulmonary emboli if they are left alone; there is some evidence from a small number of studies that these people do perfectly fine without treatment, but there’s just not enough data to know that with great confidence. It may, in fact, be quite normal for everyone on long flights to have some level of blood clots in their bodies, whether in their arms or legs or lungs, just because they have moved so little over several hours’ time. Then, over a period or hours or days, those minor blood clots resolve, never coming to medical attention. Did this man, who will soon be committed to months of blood-thinning medication, have one of those kinds of pulmonary emboli, and might he have ignored those symptoms had it not been for his family history of a different problem altogether?
The answer to this question isn’t currently known, but what is clear is that he now carries an unambiguous diagnosis. The ER physician who was seeing this gentleman has the unenviable task of taking care of potentially dozens of patients simultaneously, some with trivial medical problems not even requiring an ER visit, and some with life-threatening issues. ER doctors do not have the luxury of deliberating carefully on the underlying biology of their patients’ conditions; they have to make quick, real-time verdicts as to whether someone has a disease, whether treatment is required, and whether admission is necessary. So, when a radiologist’s reading of a pulmonary embolism comes to that physician’s attention, he or she will, quite understandably, reify it.* It’s a thing—a PE—and will be approached like all other PEs that provoke a response, for it is a life-threatening condition requiring blood-thinning medications that come with real risks.
”Reify” from the Latin res, meaning “thing.” To use the term “thingify” would prevent a doctor from sounding, well, doctory.
The process of reification then continues: the physician who admits the patient to the hospital will be given the diagnosis in advance by the ER physician. It is not uncommon for the handoff between the ER and another specialist to consist of a few words such as “he’s a fifty-year-old with a PE.” In some hospitals, this exchange may never even take place directly if the ER physician has finished his or her shift and the admitting physician hasn’t started evaluating the patient. By hearing the terse summary of a busy ER doctor and reading the radiology report, an admitting physician will already be heavily influenced in favor of thinking about this as a PE, and it may never occur to that physician to think about it as a “PE”—that is, a disease that may not really be a disease, a diagnosis with quotation marks around it. Needless to say, the patient, who came to the ER in the first place because something felt wrong, will be powerfully motivated to believe in the diagnosis as well. This is how inherently fuzzy data morphs into a full-blown categorical diagnosis. We can see the effects at the population level, but it’s very difficult to recognize the process, much less stop it, at the individual level.
What conclusions can we draw from this? I don’t mean to suggest that patients and their families should be conversant with all manner of tests such that they should have their own independent opinions about a given diagnosis. But it is reasonable to have conversations about the confidence doctors have in their diagnoses—where on the spectrum of certainty does a given diagnosis fall? I’ll talk more about conversations with doctors toward the end of the book. I also think that both doctors and patients need to carefully consider the downsides of treatments, especially in an age of expanding diagnoses. It’s instructive to remember the bedrock principle of our profession: first, do no harm.
More Medicine. Better Medicine?
Thus far, I’ve described the process of overdiagnosis purely in terms of how a physician tries to solve a problem, but other factors drive overdiagnosis with equal force. Consider the CT scan used to find the might-be-PE in our hypothetical patient. Hospitals function in a competitive marketplace, where patient volume (and the insurance reimbursement that comes with them) constitutes their lifeblood. Owning the most advanced scanner in the area allows for a flashy advertisement showcasing that hospital’s state-of-the-art facilities. CT scanners are very expensive, so the optimal way for that hospital to recoup its costs is by scanning as many people as possible.
Lest you think this is the beginning of a conspiracy story, when such a situation arises, nobody is forcing the ER physician to use it. Hospital CEOs don’t send out memos offering bonuses for doctors who scan the most patients per month, and they don’t walk around the wards trying to drum up business for radiology. But they don’t have to, because there are no mechanisms to discourage their use either. When hospitals put up billboards in their communities as having state-of-the-art radiology facilities, physicians can rightly assume that they should feel free to scan away.
The structural factors driving overdiagnosis hardly end there, however. The makers of the scanners, of course, also have a financial interest in selling as much product as possible, and they advertise accordingly. The makers of pharmaceuticals used to treat these various conditions benefit from pseudodisease: more diagnoses means more patients, more patients means more medication, and more medication means more revenue. Overdiagnosis is good for business.
I don’t think it requires cynicism to understand overdiagnosis, however. Pharmaceutical company executives as well as their rank and file may genuinely believe that some
new medication they have developed for a serious condition carries a real benefit, even in the face of growing evidence suggesting that a given disease is overdiagnosed and overtreated. No conspiracy is required because everyone involved is already motivated to believe that more medicine equals better medicine.
What’s important in understanding overdiagnosis is that although the incentives of hospitals and the biomedical industry may be different, doctors and patients are no less susceptible to motivated thinking. The manner by which it impairs clinical decision making is just as insidious, and its effects run just as deep. As health-care professionals, we want to believe that we are providing the best care for our patients by performing tests, and so we have become overconfident in what those tests tell us. As patients, we want to believe that greater technology creates greater benefits, especially because that is demonstrably true in so many other aspects of our lives, and so we tend to greet diagnoses without skepticism, even when we feel well.
By such means are sticks turned into snakes.
Overdiagnosis can be thought of as the perfect storm of our technological progress. We are hardwired by evolution to overreact to possible threats to our lives and by doing so ensure that we react appropriately to actual dangers; we develop tests that likewise find more problems than actually exist. These two factors work synergistically to expand the number of diagnoses of major disease, particularly over the past generation. Moreover, the system in which doctors and patients function is designed to create strong incentives to “make a diagnosis” (the satisfaction of a job well done for a doctor, and the relief of knowing for a patient) and create disincentives to “miss” a diagnosis (lawsuits). And, because of uncertainty, we cannot know which patients are being overdiagnosed—we can only see that the process is happening by looking at group data. Yet the logical consequence of a diagnosis is a treatment, and, as I’ll discuss in detail later, treatments carry risks. Thus, the ultimate effect of overdiagnosis is that we consign a number of patients to needless harm.
I’ll carry this subject forward as we look at screening mammograms. Screening mammograms, like screening PSA tests, are designed to detect disease in people who have no outward evidence of disease. The very absence of symptoms has a dramatic effect on the utility of such a test; in other words, the amount of uncertainty in interpreting a screening mammogram is much higher than interpreting one in which a woman presented with a lump in her breast—even though it is the exact same test. If this sounds strange, it is. How this happens is something I’ll explore in detail, for it is crucial to appreciate the mathematics of overdiagnosis to grasp how much misunderstanding surrounds the practice of mammography. So, to help familiarize ourselves with how a patient’s overall state of health can influence the uncertainty of a test, before diving into mammograms let’s take a brief look at one patient’s experience with a different screening test, the results of which led the person to assume his life would be over very soon.
2
VIGNETTE: THE PERILS OF PREDICTIVE VALUE
Perception requires imagination because the data people encounter in their lives are never complete and always equivocal.
—LEONARD MLODINOW
“The data people encounter in their lives are never complete and always equivocal”—so says physicist and author Leonard Mlodinow in his book The Drunkard’s Walk, a meditation on randomness and how people choose to incorporate it, or ignore it altogether, as they go about their daily lives. But how much imagination is required for a person to perceive the equivocal nature of a blood test that informs them they are going to die?
A lot, as it turns out.
Yet these tests are administered all the time, and only infrequently do patients or doctors account for their equivocality. We saw this in the previous chapter when we looked at the prostate specific antigen test: positive tests only very rarely uncovered disease that would have led to terrible outcomes, and yet because of the equivocal data that the PSA testing produced in groups, many men ended up enduring fairly terrible treatments that they otherwise would not have undergone.
Another way of thinking about this is to ask the following question: What happens when we approach the middle of the spectrum of certainty? In this equivocal territory, it becomes vitally important to understand the size of the risks and the magnitude of the benefits. Again, we observed this with PSA: the risks of being overdiagnosed were quite real, and fairly common, while the benefit, if one exists at all, is on the order of one life saved per one thousand men over ten years’ time. When I go on to discuss screening mammograms in the following chapter, we’ll need to keep this in mind.
But how do we overdiagnose? What are its statistical mechanics? Why can’t we just develop a test that’s 99 percent accurate and be done with it?
In fact, we can, and we have. Most tests aren’t that good, but some are, and despite this we can still produce overdiagnosis. To understand this point is to understand at least part of the controversy about screening mammogram recommendations. So to more fully appreciate this phenomenon, let’s see how this played out when one patient learned the news of a routine blood test.
Mlodinow’s Story
On a Friday afternoon in 1989, one man in California received some very discouraging news from his doctor. The chances that he would be dead within a decade were “999 out of 1,000,” according to the doctor. “I’m really sorry,” the doc added as he relayed the news, by telephone.
The test was, of course, for HIV. It was a positive test as part of a routine insurance screen. The gentleman in question had been diagnosed with the virus that would eventually cause AIDS and lead to his demise. At that time, there was very little in the way of treatment: AZT, the first drug for HIV, had been approved two years before, but patients who took the drug got better initially only to succumb to illness as the virus became resistant to the drug’s effects. So-called triple therapy, which has allowed doctors to turn HIV into a chronic and manageable disease, was still more than five years away. This test signified a death sentence, although in the world of medicine at that time it constituted yet another routine blip in the ever-growing pile of cases as the HIV epidemic spread, especially in California.
There was one aspect of this test that was unusual, however. It involved the person being tested: the very same Leonard Mlodinow whose quote opened this chapter. Because of his training, Mlodinow understood the nature of numbers and statistics. After what must have been a very harrowing few days and perhaps weeks of concentrated thought and research on the HIV test, he was able to figure out something quite remarkable: his “positive” test for HIV could actually be interpreted to mean that it probably wasn’t positive after all. Which, in fact, was the case: Mlodinow wasn’t infected with HIV, and he has kicked around ever since, producing several very readable works of popular science to an audience grateful for the misdiagnosis.
How could this be? Overall, at the time the HIV test was, in fact, quite accurate. A person with HIV was very likely to have a positive test, and an uninfected person such as Professor Mlodinow was very likely to have a negative one. And yet the counterintuitive third fact is that, despite these two statistical truths, a random positive HIV test was very likely to be a mistake. The insurance company and the physician both got it badly wrong.
Mlodinow’s story throws a few features of modern medicine into sharp relief. The first and most obvious is the way in which highly accurate tests can nevertheless lead to deeply inaccurate interpretations. A second issue Mlodinow’s story raises is the process by which we understand what it is for a physician to “know” something. Part of why the story is so jarring is how spectacularly the physician fails Stats 101: rather than having 999 in 1,000 odds of being infected with HIV, Mlodinow relates that more likely he had about 1 in 9 odds. This is a whopper of a mistake, and what makes it so troubling is that we’re not inclined to think of physicians as the kind of people who make such critical errors. When coupled with his questionable judgment in relaying such news ov
er the phone rather than scheduling a face-to-face visit in the clinic, the doc doesn’t come across as particularly professional.
But again, how could this be? The answer can be found in the idea of what constitutes predictive value. Predictive value refers to whether a given test result can truly be thought of as representing the presence or absence of disease—that is, if a test is positive and has a high positive predictive value, then that person probably does have the disease. Similarly, if a test is negative and has a high negative predictive value, then a negative test really is cause for reassurance. For instance, in a few chapters, we’ll see how the Lyme disease test has very good negative predictive value if a patient has been symptomatic for several months—if the test is negative, then whatever the problem is, isn’t Lyme.
However, accuracy and predictive value aren’t the same thing, and this is because predictive value is determined in part by the probability that someone has a disease. Unsurprisingly, this is referred to as the pretest probability. In other words, even when dealing with a fairly accurate test, if the pretest probability of someone having a given disease is low, then the positive predictive value will suffer. The lower the pretest probability, the lower the positive predictive value. Similarly, the lower a test’s accuracy, the lower the positive predictive value.
The reason Leonard Mlodinow’s positive HIV test was unlikely to be positive is because his pretest probability was low. You can’t perform a blood test to define someone’s pretest probability, but we can infer that it was low because Mlodinow was being tested as part of an insurance screen without any signs of illness. If, by contrast, he was experiencing unintentional weight loss, moderate fatigue, and a persistent nagging cough, especially as someone living in that place at that time, his pretest probability would have been much higher, and so the likelihood that his positive test was really positive would have been much higher.