Snowball in a Blizzard
Page 6
FIGURE 1.1. Cancer incidence and annual cancer mortality over time.
SOURCE: Welch, H. G., “Overdiagnosis in Cancer,” Journal of the National Cancer Institute 2010; 102:605–613.
The single most convincing illustration of this can be found in the graphic on prostate cancer. As you can see in the figure, the rise in prostate cancer has a fairly slow but steady increase from 1975 until 1990, after which the diagnosis rate quickly spiked to more than double the number seen fifteen years earlier. The total number of cases of men diagnosed with prostate cancer has since declined—a feature not shared by any of the other diseases here with the possible exception of breast cancer—but is still well above the incidence when the statistics were first kept. Meanwhile, the death rate hardly budged during that time, hovering right around the mark of twenty-five deaths per 100,000 people.
What was it that led to this sharp increase in prostate cancer during the 1990s? Could it have been that men serving in wars were coming of age and some chemical they were exposed to during the war caused prostate cancer, such as Agent Orange? It’s a plausible hypothesis, but there’s a better one: by the early 1990s the prostate specific antigen test, or PSA, was becoming commonplace in the offices of urologists and primary care physicians across the country. Since there hadn’t really been a widely available test that looked for prostate cancer before the PSA, the fact of its existence led to its extensive use and caused a sharp rise in the incidence of prostate cancer. The death rate proceeded as if none of this was happening at all.
The rise in rates of most of the cancers shown in the figure can be attributed to some technology becoming popular. In addition to the PSA and the mammogram, ultrasound has become the mainstay for the initial diagnosis of thyroid cancer, while the CT scan has been used for kidney cancer. Only melanoma, whose diagnosis relies initially on physical exam, can’t be explained by a novel technology for screening. In that case, it is due to the rise in physician awareness through systematic medical education in medical school and residency; ironically, physician education seems to have the same effect on overdiagnosis as a CT scan has on, say, kidney cancer.
Cancer cannot be diagnosed by any of these technologies, however. What is required for a diagnosis is a piece of tissue viewed under a microscope by a trained pathologist. If this is so, one might wonder what pathologists are finding when given biopsies triggered by those ultrasounds, mammograms, and CT scans. The answer is that they are definitely finding cancer. The problem is that all the new cancers they are finding aren’t likely to end up killing patients. In short, the rise in diagnoses of these cancers can be seen as the principle of medical uncertainty in action: we can see that there are unnecessary diagnoses only by looking at the numbers in aggregate, seeing the incidence rise while the death rate remains unchanged.
Earlier, I was careful to say that cancer diagnosis over the past generation has resembled insanity diagnosis in the Rosenhan experiment because we have been seeing diseases that aren’t really there. I wrote that instead of saying that we were seeing nonexistent cancers because that’s not actually true. In these increasing numbers of biopsies, pathologists are finding clumps of cells whose overall appearance is bizarre, with the distorted architecture that is the hallmark of cancer. Unfortunately, in many cases they appear to be of little physiologic significance. They may be “cancer” in the strict technical sense of the word: a group of cells whose reproductive machinery has gone haywire. But it’s not “cancer” in the way that most patients and doctors view it, namely, something that will kill if left to its own devices.
We know this because of a series of autopsies performed on people who died from causes other than cancer and had their organs examined. This is the second line of evidence suggestive of an epidemic of cancer overdiagnosis: many, many people, especially older people, die with organs containing cancers that appear to be harmless. For instance, one study published in 2005 took prostate biopsies on Hungarian men who died with no known history of urological problems. It found that nearly 40 percent of them had evidence of prostate cancer. Among the oldest age group, the incidence was more than double that. Even more telling was a study published in 1985 that looked at the thyroids of Finnish adults who died of other causes. As with the prostate study, about 40 percent of these patients were found to have some evidence of thyroid cancer. But many of these cancers were small—so small, in fact, that they could have been missed because of the distance between each section that was taken. When this was taken into account, the researchers estimated that they could probably find some kind of cancerous cells in every patient if they had examined enough sections. Studies similar to these can be found for many cancers, and they all indicate that many people, especially older people, develop cancers during the course of their lives that don’t amount to much in any meaningful sense of the word.
One of the central problems of cancer overdiagnosis is directly related to uncertainty: we know that some of these patients with these cancers are at risk of dying from the disease, but we only have a vague idea of which ones are most at risk. A different way to picture the problem was suggested by psychology researcher Hal Arkes of Ohio State University, when discussing the effect that the PSA has had on prostate cancer screening. In a 2012 edition of the journal Psychological Science, he invited readers to consider two auditoriums filled with one thousand men, all of whom are fifty or older. The men in one auditorium have annual PSA for ten consecutive years; the men in the other auditorium do not. Arkes then drew upon a number of studies in prostate cancer research to demonstrate some surprising conclusions.
At the end of those ten years, about seventy men in the unscreened auditorium will be diagnosed with prostate cancer (that is, they won’t be diagnosed until they present with symptoms requiring an evaluation that leads to the diagnosis). In the screened auditorium, however, ninety will be diagnosed with prostate cancer. Nevertheless, by the end of those ten years, most studies show that about seven men in each auditorium will die of prostate cancer, meaning that for those twenty extra diagnoses, there’s no impact on the actual mortality. Now these numbers are estimates and as such are fiercely contested, but it’s important to provide a sense of perspective on the controversy. The debate about the value of PSA boils down to whether the number of men who die from prostate cancer in the screened auditorium is six rather than seven—or, in other words, whether one life is saved for every thousand men screened over ten years.
Harm, and the Optics of Benefit
For that saved life that may not even be there, the costs (both in terms of the literal cash expenditures of the medical system and the physical toll on the many men with elevated PSA tests) are profound. About 200 screened men will have a PSA level high enough to merit a biopsy, meaning that roughly 130 men will undergo an unnecessary invasive procedure. Nine men from this unnecessary biopsy group will require brief hospitalizations from complications such as infection or bleeding. Twenty screened men will receive a diagnosis of prostate cancer that would never have come to their attention, and most will receive some form of treatment (such as radiation) that can have major side effects; as many as a quarter of them will undergo radical prostatectomy with its attendant complications, including incontinence, recurrent urinary tract infections, and impotence. These data are summarized above.
FIGURE 1.2. The effects of PSA screening. This figure estimates the difference that ten years of annual PSA screening would have on one thousand men. More than 10 percent of all men would undergo unnecessary prostate biopsies (i.e., they wouldn’t have been biopsied if they hadn’t been screened with PSA). Twenty men would receive a diagnosis of prostate cancer they didn’t actually have and would suffer the consequences of treatment. The debate among statisticians, physicians, and other researchers is whether one life per thousand men, over ten years, is saved by this process.
Although different professional societies have provided different recommendations about which men should be tested with PSA (or whether they shou
ld be tested at all*), not even the most vociferous advocates for PSA believe that its maximal benefit exceeds one life saved per thousand men screened over ten or more years.
The US Preventive Services Task Force, whom we will encounter when discussing mammography, now recommends against the use of PSA for all men. As of 2013 the American Urological Association recommends that men aged fifty-five to sixty-nine should “talk with their doctors about the benefits and harms of testing” but no longer has a solid recommendation for PSA use in that age range. For all other men, they do not recommend it.
But among those patients who are diagnosed with cancer following an abnormal PSA (or a mammogram, or a thyroid ultrasound, or any of the other screening methods that have become increasingly prevalent) there’s a very different psychological effect. These people believe that they were saved by the screen, even if we can demonstrate in population studies that isn’t the case. At best, only one man in every thousand who is screened for prostate cancer can claim to have been saved by PSA, but in reality, ninety men—nearly 10 percent of all men screened—feel as if this was the case because they reasonably assume that their PSA picked up an asymptomatic abnormality that would have killed them otherwise. Not only do patients feel this way, but their doctors do as well, and that explains in part the undiminished belief of some mainstream physicians in the effectiveness of PSA despite these numbers.
If you have a screening test that suggests you have cancer, and you undergo a biopsy that tells you have cancer, and you have surgery or chemotherapy or both to remove that cancer, and you are alive several years after all of this, it is completely understandable to believe that your life was saved as a consequence of that screen. But it is more often than not an optical illusion, a pseudodisease. It appears in every way to be disease, only it would have no meaningful biological effect on the course of one’s life if left alone. We may live in a technologically advanced society with the tools of science within easy reach, but we—both doctors and patients—are committing the same kind of Type I, false-positive error as the mythical human ancestor who fled from a snake that was really only a stick. In that case, however, the costs of fleeing were relatively low. That same mental error, in the context of a “cancer snake” that’s more akin to a cancer stick, can result in unnecessary prostatectomies, mastectomies, thyroidectomies among many other -ectomies, to say nothing of the side effects of radiation and chemotherapy. Invariably, some of these procedures will actually become snakes, so to speak—that is, there will be major complications from the procedures themselves—and lead directly to the deaths of some of these stick-fleeing patients.
The optical illusion of pseudodisease can be teased out only with the kind of population studies that simulate Dr. Arkes’s “auditorium” thought experiment. These studies take years to perform and typically don’t receive the same amount of news coverage as is seen when some famous person was diagnosed with a cancer as a result of a screen. Thus, not only do patients and their doctors become convinced of the utility of screening, but the broader public does as well.
This illusion extends to the realm of statistics. Even though the death rate of many of these cancers has remained unchanged over decades, the increasing number of cancer diagnoses has had a profound impact on a different way to measure cancer treatment: survival length. Most cancers are measured in terms of five-year survival; if you are alive and cancer-free five years from the time of your initial diagnosis and treatment, then you are considered to be cured of that cancer. If twice as many people are diagnosed with a given cancer, but the death rate from that cancer remains unchanged, then the five-year survival rate will appear to double. This makes the screen look like an even better bargain and explains to a great extent what enables overdiagnosis. It’s so easy to spot the inherently bizarre behavior of psychiatrists who appear to regard everyone as insane but much more difficult to see that same behavior in one’s primary care physician. Or in ourselves, for that matter.
Reification
Thus far, I have discussed overdiagnosis almost exclusively through the lens of cancer, but the psychological factors that lead to a rash of cancer-that-isn’t-cancer diagnoses apply to a much broader set of diseases, in theory to every disease. In the example of our mythical ancestor assessing whether an object is a stick or a snake, nobody would doubt that there is a very real difference between those two things and that, given the proper time and the ability to judge from a safe distance, our ancestor could have figured out which was which. In medicine, we are forced to make high-stakes snap judgments only rarely. Usually, we have plenty of time to consider the implications of tests, yet there is considerable evidence that we nevertheless overdiagnose many diseases. This occurs in part because it is much more difficult to distinguish a disease from a pseudodisease than a snake from a stick.
But that still doesn’t explain the mechanism by which we overdiagnose. To understand that, let’s consider the diagnosis of a blood clot in the lungs—a pulmonary embolism, or PE, as it is commonly called. A pulmonary embolus is, like most cancers, a potentially life-threatening diagnosis. Also like some cancers, PEs are difficult to detect because they present with very nonspecific symptoms that can be easily confused with other conditions. Patients suffering from PEs can have fever, shortness of breath, pain when taking a deep breath, or a rapid heart rate, among other symptoms, but so can pneumonia, a heart attack, a gallbladder infection, lymphoma, and so on. Thus, it’s quite valuable to have a method by which we can distinguish a PE from these other various maladies.
For the first half of the twentieth century, the PE was a well-understood and much-feared diagnosis, but there was no reliable way by which a PE could be diagnosed other than clinical suspicion. In the 1930s, EKG pioneers noticed a peculiar pattern in patients who later died from pulmonary emboli, which were found at autopsy.* One hundred years later, medical students still dutifully memorize this pattern, but it suffers from a problem we will see in example after example in this book: if patients had a PE, they had a reasonable chance of having this pattern on EKG, but it didn’t follow that if patients had this EKG pattern, they were likely to have a PE. Similarly, at about the same time, a radiologist at Massachusetts General Hospital named Aubrey Hampton performed postmortem chest X-rays of patients who were found to have PEs at autopsy, and noticed that several had a hump-like opacity roughly corresponding to the site of the embolus. Unfortunately, “Hampton’s hump,” as it came to be known, was typically observed only in patients with the largest of clots, by which point it was often too late to make a difference. So nobody knew how to find them in time to intervene.
Electrocardiogram, or the graphic illustration of electrical conduction patterns in the heart. Also known as “ECGs,” the “K” is still commonly used in a historical nod to the German physicians who developed the test, reflecting the Germanic spelling of “heart”: kardio.
Then in the 1960s came a nuclear radiographic study known as the ventilation/perfusion, or VQ, scan as seen in the figure on the following page.
VQ scans look for discrepancies between two different sets of images: a ventilation scan that shows where air is flowing in the lungs, and a perfusion scan that shows where blood is flowing in the lungs. (This is done by first inhaling one radioactive tracer for the ventilation scan, and then having the second radioactive tracer injected intravenously for the perfusion scan.) Any part of the scan that shows a mismatch—for instance, a dark gray splotch in one part of the lung on a ventilation scan that appears nearly white on the perfusion scan—indicates functional lung but without blood flow, and therefore suggests a clot. It was a big improvement over the crude measures of the EKG and Hampton’s hump, but as you can see from Figure 1.3, the images were coarse and difficult to interpret. Ultimately, the scan had three levels of interpretation: low, intermediate, and high probability. A large number of scans, however, were read as intermediate probability.*
In an effort to keep the discussion as simple as possible, I’ve omitted ano
ther technology that came of age in the 1970s, and is now called catheter-directed pulmonary angiography. This was the gold standard for PE diagnosis until CT pulmonary angiography displaced it, but it was not as frequently used as the VQ scan because this test carried moderate risks, including kidney damage and bleeding from the catheter injection site in the groin, among other harms. There are still more tests, such as a blood test known as a D-dimer, that are also used in diagnosing PE, but an explanation of each of these tests doesn’t change the underlying principle of PE overdiagnosis described here.
FIGURE 1.3. A VQ scan.
SOURCE: Image courtesy of Drs. Heeseop Chin, Monique Tyminski, and Robert Licho of UMass Memorial Medical Center.
VQ scans were still commonplace when I was a medical student in the late 1990s, but they were displaced in only a matter of years by CAT scans. CT pulmonary angiography, as it is more formally called, provides a detailed picture of the lungs and their blood supply.† Finding a clot with these finely detailed pictures is a good deal simpler than sifting through the grainy images of the VQ scan, thus substantially reducing the number of equivocal reads. The resolution of these scans improved very quickly; the scans we perform today can generate hundreds of images with an astonishing level of precision. Comparing today’s CT scans to ones from the 1990s is a bit like comparing an iPhone with Siri to a cell phone from twenty years ago. It’s practically a different technology altogether, with only the most rudimentary of resemblances.
Similar to the EKG/ECG phenomenon, CAT scans and CT scans are the same thing. The full name is computed axial tomography. Axial images allow viewers to look at patients as if they were looking up at them starting at the feet and going in cross-sections up the body. Over the past several years, the computerized reconstruction of images has become so much more sophisticated that scans no longer provide only axial cuts but also coronal (slicing front-to-back) and sagittal (side view) cuts, and now even three-dimensional images, so simply to call it axial imaging is a misnomer. They retain the informal title of CAT scan because of common usage. Also worth noting, given this discussion focuses on overdiagnosis, the CT is sometimes derisively referred to as “the donut of truth” by physicians who lament the lost art of the history and physical.