Ending Medical Reversal
Page 6
In the first three chapters, we have explored major categories of medical reversal. For practices meant to make us live longer, we have seen reversal happen when evidence supporting a practice was weak or flawed. This included times when the evidence relied on surrogate end points. For practices meant to make us feel better, we have seen how powerful the placebo effect is and noted reversals when treatments were later tested using appropriate controls, such as sham procedures. Screening recommendations have also been reversed. In many ways, reversals that involve screening are the worst kind. Unlike medical therapies, screening tests are performed on healthy people. An ineffective screening test will affect not only a few people with a disease who are seeking treatment, but an enormous number of healthy people who just want to stay that way. An ineffective screening test can turn millions of healthy people into patients.
Screening for a disease just seems to make sense. We suggest screening tests because they offer the promise of prevention. We have been brought up on the proverb “An ounce of prevention is worth a pound of cure.” It is hard to accept that such a simple and sensible mantra might not apply. When it becomes clear that it does not, screening tests become the subject of real controversy. The controversy is magnified because screening guidelines do tend to change. Even the most evidence-based screening guidelines, often made by the U.S. Preventive Services Task Force (USPSTF), will almost certainly be modified in the years to come for legitimate reasons as new data become available. Understanding screening—what it is, why we do it, how we know when it is beneficial, and the controversies that surround it—requires understanding three related topics. The first is the most basic: the goals of screening and the evidence that supports some of our most commonly done tests. The second is overdiagnosis, a counterintuitive phenomenon that often undermines our best efforts. The third is how screening tests perform in the real world, outside of the studies meant to support or refute their utility.
GOALS AND EVIDENCE FOR SCREENING
What is screening, and why do we do it? Screening for a disease means going out and trying to find a disease early, before it has caused symptoms. We screen large numbers of healthy patients to try to find a disease. The logic is that if we find a disease early, detect a cancer when it is small, we can cure it. We screen for diseases only because we hope that if we find and treat them early, people will live longer than if we waited until the disease became symptomatic to treat it.
The history of screening, specifically cancer screening, goes back more than 100 years. In the early 20th century, when surgery was the only way to treat cancers, surgeons realized that patients diagnosed with large tumors usually died, while those with smaller ones sometimes lived.* The earliest cancer-education campaigns, therefore, sought to teach people how to recognize cancer at its earliest stage. This was the beginning of cancer screening. As time went on, evidence suggested that we could do even better. Though diagnosing a small tumor is good, diagnosing a cancer before it has become a cancer is even better. The hunt for “precancerous” or “premalignant” lesions is the basis of Pap smears and modern colon cancer screening. In our current age, we have gone from screening for the anatomic (visible lesions) to the microscopic (precancerous cells) to the molecular (abnormal genes). Today there are debates about whether we should treat people who have abnormal genes—who are at risk for cancer—with procedures like prophylactic mastectomy.
We now have recommendations to screen for diseases as diverse as diabetes, hepatitis C, coronary-artery disease, and HIV. Cancer screening, however, is what is advised most broadly in practice and is what people talk about most. For that reason, we focus on it. The principles we discuss apply to all screening interventions. Cancer-screening tests, such as mammography (breast cancer), the PSA blood test (prostate cancer), colonoscopy (colon cancer), CT-scan screening (lung cancer), and Pap smears (cervical cancer) are designed to catch cancer when it is treatable. Colon and cervical cancer screening aim to detect premalignant lesions, while breast, prostate, and lung cancer screening really try to detect small, curable cancers. Doctors believe that, left alone, these precancerous lesions or early cancers would grow larger and spread, eventually killing you—taking many years of the life you otherwise would live. An effective cancer-screening test should thus accomplish three goals:
1. It should find cancers early.
2. It should lower the rate of dying from the cancer it is meant to find.
3. It should improve overall survival (decrease the rate of dying from anything).
A screening intervention cannot accomplish goal 2 unless it can accomplish goal 1. It also cannot accomplish goal 3 unless it accomplishes goal 2. Goal 3 is what really matters. People are screened for cancer—have mammograms and colonoscopies—only to live longer. Goals 1 and 2 are only means to an end, steps toward an ultimate goal. Imagine if a test effectively finds a cancer, decreases the rates of death from that cancer, but doubles or triples the rate of heart attacks. If, on the whole, people die sooner, it does not matter how effective that test is as a cancer screen.
How do we test screening tests? Ideally, screening tests would be tested with randomized controlled trials. We have mentioned these trials already, and we discuss them in quite a bit of detail in chapter 8. In a trial of a screening test, tens of thousands of people would be divided into two groups. One group would get the screening test and the other would be simply monitored. We would then follow the groups to see if the screened group lived longer than the group who was not screened—goal 3 from the list above. At this time, when it comes to screening tests, we usually do not have that kind of data. For some tests the trials just have not been done; for others, the results of studies do not show us what we expect. Obviously, practicing without this sort of evidence base means that we may be recommending things that do not actually work. We leave ourselves open to the possibility of future reversal.
Any screening test that you have heard about accomplishes goal 1: it finds cancers early. Colonoscopies, PSA testing, mammograms, Pap smears, and now CT scans for lung cancer find cancers before they become detectable by the patient or that patient’s doctor. Surprisingly, the evidence showing that common screening tests accomplish goal 3 (and even goal 2) is pretty weak. To begin with prostate cancer, some randomized trials (but not others) have shown that PSA testing reduces dying from prostate cancer (goal 2) but have not shown that it makes people live longer (goal 3). This was partially behind the USPSTF’s decision to stop recommending PSA testing.
Randomized trials of mammography have shown that this screening test reduces the chance of dying from breast cancer (goal 2). These findings, however, are quite variable, with some trials reporting large reductions and others reporting only small ones. Mammograms probably do not decrease the risk of dying of breast cancer for women in their forties. Whole books have been written about the mammogram question, but suffice it to say that mammography has not been shown to improve overall survival (goal 3) at any age.* As it stands, the USPSTF only recommends mammograms every other year for women between ages 50 and 74.
Pap smears have for years been the standard to which screening tests are compared. This test has likely saved innumerable lives around the world. Because this test came into use before the advent of evidence-based medicine, it has not been rigorously tested in randomized trials. One trial we do have studied a one-time Pap smear in rural India. Although this trial did not show a benefit in terms of death rates related to cervical cancer (goal 2), a one-time Pap smear is not what doctors do in the United States, so it is hard to apply this study to the developed world. Even without good data, the idea that Pap smears save lives is widely held, but there is a lot of debate regarding how often they need to be done—once every three years, every five years?
If you are over 50, your doctor has almost certainly recommended a colonoscopy for colon cancer screening. Colonoscopies are currently being tested in randomized trials, and the results should be back in the early 2020s. Two other ways to scree
n for colorectal cancer have been tested, and the results from these trials have been extrapolated to colonoscopy. Sigmoidoscopy and fecal occult blood testing (a way to look for microscopic drops of blood in the stool) have both been shown to reduce the risk of dying of colon cancer (goal 2) but not the risk of death overall (goal 3). The USPSTF presently recommends colon cancer screening for people from age 50 to 75.
The new kid on the block is CT-scan screening for lung cancer. This test is a little different from the others we have discussed, because instead of being offered to everybody over a certain age, it is recommended only for a group at high risk, specifically people between the ages of 55 and 80 with a heavy-smoking history. This test has been shown to reduce overall death rates as well as death rates from lung cancer (goals 3 and 2, respectively). This is the only screening test to date that makes you live longer. Even here there are some caveats. Most of the abnormalities that are found with this test, and thus demand evaluation, turn out to not be cancers. (In one study, 96 percent of abnormal findings were false alarms.) Also there is the intriguing finding that the total number of lives saved by lung cancer screening seems to be greater than the lives saved from treating lung cancer. In other words, the improvement in goal 3 was bigger than the improvement in goal 2. How a screening test for lung cancer saves lives beyond its effect on lung cancer needs to be understood. This is an odd result.
Putting aside CT-scan screening for lung cancer, which applies only to a relatively small group of people, why have no other trials of screening tests shown improvement in all causes of mortality? There are two possibilities. The first is that these trials are not powerful enough to see a difference. For instance, in studies of colorectal cancer with 30 years of follow-up, for every 10,000 people, 192 people die of colon cancer in the unscreened arm, while 128 die in the screened arm. This is a statistically significant difference that shows the beneficial effect of screening. However, when you look at overall mortality, not just mortality related to colon cancer, 7,109 out of 10,000 die in the unscreened group versus 7,111 out of 10,000 in the screened group. This is not a significant difference. Seeing a statistical difference in the overall mortality is harder and may require many more participants in the trial. This explanation is what most experts believe is happening. It is a reasonable argument and certainly one potential explanation.
The other possibility is that the gains in preventing deaths from cancer are offset by deaths from other causes. Maybe screening for prostate cancer decreases prostate cancer deaths but increases deaths from heart attacks. To understand how this might happen, we must turn to the second concept of cancer screening: overdiagnosis.
OVERDIAGNOSIS
Most experts agree that all cancer screening leads to some amount of overdiagnosis. Overdiagnosis occurs when some of the cancers that are found through screening are insignificant. They are cancers that have no potential to make people sick or die. If a person had not had a screening test, these cancers would have gone undetected and he would have been no worse off. Over time he would have felt nothing and would have died of something else. To Americans, who seem to have a fear of cancer and the need to fight it at all costs written into our DNA, this might seem shocking. That said, there is no doubt that overdiagnosis is a reality.
A simple case might be the best way to illustrate the concept of overdiagnosis. Consider a person who is screened for prostate cancer. He is terribly unlucky: the screening test finds a cancer; he is treated for it; and then he dies of a heart attack the following year. Had this man not been screened for prostate cancer, he still would have died of a heart attack a year later and that cancer would never have caused problems.* Sometimes the cancers we find with screening are so small and grow so slowly that 1 year becomes 20 or 30 years, and dying from a heart attack becomes dying from heart attack, or pneumonia, or a car accident, or even a different cancer altogether. In other words, if you find really slow-growing cancers with a screening test, there is a good chance they are not destined to affect a patient’s life expectancy. For all people with these cancers, the treatment for the cancer provides no benefit. They would have lived just as long, dying on the same day they otherwise would have, of something else entirely.
Can we tell which cancers are the overdiagnosed ones and which are the deadly ones? We hope that someday we will be able to, but currently there is no way to distinguish one from the other. Because of this, all screening programs find a mix of deadly cancers and “sit and do nothing” cancers. The ratio of these cancers is of critical importance. In prostate cancer, the ratio tends toward the “sit and do nothing” cancers. When we screen for prostate cancer, we end up treating around 40 cancers for every cancer that will kill.† For mammography, the ratio between dangerous and harmless cancers is contentious. The best studies suggest that if a mammogram finds breast cancer, and it is treated, there is a 13 percent chance that the mammogram will have saved a life. This means that for every one dangerous cancer diagnosed and treated there are eight harmless ones. Most of the cancers we find, when screening for breast and prostate cancer, would have gone unnoticed if not for screening.
Why is overdiagnosis problematic? Overdiagnosis is expensive and potentially dangerous. Presently, we spend almost $8 billion a year on mammography screening in the United States. A large part of that goes to finding cancers that we did not need to find. Cancer treatment is not benign. It often involves radiation, surgery, and hormonal therapy. If screening is finding many unimportant cancers, people are being exposed to this treatment unnecessarily. The side effects of these treatments worsen their quality of life. Thirty to forty men with prostate cancer experience side effects of treatment in order to lengthen one life. Some side effects shorten lives. If the side effects reduce life spans (by just a little bit), we might break even— saving lives from the cancer while taking a life with the treatment.
You may think this is all a theoretical and unsubstantiated worry. Unfortunately, there is evidence that screening for prostate cancer does increase the rate of death from non–prostate cancer causes, specifically from cardiovascular death and suicide. This sort of reasoning, balancing quality and quantity of life, is part of why the USPSTF recently changed its recommendation for prostate cancer screening, now recommending that men do not get screened. For mammography screening, many experts think the testing is favorable if you are 50 to 69 years old but probably not if you are in your forties. For the same reason, the USPSTF advised against routine screening in the younger age group. This was a controversial decision in 2009, but it has been supported by recent data.
POPULATION DATA
The final important topic to understand about screening is population data. This means moving away from the experimental trials and looking at data from the real world. For our discussion, we will focus on breast cancer and prostate cancer because these been evaluated most rigorously.
Population data have undermined our faith in the effectiveness of our screening tests for these two cancers. There are two numbers that really tell the story: the incidence of early cancers and the incidence of advanced cancers. Incidence is the number of new cancers that are diagnosed in a given year—how many people were told, for the first time, they have cancer. Early cancers are those that are contained within the breast or prostate. Advanced cancers are those that are found after they have spread from the organ of origin. These are metastatic cancers and are generally very bad. If you find advanced cancer, the thinking goes, you found this cancer too late. If screening works, we should find more early, localized cancers and fewer advanced cancers.
Because the population is always growing, incidence is always reported as a number out of 100,000 people. That way if the size of the United States grows from 200 million to 300 million, the incidence of prostate cancer over time can be compared, for instance 171 per 100,000 in 1990, and 145 per 100,000 in 2010.
Whenever you debut a screening test, you expect the following. Immediately after the test begins to be used, the incidence of e
arly cancer should rise, as you detect cases of cancer that otherwise would have gone unnoticed. This is the definition of screening. Next, after several years, as screening gets adopted throughout the population, the incidence of early cancer should peak and then drift back to nearly the baseline incidence—the incidence prior to the introduction of the test. Figure 4.1 shows this graphically. We say that we end at nearly the baseline incidence because, over time, we are still catching the same number of total cancers as we did before, but some of the cancers we are catching early rather than late, when they are advanced cancers.
For advanced cancer, you expect the following. Immediately after a test is debuted, the incidence of advanced cancer goes up. This is because when you begin screening, you catch some cancers that have already spread. Then, over the next 10 or 20 years, the beneficial effect of screening sets in. In a few decades, the incidence of advanced cancer falls because you are now finding cancers early, curing these patients so they no longer end up with a new cancer that has already spread. Some 20 years after a screening test is put into practice, the patients missing from the advanced-cancer totals should be in the early-cancer figure. (This is the “nearly” from the paragraph above.) Figure 4.1 also shows this.