Snowball in a Blizzard

Home > Other > Snowball in a Blizzard > Page 10
Snowball in a Blizzard Page 10

by Steven Hatch


  Dr. Feld was subsequently diagnosed with cancer, although one can safely assume that she speaks for tens of thousands of women who had similar experiences. For those women under the age of fifty, the majority would never have experienced such emotions had they never undergone the mammogram in the first place.*

  Dr. Feld’s breast cancer came to her attention when she noticed a lump in her breast rather than by screening mammography. It’s worth emphasizing here yet again that the above discussion about mammography refers to its value as a screen—that is, its ability to find disease in people without any indication of disease. When women such as Dr. Feld notice a lump, they must be evaluated by a physician and, in most cases, referred for mammography. My single biggest fear in writing this chapter is that this message will get twisted in the public discourse in such a way that women will take to heart the message that somehow mammograms should never be performed, which couldn’t be further from the truth.

  But there are more sinister complications than mere dread, and they relate to the uncertainty inherent in the next layer of testing: the biopsy itself. It is difficult to know how many biopsies are read incorrectly and how many unnecessary lumpectomies or mastectomies (along with chemotherapy and radiation), and all of the attendant complications and disfigurement associated with such procedures, take place. One of the more sobering assessments came in 2004, when researchers in Norway and Sweden indicated that as many as one-third of cases of invasive breast cancer were overdiagnosed. It’s worth letting that sink in: they estimated that one out of every three women who were told they had invasive cancer in fact had no cancer at all. Although these researchers were not focused on the question of a mammogram’s accuracy in women under age fifty, their findings are potentially suggestive of the relative harm that can come of a screening tool used in the wrong population for the wrong purposes.

  The balance of benefit and harm was the central issue behind the second recommendation of the US Preventive Services Task Force that caused such a furor: that women aged fifty to seventy-four should no longer receive annual mammograms, but have them every other year instead. The researchers used a variety of advanced statistical models to analyze the effect of how often a woman underwent a mammogram, tying these models into the known growth rate of breast cancer in most women.

  Reading the reports that formed the basis of the guidelines is a difficult exercise even for those with a working knowledge of study design and statistical analysis. There are dozens of mathematical models that are simulated and analyzed, each with a Table and Figure spitting out dozens of numbers. Reading it can induce vertigo. Yet in the midst of all the if-this-then-that scenarios, one message cannot be missed: the number of women whose lives are saved by annual mammography is not appreciably different whether they have an annual or biennial (i.e., every other year) mammogram. What is different is the number of false-positive diagnoses: the amount of women who would end up not suffering the harm of a false positive is reduced by nearly half.

  To give a sense of exactly what kind of numbers I’m talking about, let’s look at one particular calculation to serve as a representative for the various models. According to this calculation, in women aged fifty to seventy-four, about ten lives were saved for every thousand women screened over twenty-five years. (That is, if one thousand women eschewed annual mammography in this age group, about ten more women would succumb to breast cancer than a similar group of women who had annual mammography over this time.) The cost, however, would be in the number of false positives: for the ten saved lives there were 110 unnecessary biopsies, which as I have discussed may in turn lead to further unnecessary interventions—maybe even more than thirty additional false-positive diagnoses of invasive breast cancer, if the Scandinavian data above is to be believed.

  At any rate, if the same thousand women changed their screening strategy to every other year, the number of lives saved would indeed drop, though it would drop only to eight. But for those two “missed” lives, the number of false-positive diagnoses drops to sixty six. That’s a dramatic reduction and represents in its distilled mathematical essence the reason for the every-other-year recommendation. Both the annual and every-other-year strategy entail uncertainty, but in the case of the every-other-year strategy, the uncertainty is associated with a significantly lower level of harm, without a huge corresponding cost in the number of lives saved.

  Absolute and Relative Benefits

  What I’ve tried to do thus far is walk you through the logic of the US Preventive Services Task Force when they issued what appeared to be revolutionary guidelines in 2009. I have not gone into detail about the Task Force’s recommendation to stop mammography of any kind after the age of seventy-four. Suffice it to say that the principles that guided such a recommendation are part and parcel of what is written above, and that the panel was not convinced of any demonstrable benefit for mammograms in this age group, although they uncovered fairly good evidence of potential harms.

  But there is one last matter that is critical to an understanding of the task force guidelines. Because of the tempest surrounding the under-fifty recommendations, to say nothing of the advice to have biennial rather than annual mammograms, this last matter was mostly an afterthought in the squawking that surrounded the report. It was couched in the very cautious language of scientists, and so its importance may have been underestimated. But it is no less important than the debates about when to start mammograms or when to stop them or how often one should have them in between.

  The heart of the matter was this: the Task Force wasn’t convinced there was unequivocal evidence that mammography was the remarkable lifesaving force that its proponents often proclaim it is.

  To help unlock this, let’s use some data from an old study—the original major mammography trial, begun in the 1960s and run by the New York Health Insurance Plan, which I’ll call the HIP study for short. The HIP study is generally not included in modern analyses of mammography trials due to a variety of design flaws. Because of these flaws, most researchers believe that the HIP study overstates the value of mammography by almost double. This is precisely the reason I want to use these numbers, for even in a study that shows what nearly all statisticians today consider an outsized effect, you will see that the absolute benefit is very different from the relative benefit of mammography.

  Recall from the beginning of the chapter where I said that, in principle, measuring the benefit of a mammogram was simply a matter of dividing women into a “mammography group” and a “no mammography group,” and then over a long span of time counting how many women in each group are diagnosed with breast cancer, and comparing at the end of that period how many women died from breast cancer in both groups. Assuming both groups were equal, if more women died from breast cancer in the no mammography group, then mammography can be said to be beneficial, and it simply becomes a matter of describing the effect in mathematical terms.

  After about ten years of the trial, the researchers from the HIP study presented their data at the Conference on Breast Cancer supported jointly by the White House, the National Cancer Institute, and the American Cancer Society in 1977. At the conference their presentation had shown something fairly dramatic: they noted a roughly 33 percent relative reduction in mortality.

  This was, indeed, a number worthy of attention. But what does it mean? Here the actual data are crucial. At about the ten-year mark, the number of women in the trial who had passed away from breast cancer in each group was 91 (mammography) and 128 (no mammography), respectively. Because the total number of cases of breast cancer in each group was roughly equal, the relative difference in percentages of breast cancer mortality was about one-third.* The relative mortality measures only greater or smaller proportions of one group relative to another.

  There were 299 cases of breast cancer in the mammography arm, and 285 in the no mammography arm, for a relative difference in the mortalities of 0.30 (91 divided by 299) and 0.45 (128 divided by 285). As a statistical side note,
these numbers demonstrate that the lower the mortality of a given cancer, the harder it is for screening technology to provide a benefit. Breast cancer, in a relative sense, carries a good prognosis, as these numbers demonstrate. Even down to Stage IIIa cancer, five-year survival rates are well above 50 percent; more than 90 percent of women with breast cancer present at a stage that has greater than 80 percent five-year survival.

  But these numbers represent only the numerators—in other words, it’s telling you the number of breast cancer deaths without telling you the total size of the group studied. The real question is how many women needed to be recruited and followed to find this difference. The answer to that question is more than 60,000 (31,000 in each group). That’s a staggering number of women to follow in order to find what appears in retrospect to be a small number of lives that were saved. This is known as the absolute risk reduction, and it’s incredibly small: about one-tenth of one percent (91/31,000 compared to 128/31,000). The small size of the absolute risk reduction seems especially underwhelming when considering the number of false positives and the potential harm that can follow from such false positives discussed above.

  Depending on whether one focuses on relative risk reduction or absolute risk reduction, one can view the same data and arrive at very different conclusions regarding mammography’s benefits. Through the lens of relative risk reduction, mammography is clearly beneficial; through the lens of absolute risk reduction, one may be more guarded and skeptical of its value.

  This in part explains the abundant confusion of the mammography debate. If one is predisposed to the former view, mammograms sound like a great public health bargain—a 30 percent reduction in mortality. If you take the latter, mammograms don’t seem to be the enormous benefit that they’re billed to be, if they are a benefit at all. Most recent studies quote the relative risk reduction to be somewhere in the range of 15–20 percent. Yet, according to one study, the absolute risk reduction for women undergoing mammograms is 0.05 percent (90.25 versus 90.30 percent after ten years). Even if one were to cite rosier estimates, it’s a very small number.

  Because the total number of women who were either diagnosed with or died from breast cancer is so small, even a minor, unintentional bias could artificially create a benefit where none really exists. For instance, how do we decide that someone has died from breast cancer? This may seem like a ludicrous question, but it’s actually quite difficult. Consider the following problem: a sixty-four-year-old woman is diagnosed with breast cancer and undergoes lumpectomy, local radiation, and adjuvant chemotherapy. The following year she has a heart attack and dies. Should she be categorized as having died from breast cancer? What if she died of a pulmonary embolus, a common complication seen in people with various cancers? What if she died in a car accident, which might have happened because her brain bled from a metastatic tumor or might have happened because she was simply at the wrong intersection at the wrong time?

  These are the kinds of questions that face researchers, and there are no simple, right-versus-wrong answers for such questions. In short, these questions cut right to the core of uncertainty in medical research. We cannot know that every patient who dies within one year of a breast cancer diagnosis died of breast cancer itself. We can be fairly certain in some cases but have no idea in others. If there are large numbers of patients, and fairly strict and agreed-upon criteria to categorize patient outcomes, then our degree of certainty can be a source of comfort. But small numbers—such as roughly 100 deaths out of 30,000 over ten years—suggest much uncertainty.

  Nevertheless, one can attempt to minimize bias in such studies, and some of the bias can be minimized by having researchers who know nothing of the mammogram study decide on the cause of death for all of the deceased women in the trial because these “blinded” researchers won’t have a horse in the race. That is, you don’t want to have the very researchers who might have the strongest hopes for mammography’s success in charge of deciding post hoc who actually died from breast cancer and who didn’t; you want the most objective person possible so that no bias creeps into the results. (The categorization of deaths in the HIP study was done by its organizers, and this fact is routinely cited as one of the important reasons the data is considered unreliable.)

  Particularly when the numbers are so small, where only a few borderline calls of “breast cancer deaths” are included in the control group while a similar few are excluded from the screening group, a very long, painstaking, and expensive study can end up creating a benefit where none exists or erasing one that does. (Another way of eliminating bias is simply by looking at all deaths regardless of cause because a true benefit should still be observable even if we include everyone who died; it’s just a smaller benefit at that point. I’ll take up this topic of evaluating studies by looking at “all-cause mortality” in the next chapter and beyond.)

  How does a scientific body like the USPSTF boil this down into a bite-sized chunk for the interested and educated layperson? The simple answer is that they hedge. In issuing their recommendation for biennial mammograms in women age fifty to seventy-four, the task force thought there was high certainty that there was at least a moderate benefit, or conversely that there was a moderate certainty that there might be a substantial benefit. This is known as a Grade B recommendation. The Grade A recommendation is defined as showing high certainty of a substantial benefit. The task force couldn’t say this with confidence, and so mammography fell to the second tier of beneficial tests.

  In changing its guidelines on mammography in women under the age of fifty, the task force was equally careful in its wording. This recommendation carried a C grade, which meant that “professional judgment” and “patient preferences” could be taken into account and that mammograms might be selectively offered. The Grade C classification also meant that they believed there was a moderate certainty that the net benefits would be small. In short, they were stating that screening mammograms for women under fifty were in the dead center of the spectrum of certainty. But based on the data, they did not feel that it was reasonable to issue a Grade D recommendation, which would have unambiguously advised avoiding the practice. In essence, the task force tried to skirt the controversy about the cultural value of a screening mammogram, providing just enough intellectual wiggle room to sanction the test in patients with, say, strong genetic predispositions or other risk factors, even though there was no evidence one way or the other to indicate this was a lifesaving strategy.

  Before I conclude this chapter, let’s revisit the essay from the NBC News website. “How did the poor scientists of the U.S. Preventive Services Task Force go from being the ‘gold standard’ for deciding what works in medical screening on Monday, to a bunch of irrelevant nerds by Wednesday?” the writer asked with a verbal flourish. He then supplied his own data-minimizing answer: “that’s because data and evidence have not, do not and never will be the sole determinants of health coverage.”

  “Data” can come in the form of a story just as much as it can as a number, and so perhaps we should consider the story of Monica Long of Cheboygan, Michigan. Long’s story was reported by Stephanie Saul in the New York Times in July 2010, not long after the task force’s recommendations were issued. It makes for grim reading. Thorough in its details, exquisite in its reportage of shock, horror, and regret, Saul documents Long’s long saga with the medical system that can be understood as the living, nightmarish embodiment of a false positive. For Monica Long turned out not to have breast cancer, an error that was discovered only after additional review of several slides from the original biopsy. This review occurred only after part of her right breast was excised by a surgeon’s knife and she was referred to a new oncologist for follow-up. “Psychologically, it’s horrible,” she said in the interview. “I should never have had to go through what I did.”

  The Times piece goes on to note that the field of biopsy interpretation, especially in the early stages of breast cancer, is contentious in the extreme.* “As it t
urns out, diagnosing the earliest stage of breast cancer can be surprisingly difficult, prone to both outright error and case-by-case disagreement over whether a cluster of cells is benign or malignant,” the article states. Even in major, prestigious academic medical centers, where pathologists have significant experience reading breast biopsy slides, there are not universally agreed-upon standards as to what truly constitutes cancer and what does not. Whether you are referred for lumpectomy and radiation might be due to as random a factor as whether you live in Boston or San Francisco.

  Specifically, this stage is known as “ductal carcinoma in situ.” Several extra paragraphs could be devoted to the special challenges involving mammogram interpretation when the subsequent biopsy reveals ductal carcinoma in situ. I have opted not to discuss it here because one can get bogged down in technicalities fairly quickly, even though the underlying themes about the precision of a mammogram, and its overall benefit, aren’t dramatically affected.

  Monica Long’s story is unusual in only one key respect: she learned that her original diagnosis was wrong after the fact. In essence, she became retrospectively aware of her status as a false positive, which is rare indeed. Surely the realization that her retrospective nondiagnosis meant she was unlikely to die anytime soon from breast cancer must have been cold comfort in the extreme. But she is only the tip of an iceberg. We know that there are thousands of other women out there, just like her. Do their stories not carry ethical weight?

 

‹ Prev