Snowball in a Blizzard
Page 8
Note however that I’m not talking about certainties in either direction. A positive result from a low positive predictive value test still does sometimes truly indicate that someone has disease, and a positive result in a high positive predictive value test is sometimes wrong. Uncertainty is ever present, but there is power in being able to quantify the uncertainty. Certainly someone receiving a diagnosis of HIV in the days before effective therapy would have been disconsolate by being told the odds the test was positive was 999 in 1,000; the same person who learned the chances of really having HIV were 1 in 9 would likely have breathed more deeply, though perhaps not altogether normally.*
That was true of HIV screening in the late 1980s but has not been true for many years because of additional testing that has eliminated the false-positive problem for that disease. So, please, get your HIV screening test done!
In the next chapter, we’ll keep this concept of positive predictive value in mind, as I give numbers to the predictive value of screening mammograms, looking at the diagnosis of breast cancer in women who learn such a diagnosis in much the same way that Leonard Mlodinow learned his test meant that he was infected with HIV. I’ll quantify the levels of uncertainty surrounding screening mammograms, and in doing so illuminate one of the most impassioned topics in public health today.
3
SNOWBALL IN A BLIZZARD
Two enlargements required for [education] are: first, to feel sympathy even when the sufferer is not an object of special affection; secondly, to feel it when the suffering is merely known to be occurring, not sensibly present. The second of these enlargements depends mainly upon intelligence. It may only go so far as sympathy with suffering which is portrayed vividly and touchingly, as in a good novel; it may, on the other hand, go so far as to enable a man to be moved emotionally by statistics. This capacity for abstract sympathy is as rare as it is important.
—BERTRAND RUSSELL, “THE AIMS OF EDUCATION,” 1926
When uncertainty in medicine is ignored, one consequence can be confusion in the public dialogue, where misunderstandings arise because different parties are operating with different sets of assumptions about the utility of some technology. We saw how our sometimes misplaced faith in any number of advanced diagnostic tests can lead to the phenomenon of overdiagnosis. This occurs precisely because doctors have failed to account for uncertainty, thinking they were at the far left of the spectrum of uncertainty rather than somewhere near the middle.
Here, I consider a special case of overdiagnosis. Screening mammography—that is, the practice of using mammogram technology to look for breast cancer in otherwise healthy women—has been regarded by many as one of the cornerstones of women’s health. Nevertheless, there has been a slow and inexorable reconsideration of its value. But the public discourse about it has been so fraught with heated accusations and charges that it becomes difficult to sort out the data that led to this reconsideration. This chapter will try to strip away the rhetoric and see the uncertainty in the numbers.
Before I begin, I want to underline that what follows is a discussion about screening mammography. Part of my goal in writing this book is to encourage readers to view national health guidelines with a healthy skepticism, and to use uncertainty as a tool to know whether they really should take a medication or sign up for an annual test or stop smoking and so on. In discussing screening mammograms, I hope to show that its positive predictive value is not nearly as high as most people assume, just like Leonard Mlodinow’s HIV “positive” test that we encountered in the last chapter. But I cannot emphasize enough that this does not apply to women who feel new lumps. The presence of a new lump in one’s breast dramatically changes one’s pretest probability of having breast cancer, and consequently a mammogram for such a woman is an unambiguously useful and lifesaving tool. If you are a woman reading this and discover a previously unknown lump in your breast, please see your doctor immediately for further evaluation, which may include a mammogram.
The Public Health Earthquake
In November 2009 a relatively unknown branch of the US federal government made a very big splash by compiling, in the words of one of its critics, “a pile of bland data” and making some conclusions based on what that little pile of data showed. The group, the US Preventive Services Task Force, or USPSTF, had just revised its recommendations about when and how often women should obtain screening mammograms. Based on this data, they believed that screening mammography was no longer an unambiguous lifesaver in particular populations of women. Moreover, its report indicated something even more shocking, which was that screening mammograms, when used in the wrong context, might in fact represent a significant danger to women.
The language was in the characteristically cautious style of medical science, with several built-in linguistic hedges about the strength of the evidence. The bottom line, however, was unmistakable: it was time to reconsider the practice of recommending annual mammograms for all women after age forty.
A political firestorm ensued. Outraged women called up congressfolk and other government officials and complained about how their lives were being devalued. The New York Times ran a story within twenty-four hours of the release of the new guidelines detailing the backlash. “My big fear is that coverage will be diminished and that a very valuable tool to detect something at an early stage could be taken away from me,” said Karen Young-Levi of the organization breastcancer.org. In stronger wording, its founder, Dr. Marisa Weiss, said in the same article that the new recommendations were “a giant step backwards and a terrible mistake.” The American College of Radiology suggested that the Obama administration had released the new recommendations in a push to save health-care dollars as part of its support for the health-care reform legislation working its way through Washington at the time. The GOP put up a post on its website describing the new guidelines as “bogus scientific analysis.”
For its part, the White House ran away from the recommendations with a don’t-look-at-me shrug and pointed the finger at the previous tenant. Kathleen Sibelius, the secretary for the Department of Health and Human Services, distanced the administration from the new guidelines by noting, in an interview with CNN, that the panel had been appointed under the tenure of President Bush. Either sensing an opportunity or feeling the pressure, the US Senate rushed to include an amendment to the health-care bill to cover the mammograms its own USPSTF no longer recommended.
The members of the task force seemed to be caught flat footed. The vice chairwoman of the committee, Dr. Diana Petitti, said she was taken aback by the reaction, noting that she had been relatively unaware of the intensity of the controversy that surrounded mammography screening. She was surprised that the task force’s report would be met with such a visceral rejection by public health and women’s health advocates. “I have been made aware of it now,” she added ruefully.
The essence of the controversy lay in the task force’s insinuation that screening mammograms really didn’t save that many lives, even under best-case scenarios, or at least weren’t especially likely to do so. This came as a splash of cold water to the public health community, for many practitioners had been reared to think of screening mammograms as one of the most important lifesaving technologies of modern medicine. Consider my own training: I had gone to medical school and residency and had memorized the previous iteration of the guidelines. I knew exactly what every woman should do based on her age and was pleased with myself when I displayed this knowledge to the senior physicians.
Yet, despite my rote memorization of these recommendations, like nearly everyone else I had no idea that the benefits of mammography were modest, and moreover came with some real risks. We’ll look at the data to support those contentions shortly. Not many people, whether doctors or laypeople, were keenly aware of this. The controversy was caused by this disconnect; the experts knew one thing, and the general public, something else entirely.
The vehement reaction to the 2009 USPSTF guidelines raises two separate quest
ions: First, how did this disconnect come to be? Second, why there was such fierce resistance to a change in the guidelines for a medical technology whose overall utility was somewhat good in some age groups, marginal in others, and quite likely harmful in others still?
There are many answers. Control was a theme flowing beneath the surface: women—entirely appropriately—wanted to be in control of their bodies and not subject to the whims and misconceptions of what until recently was a nearly exclusively male profession. After generations of women dealt with medical management largely at the hands of these men, in ways ranging from mildly patronizing to actively hostile to what can only be thought of as physical assault and battery, the screening mammogram had become one among many symbols of women taking control of their destinies. Though I am assured that it is not among the more pleasant of procedures to endure, given the positive vibe with which the mammogram had become associated, it was understandable that there was such an emotional response. It felt like doctors were trying to take back that control. But, as I will show, that feeling can interfere with a cool appraisal of the data; the message that got lost in the translation was that the task force members were, in fact, actively trying to avoid repeating the mistakes of previous generations.
Another entirely understandable narrative involved the faceless committee that issued the recommendations. If one hears of a scientific committee sitting in some conference room somewhere, it is hard not to conjure up an image of mostly middle-aged men, which reinforced the notion of male control over female bodies. The faceless committee was perceived as having no understanding of the social implications of the data and, because its members were not women (the unstated assumption), they had no notion of the impact of mammography on preserving women’s lives.
This particular narrative was demonstrably untrue. In the popular press, because there was little actual investigation of the data that led to the revised guidelines, most of the coverage was devoted to the heated accusations being levied at the committee. But there was other data beyond the raw numbers and advanced statistics used to evaluate mammography that could have been publicized. Consider the following list:
Diana
Kimberly
Rosanne
Lucy
Bernadette
Virginia
Judith
This series of names is data in the true meaning of the Latin word datum, roughly meaning “information bit.” Although it may not seem so at first blush, this list is not really different from the kind of data generated by multimillion-dollar biomedical research projects. What defines data is that it doesn’t simply speak for itself: one can generate data in all manner of ways, but whether it comes in the form of a 10,000-cell Excel file or a New York Times headline, it must be contextualized and analyzed to be understood.
The context of these data points is that they represent the first names of the seven women who served on the US Preventive Services Task Force on mammography in 2009. In total, they comprised nearly half of the committee. Physicians, epidemiologists, and biostatisticians fill their ranks. In their professional lives they are referred to as “Doctor” because that is what they are, and highly accomplished ones at that. They have mothers, they have sisters, they have daughters, they have lovers. And they have themselves to see in the mirror on a daily basis. Surely, in addition to their professional pursuits, they know something of womanhood, and the devastating impact that a disease like breast cancer can have on a woman’s sense of self.
Yet in the bewildered public reaction following the report, the fact that those exact names were intimately involved in drafting the recommendations was almost entirely overlooked. Because it didn’t fit into the narrative of the faceless male committee, the inherent assumption behind the more vociferous criticisms of the panel was that this was yet another chapter in medical misogyny, a further organized attempt on the part of mostly male physicians to codify indifference to women’s suffering through the power of guidelines.
But it wasn’t so. In part, the recommendations were based on the kind of mathematics that can generate seeming paradoxes and counterintuitive conclusions, concepts that are far too difficult to encapsulate in a headline. So, newspeople around the country—of whom only a precious few had any acquaintance with basic statistics, even among reporters whose job it is to cover science and medicine—were faced with a simple question: What will be the headline? The answer seemed obvious, and every major outlet ran with a variation on the theme of “Task Force Recommends Fewer Mammograms, and None Under Fifty.” It seemed like reasonable journalism because it was entirely factual. Whether such stories helped to provide a deeper understanding of what was going on is a different matter entirely.
Any woman unfamiliar with something as obscure as an advisory panel with the tongue-twisting name of the United States Preventive Services Task Force, and no knowledge of terms like “false-positive test” or even a mental framework to make sense of the term, could only read that headline with the deeper narrative of medicine’s war on women in the back of her head. Doctors want less for you, is the easily grasped message. Had the headline been written along the lines of “Task Force Finds Mammograms Associated with Harms to Women,” the entire reception might have been substantially different.
Perhaps the panel members might have been more media savvy about announcing their new recs, and perhaps they might have more shrewdly strategized how to get this information out. Ironically, that very naïveté came about as a consequence of the USPSTF’s reason for existence: the group was deliberately designed to insulate scientists and physicians from the politics of screening recommendations. The original task force was created during the Reagan administration, when many of the screening methods for breast cancer, heart disease, and diabetes were just becoming widespread, and also when the costs of many of these tests weren’t being covered by insurers. It was consciously constructed to have an arm’s length relationship with its sponsor, the Department of Health and Human Services. Moreover, the task force was explicitly prohibited from considering the cost of tests when it issued guidelines, which made the charges that this was motivated by penny pinching as part of a desire to pass “Obamacare” seem particularly amusing.
While the task force could be chided for its lack of preparation for the resultant media tempest that its recommendations produced, the notion that the revised guidelines are part of science’s ongoing war against women is not merely shortsighted, it almost certainly ends up harming women. Yet it’s this very concept that seems to have been uncritically bandied about by people in positions of authority in both government and academia following the release of the 2009 guidelines—and hasn’t much abated, though nearly seven years have passed. That several people who should have known better were yawping through the airwaves, scurrying about to champion a practice under all circumstances for all women, even when the evidence of its utility among some women was vanishingly thin, speaks to a general lack of appreciation of the subtleties involved in evaluating diagnostic technologies that form the basis of national health panel recommendations and the reason these panels recommend what they do.
A good example of the rush to judgment can be found in the online archives of NBC News. Immediately following the announcement of the recommendations, one commentator took to the electronic soapbox to explain the tempest as the cluelessness of a bunch of stats geeks. In making his case, the author performed a rather remarkable intellectual pirouette, choosing not to deny the scientific validity of the task force’s conclusions, but arguing that they had failed to appreciate the cultural significance of mammograms. Since women feel that mammography is helpful, he appeared to reason, we should plow ahead, even if their value is found to be marginal:
Screening is what responsible and health-conscious women do to take control of their bodies and prevent disease. Those are commendable and powerful virtues, and—it seems—more compelling than a pile of bland data.
Doing the right thing and taking the ti
me to protect yourself against breast cancer has moral weight that policy makers, as Secretary Sebelius found out, ignore at their peril.
There is no reason to doubt the accuracy of the scientists’ finding that evidence does not support routine mammography for most women under 50. But there is every reason to doubt that the numbers they compiled will be sufficient to overturn a medical practice that carries so much ethical weight for women. (my emphasis)
In sum, according to this commentator, the problem with the recommendations was that they took no account of what screening mammograms meant to women and offered up only cold, lifeless statistics. The argument wasn’t that some smart number-cruncher got the equation wrong, but that they weren’t “doing the right thing.” The right thing was self-evident, for it was a “powerful virtue” that had “moral weight.” So of course the task force missed the mark. It was only looking at the spreadsheet and not thinking about the real lives the data represented.
Never mind, for the moment, the significant number of women serving on the task force, who might have found this (male) writer’s lecture about “taking control of their bodies” faintly patronizing. Condescending or not, his commentary is a perfect illustration of the intellectual thickets in which one can get caught when casually disregarding carefully assembled data as being merely “bland.” Nearly one hundred years ago the philosopher Bertrand Russell, whose quote began this chapter, had understood that numbers can tell stories equally compelling—and indeed, as tragic—as those portrayed in a good novel. And the numbers reviewed by the task force do tell stories. They are not quite simple and straightforward stories, but neither are they so complicated that they can’t be understood in broad outlines. I would also argue that the task force members were perfectly aware of the stories that the data told, in part because they had thought about the uncertainty inherent in the technology, and what that could mean for a woman.