Snowball in a Blizzard
Page 18
Statins and SSRIs are arguably the key examples of modern medicine’s success. They stop heart disease in its tracks and lift our spirits skyward. As we consume them, we consume health itself.
But is that portrayal accurate? Let’s return to the question first posed at the beginning of this book: How do we know that medicines work? What evidence allows us to regard these medications as the blockbusters they have become? Where do we encounter such drugs on the spectrum of certainty—toward the far left end, where we can be very confident of net benefits at the cost of very low risks, or is it more toward the middle, where they work only a little and carry the possibility of some harm?
Heart
In the case of statin drugs like Mevacor, finding the answer relied on an approach that James Lind would have been able to grasp in general outline, although the structure of the research was considerably more sophisticated than what he had used in investigating treatments for scurvy. Statins were developed as treatments for coronary artery disease—the gradual occlusion of the arteries that supply blood to the heart itself to keep it healthy and pumping. Because many people die from heart disease, evaluating these drugs for effectiveness involved splitting patients into a treatment (statin) versus no-treatment (i.e., placebo) arm, following them over time, and counting the number of events caused by heart disease.
By “events,” I’m being deliberately vague here. You can measure events in a number of ways, and dozens of trials of statin drugs done over the past twenty years have measured them in these different ways: You can measure events by looking at heart attacks, or you can look at events that aren’t classically thought of as heart attacks but indicate a heart attack may be imminent (for instance, tabulating the number of admissions to the hospital for chest pain). You could, in fact, just look at the total number of hospital admissions and compare them, hypothesizing that the benefits of statins extend even into “events” that in theory have nothing to do with cardiac chest pain. More on that in a moment. At any rate, the strategy is take two groups, give placebos to one and active drugs to another, define some event worth measuring, and wait. And count.
These methods introduce uncertainty into their numbers, however, because sometimes events can be missed by the researchers tabulating the data. Some heart attacks are so obvious even medical students can correctly diagnose them with the proper testing, but others are subject to interpretation, where even seasoned cardiologists have differences of opinion. What happens if an eighty-year-old with underlying heart disease develops pneumonia, causing the heart to beat quickly from the physiologic stress of the infection, which in turn leads to a small amount of heart muscle not being able to receive adequate oxygen through the blood? We call this process “demand ischemia”; sometimes it actually leads to a full-fledged heart attack, but often it doesn’t. The specific blood tests used to diagnose heart attacks are frequently positive in these situations, and thus the criteria used to distinguish coronary events from noncoronary events can be somewhat arbitrary. Therefore, some of these measurements are capable of introducing bias into the process of categorization, and the fuzzier the definition of some event, the greater the uncertainty that the results have any real meaning.
There is one category of measurement that ostensibly is immune to this kind of misinterpretation, however: mortality. If the process involves, to put it somewhat indelicately, counting the bodies from the treatment group at the end of some defined period, and comparing that figure to the number of bodies in the placebo group, you can observe differences and apply statistical models to see how likely any observed differences are due to mere chance, or might be legitimately attributable to the effects of the drug.
Even here, however, investigators have had some false starts: Do you count mortality from heart attacks alone, or do you simply look at everyone who has died in both groups? If you opt for the former strategy, you run the same risk of miscategorization as you do by tallying “cardiac events.” For instance, if statins really are tremendously beneficial at reducing risk of heart disease, then in theory they should also be beneficial even when patients develop noncardiac problems, because patients taking statins have healthier hearts and can cope with other stresses, such as pneumonia or a gastrointestinal bleed or any number of other conditions where a weaker heart can be the difference between life and death. This is generally why opting for the latter strategy, or looking at what is called “all-cause mortality,” has become the standard in studies on “lifesaving” drugs. (Compare this to the trials looking at the relationship between blood pressure and deaths due to stroke—often the reduction in all-cause mortality was lower or in some cases wasn’t even evaluated, and the lack of concordance is probably due, at least in part, to the fact that lowering blood pressure too far led people to be saved from strokes but to die of something else like a broken hip and its complications.)
The first of these “count ’em up” statin trials, which looked at the same kinds of endpoints as James Lind did in his scurvy research, didn’t actually get published until seven years after Mevacor was approved. Somewhat surprisingly, the FDA had approved Mevacor not because it directly proved it could save lives. There was good evidence based on other trials with other drugs that lowering cholesterol saved lives (particularly the so-called bad cholesterol known as LDL), so the makers of Mevacor argued that they didn’t have to demonstrate that it saved lives, but simply that it could reduce cholesterol levels. This Mevacor did, and did so fairly dramatically, with fewer side effects than the standard drug being used at the time, cholestyramine. James Lind, though, would have found this approach baffling; it would have been like measuring the heart rate of his sailors instead of counting who lived and who died. The FDA approvals of Mevacor and other statins were all based on indirect reasoning. In the next chapter, I’ll highlight a case where that kind of indirect reasoning seemed ironclad but ultimately didn’t work out very well.
Happily, the early statin trials of the mid-1990s that looked at patient outcomes rather than cholesterol levels were indeed successful. Moreover, those first papers were the beginning of an impressive streak: essentially every large-scale trial that has been performed since, more than two dozen in all, has shown that there were fewer events among people taking statins compared to those taking placebos. Initially, “events” was defined narrowly—usually a heart attack—but subsequent studies expanded the measurement to include the “all-cause mortality” parameter mentioned above. It didn’t really matter what way the investigators measured the effects of statins. They were lifesavers. They are lifesavers. The data at this point is overwhelming.
But how big a lifesaver is a statin? Compared to vitamin C in the setting of scurvy, the answer is, not much. But vitamin C is, of course, essential to survival. Essential as in you cannot live without it. You need to enroll only one patient, deprive them of vitamin C, produce scurvy, and then give the patient orange juice and watch the healing to make your point. Statins, by contrast, are nowhere near that successful, although that shouldn’t be much of a surprise, so some recalibration is in order.
One of the earliest outcome trials of statins, known as the 4S Study, was published in the British journal the Lancet in 1994. Looking at patients who were already known to have coronary artery disease, they enrolled 4,444 subjects—proving that doctors can occasionally, if only accidentally, demonstrate a flair for the dramatic—and followed them over a span of five years. At the end of the trial, about 250 patients of the 2,200 in the placebo arm had died, compared to only about 180 patients in the statin arm. Thus, based on this trial, statins could reasonably be estimated to save, or at least significantly prolong, the lives of about 70 people out of every 2,200 in a five-year span. That’s pretty impressive, for few other medications can match that, especially those used to treat chronic conditions like coronary artery disease.
One might be tempted to just wrap up the statin story on this feel-good note, but there is more to the story, and this part of the story is where
uncertainty takes center stage. For it is worth keeping in mind that although that number is big, there are a lot of people in this trial who did fine anyway regardless of whether they got the medication or not (and, of course, a lot of people still died as well). Therefore, a different way to measure the lifesaving impact of the drug is to consider how many people need to be taking it in order to save a life. In this trial, a rough estimate is that thirty people need to be on a statin in order for one person to be saved. This “number needed to treat” statistic can be thought of as a direct measurement of the certainty of benefit, for if one needs to treat, say, two hundred people in order to save one life, patients and their doctors might rightly assume that there’s a great deal of uncertainty in terms of the benefit a given drug may have for them alone.
Earlier, I noted that the justification for statins was based on the indirect reasoning that high cholesterol was associated with an increased risk of death, especially death from heart disease, so lowering cholesterol should bring about better outcomes. In the 1980s, there were many studies that demonstrated this effect, an effect seen in the figure on the following page.
What you can see is that there isn’t a completely linear relationship between cholesterol levels and death: the death rate of someone with a total cholesterol of 240 is roughly double that of someone with 200, but those with 200 have only a modestly higher risk of death than someone whose cholesterol is 160.
FIGURE 6.1. From the Report of the National Cholesterol Education Program Expert Panel on Detection, Evaluation, and Treatment of High Blood Cholesterol in Adults, 1988. Referenced in Witzum, J. L., “Current Approaches to Drug Therapy for the Hypercholesterolemic Patient,” Circulation 1989; 80:1101–1114.
So if the “cholesterol theory” is correct—and there are decades of research suggesting that this is, at the very least, partially true—and lowering cholesterol through medications does prolong lives, then the biggest bang for the buck should be seen in people with either very high cholesterol levels or existing heart disease or both. The 4S trial, and many others like it, owe their great success in part to the fact that the right kind of person who was most likely to benefit from statins was enrolled in the studies.* But as time has gone on and statins have enjoyed ever-greater success, both in the commercial and scientific sense of the word, there has been a steady expansion of the “statin market”—that is, the kind of people for whom statins are recommended. With this expansion, the dramatic beneficial effects of statins become smaller, the number of patients that need to be treated for a life to be saved rises steadily, and one can be less certain that taking such drugs will be of unquestionable benefit.
The average total cholesterol for the 4S cohort was about 260, with an average LDL of just under 190. These are extremely high numbers. Additionally, to be eligible for the 4S trial, one had to have a history of heart disease, so these subjects were, in some sense, cherry-picked to be the most likely to benefit from statins.
Late in 2013, the American Heart Association and the American College of Cardiology issued a new set of guidelines for treating cholesterol with statins. The revised guidelines rely on a complicated algorithm that is difficult even for physicians to grasp; a calculator is required to assess one’s lifetime risk of having heart disease for which a statin should provide protection. Complex or not, the guidelines appear to have dramatically increased the total number of people for whom statins are now recommended, and moreover they appear to depart significantly from using “bad” LDL cholesterol levels as one of the principal points of entry for assessing statin eligibility. An analysis of the new guidelines that appeared in the New England Journal of Medicine suggested that as many as 13 million new people are now theoretically in the statin pool, representing a 30 percent increase from the previous guidelines.
Whether the vast majority of these newly statin-eligible patients will benefit from treatment, however, is much less clear, and we lurch toward the middle of the spectrum of certainty. A major segment of this new group includes people who have a variety of increased risks for heart disease regardless of their LDL cholesterol level (such as smoking and high blood pressure). But there have been precious few trials that have evaluated whether using a cholesterol-lowering medication in people whose cholesterol is already relatively low will make much of a difference.
“As compared with the [previous] guidelines, the new guidelines would recommend statin therapy for more adults who would be expected to have future cardiovascular events . . . but would also include many adults who would not have future cardiovascular events,” the authors of the New England Journal article wrote (my emphasis). Therein lies the rub. As the number of patients considered appropriate for statin therapy grows, the number of people needed to be treated by statins grows as well. Would it make sense to take a drug when one has a 1 in 200 chance of preventing one’s heart attack, especially if it means that one could have an equal risk of suffering a moderate or serious side effect from that medication, such as liver toxicity or diabetes, which are small but well-known complications of treatment? Statins are generally safe drugs, but as their benefits become ever smaller, those risks loom ever larger.
Soul
Compared to statin manufacturers, the makers of SSRIs have faced a much more difficult challenge to prove their drugs are worthy of widespread use: How do you measure outcomes?
How, indeed, do you measure depression? Fancy medical degree or no, many people can recognize a depressed person when they see one, but how does one quantify this in order to know that an antidepressant makes one less depressed? James Lind needed only to see whether his patients lived or died, and statin makers ultimately did so as well. Moreover, statin makers had a surrogate metric (cholesterol levels) that allowed them to measure the potential benefits of their drug, albeit indirectly. The same cannot be said for researchers evaluating the possible benefits of SSRIs like Prozac.*
One could, of course, theorize that SSRI-treated patients are less likely to commit suicide, but unlike heart disease, which is incredibly common in the developed world, suicide remains a relatively rare event, and the number of patients needed to test that theory would number in the tens of thousands and would require years, perhaps decades, of follow-up. In short, it’s a study that can be performed only as a thought experiment, although researchers have looked retrospectively at suicidal risk factors. A discussion of the challenges of retrospective analysis can be found in the following chapter.
The solution to this problem lay in a variety of “depression scales,” more than a dozen in all, that attempt to quantify the severity of mood disturbance. One of the most commonly used scales is known as the Hamilton Depression Rating Scale, which was first devised by the eponymous psychiatrist in 1960 but has been revised multiple times. The Hamilton Scale consists of approximately twenty questions scored by the evaluator, in which responses suggestive of depression are given points. The worse the symptoms, the more points. The number is tallied and, based on that score (which ranges from zero to fifty-two), one is categorized as having either no depression, or depression rated as mild, moderate, or severe.
A close look at the scale, though, makes the subjective, fuzzy nature of the test apparent. One of the questions evaluates psychological anxiety: “no difficulty” is given zero points, with increasing points provided for “subjective tension and irritability,” “worrying about minor matters,” “apprehensive attitude apparent in face or speech,” and finally, “fears expressed without questioning.” Worrying about minor matters could just as easily be defined as being an adult, which means that a large number of people reading this are now two points closer to a diagnosis of depression than they were at the beginning of this paragraph. Similarly, a “work and activities” question would provide points to anyone who exhibits any reaction to the usual stresses and anxieties associated with the workplace. Three questions in total deal with insomnia, potentially inflating the size of that problem. And most of the questions don’t quite work thei
r way around the problem that everyone has a bad day at least once in a while. How many days have to be blue before one is labeled depressed? The Hamilton Scale doesn’t directly answer that question.†
Several professionals consider the overemphasis on insomnia a major flaw of the Hamilton Scale. Worth emphasizing that there are more than a dozen of these depression scales in all, each subtly different from the other, but all face the same problem of applying precision to a medical and psychiatric problem that cannot easily be quantified in a precise, or even reproducible, way.
Looking over the Hamilton Scale, it becomes apparent that one can separate those who are not depressed from those who are moderately or severely depressed with great confidence. Unfortunately, it’s much harder to distinguish mild from moderate, and moderate from severe depression—a few answers that are different here or there and you suddenly find yourself in a different category, all of which may be due to the subjective impressions of the person who is evaluating you. And that fact becomes very important when we consider the value of SSRIs, for much the same reason that we can be confident about the value of statins for patients with very high cholesterol, but less so for those who have only mildly elevated cholesterol.
As with statins for heart disease, there have been dozens of trials performed on SSRIs for depression, and, like the statin trials, the vast majority of SSRI trials have shown a clinical benefit. However, unlike the statin trials, whose primary aims were generally to evaluate how many lived or died, SSRI trials for effectiveness have to look for improvement in depression scale scores. Taken as a group, these studies have generally shown the benefits of SSRIs to be in the range of about two to four points—a benefit, to be sure, and one that is often reproduced, but a mild one (though if you read only one footnote in this book, read this one).* Moreover, those with higher scores (i.e., those more depressed) tend to reap the biggest benefits, an effect similar to that in patients who benefit from statins.