THE RANDOMIZED CONTROLLED TRIAL
Let’s start with a famous claim: simvastatin (a cholesterol-lowering medication) will help someone live longer if the person has high cholesterol and has had a heart attack. This claim was addressed with a randomized controlled trial (RCT), the 4S trial. “4S” stands for the Scandinavian Simvastatin Survival Study. The randomized controlled trial is the gold standard for evidence in clinical medicine. We have referenced these trials continuously in this book as proof of what works and what does not. Randomized controlled trials are experiments. Not the kind of experiments we all performed in high school chemistry and biology. Instead of test tubes and Petri dishes, they involve people.
Until this study was published in the British journal Lancet in 1994, simvastatin’s effectiveness was not clear. Although high cholesterol was long known to be a predictor of heart attacks, previous attempts to decrease cardiac events by lowering cholesterol had been unsuccessful. Earlier medications had successfully lowered cholesterol, but any reduction in the death rate from cardiovascular causes was offset by increased death rates from other causes.
Like any RCT, the 4S trial was designed in three steps. First, selection criteria were defined, and only people that met these criteria were entered into the trial. In this case, people had to have angina or a history of a heart attack. They also had to have high cholesterol. Second, an intervention is made. In the 4S trial the intervention was the use of either simvastatin or a perfectly matched placebo pill. This is the randomized and controlled part of the randomized controlled trial. Randomization assures that the people in each group are the same, on average. Those participants in the treatment group are no more healthy, wealthy, or wise than those in the placebo group. The control is something as close as possible to the actual intervention but without the active ingredient. Lastly, an end point was chosen. The end point in the 4S study was the ultimate end point, death. The final analysis of the study was a comparison between the number of people who died taking the placebo pill and the number who died taking simvastatin.
The results of the 4S study nicely show how scientists report the results of RCTs. Of the patients in the simvastatin group, 8 percent died, compared to 12 percent in the placebo group. This was an absolute risk reduction of 4 percent. More commonly, researchers report this type of result as a relative risk, the ratio between these numbers, or a relative risk reduction, the percentage decrease in risk because of the intervention. The relative risk in the 4S study was 0.66, and the relative risk reduction was 33 percent. A final useful number is “the number needed to treat” (NNT), the number of patients who must be treated to save one life. In the 4S trial the number needed to treat was 25. (The NNT is the inverse of the absolute risk reduction.) A relative risk reduction always sounds more impressive than a number needed to treat. NPR and Science Times usually report relative risk reductions and rarely mention the NNT. In practice, however, the number needed to treat is the more important result. Twenty-five is an impressively low number.
The strength of an RCT is that it is a prospectively planned experiment. The study’s design controls for all the factors, other than the planned intervention, that might lead to different outcomes. This is terrifically powerful. Not only are known risk factors for the outcome, such as smoking status, controlled for, but so are those not yet known. Back in 1994 nobody suspected that increased levels of blood vessel inflammation would predict death, but even so, we can be sure that there were approximately equal numbers of “inflamed” patients in each group.
RCTs are not perfect. First, they are expensive studies to perform and require that people agree to be placed in either a treatment or a placebo group. Second, there are subtle ways that RCTs can reach misleading conclusions. We have discussed one situation in which this might happen, when we considered surrogate end points, and we discuss a few more in chapter 10. Lastly, the results of a single RCT do not provide a sort of biblical truth. What they do is present strong evidence about a finding; evidence that often becomes more nuanced as later RCTs examine the same question.
THE COHORT TRIAL
In contrast to the designed experiments that are RCTs, another type of medical study, the cohort study, describes natural, unplanned events. In a cohort study, two groups (cohorts), which differ in some important or interesting way, are identified. These two groups are then followed to discover what proportion of each cohort reaches some predetermined end point. Although the data that come from cohort studies are very similar to those that come from RCTs (relative risk, relative risk reduction, and number needed to treat) there are enormous differences between the two types of studies. While the RCT is an experimental study, the cohort study is an observational study—we do not do anything; we just watch. This difference explains how different study designs can reach different conclusions.
Here is a question: should women take estrogen and progesterone after menopause to lower their rate of heart disease? But first—Why even ask this question? This is where the scientist comes in. We know that during the years when women naturally produce estrogen, they have much lower rates of heart disease than men. After menopause, when their level of estrogen drops, their rate of heart disease rises. Could we maintain women’s low rate of heart disease if we gave them estrogen? Time for the clinician to step in. The Nurses’ Health Study (NHS), a cohort study begun in 1976, sought to answer this question.
For the NHS, 127,000 nurses, between ages 30 and 55, filled out detailed questionnaires every two years. These questionnaires cataloged medical histories, daily diet habits, and major life events. About 90 percent of the questionnaires were returned, and the data filled more than 600,000 typed pages. With an enormous resource like this, it would seem easy to answer the question. Just make two piles: all the women who took hormones in one pile and all the women who did not in the other. Then see which group did better. More or less, this is what researchers did, and in 1996 they released one of the most cited articles in biomedicine.
According to the NHS, estrogen users had 40 percent fewer heart attacks than women who did not take hormones. A few years later, the NHS confirmed this in women who took estrogen and progesterone compared to women who took neither. Hormones seemed to reduce cardiovascular risk, and with this information in hand, doctors wrote millions of prescriptions.
Of course, you already know what happened, and you probably already know why. The NHS supported the claim that postmenopausal women who take estrogen do better than those who do not. The authors, doctors, and the public took this claim and restated it this way: if a postmenopausal woman starts estrogen, she will do better. Was this claim true?
In contrast to the NHS, the Women’s Health Initiative (WHI) was a randomized controlled trial that studied the same topic. A group of women who were past menopause were randomly assigned to hormones or matching placebo pills. It is worth mentioning here an issue about placebos. Many people think that patients in the placebo group get a raw deal. They take a pill that is inert and are lied to about what it is. But the truth is, a randomized trial can only be done, ethically, if there is real uncertainty as to which is better, the treatment or the placebo. In an RCT, both groups are potentially getting the inferior treatment. Having two groups allows the researchers to blow the whistle when one group starts doing worse than the other. Each group needs the other for its protection. The WHI is a great example of a trial in which the placebo group actually did better than those taking the active medicine and, in effect, bailed out the women on treatment.
The Women’s Health Initiative randomized 16,000 women, between ages 50 and 79, to estrogen and progesterone or placebo. The women were watched carefully for numerous medical conditions. The study was stopped three years earlier than originally planned, because the women receiving hormone replacement therapy were developing breast cancer, heart disease, stroke, and pulmonary embolism at a higher rate than those receiving placebo. The authors calculated that for 10,000 women over the course of one year, there would
be 7 more cardiac events, 8 more strokes, 8 more pulmonary embolisms, and 8 more invasive breast cancers in those receiving hormones compared with those taking a placebo.
The WHI is a landmark study, not just because of its implications for hormone replacement. The WHI implies that even a well-designed observational (cohort) trial can be completely off the mark.
Why did the NHS fail? It is a question that many thoughtful people tried to tackle in the aftermath of the WHI. The bottom line is that the women who took hormones were different from the women who did not. Compared with nonusers, women who used hormones in the NHS were less likely to have a family history of heart disease, be hypertensive, have diabetes, or smoke. They were more likely to take aspirin, birth control pills, and vitamins. They were younger, drank more alcohol, and consumed more saturated fat. They were also wealthier. Researchers call each of these differences “confounders,” which are differences between the groups that could be an alternative explanation of the outcome. Maybe it was not the difference in hormone use that caused the outcome; maybe it was one (or many) of the confounding variables.
The authors of the NHS knew what they were doing. They considered confounders. In the 1996 paper they wrote: “Women who take hormones are a self-selected group and usually have healthier lifestyles with fewer risk factors than women who do not take hormones. In general-population samples, hormone users, as compared with nonusers, have more years of education, are leaner, drink more alcohol, and participate in sports more often, even before starting to use hormones.”
They then used explanations and statistical adjustments to explain away the effect of these confounders. In other words, the authors seem to say, “Sure, confounders exist, but we anticipated them and adjusted for them and we can still say that hormone replacement reduces heart disease.” Reading the NHS today is difficult for us—passages have become classic examples of “eating your words” in the medical literature.
Although it is only a method—like a technique to poach eggs—the randomized controlled trial is best thought of as a medical technology. First devised in the late 1940s (in a study of tuberculosis treatments), the RCT is arguably the most important medical technology of the 20th century. When large and well designed, it is the most powerful way to elucidate the truth in all of medical science. And though it cannot tell you about the mechanism of how a treatment works biologically, it can show conclusively whether a given intervention accomplishes a given goal. In medicine, it is the gold standard for proving a claim.
Observational studies are also useful. They are great at describing the course of medical disease. If you are interested in the rate of lung cancer in nonsmokers, smokers, and former smokers, there is a wonderful cohort study from Denmark that has followed these groups.* Observational studies are also useful for laying the groundwork for an RCT. Observational studies may support a claim that a risk exists or that use of a medication is associated with a better outcome. An RCT can then be designed to investigate whether lessening the risk or using the medication helps people. But when it comes to addressing questions of medical practice, observational studies are sometimes wrong. Notice, we say sometimes. They are not always wrong. If they were always wrong, it would be easy—always do the opposite. Research says observational studies are wrong somewhere between 15 and 50 percent of the time. Unfortunately, we have no way of telling when they are wrong.
CASE-CONTROL STUDIES
There is one more study type we should discuss. Here is a question you might ask: “Does my energy drink increase my chance of getting pancreatic cancer?” Let’s begin by noticing how different this question is from the questions we have already talked about. Here, we are asking about something that people already do and whether it is associated with a rare harm.
For several reasons, you cannot do an RCT to answer this question. First of all, RCTs are done to test interventions with potential benefits. You could randomize Red Bull drinkers to a stop-drinking-energy-drinks campaign and then measure pancreatic cancer, but you cannot do the opposite (ask them to drink) if you think the exposure is bad. But more to the point, without some other evidence of harm, launching a “Quit Red Bull” campaign seems unfounded. You need to establish that the hypothesis is plausible. A cohort study would seem an obvious way to answer the question. You could recruit energy-drink users and nonusers, follow them, and compare the outcomes. The problem here is that pancreatic cancer is quite rare, so you would need to enroll hundreds of thousands of people and follow them for years. Such a study would be prohibitively expensive. Who will pay for it? Certainly not the makers of the energy drinks.
Such a situation is where a case-control study becomes useful. Case-control studies generally examine a harmful exposure that potentially causes a rare event. The studies are retrospective in nature. The researchers start by identifying people who have had the rare event (the cases) and similar people who have not (the controls). The study then looks back in time to discover if the cases were more likely to be exposed to something than the controls. It is like a cohort study run in reverse, starting with the outcome and looking back to the exposure.
These studies can, however, be rife with problems. The cases and controls may differ in regard to things other than the outcome of interest, so the studies tend to run into the same problems with confounders that the Nurses’ Health Study did. There also needs to be a reliable way of determining exposure. Can we trust the 60-year-old with pancreatic cancer to accurately recall how many energy drinks he drank in his thirties?
An excellent example of a case-control trial was published in 2000, addressing the concern that caffeine intake causes miscarriage. This question could not be studied in any other way. You could not jump into a costly “stop drinking coffee” RCT, because no one had established that there is any risk to our favorite morning beverage. A cohort study would be difficult because miscarriage is, fortunately, quite rare, so the number of women who would need to be followed would be huge. A case-control study was designed in which women who had miscarried were recruited and interviewed about their caffeine intake. Their use of caffeine was compared with that of women who had successfully carried their babies to term. As in a cohort study, statistical adjustment was required because the patients who had miscarried were different from those who did not: they were older, had had more previous miscarriages, and had less morning sickness. In the end this study showed that women who ingested large amounts of caffeine were about twice as likely to miscarry as those who consumed only small amounts.
Case-control studies, like cohort studies, are not experimental. They are observational studies that cannot prove cause and effect. We do not know that drinking more caffeine puts women at higher risk for miscarriage. We only know that women who ingest more caffeine are more likely to miscarry. It might be the caffeine, or it might be the aggravation caused by their local Starbucks’ barista. That said, case-control trials have given us crucial information. These studies have suggested that cigarettes cause lung cancer, that thalidomide causes birth defects, and that people with sleep apnea are more likely to have automobile accidents.
What have we learned? The key to evidence in medicine is the claim. And the most important claim is that some medical practice is beneficial. If you really want to know, you have to do a randomized controlled trial. RCTs are not perfect, but when large and well done, they provide stronger evidence than any other study design. This is a fact that has been well accepted in medicine for at least the past 40 years. Observational studies are useful, particularly for showing the natural history of something: what percentage of heavy smokers will get lung cancer? what percentage of people with hypertension will have a stroke? And case-control studies are great to show whether exposures or habits (often unrelated to medicine) are associated with bad, but rare, outcomes. You can think of each of these studies as the right tool for a specific job. For this analogy, think of the RCT as the hammer, the observational study as the wrench, and the case-control trial as the screwdriver
. As it turns out, the clinician is mostly in the hammering business.
10 WHAT REALLY MADE YOU BETTER :: WHEN EVIDENCE GETS COMPLICATED
IMAGINE YOU ARE WALKING HOME from work and feel your phone vibrate. You pull the phone from your pocket and see that you have a new e-mail. You think, “I’ll read it later, but let me just see who it is from.” You then notice the subject: “Urgent,” and the sender, your boss. You start reading it and the content stops you in your tracks. Suddenly, someone grabs you by the shoulder and pulls you sharply backward. You fall onto the sidewalk. A bus and a rush of hot, diesel-tinged air blows by. On the ground next to you is the stranger who grabbed you. “That guy just saved your life,” exclaims an onlooker.
Some of us have had an experience like this during our lives. In a moment, what might have been a life-altering (or life-shortening) event is changed by the action of another person. Without a doubt, the person saved your life. Had that man or woman not grabbed you and pulled, the outcome was clear.
In our world it is easy to talk about cause and effect. Your child is running in the house, bumps the table, and the glass of water sitting on the table spills. “What did I tell you about running?” you say for the twelfth time. The cause (child running) and the effect (water spilling) are clearly linked. Without the former, there would not be the latter. This is how we see the world. Our minds simplify actions and reactions to this sort of cause-and-effect relationship. Often, however, it is not so clear.
Ending Medical Reversal Page 12