by Steven Hatch
There is another aspect of Lind’s work worth noting, equally rudimentary and also equally important to our modern structure of clinical research: he compared only one variable—just dietary supplements—when evaluating treatments for scurvy. As noted above, Lind had complex theories of scurvy’s causes, theories that were shared by his colleagues. They were all wrong, we now know, but it didn’t seem so to them, and not all of their thinking was inept. The idea that damp played a critical role in scurvy no doubt came about from making comparisons about the empirical differences between a sailor’s life and a landlubber’s because sailors were much more liable to develop scurvy. It’s easy to see the appeal of that hypothesis, especially at a time when nobody knew anything about vitamins. As we’ll see in the next chapter, a similar type of hypothesis generation took place at the advent of the AIDS crisis and is a common feature of gumshoe epidemiologic work. Regardless, Lind didn’t worry about the whole theory of scurvy when he performed his trial. He just focused on one variable, and in doing so he was able to make stronger conclusions about the effects of those interventions. The power of that approach went unappreciated not only by Lind himself but by pretty much the entire profession of medicine, and it did so for a lot longer than it took doctors to solve the problem of scurvy.
Now that we’ve seen what’s vaguely modern in Lind’s scurvy trial, let’s focus again on what makes it seem so premodern. First, everyone in the scurvy trial was treated with something. Like advanced cancer today, scurvy tends to have an inexorable path toward death. But the majority of diseases aren’t like that: people often naturally recover from a given malady. Think of allergies, depression, and reflux, or even a much more serious life-threatening condition like bacterial pneumonia: do nothing at all, and a certain percentage of people will get better anyway. In experimental conditions, we have artificially introduced that “do nothing” element, and we call it a placebo, and our comparisons are “placebo controlled.”
Of course, to make that comparison, everyone needs to believe that they are in the trial as equal participants, and so modern trials also engage in a process known as blinding. In his scurvy trial, Lind knew all the details: he knew which sailors were given each treatment, and for that matter he knew which sailors weren’t given any treatment at all. That may seem trivial for his research because the apparent effects of oranges and lemons was so dramatic, but if he had a particular theory as to which of his interventions was most likely to succeed, he might have been unconsciously biased to interpret his results either favorably or unfavorably depending on the treatment. It turned out by accident that none of the other treatments in the scurvy trial contained any appreciable amounts of vitamin C. Had these treatments contained varying levels such that some of his sailors made a partial recovery, the data may have been much less obvious, and his interpretation of them could have been affected by internal biases.
Researchers can often be motivated to believe in a theory so powerfully that they neglect evidence that would seem obvious to a disinterested observer. In Chapter 1 we observed a similar phenomenon in overdiagnosis; individual doctors can become powerfully attached to their diagnoses in like manner. Nobody really understood this in Lind’s time, but now we have a deeper appreciation of the role that psychological factors can play in creating bias in clinical trials. In today’s drug studies that bear a surface resemblance to the scurvy trial, we’ve learned to avoid introducing this bias at all costs. And we don’t merely blind the scientists performing the experiment: patients likewise do not know whether they are receiving active drugs or placebos.
Finally, part of the process of blinding involves deciding which patients are selected for the treatment arm, and which for the placebo arm. Again, subtle biases may lead a researcher, for instance, to unconsciously place healthier subjects in the treatment group and sicker ones in the placebo group. That would then create the appearance of benefit where none actually exists. So we pick who goes where blindly as well—that is, we randomize patients. Nowadays, we usually randomize through a computer program. Someone enrolls for a study, the research staff enters the data into a computer, and the computer spits out some identifier that will be used later to analyze the data because, at a meaningful level, only the computer now knows which arm of the study that patient is in. The pharmacists giving the experimental drug, as well as the doctors and nurses and other staff following the patient, to say nothing of the patients themselves, have no idea whether they’re getting the actual drug or a sugar pill.
Thus, the modern standard for a drug trial is that it be randomized, double-blind, and placebo-controlled. The first of this kind of trial actually didn’t occur until almost exactly two centuries after Lind’s groundbreaking work, when a group of British researchers performed a series of studies on male patients using an experimental drug known as streptomycin. The first of these papers was published in 1948, and even then, although the trial was randomized in much the same way that they are today (though without computers), it was a single-blind trial: the physicians knew which patients were getting the streptomycin. Moreover, it wasn’t placebo controlled, as the controls were simply given the standard treatment of the day for severe tuberculosis, which was simply bed rest. In fact, the controls didn’t even realize that they were involved in a trial at all. (This would not be considered ethical by today’s research standards, or at least it would have serious problems if the design were brought before an ethical review board, the type of group involved in the oversight of all legitimate modern medical research.) But the streptomycin trials laid the groundwork for genuinely modern medicine, and they mark the first time scientists truly understood the purest way that we can know whether some treatment actually works.
The Mechanics of Certainty and Uncertainty
This is the how of modern medicine. It is simple in outline. In practice, however, it is cumbersome and expensive. The process of blinding and random allocation, the creation of placebos (it costs money to make fake pills that look exactly like real pills, and to put them in packaging that can’t be distinguished by even the most skilled pharmacist), the generation of heaps of documentation, as well as the efforts of countless administrators, scientists, and other support staff required to oversee the work is phenomenally costly and time-consuming. When people speak of the medical industrial complex, this is the creature to which they are referring. This book is a consideration of some of the dilemmas that result from the products of this massive machine. But it does work on the whole, and it works largely because of this simple principle used in making comparisons between two relatively equal groups, with much modern science and physiology undergirding the work.
The process of comparison making in biomedical research on humans applies not only to drug trials but also to investigations on the causes of disease, the utility of blood tests, the natural history of a condition like, say, schizophrenia, and all manner of scientific inquiries besides. In this chapter, however, I will focus specifically on how we use comparisons to decide which drugs make the cut and which do not, and we’ll consider the implications of these findings.
One important quality of James Lind’s scurvy research that appears on the surface to have very little in common with modern trials is the size of his cohort, as his work involved a scant twelve sailors. That doesn’t seem to square with the contemporary trials we read about in the news, where thousands, or tens of thousands, of volunteers are followed, sometimes for years. But why do we recruit so many today, and why was his trial a success with so few back then? The answer lies in part in the nature of scurvy, and again Lind was remarkably lucky in choosing the disease he did to perform his investigations. Because scurvy’s effects are so profound, and because complete recovery from scurvy happens in a miraculously short time, Lind was able to witness a radical transformation in his patients almost instantaneously when administering his vitamin C–supercharged oranges and lemons.
Most modern diseases, however, don’t work like that. Symptoms are more
subtle, and healing through medications can take longer. What, for instance, would have become of Lind’s treatise if the natural history of scurvy required a month rather than a week of treatment for recovery to commence, and Lind had abandoned his work after a fortnight? What would he have done if scurvy wasn’t so lethal, where some people succumbed to disease but most did not, similar to how influenza affects us today? If only one in twenty sailors died of scurvy, and most just puttered on in a weakened state, Lind would have had a strong chance of not noticing the effect by monitoring only twelve subjects in his trial. Moreover, although he might have stopped scurvy in the two sailors who did get oranges and lemons, surviving in a weakened state would likely have been par for the course above the crowded, fetid below-decks atmosphere in which low-ranking sailors lived, so he could easily have overlooked their recovery and misinterpreted his results. Solving problems like this is a daily task in the field of work we call biostatistics. You don’t have to become a biostatistician to appreciate the basic principles of how it works, however: the more dramatic the impact a treatment has on a disease, the fewer people are required to demonstrate that effect; treatments with a less dramatic impact require many more such people.
But what does any of this have to do with uncertainty? Haven’t I just argued that, whatever happened in the past, we’ve now solved the problem of how to know things through the practice of randomization, blinding, and controlling with placebos?
The answer depends in part on the meaning of the word “know.” Knowledge in medicine is virtually never absolute, and it cannot in general be thought of in the same way we think of facts—that is, unimpeachable bits of information. Knowing the benefits of a medicine, for instance, much more often than not means knowing the relative likelihood that the medicine will make things better. Take this drug for this condition, and there’s a very good chance you’ll get better; take that drug for that problem, by contrast, and it may be helpful, but because the effect is smaller, we can be less certain it will be helpful in any individual case.
This lack of certainty relates directly to the question of sample sizes when investigating the benefits of a medication. Suppose you have a disease with a 100 percent mortality rate—not unlike untreated scurvy, in fact. Now recruit twenty people for a trial to administer some experimental drug in the modern randomized, double-blinded, and placebo-controlled method. At the end of the study, ten of the patients have made a full recovery, and ten are dead. The blinding is lifted, and lo and behold, all ten patients who recovered were given the experimental drug, and all ten who received the placebo succumbed to the disease. That would seem to be extremely strong evidence that the drug worked, right? If someone argued that we needed to repeat the experiment with twenty thousand people instead of just twenty, you would almost certainly argue that’s a waste of time and resources, and likely unethical, especially if the disease was spreading.
Now let’s complicate matters a bit, and suppose there’s a disease with a 50 percent mortality rate. For this study, with another drug, we decide to recruit a larger number of people—say, two hundred, with one hundred participants in each arm. At the end of this trial, fifty of the subjects in the placebo arm have passed away, entirely consistent with the predicted mortality rate of 50 percent. Meanwhile, in the treatment arm, only forty eight have died. That is, of those who received the experimental drug, two additional people are alive compared to the placebo group. So is this drug effective? What if the number of people who died in the treatment arm is only forty-six compared to fifty in the placebo arm? What if it was forty-four? Or thirty-eight? At what point do you cross over the line and have the same level of confidence in this drug that you had in the much smaller hypothetical trial done in the previous paragraph?
The value of these examples lies in their demonstration that the ability to compare a treatment group with a placebo group, in a precise, randomized, blinded manner, doesn’t in and of itself provide you with clear answers; it provides you with numbers. And the numbers have to be interpreted. Moreover, because the numbers can potentially fall along a continuum, there is no single moment where one can look at the results of a drug trial and exclaim, “Eureka!” knowing that a drug has shifted from the status of totally useless to totally effective. If the two groups of subjects look exactly the same at the end of a trial, the drug probably has no effect, but as the two groups differ in terms of the proportions of who improves, one only becomes gradually more certain that the observed difference is due to the drug.
In other words, there’s a constant recognition that all the changes observed could be due just to chance alone, and the question is, How likely are the observed changes just due to chance? Because the oranges and lemons made such a huge difference to Lind’s sailors, he didn’t need to perform a trial with hundreds of people. But when we deal with drugs that have smaller effects, our ability to know their value is much more appropriately expressed in terms of levels of confidence. In modern medicine, we have introduced statistical cutoffs by which we allow people to claim that a result is “significant,” but significance is different from truth. Significance simply allows us to say that the observed differences between two groups is so large that it is highly unlikely those differences could be due to chance alone.
The drugs of antiquity that have endured into the age of modern medicine—the salicylates, the sennas, the opiates of the world—had such obvious and profound benefits that this kind of technical study was not required to know their value. But knowing that a plant extract can ease pain immediately is much easier to observe than whether a synthetic compound can produce a 30 percent reduction in the risk of a heart attack three years hence or, for that matter, whether saw palmetto “supports healthy prostate function,” as the herbal supplement company General Nutrition Center claims on its website. Modern drug trials have largely been able to quantify with a fair degree of precision the value of new treatments for the maladies that afflict us, and they are responsible for the big industrial machinery of medicine required to generate those benefits.
But these treatments aren’t perfect, and viewing them through the lens of uncertainty can help reveal their limitations. Let’s consider some of the most important medications of our time, looking at them in part through James Lind’s eyes, and see what they can tell us about medicine today.
A Tale of Two Drugs
In the annals of pharmacology, 1987 will be remembered as an auspicious year. Two new medications were approved by the FDA that year to treat diseases that had come to be among the dominant illnesses of the industrial age: coronary artery disease and depression, literally drugs to treat the ailing heart and soul. It wasn’t that there weren’t already medications for these maladies, but these two drugs represented wholly new molecular approaches to their respective diseases, and as such constituted a quantum leap in treatment. Their approval represented a moment when the advanced biochemistry that would have been unimaginable to a physician from early twentieth century finally produced highly effective treatments with relatively minimal side effects, and as such could make a reasonable claim to the title “miracle drugs.” Although the specific drugs that were approved have since been eclipsed in sales by others within their respective classes, the classes of drugs themselves have remained central to medicine to this day.
The first, whose generic name is lovastatin and trade name is Mevacor, is perhaps not well recognized by most. However, the trade names of others within this class of drugs, commonly known as statins, are omnipresent: Zocor (simvastatin), Pravachol (pravastatin), Crestor (rosuvastatin), and the biggest selling drug ever, with total sales at more than $125 billion, Lipitor (atorvastatin). Critically important drugs for treating heart and other vascular diseases, they are all household names, as much cultural phenomena as pills dispensed by specialists: indeed, one can hardly sit through a televised football game without seeing an advertisement for one of these medications, typically showing some middle-aged male performing a faux end-zone dance in
front of bemused family members, all for the joys of having gotten a new prescription for his beloved drug. (Such commercials often come on the heels of ads for nationwide fast food chains such as Buffalo Wild Wings or Dave & Buster’s, establishments that, if attended regularly and with gusto, will contribute quite directly to the metabolic problems leading to the prescription in the first place.) Make no mistake, as measured in raw dollars, collectively they have been a gigantic success. Somewhere, in the cellars of many a pharmaceutical executive, are a substantial number of bottles of fine cabernet sauvignon, all acquired as a result of the healing properties of this new class of cardiac medication.
Although Mevacor itself never attained great popularity, the depression drug approved that same year of 1987 needs little introduction. Having attained the status of cultural icon, Prozac (fluoxetine) has become as tied to notions of mental health as the anxiety medication Valium was for the previous generation. Prozac was famous not only for its mood-altering effects but as the symbol of the medicalization of everyday life and the particular societal ills of the past generation, perhaps because of the depersonalization that the medication was said to be capable of inducing. The psychiatrist Peter Kramer observed this strange new drug in his clinical practice and wrote of it in the classic medical reflection Listening to Prozac in 1993; the following year a young woman named Elizabeth Wurtzel wrote a memoir of her bouts with depression, and the role the drug played in treating it, with the all-encompassing title Prozac Nation. Both were international best sellers; Prozac Nation was adapted into a movie in 2001.
Like Mevacor, Prozac was just the first drug of a group that worked in the same manner, a mechanism known as selective serotonin reuptake inhibition, and the drugs are thus known as SSRIs. Other SSRIs would follow, such as Paxil (paroxetine), Zoloft (sertraline), Luvox (fluvoxamine), Celexa (citalopram), and Lexapro (escitalopram). Along with other classes that work in relatively similar ways, these new medications account for the bulk of drugs used for depression in the world today, having replaced the then-dominant class of drugs used for depression known as tricyclic antidepressants, or TCAs (such as amitriptyline, nortriptyline, and imipramine among several others, which have become second-line agents because of their higher side-effect profile).