by David Healy
Between April 1802 and December 1805, 1,002 patients were admitted to the Salpêtrière, and Pinel was able to follow these individuals during their stay to see who recovered and who didn't, whether patients in particular diagnostic groups fared better than others—and hence whether diagnoses in use at the time were worthwhile or not. This was a first example of what later came to be called a statistical approach to illness. Why do it? Pinel laid out his reasons.
In medicine it is difficult to come to any agreement if a precise meaning is not given to the word experiment, since everyone vaunts their own results, and only more or less cites the facts in favor of their point of view. However, to be genuine and conclusive, and serve as a solid basis for any method of treatment, an experiment must be carried out on a large number of patients following the same rules and a set order. It must also be based on a consistent series of observations recorded very carefully and repeated over a certain number of years in a regular manner. Finally it must equally report both events, which are favorable and those which are not, quoting their respective numbers, and it must attach as much importance to one set of data as to the other. In a nutshell it must be based on the theory of probabilities, which is already so effectively applied to several questions in civil life and on which from now on methods of treating illnesses must also rely if one wishes to establish these on sound grounds. This was the goal I set myself in 1802 in relation to mental alienation when the treatment of deranged patients was entrusted to my care and transferred to the Salpêtrière.6
There had never been anything like this in medicine before. Overall, 47 percent of the patients recovered, Pinel found, but of those who had been admitted for the first time, who had never been treated elsewhere, who had a disorder of acute onset, and who were treated only using Pinel's methods, up to 85 percent responded. When left to recover naturally, many more of the first-timers did so than did those among the patients who had been treated previously by other methods. Not only that, within a short time of admission Pinel could tell who was likely to recover and who was not based on their clinical features. In other words there seemed to be different disorders, and people suffering from some types would recover if left alone while inmates with some other types would not regardless of what treatments they were given. Finally, following the patients after discharge brought a whole new group of periodic disorders into view for the first time, laying the basis for the later discovery of manic-depressive illness and other recurrent mental disorders.
Aware of the pioneering nature of his research, Pinel presented his data, on February 9, 1807, to the mathematical and physical sciences faculty at the National Institute of France rather than to the country's Academy of Medicine. This was hard science and the first time in medicine that results were presented as ratios across a number of patients studied, rather than as accounts of individual cases.
In reporting these findings, Pinel showed that he was well aware that his personal bias could have colored the results. But, as he noted, while an individual patient in London could not properly be compared to one in Paris or Munich, the results of complete groups of patients could be, and the registers of Salpêtrière patients were publicly available. So he confidently challenged others to contest his findings based on their outcomes.
The scientists were impressed. The physicians weren't. It took thirty years before another French physician picked up the baton and further unsettled the medical establishment with numbers. In 1836, Pierre Louis outlined a new numerical method that controlled for variations by using large numbers of patients: “in any epidemic, let us suppose five hundred of the sick, taken indiscriminately, to be subjected to one kind of treatment, and five hundred others, taken in the same manner, to be treated in a different mode; if the mortality is greater among the first than among the second, must we not conclude that the treatment was less appropriate, or less efficacious in the first class than in the second?”7
The treatment Louis assessed was bleeding—which in fact works well in disorders such as heart failure. But when he compared bleeding to doing nothing in a sufficiently large number of patients during the course of an epidemic, he sparked a crisis in therapeutics. Doctors expected bleeding to work better than doing nothing, but “the results of my experiments on the effects of bleeding in inflammatory conditions are so little in accord with common opinion [those who were bled were more likely to die, he found] that it is only with hesitation that I have decided to publish them. The first time I analyzed the relevant facts, I believed I was mistaken, and I repeated my work but the result of this new analysis remains the same.”8
These results led to howls of outrage from physicians who claimed that it was not possible to practice medicine by numbers, that the duty of physicians was always to the patient in front of them rather than to the population at large, and that every doctor had to be guided by what he found at the bedside.
Ironically, it was Louis and Pinel who were calling on physicians to be guided by what was actually happening to their patients, not by what the medical authorities traditionally had to say. As the marketers from GlaxoSmithKline and other companies might have told Louis and Pinel, though, for many physicians to be convinced there has to be a theory, a concept about the illness and its treatment, to guide the doctor. “The practice of medicine according to this [Louis's] view,” went one dismissal, “is entirely empirical, it is shorn of all rational induction, and takes a position among the lower grades of experimental observations and fragmentary facts.”9
Louis's struggles in Paris had their counterpart in Vienna where, in 1847, Ignaz Semmelweis noted that mortality was much higher on an obstetric ward run by physicians and medical students than one run by student midwives. Suspecting that the physicians were coming to women in labor with particles of corpses from the dissection room still on their hands, he got them to wash more thoroughly with a disinfectant and was able to show that antiseptic practice made a difference. No one paid any heed. A few years later, in 1860, Joseph Lister introduced antiseptic practice to the Glasgow Royal Infirmary, and postoperative putrefaction rates subsequently declined. The later discovery that infection with bacteria led to putrefaction provided a concept to explain these observations, but until then Lister, like Semmelweis, had trouble getting his colleagues to take his findings seriously.
One of the weaknesses in these early manifestations of evidence- based medicine, as the examples of Pinel, Louis, Semmelweis, and Lister make clear, was their inability to shed much light on what lay behind the figures—they showed associations but explained nothing about cause. There are commonly tensions between broad associations of this type, the specific evidence that comes from laboratory experiments, the evidence of our own eyes, and what currently dominant theories may dictate. To the relief of most doctors, the tensions between broad associations and more specific evidence were eased to a degree with the emergence in the second half of the nineteenth century of laboratory science, which more clearly linked cause and effect.
THE CAUSES OF DISEASES
In the 1870s, a set of laboratory sciences emerged to form the bedrock of the new scientific and diagnostic work that would transform much of medicine and underlie the practice of doctors like Richard Cabot and the rise of hospitals such as Massachusetts General, as noted in chapter 1. Advances in bacteriology were among the key scientific developments that led to new treatments as well as hope that science would lead to further breakthroughs. In France, Louis Pasteur provided the first evidence that germs, later called bacteria, were the causative factors in a series of infections such as rabies,10 and he supplied both evidence and a rationale for vaccinations and antiseptic procedures.11 In Germany, Robert Koch set up the first laboratory dedicated to the pursuit of the microbial causes of disease, and his most famous protégé, Paul Ehrlich, who more than anyone else developed the dyes that helped distinguish among bacteria, later developed the drugs that killed some of them. It was Ehrlich who coined the term magic bullet, for a drug that would specifical
ly target the cause of an illness and leave the patient otherwise unaffected.12 For generations afterward, until the 1960s, the glamour and importance of their discoveries and those of their successors, written up in books such as the Microbe Hunters, attracted students to medicine.13
In 1877, Koch transmitted the lethal disease anthrax by injecting noninfected animals with the blood of infected animals; he then isolated anthrax spores and demonstrated that these spores, if grown in the eye of an ox for several generations, could also cause the infection. Where Lister met resistance for recommending an antiseptic approach in surgery on the basis of a comparison of the numbers of infections with and without antiseptic conditions, Koch could show the existence of bacilli under a microscope and later growing on a Petri dish, and then demonstrate the efficacy of sterilization in killing the bacillus. Where it had been difficult to overcome resistance to revolutionary ideas about antiseptics using only comparative numbers, for many seeing was believing.
The impact on medicine of this new science of bacteriology and the germ theory of disease can be seen with wonderful clarity in the case of cholera. From the 1830s to the 1860s, before the role of germs in disease was recognized, a series of cholera epidemics struck Europe, killing tens of thousands. Because no one knew what caused this plague or how to protect themselves from a grisly death, there was widespread public panic. In 1856, in a now-celebrated series of investigations John Snow, a London physician, mapped the appearances of the disease around London. He made a connection between clusters of those with the disease and contamination of the water supply and famously recommended removal of the handle from the pump in Broad Street so residents would get their water from other sources.14
Snow's work rather than that of Pinel, Louis, or Semmelweis is often cited as the first step in a new science of epidemiology, which maps the progress of a disease (or a treatment) through a population to pin down its course and its effects. But, while Snow is celebrated now, he was ignored at the time and the handle was not removed at the time because he could not point to a cause. The association he proposed was but one of many competing theories at the time. The data alone were not persuasive.
The later detection by Koch's group of a cholera bacillus in the drinking water of people who became ill both confirmed and undercut Snow's work. It confirmed Snow's suggestion of a link to the water supply rather than the other theories prevalent at the time. But it also made an approach to a disease like cholera that required tracking what happened to thousands of people over whole areas of a city seem crude and needlessly labor-intensive. Snow died two decades before Koch's work, but Lister, who in his antiseptic investigations had done something similar to Snow, came over to the bacterial theory of infections when it was demonstrated to him that bacteria caused putrefaction. Louis's and Snow's figures provide part of a story that needs to be matched with the evidence of our own eyes and the evidence that comes from the laboratory.
Koch's laboratory (or, experimental) approach didn't triumph without a fight, however. Many at the time refused to believe bacteria caused disease. Max von Pettenkoffer, a professor of medical hygiene in Munich, for example, argued that cholera was not caused by Koch's recently isolated bacillus but depended on an interplay of factors, many of which lay within the host. To demonstrate the point he brewed a broth of several million “cholera” bacilli and drank them, without suffering significant consequences. Faced with this challenge, Koch was forced to grapple with how we know something has caused something else. In von Pettenkoffer's case, Koch argued that stomach acid had likely killed the bacillus; still, there was room for doubt.15
Koch's solution to von Pettenkoffer's challenge and to the general problem of how to link cause and effect was to outline a number of rules. First, if you challenge (expose the person) with the cause, the effect should appear. Second, remove the cause and the effect should go. Third, a rechallenge should reproduce the effect. Fourth, the greater the exposure to the cause (the higher the dose), the more likely the effect should be. Fifth, an antidote to the drug (or bacterium) should reverse the effect. Sixth, there should be some temporal relationship between exposure to the drug or bacterium and the appearance of the effect. Finally, there should be some biological mechanism that ideally can be found that links the cause to the effect.
These are just the rules we now apply to deciding if a drug like alcohol or an industrial chemical has a particular effect on us and whether a controlled trial has shown a particular drug produces a benefit in a disease. Doctors attempting to make sense of what is happening to the patient in front of them will also use just the same rules. Whether direct observation by a doctor or a controlled trial run by a pharmaceutical company in hundreds of patients is the better approach depends on the question you're addressing. In the case of a patient with a suspected adverse drug reaction, when it is possible to vary the dose of treatment or stop treatment and rechallenge with a suspect drug, direct observation is just as scientific and may be much more informative than a controlled trial. In practice, however, if a hazard of treatment has not been revealed in what results of a controlled trial have been published, doctors are likely to deny it is happening, despite the evidence of their own eyes. To see how this situation has come to pass we have to turn to the study of fertilizers and the origin of randomized controlled trials.
FERTILIZERS IN MEDICINE?
In the best possible sense, doubt is the business of an epidemiologist. From John Snow onward, statistical epidemiologists worth their salt can provide several explanations that might account for the associations found in a quantitative study. Ronald Fisher (1890-1962), a Cambridge mathematician who did most of his key work in developing modern statistical methods from the 1920s to the 1940s while associated with an agricultural college, was typical of the genre. Photographs commonly show him smoking. To the end of his life in 1962, at the age of 72, he argued that lung cancer was not caused by smoking, that all we have for evidence are numbers linking people who smoke and cancer but that may simply mean that people prone to cancer were also more likely to smoke, and that correlations are not proof of causation.
Fisher's work centered on the question of whether the application of various fertilizers might improve grain yields. In grappling with how to determine this he introduced two ideas—randomization and statistical significance—that have come to dominate modern medicine. Indeed, they extend well beyond medicine and are worth looking at because they underlie so much of what is reported about science.
Misleadingly, a fertilizer may appear to increase the yield of grain if spread in an uncontrolled way: a myriad of soil, light, drainage, and climate factors might come into play, for example, and bias the results. Just as when trying to determine whether a drug works or not, the experimenter has to control for these known unknowns. Fisher anticipated Donald Rumsfeld's unknown unknowns by eighty years. But his key insight was how to control for these factors that he didn't know about—the way to do so was to allocate fertilizer randomly to the plants under study.
In early controlled drug trials, investigators allocated one male, say, to the new drug, the next one to placebo (a dummy pill), and so on, while also attempting to ensure there were an equal number of people of one age on the new treatment as on the control, or placebo, treatment. Now, in contrast, patients are divided into treatment and control groups according to a sequence of numbers that have been generated randomly before the trial starts. After the randomized controlled trial is over, investigators can check whether obvious factors like age and sex are distributed equally across treatment groups—which they invariably are. In one go, random allocation takes care of both the known and the unknown unknowns.
In addition to taking care of the unknown unknowns, randomization greatly reduces the number of subjects, whether plants or people, that need to be recruited to a study to get a clear-cut answer. At a stroke, the advent of randomization in controlled trials in the 1950s turbocharged epidemiology. It did away with the need to carefully balance contr
ols by age, sex, social class, and ethnicity that made a nonrandomized approach slow and cumbersome because of the requirement for huge numbers of patients to produce a clear result.
Random assignment helped Fisher decide whether a fertilizer worked or not. But there is a key point to note. The question facing Fisher was whether a fertilizer would increase yield. Would there consistently be more bushels of grain, say, in the fertilized patches than in the unfertilized ones? This question is straightforward when the outcome of interest—in this instance, bushels of grain—is the only criterion. What, however, if the yield was greater but much of the grain was moldy? Would one still say the fertilizer worked?
Asked what it means to say a medical treatment works, most people would respond that it saves lives. Even if a treatment comes with side effects, staying alive ordinarily trumps these, and conversely death trumps any benefits. But many medicines, perhaps most, do not save lives. Once we look at outcomes other than death, we move into an arena where competing values come into play. We may not want the type of sleep induced by a hypnotic or the sex life produced by Viagra. The extreme, as we shall see in later chapters, is when we end up with claims by companies that a new treatment “works” because it can be shown to have effects on things the company values—even though the same clinical trials throw up more dead bodies on the new drug than on the placebo, as is the case for the cholesterol-lowering statin drugs, the Cox-2 inhibiting analgesics, blood-sugar-lowering drugs, beta-agonists for asthma, along with various antidepressants and antipsychotics. This happens when trials throw up a trivial but clear-cut and marketable benefit, with indicators of more problematic risks that companies ignore. This complex mix of benefits and risks is in fact the case for almost all of the current best-selling drugs in medicine, but all most doctors or patients get to hear about are the benefits.