by Allan Brandt
Doll’s career was also dramatically influenced by tuberculosis. He had shown great promise in mathematics as a student but followed his father into medicine. Although eager to find ways to employ his interest in research and quantification, Doll struggled to find a situation in medicine that would demand these skills. During World War II, he contracted TB and had to have a kidney removed. Following his military discharge in 1945, at the age of thirty-two, Doll became involved in a study with Dr. Avery Jones at Central Middlesex Hospital on occupational factors in peptic ulcers. His meticulous survey, funded by the MRC, came to the attention of Bradford Hill just as he was about to begin his investigation of the causes of the rise of lung cancer. Hill asked Doll to join him at the MRC in January 1948. Together, they made a formidable partnership in the evolution of modern epidemiology. The young research assistant quickly became a full collaborator on the project.24
Hill sought to adapt his new type of clinical trial to investigations where randomization was not possible, specifically cigarette smoking. Obviously, he and Doll could not divide individuals into two groups, apply the intervention (in this case the cigarette), and later evaluate the impact on a double-blind basis. The period of exposure necessary to produce disease (probably twenty years) was far too long, there could be no placebo control group, and most significantly, since such a study would randomly subject individuals to unknown and possibly serious harms, it was clearly unethical. So Hill and Doll worked to develop a strategy that would follow—to the degree possible—the methodological rigors of randomization: how could one systematically design and implement observational studies in populations that invoked the strategies of an experimental “trial”?
The answer, they concluded, was to turn the randomized trial back-to-front. Instead of a “prospective” study—in which the subjects were randomly divided into two groups, and half of them subjected to some clinical treatment—Doll and Hill would run a “retrospective” study, taking a group of lung cancer patients and pairing each one with a carefully matched but healthy control, in order to analyze the differences in the two groups’ long-term behavior. Doll and Hill were therefore eager to bring quantitative rigor to the medical and scientific assessment of both the causes of disease and the effectiveness of treatment. In this respect, they saw no fundamental tensions between the randomized clinical trial and the retrospective (case-control) study, which they devised in 1948 to determine the causes of lung cancer. Both types of designs—prospective and retrospective—rested on a foundation of statistical inference and the systematic collection and evaluation of data. And both relied on carefully structured observations of specific variables, clinical interventions, and “risk factors.” Mathematically, researchers could treat the risk factor—cigarette smoking, poor diet, or any behavioral or social variable—as if it were an intervention, such as a new drug or a surgical procedure. Researchers could then compare those exposed to the specific risk factor to controls who had not received the “intervention.”25
Like Graham, Doll and Hill began their investigation with considerable skepticism about smoking’s influence on lung cancer. In their first list of factors that might account for the disease, the rise of the cigarette appeared as but one of several possibilities. Doll believed the introduction of the automobile, the widespread expansion of paved roads, and the consequent changes in air quality would emerge as the most important factors. As he later reflected:Motor cars . . . were a new factor and if I had had to put money on anything at the time, I should have put it on motor exhausts or possibly the tarring of roads. Because of course the whole road system in the country had changed with the advent of the motor car, and we knew . . . that the tar that was put on roads contained many carcinogens.26
In contrast, “cigarette smoking was such a normal thing and had been for such a long time that it was difficult to think it could be associated with any disease.”27
But as their data began to accrue in late 1948 and early 1949, it became clear to Doll and Hill that cigarettes were the crucial factor in the rise of lung cancer. Hill later recounted:As I went through and checked the diagnoses I saw that patient after patient in the “lung cancer” group who was regarded as a non-smoker turned out not to have lung cancer; whereas, in those who were heavy smokers the diagnoses seldom had to be changed. . . . This was a striking finding and quickly drew our attention to the importance of smoking.28
Even without any sophisticated statistical analysis, the findings were impressive: among the 647 lung cancer patients entered into Doll and Hill’s study, all 647 smoked cigarettes. When Doll computed the p value for statistical significance, it turned out to be 0.00000064! In other words, the possibility that this was a “chance” finding was less than one in a million. Among those who smoked more heavily, lung cancer was correspondingly more prevalent, confirming the dose-response effect noted a decade earlier by Raymond Pearl.29
The comparison of lung cancer patients to carefully matched controls was central to Doll and Hill’s investigation. During the 1920s and 1930s, control groups had become a key feature of clinical trials, a way to eliminate possible bias on the part of the researcher by providing rigorous comparisons. Now Doll and Hill’s quasi-experimental studies employed controls as well.30 The two men understood that their conclusions rested fundamentally on explicit comparisons: lung cancer patients to other patients (similar in every other way); smokers to nonsmokers (similar in every other way). The systematic use of such comparisons constituted a critical analytic tool of modern epidemiology and indeed, of all medical knowledge.
Another key aspect of this study was its analysis of the different rates of lung cancer in men and women. These variations, Doll and Hill suggested, were not due to any inherent sex difference but instead reflected historical patterns of cigarette consumption. “Although increasing numbers of women are now beginning to smoke,” they wrote, “the great majority of women now of cancer age have either never smoked or have only recently started to do so.” Few women had lung cancer in 1948 because relatively few had been smoking long enough. Implicit in this conclusion was the stark prediction—soon to be realized—that rates of lung cancer among women would rise significantly in the second half of the twentieth century.31
Contemporary assessments of the Doll and Hill and Wynder and Graham studies often drew attention to the potential for bias—on the part of both patients and investigators—in a retrospective study. Patients might recount their histories inaccurately, or they might tend to overestimate or underestimate their exposure to cigarette smoke. Interviewers, too, might have preconceived assumptions. Those who were eager to substantiate the smoking-cancer link might unconsciously or consciously skew their questions. Although Doll and Hill hoped to keep the patients’ diagnoses hidden from their interviewers, this had proven impossible in practice. The two researchers were highly aware of such opportunities for bias: “Serious consideration must therefore be given to the possibility of interviewers’ bias affecting the results (by the interviewers tending to scale up the smoking habits of the lung-carcinoma cases).”32 Doll carefully checked the diagnoses of each patient entered into the study; additionally, he obtained histologies and information about the part of the lung in which the tumor had originated.
Sharply aware that their data and conclusions could be critiqued based on these personal histories, Doll and Hill assessed their reliability by reinterviewing a group of controls six months later. They found only small changes, concluding that the original interviews were “reliable enough to indicate general trends and to substantiate material differences between groups.”33 They understood the importance of recording detailed smoking histories from their respondents. “It was necessary to define,” they wrote, “what was meant by a smoker,” a category that did not include “the woman who took one cigarette annually after her Christmas dinner.”34 Doll entered the data by hand into columns in a record book and added the columns. He later commented, “The whole thing was done with a nineteenth-century clerical technique.”35 Th
is systematic collection and analysis of data harkened back to the historic epidemiologic assessments of figures like John Snow investigating cholera in mid-nineteenth-century London.
Although the findings were striking, Doll and Hill understood that it would be easy to dismiss them—as the tobacco industry would repeatedly try to do—as “merely” statistical. So they meticulously described the criteria they applied before any “association” between smoking and lung cancer could be identified as a genuine causal relationship. The problem with epidemiological studies was the potential that some bias or some unanticipated variable might obscure an alternative explanation for the apparent causal relationship. This is why critics of these studies frequently warned that a statistical “association” should not be assumed to be a conclusive demonstration of a cause. No one was more aware of these problems than Doll and Hill. Every apparent limitation of their work was clearly articulated, addressed in detail, and rebutted.
Even aside from its groundbreaking results, this study was important for its explicit commitment to investigatory science, hypothesis testing, and experimental method. Doll and Hill worked to eliminate the possibility of bias in the selection of patients and controls, as well as in reporting and recording their histories; they emphasized the significance of a clear temporal relationship between exposure and the subsequent development of disease; and they sought to rule out any other factors that might distinguish controls from patients with disease. This explicit search for, and elimination of, possible “confounders” was a critical step toward their conclusion. Further, they insisted on carefully addressing all possible alternative explanations for their findings. Was there some other explanation that could plausibly account for the same data? “Consideration,” they wrote,
has been given to the possibility that the results could have been produced by the selection of an unsuitable group of control patients, by patients with respiratory disease exaggerating their smoking habits, or by bias on the part of the interviewers. Reasons are given for excluding all these possibilities, and it is concluded that smoking is an important factor in the cause of carcinoma of the lung.36
Doll and Hill’s first paper on smoking and lung cancer appeared in September 1950 in the British Medical Journal, four months after Wynder and Graham’s article in JAMA. Although Doll and Hill regretted not publishing first (they had held off, at the suggestion of MRC Secretary Harold Himsworth, to collect more data from patients outside London), their paper differed from their American counterparts in ways that would ultimately be of great significance. While they understood the importance of their conclusion, they had a complementary commitment to demonstrating the power of epidemiologic methods in investigating causal questions in medicine and public health.37
Both Doll and Hill would spend their careers applying these methods to tobacco and other risk factors, but also working to demonstrate their utility for addressing questions poorly suited to laboratory investigation. They sought to identify a scientific approach that could be used to investigate disease in situ, especially in instances where laboratory experimentation and clinical observation were so significantly limited in determining causality and outcome. The framework they sought to develop was specifically designed to address the inherent limitations of these other forms of knowing.
The issue of causal criteria would be debated for decades. Absent some clearly articulated physical mechanism, was a statistical argument sufficient to prove that A causes B? Although their criteria would be refined and expanded, Doll and Hill brilliantly and explicitly outlined the basis for a systematic epidemiological approach to determining causality in noninfectious chronic disease. In this sense, modern epidemiology was constructed around the problem of determining the harms of smoking.38
Although observers would later debate the “priority” of the Doll/Hill and Wynder/Graham investigations, such discussions obscured the fact that priority in epidemiology was not like physics or chemistry. No single study can be definitive. Given the complex variables being assessed, a conclusive judgment on cigarettes as a cause of disease would require the accumulation of many studies both similar in design yet distinctive from each other. No single study could conclusively demonstrate a causal relationship between smoking and cancer.
Following Doll and Hill and Wynder and Graham, a number of investigations reported strongly consistent findings.39 There now seemed little doubt that among patients with lung cancer, there was a disproportionate number of heavy smokers (and few nonsmokers). “We would never have said, on the case control study alone,” explained Doll, “that cigarette smoking was a cause of carcinoma of the lung.” The move from an “association” to a causal relationship was made only in light of the consistency of a wide range of evidence. For example, Doll and Hill collected international data to see if there was any country where smoking had been prevalent for a long time, but that had a low incidence of lung cancer. None existed. In countries where the cigarette was not introduced until late, lung cancer was uncommon. Additionally, they found, risk of disease was lower among light smokers worldwide.40
Although the many researchers now entering the field would assert a healthy competitiveness, their combined work formed an important collaboration-in-kind. Researchers conducting retrospective studies on cigarette smoking and lung cancer, using a variety of methods and populations, consistently replicated and validated the most important findings.41 As more studies accrued, so too did medical and public confidence in the conclusion. This aggregative process marked a significant difference in scientific epistemology from the traditional notions of individual investigators “making” scientific “discoveries.” In epidemiology, discovery and proof were iterative, as no specific experimental situation could be precisely replicated. Researchers now sought to take advantage of this variability; “consistency” across multiple studies would become another criterion for defining causality.
Retrospective studies, such as those reported by Doll and Hill and Wynder and Graham, were subject to extensive criticism from those who understood their methods, and from many who did not. Some dismissed the findings as but a figment of statistical manipulation (although little highly sophisticated statistical analysis was actually applied). Others focused on suspicion of bias. Both patients and interviewers, they suggested, might overestimate smoking, skewing the results.
One of the most strident critics of the new epidemiological studies came from the world of statistics. Joseph Berkson continually raised questions about possible bias in the selection of individuals in the respective epidemiological investigations. Berkson had trained in medicine at Johns Hopkins, where he also received a doctorate in statistics. After serving as a fellow in physiology at the Mayo Clinic in 1931, he joined the Statistics Division there. In 1934, he was named head of Biometry and Medical Statistics at the Mayo Clinic, a post he would hold for more than thirty years. Berkson found himself drawn to controversy and cherished his identity as a skeptic of the emerging consensus about lung cancer and smoking. According to Berkson, the fact that a number of the retrospective studies had used hospitalized patients as subjects and volunteers as interviewers were confounding factors. This critique was repeatedly addressed and rebutted by the epidemiologists. Berkson was also suspicious because cigarette smoking seemed not only to cause more cases of lung cancer but higher mortality from multiple causes. When such investigation “turns out to indicate that smoking causes or provokes a whole gamut of diseases, inevitably it raises the suspicion that something is amiss.” But smoking was eventually linked to many different diseases. Berkson’s a priori commitment to specificity (one cause, one disease) led him to erroneously dismiss significant findings. Despite numerous answers to his critiques, he never relented in his skepticism.42
Another vocal critic of the lung cancer findings was Sir Ronald Fisher, the leading biometrician and geneticist in Great Britain during the first half of the twentieth century and a man deeply committed to bringing statistical analysis to genetics and agri
cultural experimentation. His 1925 book, Statistical Methods for Research Workers, quickly became a classic, leading to academic appointments at University College London and Cambridge University. Fisher’s critiques were similar to Berkson’s. The ethical impossibility of conducting a randomized experiment led him to question the results of the epidemiological studies. As a believer in genetic notions of cancer causality, Fisher speculated that there was some constitutional factor that led individuals both to become smokers and to get lung cancer, even though smoking and lung cancer might not be causally related. Doll and Hill repeatedly rebutted this theory, returning to the critical question of how to account for the rise in lung cancers during the twentieth century if the disease was simply “constitutional.”43
While Fisher and Berkson raised important questions, their critiques were no match for the overwhelming evidence of repeated studies. Nonetheless, the industry broadcast and rebroadcast these attacks and ultimately hired both Fisher and Berkson as paid consultants. Although both men identified themselves as “independent” skeptics, they brought both a priori assumptions and, later, conflicts of interest to their unremitting critiques.44
In 1951, Wynder wrote to his mentor Graham about the ongoing attacks by Fisher and Berkson:It seems strange that after the British paper there should still be statisticians who find serious doubt in our findings in regard to errors of memory that patients may have. Our critics seem not to note that similar errors of memory would apply to our controlled population. . . . I am glad to report that the statistical powers . . . at the National Cancer Institute have been all on our side since we were so thoroughly confirmed by the Doll and Hill paper.45
Doll and Hill understood—as did their American colleagues—that these studies demanded additional, confirmatory investigations using other methods. The language in these reports varied from observing an “association” between cancer and cigarettes to claiming “causality.” Doll and Hill concluded that additional investigations of patients with lung cancer would not resolve the ongoing doubts about this relationship. “Further retrospective studies of the same kind would seem to us unlikely to advance our knowledge materially or to throw any new light upon the nature of the association,” they wrote in 1954.46 They began designing additional studies that would employ different research strategies to confirm and sustain their earlier findings. To counter the charge of bias that had been leveled against their earlier studies, in 1951 they initiated a major prospective study to follow health outcomes among healthy smokers paired with nonsmoking controls. They sent 60,000 questionnaires to British physicians about their smoking practices and got back some 40,000 replies. Doll and Hill chose doctors for their study for a number of reasons. First, they wanted to attract the attention of the medical profession. Second, they expected that physicians might offer more accurate replies to their questionnaires. And most importantly, they knew that all physicians were registered by the government, facilitating identification and follow-up.