by Burch, Druin
What Hill proposed in his Lancet articles and in his book, was ‘the allocation of alternate cases’. As in the trial of horse serum for treating pneumonia, you could take one person and put them in one group, then place whoever came next in the other group. Given enough people, all differences would balance out. The glory of the method was that the differences would balance regardless of whether you knew about them or not. Here was the answer to the problem, the method that could produce groups of people that were identical not only in the ways you knew about, but also the ones you didn’t know. Of course you could never guarantee completely that the groups matched, but you could get close. Flip a coin a thousand times in a row and there is a chance it will always come up heads. The more times you flip, though, the more confident you can be that heads and tails will – on average – spread themselves evenly.
Within Hill’s writing, though, were some notable omissions. As an example of the effects of alternate allocation, Hill presented data from the MRC’s trial of serum therapy for pneumonia. In one very obvious respect, the then spread of characteristics was clearly a failure. Young patients were more likely to have got the serum, older ones more likely not to. Given that the older you were, the more fatal pneumonia became, that was a problem. Hill did not comment on how it was that the two groups of men came to differ; the implication was that the trial had simply been too small to smooth the groups into similarity. It included only 322 patients. Hill noted that 159 had been controls, and 163 had been treated with the serum. He did not remark that there was something wrong with those numbers. If the patients were really being allocated alternately, to one group and then another, then the total of 322 should have divided equally down the middle. It didn’t.
There was only one explanation, and Hill did not mention it. The doctors had cheated. The alternate allocation was a failure because the doctors could influence it. A hunch that this person should get the serum, or that another should not, and the trial had become compromised. There was always an urge to do as much as possible for the youngest patients. Doctors were unable to resist the temptation to try to help. It seemed that if there was a way for doctors to influence which patients got which therapies, they did so. Picking different sorts of people for the different options meant that the results of the trial became meaningless. It was not badly intentioned, it was not something Hill was incapable of sympathising with, but it was a problem all the same. Overcoming it was the one thing that Hill’s book did not discuss.
As a boy, Ronald Aylmer Fisher was uncertain whether to follow his interests in biology or in mathematics. According to his Royal Society memoir, the decision was made when he saw a museum display of a cod. The bones of the fish’s skull were separated out and neatly labelled. Fisher promptly chose maths.
At Cambridge, he came across the work of Karl Pearson. It influenced him greatly, although the two men later fell out over a statistical concept known as likelihood ratios. When the First World War broke out, Fisher tried to join the army. They rejected him, saying his eyesight was too poor. His work on statistics therefore continued. It brought him acclaim and, from Pearson, the offer of a job. Sensing the beginnings of what was to become a lifelong feud, Fisher turned him down. His interest in identifying problems that required mathematical solutions continued. ‘A lady declares that by tasting a cup of tea made with milk she can discriminate whether the milk or the tea infusion was first added to the cup,’ he wrote in 1935. ‘We will consider the problem of designing an experiment by means of which this assertion can be tested.’
Working in an agricultural research institute, he applied himself to studying the way in which statistics could most reliably aid scientific discovery. What he came up with was one of the most important, as well the most obscurely obvious, ideas in human history: namely that in certain circumstances, ‘only randomisation can provide valid tests of significance’.
Scientific method does not just mean doing a test. It means doing a reliable test. Without that, all the trappings of an experiment are a lie, a way for people to convince themselves and others that they possess something truthful when they do not. Fisher figured out the step that was missing from Hill’s explanation of trial design. Alternate allocation simply was not robust enough to stop doctors cheating. That meant patients were not distributed randomly between groups.
Fisher was not thinking of patients but of plots of ground. His agricultural research was aimed at replacing myths with facts. How could you figure out what actually worked? No two fields were ever quite the same and it was never entirely possible to pick out two groups of plots that perfectly matched. They varied like people did, both in ways you could see and measure and also in ones you could not. But if you took enough fields, and allocated them at random, then everything should balance out. Your tests should work, your science should be reliable and the conclusions you reached as helpful as the truth could possibly be.
Stirred by the commercial success of streptomycin in America, British drug companies were gearing up to produce it themselves. Until they did so, supplies were exceedingly limited. America was willing to export a certain amount of the drug to Britain. Fifty kilos was a substantial quantity, but it was limited all the same. It came at the hefty price, in 1946, of a third of a million dollars.
Reading accounts of what happened next, you can get the impression that doctors did the only thing they could with the quantity of drug available to them. With tuberculosis such a common disease, fifty kilos (110 pounds) was nowhere near enough to treat everyone suffering from it. The Medical Research Council decided that the drug should be given with reasonable freedom to those afflicted by rapidly fatal forms of tuberculosis, like the meningitis it produced when it got into the coverings of the brain. Those people were an unfortunate minority. For them, any outcome other than death would be proof that the new drug worked – and side effects were not an issue.
The bulk of tuberculosis, though, was a different thing. Doctors knew that many people were likely to recover without streptomycin, and that they often made full recoveries. For such patients, side effects mattered very much. And since lots of people recovered from tuberculosis without any help, it was difficult to determine how much difference the drug made. The Medical Research Council described its actions as though there was only one possible course of action: a well-organised trial.
Today, that would be true. In 1946 it was propaganda. The streptomycin could easily have been sold on the open market, or distributed evenly for doctors to use as they saw fit. Those were the normal processes; that was how America was behaving. Part of the reason that the MRC took such care to make it seem that a trial was the only way forward was because it needed to persuade people who still found the idea of trials repugnant. It was not just trying to establish the effects of a new drug; it was trying to set a precedent.
The MRC’s trial of streptomycin became iconic, it seemed the start of a revolution.1 It was not a revolution that the wider community of doctors had looked for. They were still too reliant on their self-confidence, too little affected by the small amount of statistical theory they knew. The Medical Research Council, however, found itself in a position to lead doctors into behaving as it thought best. Since a 1911 Act of Parliament, the care of tuberculous patients had been put in the hands of local authorities. That centralised the control of therapies to a helpful degree. Now this drug was being imported in a limited amount, all of which was going to be controlled by the MRC. If doctors wanted to get hold of it for their patients, they needed to do as they were told. There was going to be no room for those who decided they did not believe in control groups, or thought they possessed the skill to predict the drug’s actions without participating in a reliable experiment. The council’s committee was perfectly familiar with the fact that many tuberculous therapies turned out to be useless or harmful. It knew about Sanocrysin, and understood its lesson.
Between January and September 1947, 109 patients were recruited into the MRC’s trial of strept
omycin for pulmonary tuberculosis. Using a prearranged list of random allocations, fifty-five patients were assigned to get streptomycin and bed rest. Another fifty-two got what was believed to be the best alternative treatment – bed-rest alone in a specialist TB sanatorium. Two others died within a week of joining the trial, before they could be allocated to either group.
As much as possible, in order to stop people’s expectations affecting the outcomes, the existence of the trial was kept secret. Neither the patients receiving the streptomycin nor those getting the bed-rest were told of what was going on. For the latter, no fake drug was administered. The MRC knew that this was a potential problem, and understood that the placebo effect of being hospitalised might not be the same as that of being hospitalised and injected. Streptomycin, however, was given in four painful injections a day, not into a vein but deep into a muscle, and those injections were repeated for months. It would have made the groups more similar to inject water into the controls, but it was agreed that the difference was unlikely to matter enough to inflict this on people.
After six months of the trial, doctors found that there were still tubercle bacilli in the majority of patients, whether they were getting the streptomycin or not. More encouragingly, though, death rates differed markedly between the two groups of patients. Fourteen of the controls were dead; only four of those on the antibiotic. ‘The difference between the two series is statistically significant,’ reported the trialists. They explained their calculation. The probability of the difference having occurred by chance alone was less than one in a hundred.
The eventual effects of streptomycin turned out to be reasonably complicated. Tuberculosis, even in an individual patient, quickly grew drug resistant. That meant that although the benefits were real, the drug actually performed badly if given by itself as a cure for the disease. (In the six months after the trial finished, nine more of the control patients died – but this time the number of deaths in the streptomycin group was eight, no longer a significant difference.)
In some ways, the MRC trial of streptomycin was less innovative than it appeared. Ideas about using controls and randomisation were spreading, and not just in medicine. Their successful use was going to come; this just happened to be the trial that brought it in sooner rather than later. It was not quite even the first medical trial to make full use of this new technique of randomisation. Another MRC trial, also involving Hill, was under way on whooping cough vaccines, but the streptomycin trial was the first one to be finished and reported.
Nor was the bureaucracy that made the trial possible entirely new. The increasing power of central groups of doctors and other researchers had been growing for a while. Industrialisation, the cost of drugs, the emergence of state-run health care services, all of these things combined to make it easier for the MRC to have its way. ‘As in numerous other contemporary projects,’ commented Richard Doll, ‘such as the wartime penicillin studies in the United States, central control enabled researchers to follow their methodological predispositions.’ And that made all the difference.
Towards the end of his life, Austin Bradford Hill revealed the reason that his book and articles spoke only about alternate allocation. His initial omission of the concept of randomisation was straightforward, and deliberate. The talk of allocating a patient randomly, Hill believed at the time, was simply too much for doctors to cope with. He suspected it would scare them off completely, at least until a trial demonstrating its usefulness could be forced upon them. ‘There were too few physicians,’ wrote Doll, ‘leave alone surgeons, who were willing to expose their theories to cold scientific investigation.’
A shortage of streptomycin, and a concentration of power, made the MRC’s trend-setting trial possible. But it also came about because of a small number of people were convinced that statistical methods were the only way of telling pharmacology from fantasy. Their persuasiveness meant that medicine based on reliable evidence – not rumour, impression and intuition – began to take hold. That, more than streptomycin, was the most successful blow struck against tuberculosis. Other new antibiotics were soon developed, and they were tested and retested until combinations were found that showed the greatest likelihood of destroying the bug and leaving the patient unharmed.
Tuberculosis used to be called the Captain of the Men of Death. It seemed that way. It was to be seen everywhere, striking people down at will: the old and the young, the strong and the feeble, the rich and poor. The traditional treatments were useless. Koch did something tremendous when he discovered the tubercle bacillus, revealing it to the world and to the astonished Paul Ehrlich. The development of the randomised controlled trial, less heralded, was more important.
Statistics has remained a generally unpopular discipline, unappealing by virtue of its technicalities and its sums. It can be a difficult field, easy to get lost in and vulnerable to mistakes and misunderstandings as well as deliberate misuse. The value of it, however, is real. Numbers might be cold and critical, but the benefits they offer can be warm, compassionate and humane.
Particularly in journalism, statistics are used in such meaningless ways that people are encouraged to see them all as worthless and even dishonest. Numerators without denominators, inappropriate comparisons, a general fuzziness about where a number comes from or what it actually means: all of these things devalue statistics. Certain questions about the world, however, require a numerical approach. In medicine, where many of these questions arise, arriving at a wrong answer means people suffer and die – and the alternative to counting is relying on a guess.
* * *
1A much abused word. Originally it meant an upheaval that returned the world to the way it was to begin with, the way it was supposed to be, to an Eden before the Fall. Here, for once, the original sense is accurate. Throughout history, medicine was supposed to be providing treatments that helped people. For almost all of that time it was failing to do so. Now the MRC was going to make absolutely sure that it understood streptomycin well enough to use it only for good. Medicine was going to be returned, not to what it had actually been, but to what it was always meant to be.
17 Ethics and a Glimpse of the Future
WHEN IT PUBLISHED the results of the streptomycin trial in 1948, the Medical Research Council was on the offensive. The earlier results of the drug in America, it noted, were promising but inconclusive. So many people got better from TB anyway that it was difficult to tell if a new drug worked. The previous history of drug treatment for the disease, it pointed out, was catastrophic – ‘The exaggerated claims made for gold treatment [Sanocrysin], persisting over 15 years, provide a spectacular example.’
Something that troubled the MRC team, and that it suspected would trouble its readers in the British Medical Journal, was the fact that it chose not to treat some patients. The existence of a control group was controversial. To an extent, the sheer limitations of the streptomycin supply provided a neat excuse. The team could fall back on claiming that there was not enough of the drug to go around anyway. Given the situation, said Hill, ‘it would have been unethical not to have seized the opportunity to design a strictly controlled trial which could speedily and effectively reveal the value of the treatment’.
The fact that it had only fifty kilos of streptomycin was true enough, but this was also dodging the issue, and the team knew it. Here was a radically effective new way of exploring whether a treatment worked. The MRC wanted to encourage other trials of the same nature, whether its members were testing drugs that were in limited or even in plentiful supply. Untreated controls were an essential part of their method.
What did the doctors involved really think about deliberately not treating half of their patients? They could have argued, using the example of Sanocrysin, that streptomycin was as likely to be harmful as to be helpful. On that basis, not treating people was as ethically reasonable as treating them. During the trial, however, a senior MRC doctor fell ill with pulmonary tuberculosis. If his colleagues really felt unsure abo
ut the qualities of the new drug, they could have encouraged him to enter the trial, like the other patients, and to take his chances. Instead, the MRC arranged for their doctor to receive streptomycin. They treated him outside the trial, so as not to bias its results with the entry of a patient who had no wish to be randomised, but they treated him all the same.
There was a general feeling that the drug was likely to help. The doctors behaved towards their patients in one manner and towards themselves in another.
On Christmas Eve 1947, Eric Arthur Blair was admitted to hospital with pulmonary tuberculosis. Since meeting Cochrane during the Spanish Civil War, Blair had become famous under his adopted name of George Orwell. Two months after his admission, in the midst of a Scottish February, he wrote to his publisher about his doctor’s plans to give him ‘some new American drug called streptomycin’.
It took some doing, since Orwell was not part of the MRC trial. It involved buying the drug directly in America, with dollars earned from Orwell’s Animal Farm. Even then there were problems: the US export restrictions, and the British Board of Trade. Orwell, though, was well connected. His publishers were influential people, and Aneurin Bevan, the Labour Minister for Health, was a previous editor of his journalism. The drug soon arrived. Orwell became the first man in Scotland to get it. To begin with, it made him feel better. Then things changed: