by David Healy
In cases like these, the word “works” becomes ambiguous. When a life is saved, we know where we stand and all observers can agree on what has been done. When there is an obvious and immediate benefit, such as the effects of Viagra on a penis, or the sedative effect of a sleeping pill or anesthetic, we can all agree these effects are present and many of us feel we can make up our own minds as to whether we want such an effect. But few of the currently best-selling drugs in medicine have benefits as obvious as these.
An early indicator of this new world into which medicine began moving in the blockbuster era comes from a 1982 English study in which Sanjeebit Jachuk and colleagues looked at the perceptions of seventy-five doctors, seventy-five patients, and seventy-five relatives of the effects of propranolol, the beta-blocker for which James Black won a Nobel Prize. All the doctors surveyed reported propranolol was an effective antihypertensive: they saw the column of mercury in the blood pressure apparatus falling from visit to visit when they examined the patients, which was what they were hoping to see. The patients split down the middle in their responses to the drug they were taking: half reported benefits, half reported problems. Aside from thinking the drug was working because their doctor was happy, it's difficult to know why the patients reported benefits: raised blood pressure (hypertension) has no symptoms, so no one will have felt better on that account. But the investigators also consulted relatives, and all bar one of these reported that treatment was causing more problems than benefits—patients were now either complaining of treatment side effects or the process of diagnosis had made them hypochondriacal.16
So who—doctor, patient, or relative—was right? Reducing blood pressure can save lives, by reducing the likelihood of heart attacks and strokes, although statistically it may require hundreds of patients to be treated to save a life compared to people not on the drug. Companies don't collect data on outcomes like quality of life, novel side effects or relatives' impressions of benefits, however. When we are not in a position to make up our own mind about a benefit on the basis of seeing people get up off their death bed and walk or seeing the obvious effects of a hypnotic or Viagra, we become more dependent on the interpretation and judgment of our doctors, who in turn have become ever more dependent on pharmaceutical companies to interpret the effects of the drugs they produce. At the heart of those drug-company interpretations lies their use of Fisher's second innovation, the idea of statistical significance, a technique used to hypnotize doctors into focusing only on the figures that suit companies.
HYPNOTIZING DOCTORS
Fisher was an unlikely ally for a pharmaceutical company. He was a skeptic. His basic approach was to assume a new fertilizer didn't work. This is called the null hypothesis. It was only when the yield from plots fertilized with the new agent beat the plots with no fertilizer in nineteen cases out of twenty that he thought we can rule out the play of chance in the results and should concede that the new agent plays some role. When the yield from the fertilized plot is greater nineteen times out of twenty, by Fisher's determination, the result is said to be “statistically significant.” All this means is that the higher yield of the fertilized fields is unlikely to be due to chance. When applied to a drug and a placebo, a meaningless difference may be significant in this sense—but calling it significant leads most people to assume that there is in fact a substantive difference.
Ironically, statistical significance was a side issue for Fisher, for whom the key issue was whether we could design good experiments or not, ones that would yield similar results time after time. As he put it, “No isolated experiment, however significant in itself, can suffice for the experimental demonstration of any phenomenon…. In relation to the test of significance, we may say that a phenomenon is experimentally demonstrable when we know how to conduct an experiment that will rarely fail to give us a statistically significant result.”17 Perhaps because they can design conclusive experiments, branches of science such as physics and chemistry rarely use the concept of statistical significance.
The idea of significance testing was picked up from the 1950s onward primarily in sociology, psychology, economics, and medicine, disciplines where designing conclusive experiments is much harder because of the complexities involved. It may have triumphed in these arenas because it appeared to offer scientific rigor,18 The procedure creates the impression that the scientists have been forced to stand back and let the testing procedure objectively bring out what the data demonstrate. This can send a powerful signal in arenas that are heavily contested—but the signal is a rhetorical maneuver rather than something that in fact does guarantee objectivity. In many instances significance testing in these sciences has become a mechanical exercise that substitutes for thought. Experiments are considered good, it seems, if they throw up “significant” findings, even if the findings are trivial and cannot be reproduced.
When drug trials throw up a “significant” finding on cholesterol levels or bone densities, say, companies rush out a story that their drug “works,” even though in 50 percent of the trials they've run, the drug may not beat the placebo, or there may be more dead bodies in the treatment group than in the placebo group. Companies can do this in part because regulators in the United States such as the FDA, and in Europe, have established an incredibly low bar for a drug to be allowed on the market: only two trials with statistically significant positive results are needed to let a pharmaceutical company put a drug on the market, even though there might be up to ninety-eight negative studies. Given that Fisher expected that five in a hundred studies might be positive by chance, this turns the idea of statistical significance inside out. Of the studies done with the antidepressants, for instance, 50 percent show no benefit for the drug compared with the placebo. Fisher almost certainly would have thought that those investigating such drugs simply did not know what they were doing scientifically.
Significance testing also explains how companies are able to get away with claims that treatments work even when more people die who are given the drug in a trial than those given a placebo. Trials will be set up so that findings of lowered cholesterol levels with treatment, for example, will be statistically significant while the increase in dead bodies may not be. Doctors, like many others not well versed in mathematics, clutch at the illusory certainty offered by findings that are statistically significant, even if these are trivial, on the grounds that these results could not have arisen by chance. Fascinated with significance, they also tend mentally to dispose of any evidence of accumulating harms brought on by the same treatments, as we shall see in chapter 7, by denying they exist—on the basis that findings that are not statistically significant could have arisen by chance. They are hypnotized.
There has always been a deep-rooted tension between medical care and medical business. Good medical care once firmly embraced the idea that every remedy was a potential poison that inevitably produced side effects—the trick lay in knowing how and to whom to administer this poison in order to bring about a benefit that warranted these side effects. But insofar as they are looking after their business interests, medical practitioners have always been inclined to embrace new drugs, and most doctors want more of them if they can be convinced, or can convince themselves, that they have some positive effect. This produces a bias against seeing when these very same drugs shorten lives. In the era of evidence-based medicine, the marketing barrage of the pharmaceutical companies and the promise of statistical significance have led doctors into a world in which they regard treatments more as fertilizers or vitamins that can only do good if applied widely. As a profession, medicine is thereby losing any sense of the treatment as poison, and controlled trials, which began as a method to protect patients from the biases of doctors, have become instead a method to enhance business in great part because drug companies have managed to hook doctors to the crack pipe of statistical significance.19
TAMING CHANCE
Because it was fatal in miniscule doses, strychnine was a favorite of poisoners. In 185
2, Pierre Touery claimed that activated charcoal could act as an antidote to strychnine, but his medical colleagues were not convinced. To prove his point, Touery, in front of a large audience on the floor of the French Academy of Medicine ingested some coal tar and then drank ten times the lethal dose of strychnine—without any ill effects. Nobody is likely to think now that we need a randomized controlled trial to make a convincing case for using coal tar in a situation like this. Neither Worcester nor Cabot nor any of their colleagues in Massachusetts pressed for controlled trials for diphtheria antitoxin after its introduction in the 1890s, when the garroting membranes the illness produced could be seen to dissolve almost immediately after the treatment was administered.
When treatments are unambiguously effective, we don't need a controlled trial to tell us so. But randomized trials have become such a fetish within medicine now that it's become the source of parody. A 2003 British Medical Journal article, for example, suggested that we should not be using parachutes because their efficacy hadn't yet been demonstrated in a placebo-controlled trial.20
The perverse humor here extends into almost every encounter between doctors and patients today. If tomorrow our doctor suggested putting us on a treatment that he said had not been shown to work in a controlled trial but that he had seen work with his own eyes, most people would likely refuse. In contrast, we would likely be reassured if he suggested he was only going to treat us with drugs that had been shown by controlled trials to work. We would be even more reassured if he told us that thousands of people had been entered into these trials, when almost by definition the greater the number of people needed in a trial, the more closely the treatment resembles snake oil—which contains omega-3 fatty acids and can be shown in controlled trials to have benefits if sufficiently large numbers of people are recruited.
How has it happened that our understanding of trials has in some sense been turned inside out? The story starts with the discovery of the antibiotics. The first real magic bullet that made the rest of modern therapeutics possible came in 1935 with the discovery of the sulfa drugs.21 Before the antibiotics were discovered a range of bacterial infections leading to conditions such as bacterial endocarditis (an infection of the lining of the heart), puerperal sepsis (an infection of mothers after childbirth), and septicemia (an infection of the blood stream) were commonly fatal. Sulfanilamide and, later, penicillin transformed this picture. Patients who were expected to die got up off their bed and walked out of the hospital. Neither doctors nor regulators required trials to show these drugs worked—the only onus on companies was to establish that their treatment was safe.
Wonderful though drugs like penicillin were, they failed when it came to the infection that terrified people at the time more than any other—tuberculosis—and it was this failure that led directly to the first randomized controlled trial in medicine. Part of the problem was that tuberculosis was a more chronic and insidious infection than bacterial endocarditis or puerperal infections, so it was harder to tell when the treatment was working. Where other infections came on dramatically and either cleared up or killed quickly, tuberculosis crept in on its victims, who might then have good spells and bad spells. Even sputum samples clear of the bacterium did not provide a foolproof answer to the state of the illness.
Every new treatment devised at the time was tested against tuberculosis—even the first antidepressants and antipsychotics—many of which worked in the test tube but not in patients. There had been endless claims for cures which had been shown repeatedly to be hollow in clinical care. So when Merck developed a novel class of antibiotic in 1945, of which streptomycin was the prototype, skepticism was called for. Austin Bradford Hill, who became the statistician at Britain's Medical Research Council (MRC) because tuberculosis had ruled him out of a career in medicine, suggested a controlled trial of the new drug using Fisher's ideas about randomization.
There were concerns about the ethics of what Hill was proposing. If the drug in fact turned out to be effective, were those who got the placebo being denied critical care? It was one thing to allow unfertilized plants in an agricultural field to languish but treatment trials had never left sick people untreated before. As it turned out, streptomycin wasn't as effective as penicillin for bacterial endocarditis, but it couldn't be said the drug didn't work for tuberculosis. The trial demonstrated that streptomycin made a difference to the patient clinically, reducing the amount of tubercle bacillus growing in sputum, and the number of tubercular holes visible on X-ray, and more patients on streptomycin survived. Running a randomized controlled trial in this case brought the effects of streptomycin into view in a way that would not have happened otherwise.
In the 1950s, even in the case of such clearly effective drugs as penicillin, the antipsychotic chlorpromazine, and the amphetamines, which, once developed, swept around the world, crossed frontiers and language barriers with little or no need for marketing, trials offered something useful. While there was no question penicillin worked on some bacteria, it was clear it did not work on all. And while chlorpromazine tranquilized, there could be real questions about which patients would most benefit from it. Similarly, the amphetamines obviously increased alertness and suppressed appetite, but did they make a difference in a medical condition? When controlled trials were conducted, it turned out that amphetamines were of benefit for narcolepsy, a condition where people abruptly fall asleep sometimes in mid-conversation, and possibly produced benefits in some neurotic states but did surprisingly little for severe depressions.
In assuming treatments don't work, controlled trials challenge therapeutic enthusiasm. Because surgery is such a clear physical stress to any body, many supposed that pre- and post-operative treatment with beta-blockers such as propranolol, which counter the effects of stress hormones on heart rate and blood pressure, could only be a good thing. This was so logical, it had become standard practice. But when the proper study was finally done, it was found that there were more deaths in the beta-blocker group.22 Similarly, it seemed obvious that treating the anemia that develops in the wake of renal failure would be helpful and likely prolong life expectancy, but when the first randomized trial was undertaken more patients died on the high-cost treatment (Aranesp) to relieve anemia than died in the placebo group.23
Demonstrations that treatments don't work typically come in trials sponsored by national health organizations or other not-for-profit research institutions rather than company trials. But there are exceptions. Given strong suggestions that anti-inflammatory drugs might help Alzheimer's disease, Merck decided to pursue a jackpot by investing millions of dollars in a trial to see if Vioxx would reduce the incidence of or slow the rate of progression of dementia. In fact on Vioxx (and later
Celebrex), more patients developed Alzheimer's, the disease progressed more rapidly, and more patients died than on the placebo.24
If all medicines are poisons, an outcome like this is not surprising. Simply recognizing that biology is complex highlights the risk of intervening and the need to test our assumptions and practices, no matter how benign the rationale for a particular approach might sound. An insistence on testing is exactly the spirit that gave rise to randomized controlled trials. They began as a means to control therapeutic enthusiasm, whether this enthusiasm came from the good intentions of physicians or from the greed of hucksters. What is there, then, about these trials that make companies so interested?
MIND THE GAP
In between treatments that are so obviously life-saving that trials are not needed and proposed remedies where trials save lives by demonstrating that the treatment doesn't work, there is the huge gap in which we have treatments that ease pain or restore function or promise some other benefit, even if a modest one. In the case of treatments that do not necessarily save lives but which equally cannot be dismissed as doing nothing, we are in much less certain waters than is usually realized. Controlled trials in these instances function primarily to bring to light both positive and negative associations betwe
en treatment and changes on a blood test or rating scale. It is in these waters that pharmaceutical companies have become adept at turning the evidence to their advantage.
Imagine an orthopedic department starting a trial on plaster casts for fractures of the left leg. As their placebo treatment they opted to have a cast put on the necks of the control group but in the active treatment group they randomly put casts on the right arm or leg, or left arm or leg of the patients, all of whom had broken left legs. The active treatment group in this case would do statistically significantly better than the placebo group but to advocate treating left leg fractures by indiscriminately putting a cast on any of four limbs on the basis that a randomized controlled trial had clearly shown this had worked would be nonsensical.25 Medicine in thrall to randomized controlled trials increasingly lets companies get away with just this, partly because an artful use of rating scales or blood tests conceals the fact that we don't know what we are doing. When we do know what is wrong the absurdity of simply practicing according to the figures becomes clear.
The plaster-cast example is not much more extreme than what in fact did happen in the case of the antidepressants.
When companies or their academics say today that a drug “works” what is commonly meant is that there is at least a minimal difference that is “statistically significant” between the effects of an active drug and a placebo on a blood test or rating scale. Evidence like this rather than evidence of lives saved or function restored, is all that the regulators need to let the drug on the market. Once approved for the market, the drug, be it for osteoporosis, cholesterol regulation, depression, or hypertension, is sold as though using it is the equivalent of being given penicillin or insulin. The problem is that increasingly, under the influence of company spin as to what the figures show, clinicians seem to prescribe drugs like the statins or antidepressants as though a failure to prescribe would leave them as open to a charge of clinical negligence as failing to prescribe insulin or penicillin would. The magic for companies lies in the fact that the numbers of patients recruited to the trials can be such that changes in rating scale scores or bone densities are statistically significant, whereas increased rates of death or other serious adverse events on treatment may not be.