Pharmageddon
Page 29
The main way hazards may be hidden right under the noses of doctors centers around Ronald Fisher's efforts in the 1930s to investigate whether fertilizers worked or not, as outlined in chapter 3. Fisher, quite arbitrarily, set up a standard—if on nineteen occasions out of twenty plants in a fertilized patch did better than those in an otherwise equal but nonfertilized patch, this was said to be statistically significant and the fertilizer could be said to be effective. If the result came out indicating that the fertilized patch did better only seventeen or eighteen times out of twenty, in Fisher's view, proper scientists should conclude they either had not designed the experiment correctly or they were dealing simply with the play of chance.
Soon after Fisher outlined his view of statistical significance, Jerzy Neyman, another early statistician, argued Fisher was throwing common sense out the window and offering a recipe for scientific sterility, not scientific progress. For most people, having a drug prove its worth in a study in which thousands of patients participate sounds more impressive than demonstrating a benefit in a handful of people. But as was discussed in chapter 3, the drug to go for is the one that consistently shows up as working in a small sample. Snake oil, which contains omega-3 and other fatty acids, could be shown to have statistically significant effects on depression rating scales or for pain relief purposes and possibly for some other conditions as well—provided several hundred patients are recruited to the trial. As Neyman was first to point out, we can be fooled by the fact that having hundreds of patients in a study can make a trivial finding statistically significant. Fooled to the extent that these hints of effectiveness can be sold by drug companies as convincing evidence their drug works and should all but be put in the drinking water.
Neyman's beef with Fisher becomes even clearer if we turn to drug hazards. To see this, let us move from fertilized fields to a hazard analogue, such as being faced with a loaded gun. If Fisher had said that it was only when there were bullets in nineteen chambers of a twentychambered gun that the gun could be said to be loaded this would make no sense to any of us. In the experiment that is real life, a lot of us wouldn't go near a gun with even one bullet in its twenty chambers. One bullet is significant enough for most of us. Fisher's approach makes sense in situations where skepticism is called for—as when faced with claims by a huckster or a company trying to make money out of a remedy for sick and vulnerable people. It might indeed be reasonable in this case to suspend skepticism if a treatment “worked” nineteen times out of twenty. But it makes no sense for doctors or patients to be skeptical when hazards are involved—only pharmaceutical companies have an interest in being skeptical about the existence of hazards.
But when it comes to drug hazards, pharmaceutical companies can now confidently bank on the medical community and the FDA following Fisher, not Neyman. The studies of Prozac, and later of other antidepressants, threw up double the number of suicidal acts than were found in patients on placebo, but the numbers involved were not statistically significant. Merck was happy to report over four times more heart attacks on Vioxx than on placebo, again because the numbers were not statistically significant. For Fisher, Merck, and Lilly, this essentially meant the suicides and heart attacks were not happening, whereas for Neyman the onus was on Merck and Lilly to show their gun wasn't loaded.
Following Fisher and the FDA, when it comes to a drug-induced injury, doctors now routinely line up to say the bullet through the head while on Zoloft or the heart attack while on Vioxx were just anecdotal events, and while anecdotes are regrettable, physicians have to deal in science, and if a finding is not statistically significant, the science doesn't support the existence of a risk. The doctors who line up in this way are not the rapidly vanishing doctors of yesteryear sticking to a regulatory line on things they know little or nothing about. Instead, they tend to be the most recently qualified doctors who, coached in the conventions of evidence-based medicine, are trained to follow Fisher and not Neyman when it comes to hazards.25 In this respect, we are all at greater risk from tomorrow's doctors than from yesterday's.
These claims may seem incredible, so it is perhaps worth taking a little time to view the emperor from a few different angles before deciding whether he is really wearing any clothes or not. When the first concerns about Prozac made headlines in 1990 Lilly, understandably, rushed to defend their blockbuster. The company examined their clinical trials, which included over three thousand patients, for suicidal acts and published the results of their analysis in the British Medical Journal, claiming it showed no increase in risk on Prozac compared to placebo.26 But smack in the middle of the article are the figures—six suicidal acts in 1,765 patients on Prozac versus one in 569 patients on placebo.
These figures indicate that while there is a chance there is no risk from Prozac, there is an equal chance that there is up to a sixteen-fold increase in the risk, and that the best guess is that there is roughly a doubling of risk on Prozac.27 But according to Fisher's test there was no significant difference between placebo and Prozac on this score. The conclusion the company drew from this, fully endorsed by the British Medical Journal with no subsequent objections recorded from any of the 100,000 readers of the journal was that “data from these trials do not show that fluoxetine is associated with an increased risk of suicidal acts or emergence of substantial suicidal thoughts among depressed patients.”28 The six suicidal acts simply vanished.
A twelve-year-old schoolchild could have told the British Medical Journal that a doubling of risk is an increase. Lots of the one hundred thousand readers of the British Medical Journal work in fields that have nothing to do with the pharmaceutical industry. Even in areas of medicine working closely with that industry, many of the brightest and the best have no conflicts of interest, so missing this doubling of risk cannot be put down to conflict of interest. Missing the problem in this instance cannot be put down to lack of access to the data either—the doubling of risk that the company denied was hidden like a boat in a harbor by being put in one of the most widely read journals in the world. We seem to have entered an Alice in Wonderland world, in which things are whatever the Red Queen says they are.
Emboldened by the complete lack of objections to their depression paper, Lilly went on to analyze the suicidal acts from a series of trials they had done with Prozac in eating disorders. In these cases, any increase in risk could not be attributed to depression. Again there was an excess of suicidal acts on Prozac—a 1.4-fold increase in risk but because this increase was not deemed statistically significant, it too apparently didn't exist.29
When the FDA convened their public hearing in September 1991, there was room on the program for presentations from the public and for a presentation of the “science.” In the three-minute slots they were given, many wives and mothers offered convincing testimony of how husbands and children had been prescribed the drug for anxiety, weight loss, or smoking cessation—conditions not linked to suicide—and, having apparently never been suicidal before, had gone on to commit suicide. FDA officials and experts present acknowledged that the testimonies were striking but made it clear they had to go by what the scientific data showed.30
The scientific data from their trials were presented by Lilly. None of the experts convened by the FDA, nor any of the regulatory officials speaking that day, nor any of those whose tales of horror were weighed on a scientific balance against the clinical trial data and found wanting appeared to notice that, actually, the trial data of increased suicides was entirely consistent with the personal tragedies.
But there is skullduggery to add to mystery here and a case for saying lack of access to the data also played a part in Lilly's getting away with their gamble. Unlike the readers of the British Medical Journal, the FDA had had a chance to see the real figures from the Prozac depression studies and plenty of opportunity to come to grips with the fact that there had in fact been NO suicidal acts on placebo—the real figures were six suicidal acts on Prozac versus zero on placebo. Technically, the ri
sk on Prozac was infinitely greater than on placebo. Lilly had taken a suicide that had happened before the trial had started and filed it under the heading of placebo, in a manner that contravenes regulations and seems close to being fraudulent. These data are now in the public domain31—what is not public is any account from the FDA, or other regulators worldwide faced with the same data, as to why they chose to overlook this clearly inappropriate manipulation.
This single placebo suicide was very important to the company, and maybe to the FDA, because its addition to the calculation meant the increased risk on Prozac would not be statistically significant and if not statistically significant six suicidal acts on Prozac vanished. The company knew that medics across the board could be depended upon to agree with this view of significance.
Following Lilly's lead, GlaxoSmithKline in the case of Paxil and Pfizer in the case of Zoloft also took prestudy suicides and suicidal acts from Paxil and Zoloft trials, respectively—seven in the case of Paxil and three in the case of Zoloft—and dumped them into the placebo group, against regulatory rules. The FDA noted what was happening but did nothing, and again has offered no explanation since for their oversight.
Quite astonishingly, when the British regulator (the MHRA—Medicines and Healthcare products Regulatory Agency) caught up with this maneuver thirteen years later, in 2003, and asked GlaxoSmithKline for their Paxil suicide data, making it clear companies should not move prestudy suicides into the placebo group as they had been doing, GSK instead took suicides recorded after the trials of Paxil had concluded and coded them under placebo, even including in the placebo group a patient who had committed suicide after having started Prozac. The MHRA did not object.
When Vioxx ran into trouble with the FDA in the late 1990s, Merck did something similar. In a major trial comparing Vioxx to the older drug naproxen in the treatment of osteoarthritis (the VIGOR trial), the company reported seventeen heart attacks in 4,047 patients taking Vioxx versus four in 4,029 patients taking naproxen.32 This more than fourfold increase in heart attacks could be made to vanish by breaking the patients up into groups and then reporting that when compared to naproxen, Vioxx did not increase the risk of a heart attack for patients without a previous history of cardiovascular disease. Creating groups such as those with previous history of cardiovascular disease and those without reduces the size of any one group compared to the overall group, thereby reducing the risk that a finding will be statistically significant.
In fact there had been three further heart attacks on Vioxx, making twenty in total; this only came to light when plaintiffs later took legal actions against Merck. The company had to make these three heart attacks disappear or they would have had to report that Vioxx significantly increased the risk of heart attack for all patients in the VIGOR trial, including those without previous cardiac problems.33
For most readers, the invented placebo suicides and suicidal acts in the Prozac, Paxil, and Zoloft trials and the three deleted heart attacks in the Vioxx trials are likely to seem to be the major issue. These manipulations are both a real problem and possibly a criminal offence, but they are a lesser problem than the invisibility of the seventeen Vioxx heart attacks and six Prozac suicidal acts. Lesser in the sense that companies only occasionally have to move a dead body or two from an inconvenient spot in the dataset to a less problematic one, but at the click of a statistical button they hide far more deaths in academic articles that remain permanently in full public view.
Explaining why Fisher's ideas have such traction within medicine is not easy. Regulators have followed this line because the definition of statistical significance offers them a rule of thumb, an almost mechanical procedure that takes the place of a judgment call. For pharmaceutical companies, the issue is simple; Fisher's ideas mean that positive effects in a minority of clinical trials can transform a weak and inessential drug into one apparently certified by science, while at the same time airbrushing its hazards out of existence.
But why do doctors follow this line? A number of medical academics have attempted to grapple with this, pointing out that current dependence on statistical significance testing has created a “junk epidemiology” in the domain of therapeutics. So Louis Lasagna, the first professor of clinical pharmacology in the United States, and later dean of medicine at Tufts University, who introduced randomized controlled trials into drug development, described the approach outlined above as “p-value madness” (to say something has a p-value less than 0.05 is another way to say a finding is statistically significant).34 For Sandor Greenland, professor of epidemiology and statistics at UCLA, “statistical thinking [of this type] has produced a chronic psychosis”35—by which he means that researchers relying on Fisher's ideas have lost touch with reality. Ezra Hauer, a professor of civil engineering from the University of Toronto and authority on analyzing road traffic accidents, explains that “in this manner good data are drained of real content. The direction of the empirical conclusions is reversed and ordinary human and scientific reasoning is turned on its head.”36 For Charlie Poole, professor of statistics and epidemiology at the University of North Carolina, “Statistical significance should be abandoned immediately and universally.”37 Ironically when faced with this issue in 2011, with investors' dollars at stake, the US Supreme Court argued that statistical significance cannot be the arbiter of what an investor might deem a significant risk.38 But patients it seems do not have the same rights as investors.
The differences between much of clinical practice and other branches of science were starkly framed by Kenneth Rothman, professor of epidemiology at Harvard and editor of the journal Epidemiology, in a note about submissions to the journal:
When writing for Epidemiology, you can…enhance your prospects if you omit tests of statistical significance…. We would like to see the interpretation of a study based not on statistical significance, or lack of it, for one or more study variables, but rather a careful quantitative consideration of the data in the light of competing explanations…. Misleading signals occur when a trivial effect is found to be “significant,” as often happens in large studies, or when a strong relation is found “non-significant,” as often happens in small studies.39
Is Poole's term “junk epidemiology” too strong a term to use? Consider an analysis published in 2002 of the trials of Prozac, Paxil, Zoloft, Efexor, and other antidepressants submitted to the FDA by companies seeking to market their treatments for anxiety disorders.40 To get into these studies patients had to be anxious and not depressed. The perception at the time was that anxiety had a minimal effect on the risk of suicide. Having combined the data for all these drugs, the authors announced a surprising finding, which the American Journal of Psychiatry was glad to accept. “We found that suicide risk among patients with anxiety disorders is higher than in the general population by a factor of 10 or more. Such a finding was unexpected…. The sample of patients selected was considered at minimal risk of suicide.”41 Nobody reading the American Journal of Psychiatry registered an objection to this conclusion, which was odd, given that there were eleven suicides in 12,914 anxious patients on active treatment compared with zero suicides in 3,875 patients taking placebo. The conclusion should have been that in this group of patients at minimal risk of suicide, the increased risk that came from antidepressants was more clearly visible than it had been in depression trials.
The same group of investigators looked at suicides and suicidal acts as reported in depression studies on the effects of the post-Prozac group of antidepressants, involving almost fifty thousand patients. The percentage of suicidal acts on antidepressants ranged from 0.15 to 0.20 percent, while on placebo it was 0.10 percent. The findings were not considered statistically significant, leading the authors to make the astounding statement that “the only possible conclusion supported by the present data is that prescription of SSRI antidepressants is not associated with a greater risk of completed suicide.”42
In fact the situation was much worse. These investigators we
re not aware of ten suicides and suicidal acts miscoded by Glaxo and Pfizer under the heading of placebo. Taking these into account made the difference between antidepressants and placebo statistically significant, so that the only possible conclusion that the present data did not support was the conclusion that there was no increase in risk.
Where one might expect that evidence of only bare hints that a drug might work would not be sufficient to get a drug approved for market, while a doubling or tripling of a hazard on treatment would lead to warnings, this is not what happens. Where one might expect the burden of proving that black was really white would face companies rather than patients, it instead faces patients and any doctors who wish to look closely. This is the point where the world is turned upside down, where the baby is thrown out while the bathwater is carefully preserved.
Paul Anthony, speaking on behalf of the Pharmaceutical Manufacturers Association of America, makes the issues very clear:
I want you to think about it [the issue of hazards] in terms of your reputation. It is really the reputation of a brand that is being signaled. Imagine…someone reporting that they had early information that you may be a child molester. I know that sounds extreme but it is that type of thing…. It is just an allegation…[however] that is what people will remember, and that is the reason there is a lot of concern about presenting early signal information—when you don't really have any proof. It is very different than the kind of rigorous process we had in the past, where you had to do a trial and it had to be statistically significant before you presented that.43
One of the greatest ironies of our evidence-based medicine era is that, faced with evidence of an increased number of adverse events in clinical trial data for its drug, the ultimate company defense has been to fall back on anecdotes. The most astonishing example of this mode of defense followed concerns beginning in the mid- 1990s that the antipsychotic Zyprexa might cause diabetes. Faced with growing concerns, Lilly wheeled out a view from Henry Maudsley, who in 1879 had supposedly recognized an association between psychosis and diabetes.44 Lilly authored or sponsored several series of articles that were liberally sprinkled with this quote as grounds to think it was schizophrenia after all, and not Zyprexa, that was causing the problem—with not a single published objection from the psychiatric profession in the United States or Europe.