Bad Science Page 4 Read online free by Ben Goldacre

Bad Science Page 4

Simes immediately recognised – as I hope you will too – that the question of whether one form of cancer treatment is better than another was small fry compared to the depth charge he was about to set off in the medical literature. Everything we thought we knew about whether treatments worked or not was probably distorted, to an extent that might be hard to measure, but that would certainly have a major impact on patient care. We were seeing the positive results, and missing the negative ones. There was one clear thing we should do about this: start a registry of all clinical trials, demand that people register their study before they start, and insist that they publish the results at the end.

That was 1986. Since then, a generation later, we have done very badly. In this book, I promise I won’t overwhelm you with data. But at the same time, I don’t want any drug company, or government regulator, or professional body, or anyone who doubts this whole story, to have any room to wriggle. So I’ll now go through all the evidence on missing trials, as briefly as possible, showing the main approaches that have been used. All of what you are about to read comes from the most current systematic reviews on the subject, so you can be sure that it is a fair and unbiased summary of the results.

One research approach is to get all the trials that a medicines regulator has record of, from the very early ones done for the purposes of getting a licence for a new drug, and then check to see if they all appear in the academic literature. That’s the method we saw used in the paper mentioned above, where researchers sought out every paper on twelve antidepressants, and found that a 50/50 split of positive and negative results turned into forty-eight positive papers and just three negative ones. This method has been used extensively in several different areas of medicine:

Lee and colleagues, for example, looked for all of the 909 trials submitted alongside marketing applications for all ninety new drugs that came onto the market from 2001 to 2002: they found that 66 per cent of the trials with significant results were published, compared with only 36 per cent of the rest.18

Melander, in 2003, looked for all forty-two trials on five antidepressants that were submitted to the Swedish drug regulator in the process of getting a marketing authorisation: all twenty-one studies with significant results were published; only 81 per cent of those finding no benefit were published.19

Rising et al., in 2008, found more of those distorted write-ups that we’ll be dissecting later: they looked for all trials on two years’ worth of approved drugs. In the FDA’s summary of the results, once those could be found, there were 164 trials. Those with favourable outcomes were a full four times more likely to be published in academic papers than those with negative outcomes. On top of that, four of the trials with negative outcomes changed, once they appeared in the academic literature, to favour the drug.20

If you prefer, you can look at conference presentations: a huge amount of research gets presented at conferences, but our current best estimate is that only about half of it ever appears in the academic literature.21 Studies presented only at conferences are almost impossible to find, or cite, and are especially hard to assess, because so little information is available on the specific methods used in the research (often as little as a paragraph). And as you will see shortly, not every trial is a fair test of a treatment. Some can be biased by design, so these details matter.

The most recent systematic review of studies looking at what happens to conference papers was done in 2010, and it found thirty separate studies looking at whether negative conference presentations – in fields as diverse as anaesthetics, cystic fibrosis, oncology, and A&E – disappear before becoming fully-fledged academic papers.22 Overwhelmingly, unflattering results are much more likely to go missing.

If you’re very lucky, you can track down a list of trials whose existence was publicly recorded before they were started, perhaps on a register that was set up to explore that very question. From the pharmaceutical industry, up until very recently, you’d be very lucky to find such a list in the public domain. For publicly-funded research the story is a little different, and here we start to learn a new lesson: although the vast majority of trials are conducted by the industry, with the result that they set the tone for the community, this phenomenon is not limited to the commercial sector.

By 1997 there were already four studies in a systematic review on this approach. They found that studies with significant results were two and a half times more likely to get published than those without.23

A paper from 1998 looked at all trials from two groups of triallists sponsored by the US National Institutes of Health over the preceding ten years, and found, again, that studies with significant results were more likely to be published.24

Another looked at drug trials notified to the Finnish National Agency, and found that 47 per cent of the positive results were published, but only 11 per cent of the negative ones.25

Another looked at all the trials that had passed through the pharmacy department of an eye hospital since 1963: 93 per cent of the significant results were published, but only 70 per cent of the negative ones.26

The point being made in this blizzard of data is simple: this is not an under-researched area; the evidence has been with us for a long time, and it is neither contradictory nor ambiguous.

Two French studies in 2005 and 2006 took a new approach: they went to ethics committees, and got lists of all the studies they had approved, and then found out from the investigators whether the trials had produced positive or negative results, before finally tracking down the published academic papers.27 The first study found that significant results were twice as likely to be published; the second that they were four times as likely. In Britain, two researchers sent a questionnaire to all the lead investigators on 101 projects paid for by NHS R&D: it’s not industry research, but it’s worth noting anyway. This produced an unusual result: there was no statistically significant difference in the publication rates of positive and negative papers.28

But it’s not enough simply to list studies. Systematically taking all the evidence that we have so far, what do we see overall?

It’s not ideal to lump every study of this type together in one giant spreadsheet, to produce a summary figure on publication bias, because they are all very different, in different fields, with different methods. This is a concern in many meta-analyses (though it shouldn’t be overstated: if there are lots of trials comparing one treatment against placebo, say, and they’re all using the same outcome measurement, then you might be fine just lumping them all in together).

But you can reasonably put some of these studies together in groups. The most current systematic review on publication bias, from 2010, from which the examples above are taken, draws together the evidence from various fields.29 Twelve comparable studies follow up conference presentations, and taken together they find that a study with a significant finding is 1.62 times more likely to be published. For the four studies taking lists of trials from before they started, overall, significant results were 2.4 times more likely to be published. Those are our best estimates of the scale of the problem. They are current, and they are damning.

All of this missing data is not simply an abstract academic matter: in the real world of medicine, published evidence is used to make treatment decisions. This problem goes to the core of everything that doctors do, so it’s worth considering in some detail what impact it has on medical practice. Firstly, as we saw in the case of reboxetine, doctors and patients are misled about the effects of the medicines they use, and can end up making decisions that cause avoidable suffering, or even death. We might also choose unnecessarily expensive treatments, having been misled into thinking they are more effective than cheaper older drugs. This wastes money, ultimately depriving patients of other treatments, since funding for health care is never infinite.

It’s also worth being clear that this data is withheld from everyone in medicine, from top to bottom. NICE, for example, is the National Institute for Health and Clinical Excellence
, created by the British government to conduct careful, unbiased summaries of all the evidence on new treatments. It is unable either to identify or to access data that has been withheld by researchers or companies on a drug’s effectiveness: NICE has no more legal right to that data than you or I do, even though it is making decisions about effectiveness, and cost-effectiveness, on behalf of the NHS, for millions of people. In fact, as we shall see, the MHRA and EMA (the European Medicines Agency) – the regulators that decide which drugs can go on the market in the UK – often have access to this information, but do not share it with the public, with doctors, or with NICE. This is an extraordinary and perverse situation.

So, while doctors are kept in the dark, patients are exposed to inferior treatments, ineffective treatments, unnecessary treatments, and unnecessarily expensive treatments that are no better than cheap ones; governments pay for unnecessarily expensive treatments, and mop up the cost of harms created by inadequate or harmful treatment; and individual participants in trials, such as those in the TGN1412 study, are exposed to terrifying, life-threatening ordeals, resulting in lifelong scars, again quite unnecessarily.

At the same time, the whole of the research project in medicine is retarded, as vital negative results are held back from those who could use them. This affects everyone, but it is especially egregious in the world of ‘orphan diseases’, medical problems that affect only small numbers of patients, because these corners of medicine are already short of resources, and are neglected by the research departments of most drug companies, since the opportunities for revenue are thinner. People working on orphan diseases will often research existing drugs that have been tried and failed in other conditions, but that have theoretical potential for the orphan disease. If the data from earlier work on these drugs in other diseases is missing, then the job of researching them for the orphan disease is both harder and more dangerous: perhaps they have already been shown to have benefits or effects that would help accelerate research; perhaps they have already been shown to be actively harmful when used on other diseases, and there are important safety signals that would help protect future research participants from harm. Nobody can tell you.

Finally, and perhaps most shamefully, when we allow unflattering data to go unpublished, we betray the patients who participated in these studies: the people who have given their bodies, and sometimes their lives, in the implicit belief that they are doing something to create new knowledge, that will benefit others in the same position as them in the future. In fact, their belief is not implicit: often it’s exactly what we tell them, as researchers, and it is a lie, because the data might be withheld, and we know it.

So whose fault is this?

Why do negative trials disappear?

In a moment we will see more clear cases of drug companies withholding data – in stories where we can identify individuals – sometimes with the assistance of regulators. When we get to these, I hope your rage might swell. But first, it’s worth taking a moment to recognise that publication bias occurs outside commercial drug development, and in completely unrelated fields of academia, where people are motivated only by reputation, and their own personal interests.

In many respects, after all, publication bias is a very human process. If you’ve done a study and it didn’t have an exciting, positive result, then you might wrongly conclude that your experiment isn’t very interesting to other researchers. There’s also the issue of incentives: academics are often measured, rather unhelpfully, by crude metrics like the numbers of citations for their papers, and the number of ‘high-impact’ studies they get into glamorous well-read journals. If negative findings are harder to publish in bigger journals, and less likely to be cited by other academics, then the incentives to work at disseminating them are lower. With a positive finding, meanwhile, you get a sense of discovering something new. Everyone around you is excited, because your results are exceptional.

One clear illustration of this problem came in 2010. A mainstream American psychology researcher called Daryl Bem published a competent academic paper, in a well-respected journal, showing evidence of precognition, the ability to see into the future.* These studies were well-designed, and the findings were statistically significant, but many people weren’t very convinced, for the same reasons you aren’t: if humans really could see into the future, we’d probably know about it already; and extraordinary claims require extraordinary evidence, rather than one-off findings.

But in fact the study has been replicated, though Bem’s positive results have not been. At least two groups of academics have rerun several of Bem’s experiments, using the exact same methods, and both found no evidence of precognition. One group submitted their negative results to the Journal of Personality and Social Psychology – the very same journal that published Bem’s paper in 2010 – and that journal rejected their paper out of hand. The editor even came right out and said it: we never publish studies that replicate other work.

Here we see the same problem as in medicine: positive findings are more likely to be published than negative ones. Every now and then, a freak positive result is published showing, for example, that people can see into the future. Who knows how many psychologists have tried, over the years, to find evidence of psychic powers, running elaborate, time-consuming experiments, on dozens of subjects – maybe hundreds – and then found no evidence that such powers exist? Any scientist trying to publish such a ‘So what?’ finding would struggle to get a journal to take it seriously, at the best of times. Even with the clear target of Bem’s paper on precognition, which was widely covered in serious newspapers across Europe and the USA, the academic journal with a proven recent interest in the question of precognition simply refused to publish a paper with a negative result. Yet replicating these findings was key – Bem himself said so in his paper – so keeping track of the negative replications is vital too.

People working in real labs will tell you that sometimes an experiment can fail to produce a positive result many times before the outcome you’re hoping for appears. What does that mean? Sometimes the failures will be the result of legitimate technical problems; but sometimes they will be vitally important statistical context, perhaps even calling the main finding of the research into question. Many research findings, remember, are not absolute black-and-white outcomes, but fragile statistical correlations. Under our current system, most of this contextual information about failure is just brushed under the carpet, and this has huge ramifications for the cost of replicating research, in ways that are not immediately obvious. For example, researchers failing to replicate an initial finding may not know if they’ve failed because the original result was an overstated fluke, or because they’ve made some kind of mistake in their methods. In fact, the cost of proving that a finding was wrong is vastly greater than the cost of making it in the first place, because you need to run the experiment many more times to prove the absence of a finding, simply because of the way that the statistics of detecting weak effects work; and you also need to be absolutely certain that you’ve excluded all technical problems, to avoid getting egg on your face if your replication turns out to have been inadequate. These barriers to refutation may partly explain why it’s so easy to get away with publishing findings that ultimately turn out to be wrong.30

Publication bias is not just a problem in the more abstract corners of psychology research. In 2012 a group of researchers reported in the journal Nature how they tried to replicate fifty-three early laboratory studies of promising targets for cancer treatments: forty-seven of the fifty-three could not be replicated.31 This study has serious implications for the development of new drugs in medicine, because such unreplicable findings are not simply an abstract academic issue: researchers build theories on the back of them, trust that they’re valid, and investigate the same idea using other methods. If they are simply being led down the garden path, chasing up fluke errors, then huge amounts of research money and effort are being wasted, and the discovery of new m
edical treatments is being seriously retarded.

The authors of the study were clear on both the cause of and the solution for this problem. Fluke findings, they explained, are often more likely to be submitted to journals – and more likely to be published – than boring, negative ones. We should give more incentives to academics for publishing negative results; but we should also give them more opportunity.

This means changing the behaviour of academic journals, and here we are faced with a problem. Although they are usually academics themselves, journal editors have their own interests and agendas, and have more in common with everyday journalists and newspaper editors than some of them might wish to admit, as the episode of the precognition experiment above illustrates very clearly. Whether journals like this are a sensible model for communicating research at all is a hotly debated subject in academia, but this is the current situation. Journals are the gatekeepers, they make decisions on what’s relevant and interesting for their audience, and they compete for readers.

This can lead them to behave in ways that don’t reflect the best interests of science, because an individual journal’s desire to provide colourful content might conflict with the collective need to provide a comprehensive picture of the evidence. In newspaper journalism, there is a well-known aphorism: ‘When a dog bites a man, that’s not news; but when a man bites a dog…’ These judgements on newsworthiness in mainstream media have even been demonstrated quantitatively. One study in 2003, for example, looked at the BBC’s health news coverage over several months, and calculated how many people had to die from a given cause for one story to appear. 8,571 people died from smoking for each story about smoking; but there were three stories for every death from new variant CJD, or ‘mad cow disease’.32 Another, in 1992, looked at print-media coverage of drug deaths, and found that you needed 265 deaths from paracetamol poisoning for one story about such a death to appear in a paper; but every death from MDMA received, on average, one piece of news coverage.33

‹ Prev Next ›