by Ben Goldacre
So when our homeopathy fan says that homeopathic treatment makes them feel better, we might reply: ‘I accept that, but perhaps your improvement is because of the placebo effect,’ and they cannot answer ‘No,’ because they have no possible way of knowing whether they got better through the placebo effect or not. They cannot tell. The most they can do is restate, in response to your query, their original statement: ‘All I know is, I feel as if it works. I get better when I take homeopathy.’
Next, you might say: ‘OK, I accept that, but perhaps, also, you feel you’re getting better because of ‘regression to the mean’.’ This is just one of the many ‘cognitive illusions’ described in this book, the basic flaws in our reasoning apparatus which lead us to see patterns and connections in the world around us, when closer inspection reveals that in fact there are none.
‘Regression to the mean’ is basically another phrase for the phenomenon whereby, as alternative therapists like to say, all things have a natural cycle. Let’s say you have back pain. It comes and goes. You have good days and bad days, good weeks and bad weeks. When it’s at its very worst, it’s going to get better, because that’s the way things are with your back pain.
Similarly, many illnesses have what is called a ‘natural history’: they are bad, and then they get better. As Voltaire said: ‘The art of medicine consists in amusing the patient while nature cures the disease.’ Let’s say you have a cold. It’s going to get better after a few days, but at the moment you feel miserable. It’s quite natural that when your symptoms are at their very worst, you will do things to try to get better. You might take a homeopathic remedy. You might sacrifice a goat and dangle its entrails around your neck. You might bully your GP into giving you antibiotics. (I’ve listed these in order of increasing ridiculousness.)
Then, when you get better—as you surely will from a cold—you will naturally assume that whatever you did when your symptoms were at their worst must be the reason for your recovery. Post hoc ergo propter hoc, and all that. Every time you get a cold from now on, you’ll be back at your GP, hassling her for antibiotics, and she’ll be saying, ‘Look, I don’t think this is a very good idea,’ but you’ll insist, because they worked last time, and community antibiotic resistance will increase, and ultimately old ladies die of MRSA because of this kind of irrationality, but that’s another story.*
≡ General practitioners sometimes prescribe antibiotics to demanding patients in exasperation, even though they are ineffective in treating a viral cold, but much research suggests that this is counterproductive, even as a time-saver. In one study, prescribing antibiotics rather than giving advice on self-management for sore throat resulted in an increased overall workload through repeat attendance. It was calculated that if a GP prescribed antibiotics for sore throat to one hundred fewer patients each year, thirty-three fewer would believe that antibiotics were effective, twenty five fewer would intend to consult with the problem in the future, and ten fewer would come back within the next year. If you were an alternative therapist, or a drug salesman, you could turn those figures on their head and look at how to drum up more trade, not less.
You can look at regression to the mean more mathematically, if you prefer. On Bruce Forsyth’s Play Your Cards Right, when Brucey puts a 3 on the board, the audience all shout, ‘Higher!’ because they know the odds are that the next card is going to be higher than a 3. ‘Do you want to go higher or lower than a jack? Higher? Higher?’ ‘Lower!’
An even more extreme version of ‘regression to the mean’ is what Americans call the Sports Illustrated jinx. Whenever a sportsman appears on the cover of Sports Illustrated, goes the story, he is soon to fall from grace. But to get on the cover of the magazine you have to be at the absolute top of your game, one of the best sportsmen in the world; and to be the best in that week, you’re probably also having an unusual run of luck. Luck, or ‘noise’, generally passes, it ‘regresses to the mean’ by itself, as happens with throws of a die. If you fail to understand that, you start looking for another cause for that regression, and you find…the Sports Illustrated jinx.
Homeopaths increase the odds of a perceived success in their treatments even further by talking about ‘aggravations’, explaining that sometimes the correct remedy can make symptoms get worse before they get better, and claiming that this is part of the treatment process. Similarly, people flogging detox will often say that their remedies might make you feel worse at first, as the toxins are extruded from your body: under the terms of these promises, literally anything that happens to you after a treatment is proof of the therapist’s clinical acumen and prescribing skill.
So we could go back to our homeopathy fan, and say: ‘You feel you get better, I accept that. But perhaps it is because of ‘regression to the mean’, or simply the ‘natural history’ of the disease.’ Again, they cannot say ‘No’ (or at least not with any meaning—they might say it in a tantrum), because they have no possible way of knowing whether they were going to get better anyway, on the occasions when they apparently got better after seeing a homeopath. ‘Regression to the mean’ might well be the true explanation for their return to health. They simply cannot tell. They can only restate, again, their original statement: ‘All I know is, I feel as if it works. I get better when I take homeopathy.’
That may be as far as they want to go. But when someone goes further, and says, ‘Homeopathy works,’ or mutters about ‘science’, then that’s a problem. We cannot simply decide such things on the basis of one individual’s experiences, for the reasons described above: they might be mistaking the placebo effect for a real effect, or mistaking a chance finding for a real one. Even if we had one genuine, unambiguous and astonishing case of a person getting better from terminal cancer, we’d still be careful about using that one person’s experience, because sometimes, entirely by chance, miracles really do happen. Sometimes, but not very often.
Over the course of many years, a team of Australian oncologists followed 2,337 terminal cancer patients in palliative care.
They died, on average, after five months. But around 1 per cent of them were still alive after five years. In January 2006 this study was reported in the Independent, bafflingly, as:
‘Miracle’ Cures Shown to Work
Doctors have found statistical evidence that alternative treatments such as special diets, herbal potions and faith healing can cure apparently terminal illness, but they remain unsure about the reasons.
But the point of the study was specifically not that there are miracle cures (it didn’t look at any such treatments, that was an invention by the newspaper). Instead, it showed something much more interesting: that amazing things simply happen sometimes: people can survive, despite all the odds, for no apparent reason. As the researchers made clear in their own description, claims for miracle cures should be treated with caution, because ‘miracles’ occur routinely, in 1 per cent of cases by their definition, and without any specific intervention. The lesson of this paper is that we cannot reason from one individual’s experience, or even that of a handful, selected out to make a point.
So how do we move on? The answer is that we take lots of individuals, a sample of patients who represent the people we hope to treat, with all of their individual experiences, and count them all up. This is clinical academic medical research, in a nutshell, and there’s really nothing more to it than that: no mystery, no ‘different paradigm’, no smoke and mirrors. It’s an entirely transparent process, and this one idea has probably saved more lives, on a more spectacular scale, than any other idea you will come across this year.
It is also not a new idea. The first trial appears in the Old Testament, and interestingly, although nutritionism has only recently become what we might call the ‘bollocks du jour’, it was about food. Daniel was arguing with King Nebuchadnezzar’s chief eunuch over the Judaean captives’ rations. Their diet was rich food and wine, but Daniel wanted his own soldiers to be given only vegetables. The eunuch was worried that they w
ould become worse soldiers if they didn’t eat their rich meals, and that whatever could be done to a eunuch to make his life worse might be done to him. Daniel, on the other hand, was willing to compromise, so he suggested the first ever clinical trial:
And Daniel said unto the guard…’Submit us to this test for ten days. Give us only vegetables to eat and water to drink; then compare our looks with those of the young men who have lived on the food assigned by the King and be guided in your treatment of us by what you see.’
The guard listened to what they said and tested them for ten days. At the end of ten days they looked healthier and were better nourished than all the young men who had lived on the food assigned them by the King. So the guard took away the assignment of food and the wine the) were to drink and gave them only the vegetables.
Daniel 1:1-16.
To an extent, that’s all there is to it: there’s nothing particularly mysterious about a trial, and if we wanted to see whether homeopathy pills work, we could do a very similar trial. Let’s flesh it out. We would take, say, two hundred people going to a homeopathy clinic, divide them randomly into two groups, and let them go through the whole process of seeing the homeopath, being diagnosed, and getting their prescription for whatever the homeopath wants to give them. But at the last minute, without their knowledge, we would switch half of the patients’ homeopathic sugar pills, giving them dud sugar pills, that have not been magically potentised by homeopathy. Then, at an appropriate time later, we could measure how many in each group got better.
Speaking with homeopaths, I have encountered a great deal of angst about the idea of measuring, as if this was somehow not a transparent process, as if it forces a square peg into a round hole, because ‘measuring’ sounds scientific and mathematical. We should pause for just a moment and think about this clearly. Measuring involves no mystery, and no special devices. We ask people if they feel better, and count up the answers.
In a trial—or sometimes routinely in outpatients’ clinic—we might ask people to measure their knee pain on a scale of one to ten every day, in a diary. Or to count up the number of pain-free days in a week. Or to measure the effect their fatigue has had on their life that week: how many days they’ve been able to get out of the house, how far they’ve been able to walk, how much housework they’ve been able to do. You can ask about any number of very simple, transparent, and often quite subjective things, because the business of medicine is improving lives, and ameliorating distress.
We might dress the process up a bit, to standardise it, and allow our results to be compared more easily with other research (which is a good thing, as it helps us to get a broader understanding of a condition and its treatment). We might use the ‘General Health Questionnaire’, for example, because it’s a standardised ‘tool’; but for all the bluster, the ‘GHQ-12’, as it is known, is just a simple list of questions about your life and your symptoms.
If anti-authoritarian rhetoric is your thing, then bear this in mind: perpetrating a placebo-controlled trial of an accepted treatment—whether it’s an alternative therapy or any form of medicine—is an inherently subversive act. You undermine false certainty, and you deprive doctors, patients and therapists of treatments which previously pleased them.
There is a long history of upset being caused by trials, in medicine as much as anywhere, and all kinds of people will mount all kinds of defences against them. Archie Cochrane, one of the grandfathers of evidence-based medicine, once amusingly described how different groups of surgeons were each earnestly contending that their treatment for cancer was the most effective: it was transparently obvious to them all that their own treatment was the best. Cochrane went so far as to bring a collection of them together in a room, so that they could witness each other’s dogged but conflicting certainty, in his efforts to persuade them of the need for trials. Judges, similarly, can be highly resistant to the notion of trialling different forms of sentence for heroin users, believing that they know best in each individual case. These are recent battles, and they are in no sense unique to the world of homeopathy.
So, we take our group of people coming out of a homeopathy clinic, we switch half their pills for placebo pills, and we measure who gets better. That’s a placebo-controlled trial of homeopathy pills, and this is not a hypothetical discussion: these trials have been done on homeopathy, and it seems that overall, homeopathy does no better than placebo.
And yet you will have heard homeopaths say that there are positive trials in homeopathy; you may even have seen specific ones quoted. What’s going on here? The answer is fascinating, and takes us right to the heart of evidence-based medicine. There are some trials which find homeopathy to perform better than placebo, but only some, and they are, in general, trials with ‘methodological flaws’. This sounds technical, but all it means is that there are problems in the way the trials were performed, and those problems are so great that they mean the trials are less ‘fair tests’ of a treatment.
The alternative therapy literature is certainly riddled with incompetence, but flaws in trials are actually very common throughout medicine. In fact, it would be fair to say that all research has some ‘flaws’, simply because every trial will involve a compromise between what would be ideal, and what is practical or cheap. (The literature from complementary and alternative medicine—CAM—often fails badly at the stage of interpretation: medics sometimes know if they’re quoting duff papers, and describe the flaws, whereas homeopaths tend to be uncritical of anything positive.)
That is why it’s important that research is always published, in full, with its methods and results available for scrutiny. This is a recurring theme in this book, and it’s important, because when people make claims based upon their research, we need to be able to decide for ourselves how big the ‘methodological flaws’ were, and come to our own judgement about whether the results are reliable, whether theirs was a ‘fair test’. The things that stop a trial from being fair are, once you know about them, blindingly obvious.
Blinding
One important feature of a good trial is that neither the experimenters nor the patients know if they got the homeopathy sugar pill or the simple placebo sugar pill, because we want to be sure that any difference we measure is the result of the difference between the pills, and not of people’s expectations or biases. If the researchers knew which of their beloved patients were having the real and which the placebo pills, they might give the game away—or it might change their assessment of the patient—consciously or unconsciously.
Let’s say I’m doing a study on a medical pill designed to reduce high blood pressure. I know which of my patients are having the expensive new blood pressure pill, and which are having the placebo. One of the people on the swanky new blood pressure pills comes in and has a blood pressure reading that is way off the scale, much higher than I would have expected, especially since they’re on this expensive new drug. So I recheck their blood pressure, ‘just to make sure I didn’t make a mistake’. The next result is more normal, so I write that one down, and ignore the high one.
Blood pressure readings are an inexact technique, like ECG interpretation, X-ray interpretation, pain scores, and many other measurements that are routinely used in clinical trials. I go for lunch, entirely unaware that I am calmly and quietly polluting the data, destroying the study, producing inaccurate evidence, and therefore, ultimately, killing people (because our greatest mistake would be to forget that data is used for serious decisions in the very real world, and bad information causes suffering and death).
There are several good examples from recent medical history where a failure to ensure adequate ‘blinding’, as it is called, has resulted in the entire medical profession being mistaken about which was the better treatment. We had no way of knowing whether keyhole surgery was better than open surgery, for example, until a group of surgeons from Sheffield came along and did a very theatrical trial, in which bandages and decorative fake blood squirts were used, to make sure
that nobody could tell which type of operation anyone had received.
Some of the biggest figures in evidence-based medicine got together and did a review of blinding in all kinds of trials of medical drugs, and found that trials with inadequate blinding exaggerated the benefits of the treatments being studied by 17 per cent. Blinding is not some obscure piece of nitpicking, idiosyncratic to pedants like me, used to attack alternative therapies.
Closer to home for homeopathy, a review of trials of acupuncture for back pain showed that the studies which were properly blinded showed a tiny benefit for acupuncture, which was not ‘statistically significant’ (we’ll come back to what that means later). Meanwhile, the trials which were not blinded—the ones where the patients knew whether they were in the treatment group or not—showed a massive, statistically significant benefit for acupuncture. (The placebo control for acupuncture, in case you’re wondering, is sham acupuncture, with fake needles, or needles in the ‘wrong’ places, although an amusing complication is that sometimes one school of acupuncturists will claim that another school’s sham needle locations are actually their genuine ones.)
So, as we can see, blinding is important, and not every trial is necessarily any good. You can’t just say, ‘Here’s a trial that shows this treatment works,’ because there are good trials, or ‘fair tests’, and there are bad trials. When doctors and scientists say that a study was methodologically flawed and unreliable, it’s not because they’re being mean, or trying to maintain the ‘hegemony’, or to keep the backhanders coming from the pharmaceutical industry: it’s because the study was poorly performed—it costs nothing to blind properly—and simply wasn’t a fair test.