by Ben Goldacre
But the researchers of this paper also found a mystery: looking at the Scottish Health Survey, they found no increase in the prevalence of depression; and looking at the GP consultations dataset, they again found no evidence that people were presenting more frequently to their GP with depression, or that GPs were making more diagnoses of depression.
So why were antidepressant prescriptions going up? This puzzle received some kind of explanation in 2009. The BMJ paper above found the same increase in the number of prescriptions that the journalists reported this week, as I said. But they had access to more data: their analysis didn’t just look at the total number of prescriptions in the country, or even the total number of people diagnosed with depression: it also looked at the prescription records of individual patients, in a dataset of over three million patients’ electronic health records (with 200,000 people who experienced a first diagnosis of depression during this period).
They found that the rise in the overall number of antidepressant prescriptions was not due to increasing numbers of patients receiving antidepressants. It was almost entirely caused by one thing: a small increase in the small proportion of those patients who received treatment for longer periods of time. Numerically, people receiving treatment for long periods make up the biggest chunk of all the prescriptions written, so this small shift bumped up the overall numbers hugely.
I don’t know for certain if that phenomenon explains the increase in prescriptions from 2006 to 2010, as it does for the period 2000 to 2005 (although, in the absence of work examining that question, since the increase in scripts was so similar, it does seem fairly likely). And I’m not expecting journalists to go to academic research databases to conduct large, complex, descriptive studies.
But if they are going to engage in primary research, and make dramatic causal claims – as they have done in this story – to the nation, then they could also, usefully, read through the proper work that’s already been done, and consider alternative explanations for the numbers they’ve found.
Confound You!
Guardian, 5 August 2011
Fox News was excited: ‘Unplanned children develop more slowly, study finds’. The Telegraph was equally shrill in its headline: ‘IVF children have bigger vocabulary than unplanned children’. And the British Medical Journal press release drove it all: ‘Children born after an unwanted pregnancy are slower to develop’.
The last two, at least, made a good effort to explain that this effect disappeared when the researchers accounted for social and demographic factors. But was there ever any point in reporting the raw finding, from before this correction was made?
I will now demonstrate, with a nerdy table illustration, how you correct for things such as social and demographic factors. You’ll have to pay attention, because this is a tricky concept; but at the end, when the mystery is gone, you will see why reporting the unadjusted figures as the finding, especially in a headline, is simply wrong.
Correcting for an extra factor is best understood by doing something called ‘stratification’. Imagine you do a study, and you find that people who drink are three times more likely to get lung cancer than people who don’t. The results are in Table 1. Your odds of getting lung cancer as a drinker are 0.16 (that’s 366 ÷ 2,300). Your odds as a non-drinker are 0.05. So your odds of getting lung cancer are three times higher as a drinker (0.16 ÷ .05 is roughly 3, and that figure is called the ‘odds ratio’) – as in Table 1 below.
But then some clever person comes along and says: Wait, maybe this whole finding is confounded by the fact that drinkers are more likely to smoke cigarettes. That could be an alternative explanation for the apparent relationship between drinking and lung cancer. So you want to factor smoking out.
The way to do this is to chop your data in half, and analyse non-smokers and smokers separately. So you take only the people who smoke, and compare drinkers against non-drinkers; then you take only the people who don’t smoke, and compare drinkers against non-drinkers in that group separately. You can see the results of this in the second and third tables.
Now your findings are a bit weird. Suddenly, since you’ve split the data up by whether people are smokers or not, drinkers and non-drinkers have exactly the same odds of getting lung cancer. The apparent effect of drinking has been eradicated, and this means that the observed risk of drinking was entirely due to smoking: smokers had a higher chance of lung cancer – in fact their odds were 0.3 rather than 0.03, ten times higher – and drinkers were more likely to also be smokers. Looking at the figures in these tables, 203 out of 1,954 non-drinkers smoked, whereas 1,430 out of 2,666 drinkers smoked.
I explained all this with a theoretical example, where the odds of cancer apparently trebled before correction for smoking. Why didn’t I just use the data from the unplanned pregnancies paper? Because in the real world of research, you’re often correcting for lots of things at once. In the case of this BMJ paper, the researchers corrected for parents’ socioeconomic position and qualifications, sex of child, age, language spoken at home, and a huge list of other factors.
When you’re correcting for so many things, you can’t use old-fashioned stratification, as I did in this simple example, because you’d be dividing your data up among so many smaller tables that some would have no people in them at all. That’s why you calculate your adjusted figures using cleverer methods, such as logistic regression1 and likelihood theory. But it all comes down to the same thing. In our example above, alcohol wasn’t really associated with lung cancer. And in this BMJ paper, unplanned pregnancy wasn’t really associated with slower development. Pretending otherwise is just silly.
Bicycle Helmets and the Law1
Ben Goldacre and David Spiegelhalter, British Medical Journal, 12 June 2013
We have both spent a large part of our working lives discussing statistics and risk with the general public. We both dread questions about bicycle helmets. The arguments are often heated and personal; but they also illustrate some of the most fascinating challenges for epidemiology, risk communication and evidence-based policy.
With regard to the use of bicycle helmets, science broadly tries to answer two main questions. At a societal level, ‘What is the effect of a public health policy that requires or promotes helmets?’ and at an individual level, ‘What is the effect of wearing a helmet?’ Both questions are methodologically challenging and contentious.
The linked paper by Dennis and colleagues (doi:10.1136/bmj.f2674) investigates the policy question and concludes that the effect of Canadian helmet legislation on hospital admission for cycling head injuries ‘seems to have been minimal’. Other ecological studies have come to different conclusions, but the current study has somewhat superior methodology – controlling for background trends and modelling head injuries as a proportion of all cycling injuries.
This finding of ‘no benefit’ is superficially hard to reconcile with case-control studies, many of which have shown that people wearing helmets are less likely to have a head injury. Such findings suggest that, for individuals, helmets confer a benefit. These studies, however, are vulnerable to many methodological shortcomings. If the controls are cyclists presenting with other injuries in the emergency department, then analyses are conditional on having an accident and therefore assume that wearing a helmet does not change the overall accident risk. There are also confounding variables that are generally unmeasured and perhaps even unmeasurable. People who choose to wear bicycle helmets will probably be different from those who ride without a helmet: they may be more cautious, for example, and so less likely to have a serious head injury, regardless of their helmets.
People who are forced by legislation to wear a bicycle helmet, meanwhile, may be different again. Firstly, they may not wear the helmet correctly, seeking only to comply with the law and avoid a fine. Secondly, their behaviour may change as a consequence of wearing a helmet through ‘risk compensation’, a phenomenon that has been documented in many fields. One study – albeit with a sing
le author and subject – suggests that drivers give larger clearance to cyclists without a helmet.
Even if helmets do have an effect on head-injury rates, it would not necessarily follow that legislation would have public health benefits overall. This is because of ‘second-round’ effects, such as changes in cycling rates, which may affect individual and population health. Modelling studies have generally concluded that regular cyclists live longer because the health effects of cycling far outweigh the risk of crashes. This trade-off depends crucially, however, on the absolute risk of an accident: any true reduction in the relative risk of head injury will have a greater impact where crashes are more common, such as for children.
The impact on all-cause mortality, and on head injuries, may be even further complicated if such legislation has varying effects on different groups. For example, a recent study identified two broad subpopulations of cyclist: ‘one speed-happy group that cycle fast and have lots of cycle equipment including helmets, and one traditional kind of cyclist without much equipment, cycling slowly’. The study concluded that compulsory cycle-helmet legislation may selectively reduce cycling in the second group. There are even more complex second-round effects if each individual cyclist’s safety is improved by increased cyclist density through ‘safety in numbers’, a phenomenon known as Smeed’s law. Statistical models for the overall impact of helmet habits are therefore inevitably complex and based on speculative assumptions. This complexity seems at odds with the current official BMA policy, which confidently calls for compulsory helmet legislation.
Standing over all this methodological complexity is a layer of politics, culture and psychology. Supporters of helmets often tell vivid stories about someone they knew, or heard of, who was apparently saved from severe head injury by a helmet. Risks and benefits may be exaggerated or discounted depending on the emotional response to the idea of a helmet. For others, this is an explicitly political matter, where an emphasis on helmets reflects a seductively individualistic approach to risk management (or even ‘victim blaming’), while the real gains lie elsewhere. It is certainly true that in many countries, such as Denmark and the Netherlands, cyclists have low injury rates, even though rates of cycling are high and almost no cyclists wear helmets. This seems to be achieved through interventions such as good infrastructure, stronger legislation to protect cyclists, and a culture of cycling as a popular, routine, non-sporty, non-risky behaviour.
In any case, the current uncertainty about any benefit from helmet wearing or promotion is unlikely to be substantially reduced by further research. Equally, we can be certain that helmets will continue to be debated, and at length. The enduring popularity of helmets as a proposed major intervention for increased road safety may therefore lie not with their direct benefits – which seem too modest to capture compared with other strategies – but more with the cultural, psychological and political aspects of popular debate around risk.
Screen Test
Guardian, 12 January 2008
So we’re all going to get screened for our health problems, by some businessmen who’ve bought a CT scanner and put an advert in the paper maybe, or perhaps by Gordon Brown: because screening saves lives, data is good, and it’s always better to do something rather than nothing.
Unfortunately, it’s a tiny bit more complicated than that.
Screening is a fascinating area, mainly because of the maths of rare events, but also because of the ethics. Screening isn’t harmless, as tests – inevitably – aren’t perfect. You might get a false alarm, causing stress and anxiety (‘the worst time in my life’ said women in one survey on breast screening). Or you might have to endure more invasive medical investigations to follow up the early warning: even something as innocuous as a biopsy can sometimes result in harmful adverse events, and if you do a lot of those, unnecessarily, in a population, then you’re hurting people, sometimes more than you’re helping. Lastly, people might get false reassurance from a false negative result, and ignore other niggles, which can in turn delay the diagnosis of genuine problems.
Then, there are the interesting ethical issues. One of the proposed screening programmes is intended to catch abdominal aortic aneurysms earlier. An AAA is a swelling of the main blood-vessel trunk in your belly: they can rupture without much warning, and when they do, people often die fast and frighteningly. But if you know the AAA is there, and do the repair operation at your leisure before it ruptures, then survival is far better. Screening and repairing have been shown to reduce mortality by around 40 per cent, looking at the whole population, which is a good thing.
But remember, you will operate on some people – as a preventive measure, because you picked up their aneurysm on screening – who would never have died from their aneurysm: it would have just ticked away quietly, not rupturing. And some of the people you operate on unnecessarily (and remember, there’s no crystal ball to identify these people) will die of complications on the operating table. They only died because of your screening programme. It saves lives overall, but Fred Bloggs – loving husband of Winona Bloggs – who would have lived, is now dead, thanks to you.
That’s Vegas, you could say. But it’s tricky, and the sums are often close. For example, mammogram screening for breast cancer every two years has been estimated to prevent two deaths per thousand women aged fifty to fifty-nine over ten years: that is good. But achieving this requires 5,000 screenings among those thousand women, resulting in 242 recalls, and sixty-four women having at least one biopsy. Five women will have cancer detected and treated. Again, this isn’t an argument against screening, we’re just walking through some example numbers.
Although, interestingly, that’s not something everybody is keen to do with screening. People in healthcare can be zealots, and enthusiasts, and we can often project our own values and preferences onto everyone else. Researchers have studied the invitation letters sent out for screening programmes, along with the websites and pamphlets, and they have repeatedly been shown to be biased in favour of participation, and lacking in information.
Where figures are given, they generally use the most dramatic and uninformative way of expressing the benefits: the ‘relative risk reduction’ is given, the same statistical form that journalists prefer – for example, ‘a 30 per cent reduction in deaths from breast cancer’ – rather than a more informative figure like the ‘number needed to screen’ – say, ‘two lives saved for every thousand women scanned’. Sometimes the leaflets even contain borderline porkies, like this one from Ontario: ‘There has been a 26 per cent increase in breast cancer cases in the last ten years,’ it said, in scary and misleading tones. This was roughly the level of over-diagnosis caused by screening over the preceding ten years during which the screening programme itself had been operating.
These problems with clear information raise interesting questions around informed consent, although seductive letters do increase uptake, and so save lives. It’s tricky: on the one hand, you end up sounding like a redneck who doesn’t trust the gub’mint, because screening programmes are often valuable. On the other hand, you want people to be allowed to make informed choices.
And the amazing thing is, in at least one large survey of five hundred people, even when presented with the harsh realities of the tests, people made what many would still think are the right decisions. Thirty-eight per cent had experienced at least one false-positive screening test; more than 40 per cent of these individuals described the experience as ‘very scary’, or ‘the scariest time of my life’. But looking back, 98 per cent were glad they were screened. Most wanted to know about cancer, regardless of the implications. Two thirds said they would be tested for cancer even if nothing could be done. Chin up.
How Do You Know?
Guardian, 4 June 2011
Mobile phones ‘possibly’ cause brain cancer, according to a report this week from the IARC (International Agency for Research on Cancer), part of the WHO. This report has triggered over 3,000 news articles around the world. Like
you, I’m not interested in marginal changes around small lifestyle risks for the risks themselves; but I am interested in the methodological issues they throw up.
First, transparency: science isn’t about authoritative utterances from men in white coats, it’s about showing your working. What does this report say? How do its authors reason around contradictory data? Nobody can answer those questions, because the report isn’t available. Nobody you see writing confidently about it has read it. There is only a press release. Nobody at the IARC even replied to my emails requesting more information.
This isn’t just irritating. Phones are a potential risk exposure where people can make a personal choice. People want information. It’s in the news right now. The word ‘possibly’ informs nobody. How can we put flesh on that with the research that is already published, and what are the limits of the research?
The crudest data you could look at is the overall rate of different brain cancers. This hasn’t changed much over time, despite an increase in mobile-phone use, but it’s a crude measure, affected by lots of different things.
Ideally, we’d look at individuals, to see if greater mobile use is correlated with brain cancer, but that can be tricky. These tumours are rare – about ten cases in every 100,000 people each year – and that affects how you research them.
For common things, such as heart disease, you can take a few thousand people and measure factors you think are relevant – smoking, diet, some blood tests – then wait a few years until they get the disease. This is a ‘prospective cohort study’, but that approach is much less useful for studying rare outcomes, like brain tumours, because you won’t get enough cases appearing in your study group to spot an association with your potential cause.