by Matt Parker
Or were they? How would we know if the monks were bad at building bridges? All the terrible bridges have either collapsed or been replaced over the near-millennium since. In the 1200s people would have been building bridges all over the place, probably with all sorts of different-shaped supports. I assume almost all of them are now gone. We only know about this one because it survived. To conclude that monks were good at building bridges is an example of ‘survivor bias’. It’s like a manager throwing half the applications for a job into the bin at random because they don’t want to hire any unlucky people. Just because something survives does not mean it is significant.
Bridge over turbulent water.
I find that a lot of nostalgia about how things were manufactured better in the past is down to survivor bias. I see people online sharing pictures of old kitchen equipment which is still working: waffle-irons from the 1920s, mixers from the 1940s and coffee machines from the 1980s. And there is some truth to the statement that older appliances last longer. I spoke to a manufacturing engineer in the US who said that, with 3D-design software, parts can be designed with much smaller tolerances now, whereas previous generations of engineers were not sure where the line was and so had to over-engineer parts to make sure they would work. But there is also the survivor bias that all the kitchen mixers which broke over the years have long been thrown out.
The study which looked at how many heart attacks occurred after the daylight-saving-time clock change also had a problem with a kind of survivor bias. In this case, the researchers had data only on people who made it to a hospital and required an artery-opening procedure, so this limited their investigation to people who had a serious attack and made it to a hospital. There could have been people having a daylight-saving-induced heart attack but who died before making it to the hospital, and the study would have missed them completely.
There are also sampling biases around how and where the data is collected. In 2012 the city of Boston released an app called Street Bump which seemed to be the perfect combination of smartdata collection and analysis. City councils spend a lot of their time repairing potholes in streets, and the longer potholes exist, the more they grow and become dangerous. The idea was that a driver could load the Street Bump app on to their smartphone and, while they are driving, the accelerometers in the phones would be looking for the tell-tale bump from when the car drives over a pothole. This constant updating map of potholes would allow city councils to fix new ones before they grew into car-eating canyons.
It got even more zeitgeist when some crowdsourcing was thrown in. The first version of the app was not good at spotting a false positive: data which looks like the data you want but is actually something else. In this case, the app was picking up cars driving over kerbs or other bumps and registering them as potholes; even the drivers moving the phone around in the car could register as a pothole. So version two was thrown open to the wisdom of the crowd. Anyone could suggest changes to the app’s code and the best ones would share in a $25,000 reward. The final Street Bump 2.0 app had contributions from anonymous software engineers, a team of hackers in Massachusetts and a head of a university mathematics department.
The new version was much better at detecting which bumps came from potholes. But there was a sampling bias because it was only reporting potholes where someone had a smartphone and was running the app, which heavily favoured affluent areas with a young population. The method used to collect the data made a big difference. It’s like conducting a survey about what people think of modern technology but only accepting submissions by fax.
And of course there is a bias in terms of what data people choose to release. When a company runs a drug trial on some new medication or medical intervention they have been working on, they want to show that it performs better than either no intervention or other current options. At the end of a long and expensive trial, if the results show that a drug has no benefit (or a negative one), there is very little motivation for the company to publish that data. It’s a kind of ‘publication bias’. An estimated half of all drug-trial results never get published. A negative result from a drug trial is twice as likely to remain unpublished as a positive result.
Withholding any drug-trial data can put people’s lives at risk, possibly more so than any other mistake I’ve mentioned in this book. Engineering and aviation disasters can result in hundreds of deaths. Drugs can have far wider impacts. In 1980 a trial was completed testing the anti-arrhythmic heart drug Lorcainide: while the frequency of serious arrhythmias in patients who took the drug did drop, of the forty-eight patients given the drug, nine died, while only one of the forty-seven patients given a placebo died.
But the researchers struggled to find anyone to publish their work.fn3 The deaths were outside the scope of their original investigation (focused only on frequency of arrhythmias) and because their sample of patients was so small the deaths could have been random chance. Over the next decade, further study did reveal the risks associated with this type of drug, a finding which could have been reached sooner with their data. If the Lorcainide data had been released sooner, an estimated ten thousand people might not have died.
Ben Goldacre, physician and nerd warrior, tells the story of how he prescribed the antidepressant drug Reboxetine to a patient based on trial data which showed it was more effective than a placebo. It had a clear positive result from a trial involving 254 patients, which was enough to convince him to write a prescription. Sometime later, in 2010, it was revealed that six other trials had been carried out to test Reboxetine (involving nearly 2,500 patients) and they all showed that it was no better than a placebo. Those six studies had not been published. Goldacre has since started the AllTrials campaign to get all drug-trial data, future and past, released. Check out his book Bad Pharma for more details.
In general, it’s amazing what you can prove if you’re prepared to ignore enough data. The UK has been home to humans for thousands of years, and that has left its mark on the landscape: there are ancient megalithic sites all over the place. In 2010 there were reports in the press that someone had analysed 1,500 ancient megalithic sites and found a mathematical pattern which linked them together in isosceles triangles as a kind of ‘prehistoric satnav’. This research was carried out by author Tom Brooks and, apparently, these triangles were too precise to have occurred by chance.
The sides of some of the triangles are over 100 miles across on each side and yet the distances are accurate to within 100 metres. You cannot do that by chance.
– Tom Brooks, 2009, and again, 2011
Brooks had been repeating his findings whenever he had a book to sell and it seems had put out near-identical press releases in at least 2009 and 2011. The coverage I saw was in January 2010, and I decided to test his claims. I wanted to apply the same process of looking for isosceles triangles but in location data that would not have any meaningful patterns. A few years earlier, Woolworths, a major chain of UK shops, had gone bankrupt and their derelict shopfronts were still on high streets all across the country. So I downloaded the GPS coordinates of eight hundred ex-Woolworths locations and got to work.
My Woolworths alignments. Since then both Woolworths and my hair have become a lot scarcer.
I found three Woolworths sites around Birmingham which form an exact equilateral triangle (Wolverhampton, Lichfield and Birmingham) and, if the base of the triangle is extended, it makes a 173.8-mile line linking the Conwy and Luton stores. Despite the 173.8-mile distance involved, the Conwy Woolworths store was only 12 metres off the exact line and the Luton site was within 9 metres. On either side of the Birmingham Triangle I found pairs of isosceles triangles within the required precision. This was the location of some creepy and eerie alignments. Which makes the Birmingham Triangle a kind of Bermuda Triangle, only with much worse weather.
As is apparently the accepted practice with these sorts of things, I put out a press release outlining my findings. I claimed that, at last, this information could give us some ins
ight into how the people of 2008 had lived. And, like Brooks, I claimed that the patterns were so precise that I could not rule out extraterrestrial help. The Guardian covered it with the headline ‘Did aliens help to line up Woolworths stores?’fn4
To find these alignments I had simply skipped over the vast majority of the Woolworths locations and chosen the few that happened to line up. A mere eight hundred locations gave me a choice of over 85 million triangles. I was not at all surprised when some of them were near-exactly isosceles. If none of them had been, then I would start believing in aliens. The 1,500 prehistoric sites that Brooks used gave him over 561 million triangles to pick-and-mix from. I suspect that he is completely genuine in his belief that ancient Britons placed their important sites in these locations: he had merely fallen victim to confirmation bias. Data that matched his expectations was focused on and the rest of it ignored.
Brooks put out his ancient-satnav press release yet again in 2011. So I put out my own press release again, this time with some help from programmer Tom Scott. Scott wrote a website which would take any postcode in the UK and find three ancient megalithic alignments which go through that spot; one of the three had to be Stonehenge. Three such ley lines go through every address in the UK. It is a mathematical certainty that you can find any pattern you want, as long as you’re prepared to ignore enough data that does not match. I have not heard anything from Brooks in the press since and, as a fellow triangle-phile, I hope he is doing okay.
Causation, correlation and mobile-phone masts
In 2010 a mathematician found that there was a direct correlation between the number of mobile-phone masts and the number of births in areas of the UK. For every additional mobile-phone mast in an area, 17.6 more babies were born compared to the national average. It was an incredibly strong correlation and would have warranted further investigation, had there been any causal link. But there wasn’t. The finding was meaningless. And I can say that because I was that mathematician.
This was a project I was doing with the BBC Radio 4 mathematics programme More or Less to look at how people respond to a correlation where there is no causal link. The sight of mobile-phone masts was not putting the citizens of the UK in a romantic mood. And decades of studies have revealed no biological impact from mobile-phone masts. In this case, both factors were dependent on a third variable: population size. Both the number of mobile-phone masts in an area and the number of births depend on how many people live there.
I should make it very clear: in the article I explained that the correlation was because of population size. I explained in great detail that this was an exercise in showing that correlation does not mean causation. But it ended up also being an exercise in how people don’t read the article properly before commenting underneath. The correlation was too alluring and people could not help but put forward their own reasons. More than one person suggested that expensive neighbourhoods have fewer masts and young families with loads of kids cannot afford to live there, proving once again that there is no topic that Guardian readers cannot make out to be about house prices. And, of course, it attracted a few of the alternative-facts types.
If this study holds up, then it’s in strong support of the existing scientific evidence that low-level radiation from mobile-phone masts do cause biological effects.
– Someone who didn’t read beyond the headline
A correlation is never enough to argue that one thing is causing another. There is always the chance that something else is influencing the data, causing the link. Between 1993 and 2008 the police in Germany were searching for the mysterious ‘phantom of Heilbronn’, a woman who had been linked to forty crimes, including six murders; her DNA had been found at all the crime scenes. Tens of thousands of police hours were spent looking for Germany’s ‘most dangerous woman’ and there was a €300,000 bounty on her head. It turns out she was a woman who worked in the factory that made the cotton swabs used to collect DNA evidence.
And, of course, some correlations happen to be completely random. If enough data sets are compared, sooner or later there will be two which match almost perfectly completely by accident. There is even a Spurious Correlations website which can search through publicly available data and find matches for you. I did a quick check against the number of people in the US who obtained a PhD in mathematics. Between 1999 and 2009 the number of maths doctorates awarded had an 87 per cent correlation with the ‘Number of people who tripped over their own two feet and died’. (Provided without comment.)
As a mathematical tool, correlation is a powerful technique. It can take a collection of data and provide a good measure of how closely linear changes in one variable match changes in the other. But it is only a tool, not the answer. Much of mathematics is about finding the correct answer but, in statistics, the numbers coming out of calculations are never the whole story. All of the Datasaurus Dozen have the same correlation values as well, but there are clearly different relationships in the plots. The numbers produced by statistics are the start of finding the answer, not the end. It takes a bit of common sense and clever insight to go from the statistics to the actual answer.
For the record, in the US the number of people awarded maths PhDs also has an above 90 per cent correlation over ten years or more with: uranium stored at nuclear-power plants, money spent on pets, total revenue generated by skiing facilities, and per capita consumption of cheese.
Otherwise, when you hear a statistic such as the fact that cancer rates have been steadily increasing, you could assume that people are living less healthy lives. The opposite is true: longevity is increasing, which means more people are living long enough to get cancer. For most cancers, age is the biggest risk factor and, in the UK, 60 per cent of all cancer diagnoses are for people aged sixty-five or older. As much as it pains me to say it, when it comes to statistics, the numbers are not everything.
TWELVE
Tltloay Rodanm
In 1984 ice-cream-van driver Michael Larson went on the US TV game show Press Your Luck and won an unprecedented $110,237: about eight times more than the average winner. He had such an extended winning streak that the normally fast-turnover game show had to split his appearance over two episodes.
On Press Your Luck the prizes were dished out via the Big Board, a screen with eighteen boxes detailing different cash amounts, physical prizes and a cartoon character known as a Whammy. The system rapidly flicked between the boxes in an apparently random order, and the player won the content of whichever box was selected when they hit their buzzer. Should a contestant land on a Whammy, the player lost all the prizes they had accumulated so far.
The system never lingered on a box for long enough for the player to see what it was, react and hit their buzzer. And because the movement seemed unpredictable it was theoretically impossible for the player to anticipate which box it would select in advance, so they were picking at random. Most players would win a few prizes before retiring for that round; others would press their luck and get whammied. At least, that was the theory.
The game starts normally enough. Michael answers enough trivia questions correctly to earn some spins on the Big Board, and on his first go he hits a Whammy. By the start of the second round Michael is coming last, but his trivia knowledge has earned him seven more spins on the Big Board. This time he does not hit a Whammy; he wins $1,250. Then $1,250 on the next spin. Then $4,000; $5,000; $1,000; a holiday to Kauai; $4,000; and so on. And most of these were prizes that also came with a ‘free spin’, so his Big Board reign seems to be everlasting.
At first the host Peter Tomarken goes through his normal patter, waiting for Michael to hit a Whammy. But he doesn’t. In a freak of probability, he keeps selecting prize after prize. The video is available online if you search for ‘Press Your Luck’ and ‘Michael Larson’. It is amazing to watch the range of emotions the host goes through. Initially he is excited that something unlikely is happening but soon he is trying to work out what on earth is going on while maintaining his
jovial-game-show-host persona.
Michael Larson pressing everything but his luck.
Instead of being truly random, the board had only five predetermined cycles, which it went through so fast they looked random. Michael Larson had taped the show at home and pored over the footage until he cracked those underlying patterns. Then he memorized them – which, ironically, was probably less effort than learning the answers to trivia questions, like other people did. And I certainly can’t make fun of him for memorizing long sequences of seemingly arbitrary values; my knowledge of the digits of pi has definitely not won me $110,237.
The designers of the Press Your Luck system hard-coded set cycles instead of being truly random because being random is difficult. It is far easier to use an already generated list of locations than to randomly pick a path on the fly. It’s not even a case of it being difficult for computers to do something randomly: it’s pretty much impossible.
Robotic randomness
No computer can be random unaided: computers are built to follow instructions exactly; processors are built to predictably do the correct thing every time. Making a computer do something unexpected is a difficult feat. You can’t have a line of code which is do something random and get a truly random number without a specialized component being attached to the computer.
The extreme version is to build a two-metre-high motorized conveyor belt which dips into a bucket of two hundred dice and lifts a random selection of them past a camera which the computer can then use to look at the dice and detect which numbers have been rolled. Such a machine, capable of 1,330,000 random dice rolls a day, would weigh around 50 kilograms, fill a room with the cacophony of moving motors and rolling dice and be exactly what Scott Nesin built for his GamesByEmail website.