by Hannah Fry
By now, you won’t be surprised to learn that judges are also susceptible to the anchoring effect. They’re more likely to award higher damages if the prosecution demands a high amount,53 and hand down a longer sentence if the prosecutor requests a harsher punishment.54 One study even showed that you could significantly influence the length of a sentence in a hypothetical case by having a journalist call the judge during a recess and subtly drop a suggested sentence into the conversation. (‘Do you think the sentence in this case should be higher or lower than three years?’)55 Perhaps worst of all, it looks like you can tweak a judge’s decision just by having them throw a dice before reviewing a case.56 Even the most experienced judges were susceptible to this kind of manipulation.57
And there’s another flaw in the way humans make comparisons between numbers that has an impact on the fairness of sentencing. You may have noticed this particular quirk of the mind yourself. The effect of increasing the volume on your stereo by one level diminishes the louder it plays; a price hike from £1.20 to £2.20 feels enormous, but an increase from £67 to £68 doesn’t seem to matter; and time seems to speed up as you get older. It happens because humans’ senses work in relative terms rather than in absolute values. We don’t perceive each year as a fixed period of time; we experience each new year as a smaller and smaller fraction of the life we’ve lived. The size of the chunks of time or money or volume we perceive follows a very simple mathematical expression known as Weber’s Law.
Put simply, Weber’s Law states that the smallest change in a stimulus that can be perceived, the so-called ‘Just Noticeable Difference’, is proportional to the initial stimulus. Unsurprisingly, this discovery has also been exploited by marketers. They know exactly how much they can get away with shrinking a chocolate bar before customers notice, or precisely how much they can nudge up the price of an item before you’ll think it’s worth shopping around.
The problem in the context of justice is that Weber’s Law influences the sentence lengths that judges choose. Gaps between sentences get bigger as the penalties get more severe. If a crime is marginally worse than something deserving a 20-year sentence, an additional 3 months, say, doesn’t seem enough: it doesn’t feel there’s enough of a difference between a stretch of 20 years and one of 20 years and 3 months. But of course there is: 3 months in prison is still 3 months in prison, regardless of what came before. And yet, instead of adding a few months on, judges will jump to the next noticeably different sentence length, which in this case is 25 years.58
We know this is happening because we can compare the sentence lengths actually handed out to those Weber’s Law would predict. One study from 2017 looked at over a hundred thousand sentences in both Britain and Australia and found that up to 99 per cent of defendants deemed guilty were given a sentence that fits the formula.59
‘It doesn’t matter what type of offence you’d committed,’ Mandeep Dhami, the lead author of the study, told me, ‘or what type of defendant you were, or which country you were being sentenced in, or whether you had been given a custodial or community sentence.’ All that mattered was the number that popped into the judge’s mind and felt about right.
Sadly, when it comes to biased judges, I could go on. Judges with daughters are more likely to make decisions more favourable to women.60 Judges are less likely to award bail if the local sports team has lost recently. And one famous study even suggested that the time of day affects your chances of a favourable outcome.61 Although the research has yet to be replicated,62 and there is some debate over the size of the effect, there may be some evidence that taking the stand just before lunch puts you at a disadvantage: judges in the original study were most likely to award bail if they had just come back from a recess, and least likely when they were approaching a food break.
Another study showed that an individual judge will avoid making too many similar judgments in a row. Your chances, therefore, of being awarded bail drop off a cliff if four successful cases were heard immediately before yours.63
Some researchers claim, too, that our perceptions of strangers change depending on the temperature of a drink we’re holding. If you’re handed a warm drink just before meeting a new person, they suggest, you’re more likely to see them as having a warmer, more generous, more caring personality.64
This long list is just the stuff we can measure. There are undoubtedly countless other factors subtly influencing our behaviour which don’t lend themselves to testing in a courtroom.
Summing up
I’ll level with you. When I first heard about algorithms being used in courtrooms I was against the idea. An algorithm will make mistakes, and when a mistake can mean someone loses their right to freedom, I didn’t think it was responsible to put that power in the hands of a machine.
I’m not alone in this. Many (perhaps most) people who find themselves on the wrong side of the criminal justice system feel the same. Mandeep Dhami told me how the offenders she’d worked with felt about how decisions on their future were made. ‘Even knowing that the human judge might make more errors, the offenders still prefer a human to an algorithm. They want that human touch.’
So, for that matter, do the lawyers. One London-based defence lawyer I spoke to told me that his role in the courtroom was to exploit the uncertainty in the system, something that the algorithm would make more difficult. ‘The more predictable the decisions get, the less room there is for the art of advocacy.’
However, when I asked Mandeep Dhami how she would feel herself if she were the one facing jail, her answer was quite the opposite:
‘I don’t want someone to use intuition when they’re making a decision about my future. I want someone to use a reasoned strategy. We want to keep judicial discretion, as though it is something so holy. As though it’s so good. Even though research shows that it’s not. It’s not great at all.’
Like the rest of us, I think that judges’ decisions should be as unbiased as possible. They should be guided by facts about the individual, not the group they happen to belong to. In that respect, the algorithm doesn’t measure up well. But it’s not enough to simply point at what’s wrong with the algorithm. The choice isn’t between a flawed algorithm and some imaginary perfect system. The only fair comparison to make is between the algorithm and what we’d be left with in its absence.
The more I’ve read, the more people I’ve spoken to, the more I’ve come to believe that we’re expecting a bit too much of our human judges. Injustice is built into our human systems. For every Christopher Drew Brooks, treated unfairly by an algorithm, there are countless cases like that of Nicholas Robinson, where a judge errs on their own. Having an algorithm – even an imperfect algorithm – working with judges to support their often faulty cognition is, I think, a step in the right direction. At least a well-designed and properly regulated algorithm can help get rid of systematic bias and random error. You can’t change a whole cohort of judges, especially if they’re not able to tell you how they make their decisions in the first place.
Designing an algorithm for use in the criminal justice system demands that we sit down and think hard about exactly what the justice system is for. Rather than just closing our eyes and hoping for the best, algorithms require a clear, unambiguous idea of exactly what we want them to achieve and a solid understanding of the human failings they’re replacing. It forces a difficult debate about precisely how a decision in a courtroom should be made. That’s not going to be simple, but it’s the key to establishing whether the algorithm can ever be good enough.
There are tensions within the justice system that muddy the waters and make these kinds of questions particularly difficult to answer. But there are other areas, slowly being penetrated by algorithms, where decisions are far less fraught with conflict, and the algorithm’s objectives and positive contribution to society are far more clear-cut.
Medicine
IN 2015, A group of pioneering scientists conducted an unusual study on the accuracy of cancer diagnoses.
1 They gave 16 testers a touch-screen monitor and tasked them with sorting through images of breast tissue. The pathology samples had been taken from real women, from whom breast tissue had been removed by a biopsy, sliced thinly and stained with chemicals to make the blood vessels and milk ducts stand out in reds, purples and blues. All the tester had to do was decide whether the patterns in the image hinted at cancer lurking among the cells.
After a short period of training, the testers were set to work, with impressive results. Working independently, they correctly assessed 85 per cent of samples.
But then the researchers realized something remarkable. If they started pooling answers – combining votes from the individual testers to give an overall assessment on an image – the accuracy rate shot up to 99 per cent.
What was truly extraordinary about this study was not the skill of the testers. It was their identity. These plucky lifesavers were not oncologists. They were not pathologists. They were not nurses. They were not even medical students. They were pigeons.
Pathologists’ jobs are safe for a while yet – I don’t think even the scientists who designed the study were suggesting that doctors should be replaced by plain old pigeons. But the experiment did demonstrate an important point: spotting patterns hiding among clusters of cells is not a uniquely human skill. So, if a pigeon can manage it, why not an algorithm?
Pattern hunters
The entire history and practice of modern medicine is built on the finding of patterns in data. Ever since Hippocrates founded his school of medicine in ancient Greece some 2,500 years ago, observation, experimentation and the analysis of data have been fundamental to the fight to keep us healthy.
Before then, medicine had been – for the most part – barely distinguishable from magic. People believed that you fell ill if you’d displeased some god or other, and that disease was the result of an evil spirit possessing your body. As a result, the work of a physician would involve a lot of chanting and singing and superstition, which sounds like a lot of fun, but probably not for the person who was relying on it all to stop them from dying.
It wasn’t as if Hippocrates single-handedly cured the world of irrationality and superstition for ever (after all, one rumour said he had a hundred-foot dragon for a daughter),2 but he did have a truly revolutionary approach to medicine. He believed that the causes of disease were to be understood through rational investigation, not magic. By placing his emphasis on case reporting and observation, he established medicine as a science, justly earning himself a reputation as ‘the father of modern medicine’.3
While the scientific explanations that Hippocrates and his colleagues came up with don’t exactly stand up to modern scrutiny (they believed that health was a harmonious balance of blood, phlegm, yellow bile and black bile),4 the conclusions they drew from their data certainly do.5 (They were the first to give us insights such as: ‘Patients who are naturally fat are apt to die earlier than those who are slender.’) It’s a theme that is found throughout the ages. Our scientific understanding may have taken many wrong turns along the way, but progress is made through our ability to find patterns, classify symptoms and use these observations to predict what the future holds for a patient.
Medical history is packed with examples. Take fifteenth-century China, when healers first realized they could inoculate people against smallpox. After centuries of experimentation, they found a pattern that they could exploit to reduce the risk of death from this illness by a factor of ten. All they had to do was find an individual with a mild case of the disease, harvest their scabs, dry them, crush them and blow them into the nose of a healthy person.6 Or the medical golden age of the nineteenth century, as medicine adopted increasingly scientific methods, and looking for patterns in data became integral to the role of a physician. One of these physicians was the Hungarian Ignaz Semmelweis, who in the 1840s noticed something startling in the data on deaths on maternity wards. Women who gave birth in wards staffed by doctors were five times more likely to fall ill to sepsis than those in wards run by midwives. The data also pointed towards the reason why: doctors were dissecting dead bodies and then immediately attending to pregnant women without stopping to wash their hands.7
What was true of fifteenth-century China and nineteenth-century Europe is true today of doctors all over the world. Not just when studying diseases in the population, but in the day-to-day role of a primary care giver, too. Is this bone broken or not? Is this headache perfectly normal or a sign of something more sinister? Is it worth prescribing a course of antibiotics to make this boil go away? All are questions of pattern recognition, classification and prediction. Skills that algorithms happen to be very, very good at.
Of course, there are many aspects of being a doctor that an algorithm will probably never be able to replicate. Empathy, for one thing. Or the ability to support patients through social, psychological, even financial difficulties. But there are some areas of medicine where algorithms can offer a helping hand. Especially in the roles where medical pattern recognition is found in its purest form and classification and prediction are prized almost to the exclusion of all else. Especially in an area like pathology.
Pathologists are the doctors a patient rarely meets. Whenever you have a blood or tissue sample sent off to be tested, they’re the ones who, sitting in some distant laboratory, will examine your sample and write the report. Their role sits right at the end of the diagnostic line, where skill, accuracy and reliability are crucially important. They are often the ones who say whether you have cancer or not. So, if the biopsy they’re analysing is the only thing between you and chemotherapy, surgery or worse, you want to be sure they’re getting it right.
And their job isn’t easy. As part of their role, the average pathologist will examine hundreds of slides a day, each containing tens of thousands – sometimes hundreds of thousands – of cells suspended between the small glass plates. It’s the hardest game of Where’s Wally? imaginable. Their job is to meticulously scan each sample, looking for tiny anomalies that could be hiding anywhere in the vast galaxy of cells they see beneath the microscope’s lens.
‘It’s an impossibly hard task,’ says Andy Beck,8 a Harvard pathologist and founder of PathAI, a company created in 2016 that creates algorithms to classify biopsy slides. ‘If each pathologist were to look very carefully at five slides a day, you could imagine they might achieve perfection. But that’s not the real world.’
It certainly isn’t. And in the real world, their job is made all the harder by the frustrating complexities of biology. Let’s return to the example of breast cancer that the pigeons were so good at spotting. Deciding if someone has the disease isn’t a straight yes or no. Breast cancer diagnoses are spread over a spectrum. At one end are the benign samples where normal cells appear exactly as they should be. At the other end are the nastiest kind of tumours – invasive carcinomas, where the cancer cells have left the milk ducts and begun to grow into the surrounding tissue. Cases that are at these extremes are relatively easy to spot. One recent study showed that pathologists manage to correctly diagnose 96 per cent of straightforward malignant specimens, an accuracy roughly equivalent to what the flock of pigeons managed when given a similar task.9
But in between these extremes – between totally normal and obviously horribly malignant – there are several other, more ambiguous categories, as shown in the diagram below. Your sample could have a group of atypical cells that look a bit suspicious, but aren’t necessarily anything to worry about. You could have pre-cancerous growths that may or may not turn out to be serious. Or you could have cancer that has not yet spread outside the milk ducts (so-called ductal carcinoma in situ).
Which particular category your sample happens to be judged as falling into will probably have an enormous impact on your treatment. Depending on where your sample sits on the line, your doctor could suggest anything from a mastectomy to no intervention at all.
The problem is, distinguishing between these ambiguous categories ca
n be extremely tricky. Even expert pathologists can disagree on the correct diagnosis of a single sample. To test how much the doctors’ opinions varied, one 2015 study took 72 biopsies of breast tissue, all of which were deemed to contain cells with benign abnormalities (a category towards the middle of the spectrum) and asked 115 pathologists for their opinion. Worryingly, the pathologists only came to the same diagnosis 48 per cent of the time.10
Once you’re down to 50–50, you might as well be flipping a coin for your diagnosis. Heads and you could end up having an unnecessary mastectomy (costing you hundreds of thousands of dollars if you live in the United States). Tails and you could miss a chance to address your cancer at its earliest stage. Either way, the impact can be devastating.
When the stakes are this high, accuracy is what matters most. So can an algorithm do better?
Machines that see
Until recently, creating an algorithm that could recognize anything at all in an image, let alone cancerous cells, was considered a notoriously tricky challenge. It didn’t matter that understanding pictures comes so easily to humans; explaining precisely how we were doing it proved an unimaginably difficult task.
To understand why, imagine writing instructions to tell a computer whether or not a photo has a dog in it. You could start off with the obvious stuff: if it has four legs, if it has floppy ears, if it has fur and so on. But what about those photos where the dog is sitting down? Or the ones in which you can’t see all the legs? What about dogs with pointy ears? Or cocked ears? Or the ones not facing the camera? And how does ‘fur’ look different from a fluffy carpet? Or the wool of a sheep? Or grass?