by Hannah Fry
That means these state-of-the-art algorithms can also be pretty easily fooled. Since they work by detecting a statistical description of the patterns of light and dark on a face, you can trick them just by wearing funky glasses with a disruptive pattern printed on them. Even better, by designing the specific disruptive pattern to signal someone else’s face, you can actually make the algorithm think you are that person – as the chap in the image above is doing, wearing glasses that make him look ‘like’ actress Milla Jovovich.63 Using glasses as a disguise? Turns out Clark Kent was on to something.
But, targeted attacks with funky glasses aside, the recognition abilities of these statistical algorithms have prompted many admiring headlines, like those that greeted Google’s FaceNet. To test its recognition skills, FaceNet was asked to identify five thousand images of celebrities’ faces. Human recognizers had previously attempted the same task and done exceptionally well, scoring 97.5 per cent correct identifications (unsurprisingly, since these celebrity faces would have been familiar to the participants).64 But FaceNet did even better, scoring a phenomenal 99.6 per cent correct.
On the surface, this looks as if the machines have mastered superhuman recognition skills. It sounds like a great result, arguably good enough to justify the algorithms being used to identify criminals. But there’s a catch. Five thousand faces is, in fact, a pathetically small number to test your algorithm on. If it’s going to be put to work fighting crime, it’s going to need to find one face among millions, not thousands.
That’s because the UK police now hold a database of 19 million images of our faces, created from all those photos taken of individuals arrested on suspicion of having committed a crime. The FBI, meanwhile, has a database of 411 million images, in which half of all American adults are reportedly pictured.65 And in China, where the ID card database gives easy access to billions of faces, the authorities have already invested heavily in facial recognition. There are cameras installed in streets, subways and airports that will supposedly spot everything from wanted criminals to jaywalkers as they travel through the country’s cities.66 (There’s even a suggestion that a citizen’s minor misdemeanours in the physical world, like littering, will form part of their Sesame Credit score – attracting all of the associated punishments that we uncovered in the ‘Data’ chapter.)
Here’s the problem: the chances of misidentification multiply dramatically with the number of faces in the pile. The more faces the algorithm searches through, the more chance it has of finding two faces that look similar. So, once you try using these same algorithms on bigger catalogues of faces, their accuracy plummets.
It would be a bit like getting me to match ID cards to ten strangers and – when I got full marks – claiming that I was capable of correctly identifying faces 100 per cent of the time, then letting me wander off into the centre of New York to identify known criminals. It’s inevitable that my accuracy would drop.
It’s just the same with the algorithms. In 2015, the University of Washington set up the so-called MegaFace challenge, in which people from around the world were invited to test their recognition algorithms on a database of 1 million faces.67 Still substantially smaller than the catalogues held by some government authorities, but getting closer. Even so, the algorithms didn’t handle the challenge well.
Google’s FaceNet – which had been close to perfect on the celebrities – could suddenly manage only a 75 per centfn2 identification rate.68 Other algorithms came in at a frankly pathetic 10 per cent success rate. At the time of writing, the world’s best is a Chinese offering called Tencent YouTu Lab, which can manage an 83.29 per cent recognition rate.69
To put that another way, if you’re searching for a particular criminal in a digital line-up of millions, based on those numbers, the best-case scenario is that you won’t find the right person one in six times.
Now, I should add that progress in this area is happening quickly. Accuracy rates are increasing steadily, and no one can say for certain what will happen in the coming years or months. But I can tell you that differences in lighting, pose, image quality and general appearance make accurately and reliably recognizing faces a very tricky problem indeed. We’re some way away from getting perfect accuracy on databases of 411 million faces, or being able to find that one-in-a-trillion doppelgänger match.
Striking a balance
These are sobering facts, but not necessarily deal-breakers. There are algorithms good enough to be used in some situations. In Ontario, Canada, for instance, people with a gambling addiction can voluntarily place themselves on a list that bars them from entering a casino. If their resolve wavers, their face will be flagged by recognition algorithms, prompting casino staff to politely ask them to leave.70 The system is certainly unfair on all those mistakenly prevented from a fun night on the roulette table, but I’d argue that’s a price worth paying if it means helping a recovering gambling addict resist the temptation of their old ways.
Likewise in retail. In-store security guards used to have offices plastered with Polaroids of shoplifters; now algorithms can cross-reference your face with a database of known thieves as soon as you pass the threshold of the store. If your face matches that of a well-known culprit, an alert is sent to the smartphones of the guards on duty, who can then hunt you out among the aisles.
There’s good reason for stores to want to use this kind of technology. An estimated 3.6 million offences of retail crime are committed every year in the UK alone, costing retailers a staggering £660 million.71 And, when you consider that in 2016 there were 91 violent deaths of shoplifting suspects at retail locations in the United States,72 there is an argument that a method of preventing persistent offenders from entering a store before a situation escalates would be good for everyone.
But this high-tech solution to shoplifting comes with downsides: privacy, for one thing (FaceFirst, one of the leading suppliers of this kind of security software, claims it doesn’t store the images of regular customers, but shops are certainly using facial recognition to track our spending habits). And then there’s the question of who ends up on the digital blacklist. How do you know that everyone on the list is on there for the right reasons? What about innocent until proven guilty? What about people who end up on the list accidentally: how do they get themselves off it? Plus again there’s the potential for misidentification by an algorithm that can never be perfectly accurate.
The question is whether the pros outweigh the cons. There’s no easy answer. Even retailers don’t agree. Some are enthusiastically adopting the technology, while others are moving away from it – including Walmart, which cancelled a FaceFirst trial in their stores after it failed to offer the return on investment the company were hoping for.73
But in the case of crime the balance of harm and good feels a lot more clear cut. True, these algorithms aren’t alone in their slightly shaky statistical foundations. Fingerprinting has no known error rate either,74 nor do bite mark analysis, blood spatter patterning75 or ballistics.76 In fact, according to a 2009 paper by the US National Academy of Sciences, none of the techniques of forensic science apart from DNA testing can ‘demonstrate a connection between evidence and a specific individual or source’.77 None the less, no one can deny that they have all proved to be incredibly valuable police tools – just as long as the evidence they generate isn’t relied on too heavily. But the accuracy rates of even the most sophisticated facial recognition algorithms leave a lot to be desired. There’s an argument that if there is even a slight risk of more cases like Steve Talley, then a technology that isn’t perfect shouldn’t be used to assist in robbing someone of their freedom. The only problem is that stories like Talley’s don’t quite paint the entire picture. Because, while there are enormous downsides to using facial recognition to catch criminals, there are also gigantic upsides.
The tricky trade-off
In May 2015, a man ran through the streets of Manhattan randomly attacking passers-by with a black claw hammer. First, he ran up to a
group of people near the Empire State Building and smashed a 20-year-old man in the back of the head. Six hours later he headed south to Union Square and, using the same hammer, attacked a woman quietly sitting on a park bench on the side of the head. Just a few minutes later he appeared again, this time targeting a 33-year-old woman walking down the street outside the park.78 Using surveillance footage from the attacks, a facial recognition algorithm was able to identify him as David Baril, a man who, months before the attacks, had posted a picture on Instagram of a hammer dripping with blood.79 He pleaded guilty to the charges stemming from the attacks and was sentenced to 22 years in prison.
Cold cases, too, are being re-ignited by facial recognition breakthroughs. In 2014, an algorithm brought to justice an American man who had been living as a fugitive under a fake name for 15 years. Neil Stammer had absconded while on bail for charges including child sex abuse and kidnapping; he was re-arrested when his FBI ‘Wanted’ poster was checked against a database of passports and found to match a person living in Nepal whose passport photo carried a different name.80
After the summer of 2017, when eight people died in a terrorist attack on London Bridge, I can appreciate how helpful a system that used such an algorithm might be. Youssef Zaghba was one of three men who drove a van into pedestrians before launching into a stabbing spree in neighbouring Borough Market. He was on a watch list for terrorist suspects in Italy, and could have been automatically identified by a facial recognition algorithm before he entered the country.
But how do you decide on that trade-off between privacy and protection, fairness and safety? How many Steve Talleys are we willing to accept in exchange for quickly identifying people like David Baril and Youssef Zaghba?
Take a look at the statistics provided by the NYPD. In 2015, it reported successfully identifying 1,700 suspects leading to 900 arrests, while mismatching five individuals.81 Troubling as each and every one of those five is, the question remains: is that an acceptable ratio? Is that a price we’re willing to pay to reduce crime?
As it turns out, algorithms without downsides, like Kim Rossmo’s geoprofiling, discussed at the beginning of the chapter, are the exception rather than the rule. When it comes to fighting crime, every way you turn you’ll find algorithms that show great promise in one regard, but can be deeply worrying in another. PredPol, HunchLab, Strategic Subject Lists and facial recognition – all promising to solve all our problems, all creating new ones along the way.
To my mind, the urgent need for algorithmic regulation is never louder or clearer than in the case of crime, where the very existence of these systems raises serious questions without easy answers. Somehow, we’re going to have to confront these difficult dilemmas. Should we insist on only accepting algorithms that we can understand or look inside, knowing that taking them out of the hands of their proprietors might mean they’re less effective (and crime rates rise)? Do we dismiss any mathematical system with built-in biases, or proven capability of error, knowing that in doing so we’d be holding our algorithms to a higher standard than the human system we’re left with? And how biased is too biased? At what point do you prioritize the victims of preventable crimes over the victims of the algorithm?
In part, this comes down to deciding, as a society, what we think success looks like. What is our priority? Is it keeping crime as low as possible? Or preserving the freedom of the innocent above all else? How much of one would you sacrifice for the sake of the other?
Gary Marx, professor of sociology at MIT, put the dilemma well in an interview he gave to the Guardian: ‘The Soviet Union had remarkably little street crime when they were at their worst of their totalitarian, authoritarian controls. But, my God, at what price?’82
It may well be that, in the end, we decide that there should be some limits to the algorithm’s reach. That some things should not be analysed and calculated. That might well be a sentiment that eventually applies beyond the world of crime. Not, perhaps, for lack of trying by the algorithms themselves. But because – just maybe – there are some things that lie beyond the scope of the dispassionate machine.
Art
JUSTIN WAS IN a reflective mood. On 4 February 2018, in the living room of his home in Memphis, Tennessee, he sat watching the Super Bowl, eating M&Ms. Earlier that week he’d celebrated his 37th birthday, and now – as had become an annual tradition – he was brooding over what his life had become.
He knew he should be grateful, really. He had a perfectly comfortable life. A stable nine-to-five office job, a roof over his head and a family who loved him. But he’d always wanted something more. Growing up, he’d always believed he was destined for fame and fortune.
So how had he ended up being so … normal? ‘It was that boy-band,’ he thought to himself. The one he’d joined at 14. ‘If we’d been a hit, everything would have been different.’ But, for whatever reason, the band was a flop. Success had never quite happened for poor old Justin Timberlake.
Despondent, he opened another beer and imagined what might have been. On the screen, the Super Bowl commercials came to an end. Music started up for the big half-time show. And in a parallel universe – virtually identical to this one in all but one detail – another 37-year-old Justin Timberlake from Memphis took the stage.
Many worlds
Why is the real Justin Timberlake so successful? And why did the other Justin Timberlake fail? Some people (my 14-year-old-self included)fn1 might argue that pop-star Justin’s success is deserved: his natural talent, his good looks, his dancing abilities and the artistic merit of his music made fame inevitable. But others might disagree. Perhaps they’d claim there is nothing particularly special about Timberlake, or any of the other pop superstars who are worshipped by legions of fans. Finding talented people who can sing and dance is easy – the stars are just the ones who got lucky.
There’s no way to know for sure, of course. Not without building a series of identical parallel worlds, releasing Timberlake into each and watching all the incarnations evolve, to see if he manages success every time. Unfortunately, creating an artificial multiverse is beyond most of us, but if you set your sights below Timberlake and consider less well-known musicians instead, it is still possible to explore the relative roles of luck and talent in the popularity of a hit record.
This was precisely the idea behind a famous experiment conducted by Matthew Salganik, Peter Dodds and Duncan Watts back in 2006 that created a series of digital worlds.1 The scientists built their own online music player, like a very crude version of Spotify, and filtered visitors off into a series of eight parallel musical websites, each identically seeded with the same 48 songs by undiscovered artists.
In what became known as the Music Lab,2 a total of 14,341 music fans were invited to log on to the player, listen to clips of each track, rate the songs, and download the music they liked best.
Just as on the real Spotify, visitors could see at a glance what music other people in their ‘world’ were listening to. Alongside the artist name and song title, participants saw a running total of how many times the track had already been downloaded within their world. All the counters started off at zero, and over time, as the numbers changed, the most popular songs in each of the eight parallel charts gradually became clear.
Meanwhile, to get some natural measure of the ‘true’ popularity of the records, the team also built a control world, where visitors’ choices couldn’t be influenced by others. There, the songs would appear in a random order on the page – either in a grid or in a list – but the download statistics were shielded from view.
The results were intriguing. All the worlds agreed that some songs were clear duds. Other songs were stand-out winners: they ended up being popular in every world, even the one where visitors couldn’t see the number of downloads. But in between sure-fire hits and absolute bombs, the artists could experience pretty much any level of success.
Take 52Metro, a Milwaukee punk band, whose song ‘Lockdown’ was wildly popular in one w
orld, where it finished up at the very top of the chart, and yet completely bombed in another world, ranked 40th out of 48 tracks. Exactly the same song, up against exactly the same list of other songs; it was just that in this particular world, 52Metro never caught on.3 Success, sometimes, was a matter of luck.
Although the path to the top wasn’t set in stone, the researchers found that visitors were much more likely to download tracks they knew were liked by others. If a middling song got to the top of the charts early on by chance, its popularity could snowball. More downloads led to more downloads. Perceived popularity became real popularity, so that eventual success was just randomness magnified over time.
There was a reason for these results. It’s a phenomenon known to psychologists as social proof. Whenever we haven’t got enough information to make decisions for ourselves, we have a habit of copying the behaviour of those around us. It’s why theatres sometimes secretly plant people in the audience to clap and cheer at the right times. As soon as we hear others clapping, we’re more likely to join in. When it comes to choosing music, it’s not that we necessarily have a preference for listening to the same songs as others, but that popularity is a quick way to insure yourself against disappointment. ‘People are faced with too many options,’ Salganik told LiveScience at the time. ‘Since you can’t listen to all of them, a natural short cut is to listen to what other people are listening to.’4
We use popularity as a proxy for quality in all forms of entertainment. For instance, a 2007 study looked into the impact of an appearance in the New York Times bestseller list on the public perception of a book. By exploiting the idiosyncrasies in the way the list is compiled, Alan Sorensen, the author of the study, tracked the success of books that should have been included on the basis of their actual sales, but – because of time lags and accidental omissions – weren’t, and compared them to those that did make it on to the list. He found a dramatic effect: just being on the list led to an increase in sales of 13–14 per cent on average, and a 57 per cent increase in sales for first-time authors.