Hello World

Page 9

by Hannah Fry

Sure, you could work all of these in as extra instructions, running through every single possible type of dog ear, or dog fur, or sitting position, but your algorithm will soon become so enormous it’ll be entirely unworkable, before you’ve even begun to distinguish dogs from other four-legged furry creatures. You need to find another way. The trick is to shift away from the rule-based paradigm and use something called a ‘neural network’.11

You can imagine a neural network as an enormous mathematical structure that features a great many knobs and dials. You feed your picture in at one end, it flows through the structure, and out at the other end comes a guess as to what that image contains. A probability for each category: Dog; Not dog.

At the beginning, your neural network is a complete pile of junk. It starts with no knowledge – no idea of what is or isn’t a dog. All the dials and knobs are set to random. As a result, the answers it provides are all over the place – it couldn’t accurately recognize an image if its power source depended on it. But with every picture you feed into it, you tweak those knobs and dials. Slowly, you train it.

In goes your image of a dog. After every guess the network makes, a set of mathematical rules works to adjust all the knobs until the prediction gets closer to the right answer. Then you feed in another image, and another, tweaking every time it gets something wrong; reinforcing the paths through the array of knobs that lead to success and fading those that lead to failure. Information about what makes one dog picture similar to another dog picture propagates backwards through the network. This continues until – after hundreds and thousands of photos have been fed through – it gets as few wrong as possible. Eventually you can show it an image that it has never seen before and it will be able to tell you with a high degree of accuracy whether or not a dog is pictured.

The surprising thing about neural networks is that their operators usually don’t understand how or why the algorithm reaches its conclusions. A neural network classifying dog pictures doesn’t work by picking up on features that you or I might recognize as dog-like. It’s not looking for a measure of ‘chihuahua-ness’ or ‘Great Dane-ishness’ – it’s all a lot more abstract than that: picking up on patterns of edges and light and darkness in the photos that don’t make a lot of sense to a human observer (have a look at the image recognition example in the ‘Power’ chapter to see what I mean). Since the process is difficult for a human to conceptualize, it means the operators only know that they’ve tuned up their algorithm to get the answers right; they don’t necessarily know the precise details of how it gets there.

This is another ‘machine-learning algorithm’, like the random forests we met in the ‘Justice’ chapter. It goes beyond what the operators program it to do and learns itself from the images it’s given. It’s this ability to learn that endows the algorithm with ‘artificial intelligence’. And the many layers of knobs and dials also give the network a deep structure, hence the term ‘deep learning’.

Neural networks have been around since the middle of the twentieth century, but until quite recently we’ve lacked the widespread access to really powerful computers necessary to get the best out of them. The world was finally forced to sit up and take them seriously in 2012 when computer scientist Geoffrey Hinton and two of his students entered a new kind of neural network into an image recognition competition.12 The challenge was to recognize – among other things – dogs. Their artificially intelligent algorithm blew the best of its competitors out of the water and kicked off a massive renaissance in deep learning.

An algorithm that works without our knowing how it makes its decisions might sound like witchcraft, but it might not be all that dissimilar from how we learn ourselves. Consider this comparison. One team recently trained an algorithm to distinguish between photos of wolves and pet huskies. They then showed how, thanks to the way it had tuned its own dials, the algorithm wasn’t using anything to do with the dogs as clues at all. It was basing its answer on whether the picture had snow in the background. Snow: wolf. No snow: husky.13

Shortly after their paper was published, I was chatting with Frank Kelly, a professor of mathematics at Cambridge University, who told me about a conversation he’d had with his grandson. He was walking the four-year-old to nursery when they passed a husky. His grandson remarked that the dog ‘looked like’ a wolf. When Frank asked how he knew that it wasn’t a wolf, he replied, ‘Because it’s on a lead.’

An AI alliance

There are two things you want from a good breast cancer screening algorithm. You want it to be sensitive enough to pick up on the abnormalities present in all the breasts that have tumours, without skipping over the pixels in the image and announcing them as clear. But you also want it to be specific enough not to flag perfectly normal breast tissue as suspicious.

We’ve met the principles of sensitivity and specificity before, in the ‘Justice’ chapter. They are close cousins of false negatives and false positives (or Darth Vader and Luke Skywalker – which, if you ask me, is how they should be officially referred to in the scientific literature). In the context we’re talking about here, a false positive occurs whenever a healthy woman is told she has breast cancer, and a false negative when a woman with tumours is given the all-clear. A specific test will have hardly any false positives, while a sensitive one has few false negatives. It doesn’t matter what context your algorithm is working in – predicting recidivism, diagnosing breast cancer or (as we’ll see in the ‘Crime’ chapter) identifying patterns of criminal activity – the story is always the same. You want as few false positives and false negatives as possible.

The problem is that refining an algorithm often means making a choice between sensitivity and specificity. If you focus on improving one, it often means a loss in the other. If, for instance, you decided to prioritize the complete elimination of false negatives, your algorithm could flag every single breast it saw as suspicious. That would score 100 per cent sensitivity, which would certainly satisfy your objective. But it would also mean an awful lot of perfectly healthy people undergoing unnecessary treatment. Or say you decided to prioritize the complete elimination of false positives. Your algorithm would wave everyone through as healthy, thus earning a 100 per cent score on specificity. Wonderful! Unless you’re one of the women with tumours that the algorithm just disregarded.

Interestingly, human pathologists don’t tend to have problems with specificity. They almost never mistakenly identify cells as cancerous when they’re not. But people do struggle a little with sensitivity. It’s worryingly easy for us to miss tiny tumours – even obviously malignant ones.

These human weaknesses were highlighted in a recent challenge that was designed to pit human against algorithm. Computer teams from around the world went head-to-head with a pathologist to find all the tumours within four hundred slides, in a competition known as CAMELYON16. To make things easier, all the cases were at the two extremes: perfectly normal tissue or invasive breast cancer. There was no time constraint on the pathologist, either: they could take as long as they liked to wade through the biopsies. As expected, the pathologist generally got the overall diagnosis right (96 per cent accuracy)14 – and without identifying a single false positive in the process. But they also missed a lot of the tiny cancer cells hiding among the tissue, only managing to spot 73 per cent of them in 30 hours of looking.

The sheer number of pixels that needed checking wasn’t necessarily the problem. People can easily miss very obvious anomalies even when looking directly at them. In 2013, Harvard researchers secretly hid an image of a gorilla in a series of chest scans and asked 24 unsuspecting radiologists to check the images for signs of cancer. Eighty-three per cent of them failed to notice the gorilla, despite eye-tracking showing that the majority were literally looking right at it.15 Try it yourself with the picture above.16

Algorithms have the opposite problem. They will eagerly spot anomalous groups of cells, even perfectly healthy ones. During CAMELYON16, for instance, the best neu
ral network entered managed to find an impressive 92.4 per cent of the tumours,17 but in doing so it made eight false-positive mistakes per slide by incorrectly flagging normal groups of cells as suspicious. With such a low specificity, the current state-of-the art algorithms are definitely leaning towards the ‘everyone has breast cancer!’ approach to diagnosis and are just not good enough to create their own pathology reports yet.

The good news, though, is that we’re not asking them to. Instead, the intention is to combine the strengths of human and machine. The algorithm does the donkey-work of searching the enormous amount of information in the slides, highlighting a few key areas of interest. Then the pathologist takes over. It doesn’t matter if the machine is flagging cells that aren’t cancerous; the human expert can quickly check through and eliminate anything that’s normal. This kind of algorithmic pre-screening partnership not only saves a lot of time, it also bumps up the overall accuracy of diagnosis to a stunning 99.5 per cent.18

Marvellous as all this sounds, the fact is that human pathologists have always been good at diagnosing aggressive cancerous tumours. The difficult cases are those ambiguous ones in the middle, where the distinction between cancer and not cancer is more subtle. Can the algorithms help here too? The answer is (probably) yes. But not by trying to diagnose using the tricky categories that pathologists have always used. Instead, perhaps the algorithm – which is so much better at finding anomalies hidden in tiny fragments of data – could offer a better way to diagnose altogether. By doing something that human doctors can’t.

The Nun Study

In 1986, an epidemiologist from the University of Kentucky named David Snowden managed to persuade 678 nuns to give him their brains. The nuns, who were all members of the School Sisters of Notre Dame, agreed to participate in Snowden’s extraordinary scientific investigation into the causes of Alzheimer’s disease.

Each of these women, who at the beginning of the study were aged between 75 and 103 years old, would take a series of memory tests every year for the rest of their lives. Then, when they died, their brains would be donated to the project. Regardless of whether they had symptoms of dementia or not, they promised to allow Snowden’s team to remove their most precious organ and analyse it for signs of the disease.19

The nuns’ generosity lead to the creation of a remarkable dataset. Since none of them had children, smoked, or drank very much, the scientists were able to rule out many of the external factors believed to raise the likelihood of contracting Alzheimer’s. And since they all lived a similar lifestyle, with similar access to healthcare and social support, the nuns effectively provided their own experimental control.

All was going well when, a few years into the study, the team discovered this experimental group offered another treasure trove of data they could tap into. As young women, many of the now elderly nuns had been required to submit a handwritten autobiographical essay to the Sisterhood before they were allowed to take their vows. These essays were written when the women were – on average – only 22 years old, decades before any would display any symptoms of dementia. And yet, astonishingly, the scientists discovered clues in their writing that predicted what would happen to them far in the future.

The researchers analysed the language in each of the essays for its complexity and found a connection between how articulate the nuns were as young women and their chances of developing dementia in old age.

For example, here is a one-sentence extract from a nun who maintained excellent cognitive ability throughout her life:

After I finished the eighth grade in 1921 I desired to become an aspirant at Mankato but I myself did not have the courage to ask the permission of my parents so Sister Agreda did it in my stead and they readily gave their consent.

Compare this to a sentence written by a nun whose memory scores steadily declined in her later years:

After I left school, I worked in the post-office.

The association was so strong that the researchers could predict which nuns might have dementia just by reading their letters. Ninety per cent of the nuns who went on to develop Alzheimer’s had ‘low linguistic ability’ as young women, while only 13 per cent of the nuns who maintained cognitive ability into old age got a ‘low idea density’ score in their essays.20

One of the things this study highlights is the incredible amount we still have to learn about our bodies. Even knowing that this connection might exist doesn’t tell us why. (Is it that good education staves off dementia? Or that people with the propensity to develop Alzheimer’s feel more comfortable with simple language?) But it might suggest that Alzheimer’s can take several decades to develop.

More importantly, for our purposes, it demonstrates that subtle signals about our future health can hide in the tiniest, most unexpected fragments of data – years before we ever show symptoms of an illness. It hints at just how powerful future medical algorithms that can dig into data might be. Perhaps, one day, they’ll even be able to spot the signs of cancer years before doctors can.

Powers of prediction

In the late 1970s, a group of pathologists in Ribe County, Denmark, started performing double mastectomies on a group of corpses. The deceased women ranged in age from 22 to 89 years old, and 6 of them, out of 83, had died of invasive breast cancer. Sure enough, when the researchers prepared the removed breasts for examination by a pathologist – cutting each into four pieces and then slicing the tissue thinly on to slides – these 6 samples revealed the hallmarks of the disease. But to the researchers’ astonishment, of the remaining 77 women – who had died from completely unrelated causes, including heart disease and car accidents – almost a quarter had the warning signs of breast cancer that pathologists look out for in living patients.

Without ever showing any signs of illness, 14 of the women had in situ cancer cells that had never developed beyond the milk ducts or glands. Cells that would be considered malignant breast cancer if the women were alive. Three had atypical cells that would also be flagged as a matter of concern in a biopsy, and one woman actually had invasive breast cancer without having any idea of it when she died.21

These numbers were surprising, but the study wasn’t a fluke. Other researchers have found similar results. In fact, some estimate that, at any one time, around 9 per cent of women could be unwittingly walking around with tumours in their breasts22 – about ten times the proportion who actually get diagnosed with breast cancer.23

So what’s going on? Do we have a silent epidemic on our hands? According to Dr Jonathan Kanevsky, a medical innovator and resident in surgery at McGill University in Montreal, the answer is no. At least, not really. Because the presence of cancer isn’t necessarily a problem:

If somebody has a cancer cell in their body, the chances are their immune system will identify it as a mutated cell and just attack it and kill it – that cancer will never grow into something scary. But sometimes the immune system messes up, meaning the body supports the growth of the cancer, allowing it to develop. At that point cancer can kill.24

Not all tumours are created equally. Some will be dealt with by your body, some will sit there quite happily until you die, some could develop into full-blown aggressive cancer. The trouble is, we often have very little way of knowing which will turn out to be which.

And that’s why these tricky categories between benign and horribly malignant can be such a problem. They are the only classifications that doctors have to work with, but if a doctor finds a group of cells in your biopsy that seem a bit suspicious, the label they choose can only help describe what lies within your tissue now. That isn’t necessarily much help when it comes to giving you clues to your future. And it’s the future, of course, that worried patients are most concerned about.

The result is that people are often overly cautious in the treatment they choose. Take in situ cancers, for instance. This category sits towards the more worrying end of the spectrum, where a cancerous growth is present, but hasn’t yet spread to the surrounding
tissue. Serious as this sounds, only around one in ten ‘in situ’ cancers will turn into something that could kill you. None the less, a quarter of women who receive this diagnosis in the United States will undergo a full mastectomy – a major operation that’s physically and often emotionally life-changing.25

The fact is, the more aggressively you screen for breast cancer, the more you affect women who otherwise would’ve happily got on with their lives, oblivious to their harmless tumours. One independent UK panel concluded that for every 10,000 women who’ll be invited to a mammogram screening in the next 20 years, 43 deaths from breast cancer will be prevented. And a study published in the New England Journal of Medicine concluded that for every 100,000 who attend routine mammogram screenings, tumours that could become life-threatening will be detected in 30 women.26 But – depending which set of statistics you use – three or four times as many women will be over-diagnosed, receiving treatment for tumours that were never going to put their lives in danger.27

This problem of over-diagnosis and over-treatment is hard to solve when you’re good at detecting abnormalities but not good at predicting how they’ll develop. Still, there might be hope. Perhaps – just as with the essay-writing nuns – tiny clues to the way someone’s health will turn out years in the future can be found hiding in their past and present data. If so, winkling out this information would be a perfect job for a neural network.

In an arena where doctors have struggled for decades to discover why one abnormality is more dangerous than another, an algorithm that isn’t taught what to look for could come into its own. Just as long as you can put together a big enough set of biopsy slides (including some samples of tumours that eventually metastasized – spread to other parts of the body – as well as some that didn’t) to train your neural network, it could hunt for hidden clues to your future health blissfully free of any of the prejudice that can come with theory. As Jonathan Kanevsky puts it: ‘It would be up to the algorithm to determine the unique features within each image that correspond to whether the tumour becomes metastatic or not.’28

‹ Prev Next ›