The Rules of Contagion

Page 23

by Adam Kucharski

We’ve already seen that for STIs, the reproduction number of an infection will be larger when there is a lot of variation in how many sexual partners people have. An infection that would fade away if everyone behaved identically can persist if some people have a lot more partners than others. Vespignani and Pastor-Satorras realised that something even more extreme can happen with computer networks.[33] Because there is huge variability in the number of links, even seemingly weak infections can survive. The reason is that in this kind of network, a computer is never more than a few steps from a highly connected hub, which can spread the infection widely in a superspreading event. It’s an exaggerated form of the problem that banks faced in 2008, with a few major hubs able to drive the entire outbreak.

When outbreaks are driven by superspreading events, it makes the transmission process extremely fragile. Unless an infection hits a major hub, it probably won’t go very far. Yet superspreading can also make an outbreak more unpredictable. Although most outbreaks won’t take off, those that do can stutter along for a surprisingly long time. This explains why a handful of computer viruses and worms have continued to spread, despite not being that transmissible at an individual level. The same is true of many trends on social media. If you’ve ever seen a strange meme spreading and wondered how it could have persisted for so long, it probably has more to do with the network itself rather than the quality of the content.[34] Thanks to their structure, online networks are giving infections an advantage that they don’t have in other areas of life.

On 22 march 2017, web developers around the world noticed that their apps weren’t working properly. From Facebook to Spotify, companies using the JavaScript programming language found themselves unable to work parts of their software. User interfaces were broken, visuals wouldn’t load, updates wouldn’t install.

The problem? Eleven lines of computer code – which many people didn’t even know existed – had gone missing. The code in question had been written by Azer Koçulu, a developer based in Oakland, California. Those eleven lines formed a JavaScript program called ‘left-pad’. The program itself wasn’t particularly complicated; it just added some extra characters at the start of a segment of text. It was the sort of thing most coders could have created from scratch in a few minutes.[35]

Yet most coders don’t create everything from scratch. To save time, they use tools that others have developed and shared. Many of them do this by searching an online resource called ‘npm’, which collects together handy bits of code like left-pad. In some cases, people incorporate these existing tools into new programs, which they subsequently share. Some of these programs then feed into other new programs, creating a chain of dependency with each one supporting the next. Whenever someone installs or updates a program, they will also need to load everything in the dependency chain, otherwise they’ll get an error message. Left-pad lay deep within one of these chains. In the month before it disappeared, the code had been downloaded over two million times.

On that day in March, Koçulu had pulled his code from npm after a disagreement over a trademark. Npm had asked him to rename one of his software packages after another company complained; Koçulu protested and eventually responded by removing all of his code. That included left-pad, which meant that any chains of programs that relied on Koçulu’s tool were suddenly broken. And because some of the chains were so long, many developers hadn’t realised they were so reliant on those eleven lines of code.

Koçulu’s work is just one example of computer code that has spread much further than we might think. Soon after the left-pad incident, software developer David Haney noted that another tool on npm – which consisted of a single line of code – had become an essential part of seventy-two other programs. He listed several other pieces of software that were highly dependent on simple snippets of code. ‘I can’t help but be amazed by the fact that developers are taking on dependencies for single line functions that they should be able to write with their eyes closed,’ he wrote.[36] Borrowed pieces of code can often spread further than people realise. When researchers at Cornell University analysed articles written with LaTeX, a popular scientific writing software, they found that academics would often repurpose each other’s code. Some files had spread through networks of collaborators for more than twenty years.[37]

As code spreads, it can also pick up changes. After those three students posted the Mirai code online at the end of September 2016, dozens of different variants emerged, each with subtly different features. It was only a matter of time before someone altered the code to launch a major attack. In early October, a few weeks before the Dyn incident, security company RSA noticed a remarkable claim on a dark net marketplace: a group of hackers was offering a way to flood a target with 125 gigabytes of activity per second. For $75,000, someone could buy access to a 100,000-strong botnet, which was apparently based on some adapted Mirai code.[38] However, it wasn’t the first time the Mirai code had changed. In the weeks before they published the code, Mirai’s creators made over twenty alterations, apparently in an attempt to increase the contagiousness of their botnet. These included features that made the worm harder to detect, as well as tweaks to fight off other malware that was competing for the same susceptible machines. Once out in the wild, Mirai would continue to change for years to come; new variants were still appearing in 2019.[39]

When Fred Cohen first wrote about computer viruses in 1984, he pointed out that malware might evolve over time, becoming harder to detect. Rather than settling down to a well-balanced equilibrium, the ecosystem of computer viruses and anti-virus software would continuously shift around. ‘As evolution takes place, balances tend to change, with the eventual result being unclear in all but the simplest circumstances,’ he noted.[40] ‘This has very strong analogies to biological theories of evolution, and might relate well to genetic theories of diseases.’

A common way of protecting against malware is to have anti-virus software look for known threats. Typically, this involves searching for familiar segments of code; once a threat is recognised, it can be neutralised.[41] Human immune systems can do something very similar when we get infected or vaccinated. Immune cells will often learn the shape of the specific pathogen we’ve been exposed to; if we get infected again, these cells can respond quickly and neutralise the threat. However, evolution can sometimes hinder this process, with pathogens that once looked familiar changing their appearance to evade detection.

One of the most prominent – and frustrating – examples of this process is influenza evolution. Biologist Peter Medawar once called the flu virus ‘a piece of nucleic acid surrounded by bad news’.[42] There are two particular types of bad news on the surface of the virus: a pair of proteins known as haemagglutinin and neuraminidase, or HA and NA for short. HA allows the virus to latch onto host cells; NA helps with the release of new virus particles from infected cells. The proteins can take several different forms, and the different flu types – like H1N1, H3N2, H5N1 and so on – are named accordingly.

Winter flu epidemics are mostly caused by H1N1 and H3N2. These viruses gradually evolve as they circulate, causing the shape of those proteins to change. This means our immune system no longer recognises the mutated virus as a threat. We have annual flu epidemics – and annual flu vaccination campaigns – because our bodies are in essence playing a game of evolutionary cat-and-mouse with the infection.

Evolution can also help artificial infections persist. In recent years, malware has started to alter itself automatically to make identification harder. During 2014, for example, the ‘Beebone’ botnet infected thousands of machines worldwide. The worm behind the bots changed its appearance several times a day, resulting in millions of unique variants as it spread. Even if anti-virus software learned what the current versions of code looked like, the worm would soon shuffle itself around, distorting any known patterns. Beebone was finally taken offline in 2015, when police targeted the part of the system that wasn’t evolving: the fixed domain names used
to co-ordinate the botnet. This proved far more effective than trying to identify the shapeshifting worms.[43] Similarly, biologists are hoping to develop more effective flu vaccines by targeting the parts of the virus that don’t change.[44]

Given the need to evade detection, malware will continue to evolve, while authorities attempt to keep up. The routes of transmission will also keep changing. As well as finding new targets – like household devices – infections are increasingly spreading through clickbait and tailored attacks on social media.[45] By sending customised messages to specific users, hackers can boost the chances they’ll click on a link and inadvertently let malware in. However, evolution isn’t just helping infections spread effectively from computer-to-computer or person-to-person. It’s also revealing a new way to tackle contagion.

7

Tracking outbreaks

The affair would end with a murder attempt. For over ten years, Richard Schmidt, a gastroenterologist in Lafayette, Louisiana, had been having a relationship with Janice Trahan, a nurse fifteen years his junior. She’d divorced her husband after the affair started, but despite his promises, Schmidt had not left his wife and three children. Trahan had tried to break off the affair before, but this time it would be for good.

She would later testify that a couple of weeks afterwards, on 4 August 1994, Schmidt had come to her home while she was asleep. Schmidt told her he was there to give her a shot of vitamin B12. He’d previously given her vitamin injections to boost her energy levels, but that night she told him she didn’t want one. Before she could stop him, he’d stuck a needle in her arm. None of the previous injections had hurt, but this time the pain spread right through the limb. At which point, Schmidt said he had to leave to go to the hospital.

The pain continued overnight, and in the weeks that followed, she became ill with flu-like symptoms. She made several trips to the hospital, but test after test came back negative. One doctor had suspected hiv, but didn’t test for it. He later said that his colleague – one Dr Schmidt – had told him that Trahan had already tested negative for the infection. Her illness continued, and eventually another doctor ordered a new set of tests. In January 1995, Trahan finally received the correct diagnosis: she was hiv positive.

Back in August, Trahan had told a colleague she’d suspected that the ‘shot in the dark’ wasn’t B12. There was no doubt that hiv was a recent infection: she’d given blood several times and her most recent donation – made in April 1994 – had tested negative for hiv. According to a local hiv specialist, the progression of her symptoms was consistent with an early August date of infection. When police searched Schmidt’s offices, they found evidence that blood had been drawn from an hiv patient on 4 August – just hours before he’d allegedly injected Trahan – and the procedure hadn’t been recorded in the usual way. However, Schmidt denied visiting her and giving her the injection.[1]

Perhaps the virus itself could provide a clue about what had happened? At the time, it was already common to use DNA testing to match suspects to crime scenes. However, the task was trickier in this case. Viruses like hiv evolve relatively quickly, so the virus found in Trahan’s blood wouldn’t necessarily be the same as the one in the blood that infected her. Faced with a charge of attempted second-degree murder, Schmidt argued that the hiv virus that infected Trahan was too different to the original patient’s virus; it just wasn’t plausible that this had been the source of her infection. Given all the other evidence pointing to Schmidt, the prosecution disagreed. They just needed a way to show it.

On 20 june 1837,the british crown passed down the royal family tree, from William IV to Victoria. Meanwhile, a short walk away in Soho, a young biologist was also thinking about family trees, albeit on a much grander scale. Back in England after his five-year voyage on HMS Beagle, Charles Darwin would end up outlining his theories in a new leather-bound notebook. To help clarify his thinking, he sketched out a simplified diagram of a ‘tree of life’. The idea was that the branches indicated the evolutionary relationships between different species. Just like a family tree, Darwin suggested that closely related organisms would be closer to each other, while distinct species would be much further away. Tracing each of the branches would lead to a shared root: a single common ancestor.

Darwin’s original tree of life sketch. Species A is a distant relative of B, C, and D, which are more closely related. In the diagram, all the species evolved from a single starting point, labelled (1)

Darwin started by drawing evolutionary trees based on things like physical traits. On his Beagle voyage, he categorised bird species by features such as beak shape, tail length, and plumage.[2] This field of research would eventually become known as ‘phylogenetics’, after the Ancient Greek words for ‘species’ (phylo) and ‘origin’ (genesis).

Although early evolutionary analysis focused on the appearance of different species, the rise of genetic sequencing has made it possible to compare organisms in much more detail. If we have two genomes, we can see how related they are based on the amount of overlap in the lists of letters that make up their sequences. The more overlap there is, the fewer mutations are required to get from one sequence to the other. It’s a bit like waiting for tiles to appear in a game of Scrabble. Going from a sequence ‘AACG’ to ‘AACC’, for example, is easier than getting from ‘AACG’ to ‘TTGG’. And like Scrabble, we can estimate how long the evolutionary process has been running based on how much the letters have changed from their original sequence.

Using this idea – and plenty of computational power – it’s possible to arrange sequences into a phylogenetic tree, tracing out their historical evolution. We can also estimate when important evolutionary changes may have happened. This is useful if we want to know how an infection may have spread. For example, after sars sparked a major outbreak in 2003, scientists identified the virus in palm civets, a small mongoose-like animal. Maybe the disease had been routinely circulating in civets before spilling over into the human population?

Analysis of different sars viruses suggested otherwise. Human and civet viruses were closely related, indicating that both were relatively new hosts for the virus. sars had potentially jumped from civets into humans a few months before the outbreak started. In contrast, the virus had been circulating in bats for much longer, making its way into civets sometime around 1998. Based on the evolutionary history of the different viruses, civets were probably just a brief stepping stone for sars as it made its way into humans.[3]

During Richard Schmidt’s trial, the prosecution used similar phylogenetic evidence to show that it was plausible that Trahan’s infection had come from the hiv patient who’d visited Schmidt. Evolutionary biologist David Hillis and his colleagues compared the viruses isolated from the pair with other viruses found in hiv patients in Lafayette. In his testimony, Hillis said the viruses found in Schmidt’s patient and Trahan were ‘the most closely related sequences in the analysis, and as closely related to sequences isolated that two individuals could be’. Although it wasn’t conclusive proof that Trahan’s infection had come from Schmidt’s patient, it undermined the defence’s claim that the cases were unrelated. Eventually, Schmidt was found guilty and sentenced to fifty years in prison. As for Trahan, she remarried and continued to live with hiv, celebrating her twentieth wedding anniversary in 2016.[4]

Simplified phylogenetic tree for sars viruses in different host species. Dashed lines show estimated times when viruses diverged from one another, finding their way into a new group of hosts. (Data: Hon et al., 2008)

Schmidt’s trial was the first time that phylogenetic analysis had been used in a US criminal case. Since then, the methods have appeared in other cases around the world. Following a surge in cases of hepatitis C in Valencia, Spain, police investigators linked many of the patients to an anaesthetist named Juan Maeso. Phylogenetic analysis confirmed he was the likely source of the outbreak, and in 2007 he was convicted of infecting hundreds of patients by reusing syringes.[5] Genetic data has also helped
prove innocence. Shortly after the Maeso case, a group of medics were released from a prison in Libya. They’d been held for eight years after accusations that they’d deliberately infected children with hiv. The group were freed in part because of phylogenetic analysis, which showed that many of the infections had occurred years before the team had arrived in the country.[6]

As well as pointing to the likely source of an outbreak, phylogenetic methods can reveal when a disease arrived in a particular location. Suppose we are investigating a virus like hiv, which evolves relatively quickly. If the hiv viruses circulating in an area are relatively similar, it suggests they haven’t had long to evolve, so the outbreak is probably quite recent. In contrast, if there is a lot of diversity among current viruses, it means that there has been a lot of time for evolution, which suggests the original virus was introduced a while ago. These methods are now commonly used in public health. Recall how in earlier chapters, we looked at the arrival of Zika into Latin America and hiv into North America. In both cases, teams used genetic data to estimate the timing of the virus’s introduction. Researchers have also applied these same ideas to other infections, from pandemic influenza to hospital superbugs like MRSA.[7]

With access to genetic data, we can also work out whether an outbreak started with a single case or multiple introductions. When our team analysed Zika viruses isolated in Fiji during 2015 and 2016, we found two distinct groups of viruses in the phylogenetic tree. Based on the rate of evolution, one group of viruses had arrived into the capital Suva in 2013–14, spreading at low levels for the subsequent year or two, while a separate outbreak had later started in the west of the country.[8] I didn’t realise it at the time, but some of the mosquitoes I swatted away during my 2015 visit had probably been infected with Zika.

‹ Prev Next ›