The Naked Future

Home > Other > The Naked Future > Page 8
The Naked Future Page 8

by Patrick Tucker


  How does better and faster field reporting of new flu strains change this map? It creates more certainty about where and when mutation events occur, bringing the lines that are up in the atmosphere closer to the earth’s surface. That allows for better prediction about where the flu is going next. One of the key drivers is the growth of computing power and cloud services. Virus paths and transgenic events can be simulated just like weather patterns or human interactions. As more simulations are run, confidence improves.

  For instance, since 1993 epidemiologists have suspected that the comparatively dangerous H5N1 influenza virus’s polymerase basic 2 protein (P2B) contains lysine at position 627.9 That factoid may not sound particularly important to your future health but it is the presence of lysine in that position that allows the virus to reproduce in mammal lungs, which, at 33 degrees Celsius, are colder than bird lungs at 41 degrees Celsius. When a virus can reproduce in lungs, it goes airborne. The host begins to spread it through coughing and sneezing. If you can figure out under what conditions that particular mutation arose, what the weather and air were like, you can determine when an already deadly flu strain becomes much more spreadable.

  In January 2012 a team of scientists from the University of Wisconsin successfully created a strain of H5N1 influenza with lysine in the 627 position of the P2B.10 They demonstrated that the influenza could be passed between ferrets through coughing and sneezing. The government supported the work through grants. But the researchers’ findings were placed under a moratorium for months, as many believed the research could serve as a manual for weaponizing H5N1. Following the publication of the finding, conservative members of the U.S. House of Representatives threatened legislation to restrict publication of similar research in the future. This presented a somewhat ironic situation: the researchers were working, in part, under government grant, and the government was trying to suppress the very research that it had funded.11

  Controversial and expensive lab experiments like the one described above will remain important in figuring out how flu moves from one animal to another. But the research isn’t cheap or easy, is potentially vulnerable to political infighting, and is certainly not fast. Incredibly, for decades epidemiologists suspected the link between lysine, the P2B protein, and aerosol communicability of H5N1 in mammals, but it took until 2012 to show exactly how that mutation occurs.

  The United Nations has forecast that an H5N1 pandemic could kill between 5 million and 150 million people around the world. As researchers Tyler J. Kokjohn and Kimbal E. Cooper point out, “The large discrepancy between those two figures reflects the tremendous difference that preparation could play in facing a global pandemic. Fatality rates will vary depending on the strength of the resources that the national health institutions have in place.”12

  The best guard against the worst possible scenario isn’t just more data; it’s field data broadcast in real time. Databases like ProMED-mail and Global Infectious Diseases and Epidemiology Network (GIDEON) have made some disease clusters much easier to remotely detect, in part because they enable the spread of information not just between local health-care workers and global health organizations but also between clinic workers in the same area who may be on the front lines of a potential outbreak and not know it. This emerging capability is particularly important in Southeast Asia where diagnostics is often very difficult but where the climate and population conditions are ideal for epidemics.13

  Genomic and RNA sequencing have become steadily more cheap and ubiquitous, advancing even faster than computer information technology. As Scripps bioscientist Eric Topol points out in his book The Creative Destruction of Medicine, in the 1970s sequencing was limited to ten thousand pairs at a cost of $10 per base. But in the 2000s, sequencing machines capable of reading “hundreds of thousands” of bits of genome code in parallel increased the speed and decreased the cost of sequencing “a thousandfold.”14

  Today, RNA sequencing of flu strains requires nucleic acid sequence-based amplification machines and polymerase chain-reaction devices that are usually about desktop-size and require a bit of technical training to operate.

  Now imagine these two capabilities merging. Hospitals around the country and around the world already use handheld diagnostic equipment, sometimes called point-of-care tests, or POCTs, to detect tuberculosis, influenza, and a handful of other illnesses. In the coming years POCTs will be the leapfrog technology that brings twenty-first-century diagnostics to remote village settings, little one-road towns where no one ever thought it affordable to even build a clinic, much less a diagnostics lab. Not surprisingly, the Gates Foundation (with Qualcomm) today offers tens of millions of dollars in grant money to support the design of better POCT machines. There’s no technological barrier that stands in the way of putting an influenza detector in the hands of poultry farmers. Turning these detectors into flu genome sequencers, transforming a big data stream into a sensed data stream, is also within our reach. Future breakthroughs in our understanding of microfluidics, better sensors (through nanotechnology), wider broadband coverage, and growth in cloud computing will enable anyone to take a flu sample, digitize the gleaned data immediately, upload it to the cloud for sequencing, and report flu mutations, all with the press of a button.

  Imagine a naked future in which handheld devices and apps have brought not only diagnostic capability but advanced sequencing capability into the places where disease hot spots are most likely to flare up; where every poultry worker, school nurse, office or metro-station manager, mother, father, or student carried a handheld sequencer on their phone; a time when an army of people saw disease surveillance as part of their job, the same way we all feel a certain duty to report a suspicious package we see left on a train or in an airport.

  “I see a future where commodity sequencing will be deployed widely in doctors’ offices, in public health institutes, or maybe in the field in autonomous devices,” Janies told the NIH assembly. “Every night I would like to calculate a new map and show it to whomever needs to see it.”

  But flu is more than just chemical reactions occurring inside the bodies of ducks, geese, pigs, and people. Behind every person-to-person flu transmission is a story of two humans connecting. Those stories are essential to better predicting flu movement.

  Finding Your Flu Triangle

  Schools are to the seasonal flu what gasoline is to fire, yet very little is known about how people actually move in these sorts of enclosed, highly populated spaces. For instance, let’s say you wanted to slow the spread of influenza through a school. You could start by inoculating the most connected individuals, the people with the widest circle of associations. These would be teachers (connected by virtue of their official role) and popular kids (who earned their connectedness through blood conquest). If you could isolate these people at the start of an outbreak, you could halt the spread of flu in that population. Right?

  Marcel Salathe of Pennsylvania State University set out to test this notion. He outfitted 788 U.S high school students with small sensors that recorded their movements at twenty-second intervals. Salathe and his team recorded 762,868 close personal interactions (an interaction at a distance of about ten feet, which is about the distance at which people mouth spray one another with flu). After he collected a day’s worth of data he and his team ran one thousand simulations for each participant. The results: about 70 percent of the time, the infected will isolate themselves from the rest of the population and go home, resulting in no major outbreak. But the remaining 30 percent of simulations showed the contagion spreading. In some scenarios 1.3 percent of the school population went on to be infected; in others (for H1N1) the number was closer to 50 percent.15

  The popular kids and teachers didn’t spread the virus any faster than anyone else. The average close personal interaction in an American high school, according to Salathe’s research, is a bit less than five minutes. But it’s chopped up into an average of eighteen twenty-second int
ervals of hallway shoulder rubbing, lab partnering, cafeteria chatting. What was most surprising was that everyone rubs shoulders with everyone else. Separating or inoculating one group of seemingly more connected people had no major effect. So you aren’t more likely to get the flu from the popular girl simply because she has more hangers-on, thus more potential infectors. You simply don’t spend enough time with those folks. Salathe’s model shows that the chance of getting the flu from a random person in your high school is 1.35 percent because your exposure to random people is just about five minutes per person. You’re most likely to get sick from one of your two best friends, people with whom you share a lot of breathing and touching space.

  Nicholas Christakis and James Fowler describe in their book, Connected: The Surprising Power of Our Social Networks and How They Shape Our Lives, for a typical American there’s a 52 percent chance that any two people in your social network know each other. Christakis and Fowler call this transitivity. Flu is often passed from contact with various surfaces but when it’s passed from person to person, transitivity is the primary factor, specifically triangular transitivity. Kids in high school spend a lot of time in groups of three, or closed triangles.16

  If you know who is in your triangle then you know one of the two people most likely to get you sick. You can calculate the probability of that person infecting you on the basis of the amount of time you spend with that person, or colocate with them. The odds of flu transmission between people in close contact rises by a factor of 0.003 every twenty seconds, something a group of Johns Hopkins researchers discovered in the 1970s after watching how flu spread on airplanes. If you know who is in your triangle, and you know the amount of time you’re going to spend with them, then you can put a number on the odds of getting their bug (assuming you don’t do anything foolish like touch something they’ve first touched). This score would be far from perfect. But it need not be perfect to change in behavior and potentially limit the spread of influenza.17

  But having the formula isn’t enough; you want to be able to compute the probability of getting the flu from someone before you go out and colocate with them. This is where social networks are creating a naked future for communicable disease.

  The Figurative Flu

  For computer scientist Michael Paul, Twitter is much more than a social networking site. It’s a window into the physical states of millions of users around the world, and potentially a boon to future public health. Between May 2009 and October 2010, he and his research supervisor, Mark Dredze, created a program to classify “sick” tweets in terms of various symptoms being expressed. They called it Ailment Topic Aspect Model, or ATAM.18

  Every week, the CDC releases new flu numbers. The researchers discovered their ATAM model predicted official flu numbers before the CDC released them. They were able to work with the tweets daily and could have gone faster. The CDC method for collecting, analyzing, and releasing data is comparatively glacial. As Paul explained to me at his lab at Johns Hopkins, “The CDC takes two weeks because all of the CDC’s tools are limited. They survey hospitals. They literally call around and ask for numbers on positive specimens that week. Getting that data takes a while. The numbers for the flu rate in a week come out about two weeks after that week. If you want numbers for an outbreak of a novel epidemic, like SARS, two weeks is too long.”

  Paul and Dredze aren’t the first research team to attempt to use data on what people are typing into their computers and phones to forecast illness rates. Google Flu Trends famously can also predict influenza rates before official CDC reporting. But this flu trends program is query based. Search-engine queries reveal only what people are interested in, and the origin of that interest is a matter of guesswork, easier in some cases than in others.

  If a lot of people in a particular area are querying “flu shots” that’s a fairly good indicator of a flu outbreak, but it’s also an indirect indicator. By definition, a query is a question, not a statement. We have a good but not perfect sense of why people are asking about flu shots. What if you wanted to measure something a bit a more subjective, such as happiness? If I’m looking for flu remedies, Google can draw a reasonable inference that I’m in the market for flu symptom remedies because I’ve got a bug. But as Paul points out, during the SARS outbreak millions of people began searching for information about SARS simply because they were curious about it.

  Someone who types “I have the flu” is more likely to actually have it. To gain a sense of what people are feeling, what symptoms they may be experiencing, you need a platform that allows people to broadcast their current state of being and you need a user base interested in doing exactly that. This is precisely what Twitter is about. Paul and Dredze’s system predicted not only flu but also revealed a wide assortment of illnesses and symptoms among the observed subjects.

  Writing an algorithm that can parse nuances in human language is no easy matter. Humans are able to easily differentiate between words that clearly refer to illness, such as “101-degree fever,” “nausea,” and “seizure,” from those terms that we figuratively attribute to infectious illness, such as “Bieber fever,” being “bored to death,” and “OMG that guy’s Fuchsia Vneck is giving me a seizure #hipsterfail.” We learn appropriate use of rhetorical, figurative, or simply nonliteral language based on a wide variety of feedback signals. A human’s education in semantics, the study of the multiple meanings of language, is lifelong. Scientists hoping to imbue some shallow understanding of the slipperiness of words in computer programs don’t have that luxury.

  What Paul and Dredze did have was a ridiculous amount of data: a stockpile of 2 billion tweets that they built over a period of a year and a half. “Two billion over a year is a small sample, a lot of data, but a small sample,” Paul explains.

  Once they compiled their 2-billion-tweet corpus they soon found they had another problem. It was too big—not too large for a program but too large for them to work with. If the program was going to learn, they would have to design lessons for it; more specifically establish a set of rules that the program could use to separate health-related tweets from non-health-related ones. To write those rules, they needed to whittle the corpus down significantly.

  The next step was to figure out which illness-related words would be the most fruitful. “Words like the ‘flu’ are strong. But names of really specific drugs return so few tweets they’re not worth including,” Paul says. They looked at which of the 2 billion tweets were related to thirty thousand key words from Web MD and WrongDiagnosis.com, indicating sickness. This filtering and then classifying gave them a pile of just 11 million tweets. These contained such words as “flu,” “sick,” and various other terms that were related to illness but often used for other purposes (e.g.,“Web design class gives me a huge headache every time”).

  These remaining 11 million tweets had to be classified or annotated by hand, a task that was monumental enough to be beyond the reach of a pair of linguists. But the emergence of crowd-sourcing platforms has reduced this sort of large-scale, highly redundant task to a chore that can be smashed up and instantaneously divided among thousands of people around the world through Amazon’s Mechanical Turk service. The cost, Paul recalls, was close to $100. Each of the 11 million tweets was labeled three times to guard against human error. If two of the Mechanical Turk labelers believed a somewhat ambiguous tweet was about health, the chances of the tweet’s not being related to health was small.

  The point of this exercise was not to relieve the program of the burden of having to learn for itself but to create a body of correct answers that the program could check itself against. In machine learning, this is called a training set. It’s what the program uses to look up the answers to see if it’s right or wrong.

  Examples of machine learning used in practice go back to the 1950s, but only recently do we have enough material to train a computer model on virtually anything. This is a methodology breakthrough that is now poss
ible only because of the Internet, where spontaneous data creation from users has taken the place of costly and laborious surveys.

  What Paul and Dredze’s program does is show health and flu trends in something closer to real time, telemetrically, rather than in future time. But remember, the future is a matter of perception, and perception on such matters as flu outbreak is shaped by reporting. Predicting what official CDC results will reveal two weeks before those results become public is an example of an area of a particular future’s becoming more exposed where it had once been cloaked.

  But let’s return to our scenario in which Josh was told the identity of the person to whom he was going to give the flu. It also included a level of fine granularity, actionable intelligence on the likelihood, computed as a probability score, of direct person-to-person flu transmission. We aren’t concerned with some big data problem, we’re concerned about Josh!

  A few years ago attempting to solve a problem so complex would have involved extremely expensive and elaborate socio-technical simulations carried out by people with degrees in nuclear physics. But that was before millions of people took up the habit of broadcasting their present physical condition, their location, and their plans, all at once.

  Working off Paul and Dredze’s research, Adam Sadilek published two papers in the spring of 2012 in which he showed how to use geo-tagged tweets to discern—in real time—which one of your friends has a cold, deduce where he got it, and predict the likelihood of his giving it to you.

  He applied Paul and Dredze’s program that separates sick tweets from benign tweets, on top of a real-world setting. Sadilek looked at 15 million tweets, about 4 million of which had been geo-tagged, from 632,611 unique Twitter users in the city of New York. About half of those tweets (2.5 million) were from people who posted geo-tagged tweets more than a hundred times per month, so-called geo-active users. There were only about 6,000 of these people, but there were 31,874 friendships between them. Each person had an average of about 5 friends in the group.

 

‹ Prev