Hacking Darwin

Home > Other > Hacking Darwin > Page 6
Hacking Darwin Page 6

by Jamie Metzl


  New genetic tools make it possible to turn on and off multiple genes, but really understanding how genes contribute to complex human traits requires a much more sophisticated process of integration. Genome-wide association studies, or GWAS, are starting to do just that.

  Even though all humans are a great deal more genetically alike than not, our relatively small number of genetic differences account for most of our diversity and diseases and so are pretty important. As opposed to the old days of looking for these types of single-gene mutations among groups of people with the same genetic disease, the GWAS process pores through hundreds, thousands, or even millions of known genetic variations to find differences and patterns that can match with different outcomes.

  Once genes are sequenced, the order of the G, A, T, and C bases is translated into a digital file. The GWAS involves a computer algorithm scanning the genomes of large groups of people, looking for genetic variations associated with specific genetic diseases or traits. Each GWAS can look for thousands of these variations (which scientists call single nucleotide polymorphisms, or SNPs). The more relevant mutations that are found, the more accurate future studies will become.

  To better understand how GWAS and other processes for making sense of immense amounts of genetic data work, imagine trying to comprehend a forest. Imagine that other people have been traveling through the maze of trees and branches for years and have identified thousands of the most significant places in the forest where some of the most important things are happening—perhaps the waterfalls, animal feeding grounds, special plants, etc. We know that these types of sites are significant based on our experience traveling through many other forests. One way to better understand this forest would be to visit each of these high-impact sites and see what’s going on. A GWAS does the same within the vast expanse of the genome by seeing what specific genetic markers—ones already flagged as relevant to what we may be looking for—are doing.

  Beyond GWAS, newer next generation sequencing (NGS) tools are making it possible for researchers to sequence all of the protein coding genes and then all of the genes in a given genome. Looking at the protein coding genes is like finding a trail that links the most important sites in your forest and allows you to understand how all of the different points along the trail connect to and interact with one another. Sequencing the whole genome is like looking at the entire forest, a bigger and more complicated job but one that ultimately helps us understand the forest far better than just looking at the most important places.

  Focusing on such a huge data set as the entire forest or the entire genome is a far more daunting analytical task. It’s easier for us to get a sense of a few waterfalls and plants, or of a few target genes, than to understand the broader and more complicated ecosystem of the forest or the genome as a whole. But if we could understand these broader ecosystems, we’d know a great deal more about the forest and, in the case of the human genome, ourselves.

  The more we move from looking at how a single genetic mutation causes a disease or trait to how a complex pattern of genes and other systems creates a certain outcome, the less possible it becomes to establish causality using our limited human brains alone. That’s why the intersection between the genetics and biotechnology revolutions on the one hand and the artificial intelligence (AI) and big-data analytics revolutions on the other are so critical to our story.

  The ancient Chinese game of Go, considered by many the world’s most complicated board game, has long played a central role in China’s culture and strategic thinking. Invented more than 2,500 years ago, the Go board is made up of 361 squares onto which one player places black stones and the other white. Moving in turn, each player tries to encircle the other player’s stones to take them off the board. Whoever controls the most territory when the game ends is the winner. To put the complexity of Go into perspective, the average chess move after the first two moves has around 400 options. The average move in Go has around 130,000 options.

  Even after IBM’s Deep Blue computer defeated chess grand champion Garry Kasparov in 1996, most observers believed it would be many decades before a computer could defeat the world Go champions because Go’s mathematical complexity rendered Deep Blue’s computational approach useless. But when Google DeepMind’s AlphaGo program deployed advanced machine learning capabilities to trounce the Korean and Chinese Go world champions in a series of high-profile competitions in 2016 and 2017, the world took notice.

  The AlphaGo program learned how to play Go in part by analyzing hundreds of thousands of digitally recorded human Go matches. In later 2017, DeepMind for Google introduced a new program, AlphaGo Zero, which did not need to study any human games. Instead, the programmers provided the algorithm with the basic rules of Go and instructed it to play against itself to learn the best strategies. Three days later, AlphaGo Zero defeated the original AlphaGo program that had bested the leading grandmasters.

  AlphaGo Zero can crush any human Go player because it recognizes layers of patterns in a massive field of data far beyond what any human could hope to achieve on his or her own. The rapid advance of AI has frightened many people, technology entrepreneur Elon Musk and the late Stephen Hawking among them, who are concerned that AI will supplant and someday potentially harm humans.4 These types of fears are theoretical at our current stage of technological development, but the idea of using AI to begin unlocking the secrets of our biology is not. Today, it is quite clear that AI technology is not supplanting us; it is enhancing us.

  Thousands of books have been written about how the information and computing revolutions are transforming the way we store and process information. In the 1880s, punch cards were a major innovation for processing what seemed then like large amounts of data. The magnetic tape was first used in the 1920s to store information and made possible the machines developed to crack Nazi and Japanese secret codes and win the Second World War. Soon after, Hungarian-American genius John Von Neumann laid the foundations of modern computing that underpinned the development of the mainframe, personal computers, and the eventual internet revolution. Now, the connected big data and AI revolutions are allowing us to make increasingly more sense of the growing mountains of data being generated inside and around us.

  It is no coincidence that the first word in big-data analytics is big. More data has been generated in the past two years than in the entirety of human history beforehand, allowing us to do ever-bigger things in a forever accelerating process.5 This data analytics revolution is massively expanding the problem-solving capacity of our species.

  When Thomas Edison was inventing the phonograph, lightbulb, electrical grid, motion picture camera, and much else in Menlo Park, New Jersey, and he faced a challenge he could not figure out himself, he could speak with a relatively small number of people, perhaps in the hundreds, or read a limited number of books and papers. Today, however, most of our species is networked through the internet; we can glide past problems others have already solved and focus on the new challenges we ourselves are best able to address.

  When brilliant people like Edison died, much of their knowledge went with them to the grave. Today, far more of our information and knowledge is captured in our accessible digital records, and the data processing and knowledge-amassing tools we are developing will live in perpetuity. Human death remains an individual and familial tragedy (and just generally sucks), but it has far less impact on the advance of our collective knowledge and our species more generally than it used to. Already today, most of us are in many ways smarter with our smartphones than even great thinkers of the past. We are functionally merging with our breathtaking and fast-improving tools and are, in many significant ways, better off for it.

  The big data and machine learning revolutions are helping us figure out all sorts of systems, from urban planning to autonomous vehicles to space travel, but among their most significant impacts will be on our understanding of and ability to manipulate biology. As difficult as Go dominance is, recognizing the patterns of huma
n biology are far more complex. DeepMind’s AlphaZero algorithm, a more general version of AlphaGo Zero, isn’t just a champion at Go, it also bested leading human champions and inferior algorithms in chess and the complex Japanese strategy game of shogi. The rules of these games fed to AlphaZero are simple and straightforward. The “rules” of our biology could well someday be known to us, but today we humans, even working with our AI tools, struggle to figure them out.

  To get there, cutting-edge scientists are deploying big data analytical and deep-learning tools to help make more sense of the human genome. Deep-learning software is being used not just to diagnose medical images of breast and other cancers more accurately than human radiologists, but also to synthesize patients’ genomic information and electronic medical records to begin diagnosing and even predicting diseases.

  Companies around the world are racing to accelerate this process. The innovative Canadian company Deep Genomics, for example, is bringing together AI and genomics to uncover patterns in how diseases work because, in its words, “the future of medicine will rely on artificial intelligence, because biology is too complex for humans to understand.”6 Google and a Chinese company, WuXi NextCODE, recently released highly sophisticated, cloud-based AI systems designed to help make sense of the massive amounts of data coming from genetic sequencing. Boston-based Biogen is looking actively at how quantum computing might supercharge the ability to find meaningful patterns from these massive data sets.7

  The intersection of AI and genomics will become more powerful as deep-learning techniques become more sophisticated, more and larger data sets of sequenced genomes become available, and our ability to decipher more of the underlying principles of our systems biology grows.

  As the genomic data set expands, scientists will use AI tools to better understand how complex genetic patterns can lead to specific outcomes. The real benefit comes from not only sequencing very large numbers of people but also from comparing their genotypes (their genetic makeup) to their phenotypes (how these genes are expressed over the course of their lives). The more sequenced genomes can be matched with detailed life records shared in a common database, the better able we will be to figure out what our genes and other biological systems are doing.

  “The world’s most valuable resource,” The Economist wrote in 2017, “is no longer oil, but data.”8 In the case of genomics, it is high-quality data, matching people’s biology with the most specific information possible about many other aspects of their lives.9

  Bringing together these vast sets of genetic data and life records will require relatively uniform electronic health and life records to be analyzed by AI algorithms. Today’s diversity of health, medical, and life records systems make the sharing of large pools of genomic data more difficult than it needs to be. In an ideal world, everyone would have their full genome sequenced and all of their personal and medical data recorded accurately in a standardized electronic medical record shareable with researchers in an open network.

  In the real world, however, the idea of our most intimate data being made available to people we don’t know in a searchable database frightens many of us—for good reason. But different researchers, companies, and governments around the world are exploring different approaches to balancing our collective need for big data pools and our individual desire for data privacy.*

  Iceland is one of the world’s most genetically homogenous societies. Settled by a small number of common ancestors in the ninth century, with relatively few immigrants arriving since, and possessing detailed genealogical, birth, death, and health records going back centuries, the country is an ideal laboratory for genetic research. In 1996, Icelandic neurologist Kári Stefánsson cofounded deCODE Genetics, a company with the ambitious goal of mining the gene pool of Icelanders to better understand and find cures for a range of diseases. To get the data they needed, deCODE convinced Iceland’s parliament to grant the company access to national health records and convinced Icelanders, many of whom became company shareholders, to donate their blood to the company.

  When Swiss pharmaceutical giant Hoffmann-La Roche bought deCODE for $200 million in 1998, many Icelanders felt betrayed. An ensuing lawsuit denied deCODE access to the national health record system and required each individual to consent to their personal records being shared. But after deCODE and Hoffman-La Roche offered individual Icelanders free access to any drugs developed in the collaboration, many Icelanders signed back up. Today, deCODE holds 100,000 blood samples and has used its genetic and data pool to discover genes linked to various diseases and even came up with novel treatment for heart attacks.10 Another pharmaceutical giant, AstraZeneca, announced in early 2018 that it planned to sequence half a million genomes from its own clinical trials by 2026.11 Governments are also very much involved in efforts to amass large pools of genetic data.

  Britain’s 100,000 Genomes Project of Genomic England, launched with great fanfare by Prime Minister David Cameron in 2012, is sequencing patients in the country’s National Health Service (NHS) with rare diseases and cancer, as well as their families. With the goal of matching genetic information with health records to better understand and advance the treatment of genetic disease, the 100,000 Genomes Project sought to “kick-start the development of a UK genomic industry.”12 Raising the ante, the NHS Genomic Medicine Service announced in October 2018 that all adults with certain cancers and rare diseases would be offered whole genome sequencing with the goal of sequencing five million Britons over the coming five years.

  Americans might think the absence of a unified national health system makes this kind of government-led effort more difficult in the United States, but the recently launched U.S. plan is also ambitious. After years of delays, in spring 2018, the U.S. National Institutes of Health began recruiting a targeted million Americans from all socioeconomic, ethnic, and racial groups to submit their sequenced genomes, health records, regular blood samples, and other personal information to the All of Us Research Program.13 Congress has authorized a $1.45 billion ten-year budget for this program, and enrollment sites are being set up across the United States. If privacy concerns and bureaucratic inertia can be addressed, this initiative could do a lot to push genetic research forward. The U.S. Department of Veterans Affairs also launched its own biobank, the Million Veterans Program, which plans to sequence a million veterans by 2025 to match their genotypes and health and service records.14

  Creative private-sector models are also emerging that try to balance the societal interest of accessible, big-data pools of genetic information with the interest of many individuals to maintain some level of control over their genetic data. LunaDNA, a young San Diego–based company created by Illumina alumni, is seeking to bring the many small and disparate genetic data sets held by multiple companies and clinics together into a searchable collective by rewarding the individuals willing to share their genetic information with cryptocurrency.15 This type of approach makes particular sense because people’s sequenced genomes, just like their internet search history, will soon have a very significant commercial value whose benefits deserve to be shared with consumers. The Boston-based Personal Genome Project is trying to build an open-source coalition of national genetic data pools.16

  Perhaps not surprisingly, China has embarked on the most aggressive path for building its big-data genetic pool on an industrial scale. Its recently announced $9 billion, fifteen-year investment to improve national leadership in precision medicine, for example, dwarfs similar initiatives around the world.17 China’s National Development and Reform’s thirteenth Five-Year Plan for biotech industry development aims to sequence at least 50 percent of all newborns (including prepregnancy, prenatal, and newborn testing) in China by 2020 and support hundreds of separate projects to sequence genomes and gather clinical data in partnership with local governments and private companies.18 China is also moving aggressively toward establishing a single, shareable format for all electronic health records across the country and to ensure that privacy prot
ections do not impede access to this data by researchers, companies, and the government.

  As a result of all of these types of efforts around the world, it is estimated that up to two billion human genomes might be sequenced within the coming decade.19 Making sense of this massive amount of data, correlating it to electronic health and life records, and integrating it with large data sets of other human biological systems will require significantly more computing power than we have today, but with the increase in supercomputing capacity around the world there is little doubt we’ll eventually get there.20

  Bringing together so much genetic and personal information in shareable digital databases would be an almost impossible task if each person was making an individual decision about whether to have his or her genome sequenced or not. Instead, most everyone who is born through IVF and embryo selection or visits a doctor’s office or hospital at any point in their life will be sequenced as standard procedure—the way people routinely get their pulse tested today—in our collective shift from our system of generalized medicine to the new world of personalized, a.k.a. precision, medicine.

  Our current medical world is based largely on averages. Not every drug, for example, works for every person, but if it works for even a moderate percentage of people, regulators will often approve it. If you show up in a standard doctor’s office with a condition that could be treated with the drug, generally there’s a very simple way you find out if it works—by trying it. If you take the common blood thinner Warfarin and it helps, that proves it is right for your biology. If you are among the one in a hundred people for whom Warfarin causes internal bleeding and possibly death, you learn the opposite the hard way.

  Generalized medicine was our only way to do things when our understanding of how each individual human being works was low. In the coming world of personalized medicine, this approach will seem the equivalent of leeches. Instead of just seeing a doctor, you will see a doctor paired with an AI agent. Your treatment for ailments from headaches to cancers will be chosen based on how well they work for a person like you. Every person’s individual biology—including your gender and age, the status of your microbiome, your metabolic indicators, and your genes—will be the foundation of your medical record and care.

 

‹ Prev