by David Reich
Figure 2. Ancient DNA labs are now producing data so fast that the time lag between data production and publication is longer than the time it takes to double the data in the field.
Much of the technology for the genome-wide ancient DNA revolution was invented by Svante Pääbo and his colleagues at the Max Planck Institute for Evolutionary Anthropology in Leipzig, Germany, who developed it to study extremely old samples such as archaic Neanderthals and Denisovans. My contribution has been to scale up the methods to study large numbers of relatively more recent samples, albeit still many thousands of years old. The traditional length of an apprenticeship is seven years, and I began mine in 2007 when I started working with Pääbo on the Neanderthal and Denisova genome projects. In 2013, Pääbo helped me to establish my own ancient DNA laboratory—the first in the United States focused on studying whole-genome ancient human DNA. My partner in this effort has been Nadin Rohland, who did her own seven-year apprenticeship in Pääbo’s laboratory before she came to mine. Our idea was to make ancient DNA industrial—to build an American-style genomics factory out of the techniques developed in Europe to study individual samples.
Rohland and I realized that a technique developed by Matthias Meyer and Qiaomei Fu in Pääbo’s laboratory could be the key to the industrial-scale study of ancient DNA. Meyer and Fu’s invention was born of necessity: the need to extract DNA from an approximately forty-thousand-year-old early modern human from Tianyuan Cave in China.16 When Meyer and Fu extracted DNA from Tianyuan’s leg bones, they found that only about 0.02 percent of it was from the man himself. The rest came from microbes that had colonized his bones after he died. This made direct sequencing too expensive, even using the hundred-thousand-times cheaper technology that had become available after around 2006. To get around this challenge, Meyer and Fu borrowed a page from the playbook of methods developed by medical geneticists. Just as medical geneticists had developed methods to isolate DNA from the 2 percent of the genome that is most interesting and to discard the other 98 percent, Meyer and Fu isolated a tiny fraction of sequences from the Tianyuan bone that were human and discarded the rest.
The method of DNA isolation that Meyer and Fu developed has been central to the success of the ancient DNA revolution. In the 1990s, molecular biologists learned how to adapt laser-etching techniques invented for printing electronic circuits to attach millions of DNA sequences of their choice to silicon or glass wafers. These sequences could then be cut off the wafers using molecular scissors (enzymes) and released into a watery mix. Meyer and Fu took advantage of this method to synthesize fifty-two-letter-long sequences of DNA that, overlapping like shingles on a roof, covered much of human chromosome 21. Exploiting DNA’s tendency to bind to highly similar sequences, they “fished” out the DNA sequences from Tianyuan that they were interested in by using as “bait” the sequences they had artificially synthesized. They found that a large fraction of the DNA they obtained was from Tianyuan’s genome. Not only that, but it was from the parts of Tianyuan’s genome that they wanted to study. They analyzed the data to show that Tianyuan was an early modern human, part of the lineage leading to present-day East Asians. He did not have a particularly large amount of ancestry from archaic human lineages that were diverged by hundreds of thousands of years from modern human lineages, contradicting earlier claims based on the shape of his skeleton.17
Rohland and I adapted this technique to study the whole genome. We worked with our colleagues in Germany to synthesize fifty-two-letter-long DNA sequences covering more than a million positions at which people are known to vary. We used these bait sequences to enrich for human compared to microbial DNA, which in some cases increased the fraction of DNA that was of interest to us by more than a hundredfold. We gained another approximately tenfold jump in efficiency because we only targeted informative positions in the genome. We automated the whole approach, processing the DNA using robots that allowed a single person to study more than ninety samples at once in the span of a few days. We hired a team of technicians to grind powder out of ancient remains, to extract DNA from the powder, and then to turn the extracted DNA into a form that we could sequence. The laboratory work was only the beginning. An equally intricate task was sorting the billions of DNA sequences into the individuals to whom they belonged, analyzing the data and weeding out samples with evidence of contamination, and creating an easily accessible dataset. Shop Mallick, a physicist who had joined my laboratory six years before, set up our computers to do all of this, and continually updated our strategy for processing the data as the nature of the data evolved and its volume increased.
The results were even better than we had hoped. The cost of producing genome-wide data dropped to less than five hundred dollars per sample. This was many dozens of times cheaper than brute-force whole-genome sequencing. Even better, our method made it possible to get genome-wide data out of around half of the skeletal samples we screened, although the success rate of course varied depending on the degree to which the skeletons we examined had been preserved. For example, we have obtained about 75 percent success rates for ancient samples from the cold climate of Russia, but only around 30 percent for samples from the hot Near East.
These advances mean that whole-genome study of ancient DNA no longer requires screening large numbers of skeletal remains before it is possible to find a few individuals whose DNA can be analyzed. Instead, a substantial fraction of screened samples dating to the last ten thousand years can now be converted to working genome-wide data. The new methods have made it possible to analyze hundreds of samples in a single study. With such data, it is possible to reconstruct population changes in exquisite detail, transforming our understanding of the past.
By the end of 2015, my ancient DNA laboratory at Harvard had published more than half of the world’s genome-wide human ancient DNA. We discovered that the population of northern Europe was largely replaced by a mass migration from the eastern European steppe after five thousand years ago18; that farming developed in the Near East more than ten thousand years ago among multiple highly differentiated human populations that then expanded in all directions and mixed with each other along with the spread of agriculture19; and that the first human migrants into the remote Pacific islands beginning around three thousand years ago were not the sole ancestors of the present-day inhabitants.20 In parallel, I initiated a project to survey the diversity of the world’s present-day populations, using a microchip for analyzing human variation that my collaborators and I designed specifically for the purpose of studying the human past. We used the chip to study more than ten thousand individuals from more than a thousand populations worldwide—a dataset that has become a mainstay of studies of human variation not just in my laboratory but also in other laboratories around the world.21
The resolution with which this revolution has allowed us to reconstruct events in the human past is stunning. I remember a dinner at the end of graduate school with my Ph.D. supervisor, David Goldstein, and his wife, Kavita Nayar, both of whom had been students of Cavalli-Sforza. It was 1999, a decade before the advent of genome-wide ancient DNA, and we daydreamed together, wondering how accurately events of the past could be reconstructed by traces left behind. After a grenade explosion in a room, could the exact position of each object prior to the explosion be reconstructed by piecing together the scattered remains and studying the shrapnel in the wall? Could languages long extinct be recalled by unsealing a cave still reverberating with the echoes of words spoken there thousands of years ago? Today, ancient DNA is enabling this kind of detailed reconstruction of deep relationships among ancient human populations.
These days, human genome variation has surpassed the traditional toolkit of archaeology—the study of the artifacts left behind by past societies—in what it can reveal of changes in human populations in the deep past.22 This has come as a surprise to nearly everyone. Carl Zimmer, a science journalist at The New York Times who has written frequently about this new field, told me that when he was ass
igned by his newspaper to cover the study of ancient DNA, he agreed to do it as a service to the science team, thinking it would be a minor sideshow to his main focus on evolution and human physiology. He imagined writing an article about the field every six months or so, and that the rush of discoveries would end after a year or two. Instead, Zimmer now finds himself dealing with a major new scientific paper every few weeks, even as developments are accelerating and the revolution intensifies.
This book is about the genome revolution in the study of the human past. This revolution consists of the avalanche of discoveries based on data taken from the whole genome—meaning, the entire genome analyzed at once instead of just small stretches of it such as mitochondrial DNA. The revolution has been made far more powerful by the new technologies for extracting whole genomes’ worth of DNA from ancient humans. I make no attempt to trace the history of the field of genetic studies of the past—the decades of scientific analysis of human variation that began with studies of skeletal variation and continued with studies of genetic variation in tiny snippets of the human genome. These efforts provided insights into population relationships and migrations, but those insights pale when compared to the dazzling information provided by the extraordinary tranches of data that began to be available after 2009. Before and after that year, studies of one or a few locations in the genome were occasionally the basis for important discoveries, providing evidence in favor of some scenarios over others. Yet genetic evidence before around 2009 was mostly incidental to studies of the human past in other fields, a poor handmaiden to the main business of archaeology. Since 2009, though, whole-genome data have begun to challenge long-held views in archaeology, history, anthropology, and even linguistics—and to resolve controversies in those fields.
The ancient DNA revolution is rapidly disrupting our assumptions about the past. Yet there is at present no book by a working geneticist that lays out the impact of the new science and explains how it can be used to establish compelling new facts. The findings needed to grasp the scope of the ancient DNA revolution are scattered among hard-to-read, jargon-filled scientific papers, sometimes supplemented by hundreds of pages of dense notes on methodology. In Who We Are and How We Got Here, I aim to offer readers a clear view through this extraordinary window into the past—to provide a book about the ancient DNA revolution intended for lay reader and specialist alike. My goal is not to present a synthesis—the field is moving too quickly. By the time this book reaches readers, some advances that it describes will have been superseded or even contradicted. In the three years since I began writing, many fresh findings have emerged, so that most of what I describe here is based on results obtained after I started. I hope that readers will take the topics I discuss as examples of the disruptive power of whole-genome studies, not as a definitive summary of the state of the science.
My approach is to take readers through the process of discovery, with each chapter serving as an argument that has as its goal to bring readers, who may have come with one perspective when they started, to another place when they finish. I try to make a virtue of my laboratory’s central role in the ancient DNA revolution by telling the story of my own work where it is relevant—as this is a subject on which I can speak with great authority—while also discussing work in which I was not involved when it is critical to the story. Because I take this approach, the book disproportionately highlights the work from my laboratory. I apologize that I have been able to mention by name only a tiny fraction of the people who made equally important contributions. My priority has been to convey the excitement and surprise of the genome revolution, and to take readers on a compelling narrative path through it, not to write a scientific review.
I also highlight some of the great themes that are emerging, especially the finding that mixture between highly differentiated populations is a recurrent process in the human past. Today, many people assume that humans can be grouped biologically into “primeval” groups, corresponding to our notion of “races,” whose origins are populations that separated tens of thousands of years ago. But this long-held view about “race” has just in the last few years been proven wrong—and the critique of concepts of race that the new data provide is very different from the classic one that has been developed by anthropologists over the last hundred years. A great surprise that emerges from the genome revolution is that in the relatively recent past, human populations were just as different from each other as they are today, but that the fault lines across populations were almost unrecognizably different from today. DNA extracted from remains of people who lived, say, ten thousand years ago shows that the structure of human populations at that time was qualitatively different. Present-day populations are blends of past populations, which were blends themselves. The African American and Latino populations of the Americas are only the latest in a long line of major population mixtures.
Who We Are and How We Got Here is divided into three parts. Part I, “The Deep History of Our Species,” describes how the human genome not only provides all the information that a fertilized human egg needs to develop, but also contains within it the history of our species. Chapter 1, “How the Genome Explains Who We Are,” argues that the genome revolution has taught us about who we are as humans not by revealing the distinctive features of our biology compared to other animals but by uncovering the history of migrations and population mixtures that formed us. Chapter 2, “Encounters with Neanderthals,” reveals how the breakthrough technology of ancient DNA provided data from Neanderthals, our big-brained cousins, and showed how they interbred with the ancestors of all modern humans living outside of Africa. The chapter also explains how genetic data can be used to prove that ancient mixture between populations occurred. Chapter 3, “Ancient DNA Opens the Floodgates,” highlights how ancient DNA can reveal features of the past that no one had anticipated, starting with the discovery of the Denisovans, a previously unknown archaic population that had not been predicted by archaeologists and that mixed with the ancestors of present-day New Guineans. The sequencing of the Denisovan genome unleashed a cavalcade of discoveries of additional archaic populations and mixtures, and demonstrated unequivocally that population mixture is central to human nature.
Part II, “How We Got to Where We Are Today,” is about how the genome revolution and ancient DNA have transformed our understanding of our own particular lineage of modern humans, and it takes readers on a tour around the world with population mixture as a unifying theme. Chapter 4, “Humanity’s Ghosts,” introduces the idea that we can reconstruct populations that no longer exist in unmixed form based on the bits of genetic material they have left behind in present-day people. Chapter 5, “The Making of Modern Europe,” explains how Europeans today descend from three highly divergent populations, which came together over the last nine thousand years in a way that archaeologists never anticipated before ancient DNA became available. Chapter 6, “The Collision That Formed India,” explains how the formation of South Asian populations parallels that of Europeans. In both cases, a mass migration of farmers from the Near East after nine thousand years ago mixed with previously established hunter-gatherers, and then a second mass migration from the Eurasian steppe after five thousand years ago brought a different kind of ancestry and probably Indo-European languages as well. Chapter 7, “In Search of Native American Ancestors,” shows how the analysis of modern and ancient DNA has demonstrated that Native American populations prior to the arrival of Europeans derive ancestry from multiple major pulses of migration from Asia. Chapter 8, “The Genomic Origins of East Asians,” describes how much of East Asian ancestry derives from major expansions of populations from the Chinese agricultural heartland. Chapter 9, “Rejoining Africa to the Human Story,” highlights how ancient DNA studies are beginning to peel back the veil on the deep history of the African continent drawn by the great expansions of farmers in the last few thousand years that overran or mixed with previously resident populations.
Part III, “The Disruptive Genome
,” focuses on the implications of the genome revolution for society. It offers some suggestions for how to conceive of our personal place in the world, our connection to the more than seven billion people who live on earth with us, and the even larger numbers of people who inhabit our past and future. Chapter 10, “The Genomics of Inequality,” shows how ancient DNA studies have revealed the deep history of inequality in social power among populations, between the sexes, and among individuals within a population, based on how that inequality determined success or failure of reproduction. Chapter 11, “The Genomics of Race and Identity,” argues that the orthodoxy that has emerged over the last century—the idea that human populations are all too closely related to each other for there to be substantial average biological differences among them—is no longer sustainable, while also showing that racist pictures of the world that have long been offered as alternatives are even more in conflict with the lessons of the genetic data. The chapter suggests a new way of conceiving the differences among human populations—a way informed by the genome revolution. Chapter 12, “The Future of Ancient DNA,” is a discussion of what comes next in the genome revolution. It argues that the genome revolution, with the help of ancient DNA, has realized Luca Cavalli-Sforza’s dream, emerging as a tool for investigating past populations that is no less useful than the traditional tools of archaeology and historical linguistics. Ancient DNA and the genome revolution can now answer a previously unresolvable question about the deep past: the question of what happened—how ancient peoples related to each other and how migrations contributed to the changes evident in the archaeological record. Ancient DNA should be liberating to archaeologists because with answers to these questions in reach, archaeologists can get on with investigating what they have always been at least as interested in, which is why the changes occurred.