Fifteen years after the genome was sequenced, the shift in focus from candidate genes to genome-wide analyses and from mutation to changes in standing variation has resulted in a large body of work documenting that recent evolutionary selection pressure has been extensive. And it has been mostly local.
Recent Selection Pressure Has Been Extensive
One of the unexpected findings since the sequencing of the genome has been how much evolution has taken place in the recent past, continuing up to the present. Most of it has occurred through changes in standing variation. Some of it has occurred through introgression.
Evolution Through Selection Within Homo Sapiens
Evolutionary geneticists wasted no time in using the new tools that became available after the sequencing of the genome. In a 2009 article, just six years after the final draft was published, Joshua Akey could assemble the results of 21 studies that had already used genome-wide scans for natural selection.46 Focusing on the seven studies that used the largest samples at that time, the HapMap and Perlegen databases, he constructed an integrated map of positive selection.47
The aggregate numbers from just those seven were startlingly large. A total of 5,110 distinct regions of the genome were identified by at least one study, encompassing 14 percent of the genome and 23 percent of all genes. A total of 722 regions containing 2,465 genes were identified in two or more studies. His conclusion:
Genomic maps of selection suggest widespread genetic hitchhiking… throughout the genome. Although the veracity of this statement is subject to the limitations described above, it is fair to say that the number of strong selective events thought to exist in the human genome today is considerably more than that imagined less than a decade ago. Again, restricting our attention to the 722 loci identified in two or more genome-wide scans, ∼245 Mb (∼8%) of the genome has been influenced by positive selection, and an even larger fraction may have been subject to more modest selective pressure.48
Since then, the number of loci believed to be under selection has continued to grow. In 2012, a team predominantly from Stanford used a different method, employing a measure called ROH (“runs of homozygosity”), applying it to the combined data from the HGDP-CEPH and HapMap databases.49 They identified 69 regions that they labeled “ROH hotspots.” The top 10 included genes involving cell function, connective tissue development, the brain, vision, the central nervous system, and skin pigmentation. Five of the top 10 regions had not previously been identified as sources of recent selection pressure.
In 2013, Sharon Grossman of the Broad Institute of Harvard and MIT and her colleagues published the results for a method called CMS (Composite of Multiple Signals) designed to pinpoint specific variants within the genetic regions under selection.50 Initial applications of this method yielded an additional 86 regions showing a high probability of selection pressure, along with hundreds of specific genes that appear to have been under recent selection pressure. These include genes involving hearing, immunity, infectious disease, metabolism, olfactory receptors, pigmentation, hair and sweat, sensory perception, vision, and brain development.
In 2016, another international team made progress in identifying human adaptation within the last 2,000 years. Using a new method, the Singleton Density Score, applied to a sample of the ancestors of modern Britons, they found strong signals in favor of selection for lactase persistence, blond hair, and blue eyes. Their conclusion: “Our results suggest that selection on complex traits has been an important force in shaping both genotypic and phenotypic variation within historical times.”51
Also in 2016, Daniel Schrider and Andrew Kern published a sophisticated new machine-learning technique called Soft/Hard Inference through Classification. They applied it to six populations that have low levels of historical admixture: three of sub-Saharan Africans (two from West Africa, one from Kenya), one of non-Latino whites in Utah, one of Japanese in Japan, and one of Amerindians in Peru. Their method identified 1,927 “distinct selective sweeps,” of which 1,408 were ones not previously identified.52 The work by Schrider and Kern also substantiated the importance of changes in standing variation. Of the total 1,927 sweeps they identified, 92.2 percent were soft, which accounts for the title of their article, “Soft Sweeps Are the Dominant Mode of Adaptation in the Human Genome.” Geneticists Rajiv McCoy and Joshua Akey summarized the implications this way:
This finding has potentially wide-ranging implications for the dynamics of neutral and slightly deleterious variation.… More generally, a widespread influence of selective sweeps challenges the long-standing neutral theory of molecular evolution, which states that most variation within and between species does not impact fitness and is largely governed by random genetic drift.… If a large proportion of genetic variation is in fact influenced by linked positive selection, null models may need to be updated to better reflect this complexity.53
I have restricted myself to a handful of the global and largest studies. The total body of work is far greater. In 2009, Joshua Akey had 21 studies to work with. A 2016 review article titled “Fifteen Years of Genomewide Scans for Selection” included an additional 52 studies.54
In response to such findings, a controversy has arisen. Are we really looking at natural selection or at genetic drift due to purifying and background selection?55 A team of researchers at the Max Planck Institute for Evolutionary Anthropology set out to answer that question through an analysis of contemporary allele frequencies in the 1000 Genomes Project combined with evidence from a high-quality genome of a 45,000-year-old anatomically modern human from Ust’-Ishim in Siberia. Their answer, published in 2016, was that many of the most strongly differentiated alleles between Africans and Eurasians had not risen in frequency after the dispersal from Africa. “Nevertheless, our results provide clear evidence that local adaptation contributed to these allele frequency changes in European populations, as strongly differentiated alleles in Europeans are enriched in likely functional variants.”56
“Evidence” is the correct word, not “proof.” Techniques for discriminating natural selection from other sources of change in standing variation are still being refined. The state of knowledge is still nowhere close to a firm number for specifying how much total evolutionary change there has been, how much of that total has been an adaptive response to natural selection, how much has been a nonadaptive response to selection on correlated traits, and how much has occurred by genetic drift.57 What can be said more confidently is that the regions under selective pressure since the dispersal from Africa are in fact extensive; that the methods for identifying these regions have steadily improved since the earliest studies at the beginning of the century; and that each new compilation shows a new and substantial amount of the genome has been influenced—both directly and indirectly—by selection.
Evolution Through Introgression
Descendants of those who left Africa experienced significant introgression of genes from other hominins. Two hominins are definitely involved: Neanderthals and Denisovans. Others might eventually be identified.58
Scientists have known since the 1860s that a race of advanced hominins other than Homo sapiens once lived in Europe. More recently, they have established that Neanderthals lived in Asia as well. They descended from Homo erectus along a separate line from Homo sapiens. The timing of the split has been estimated as early as 800,000 and as late as 400,000 years ago. It is now established that anatomically modern humans bred with both Neanderthals and another, recently discovered archaic hominin, the Denisovans, whose discovery in a cave near southern Siberia’s Altai Mountains was announced in 2010. Introgression between Denisovans and Homo sapiens left traces in modern East Asians and in peoples from New Guinea and elsewhere in the Pacific. It is argued that Denisovan gene variants may also account for Tibetans’ ability to function at high altitudes.59
Interbreeding between Neanderthals and Homo sapiens occurred in several places and times in Europe and at least to some extent in East Asia, leaving traces of Neand
erthal DNA in modern Europeans and East Asians amounting to about 1 to 2 percent of the genome. It appears that Neanderthal alleles may have helped humans adapt to non-African environments; Neanderthals are argued to have been not only cold-adapted but hyperarctic-adapted.60 Another study concluded that “the major influence of Neandertal introgressed alleles is through their effects on gene regulation.”61 The safest conclusion at the moment is that most of the story is yet to be told. The balance of probability says that most of the variants picked up from the Neanderthals were neutral or negative, but the Neanderthals had probably been adapting to conditions and pathogens not found in Africa for hundreds of thousands of years and are bound to have carried many variants that would have been advantageous to the newcomers. Chances are good that humans picked up some of them.
The new findings about recent evolution from natural selection and introgression have triggered a spirited debate that is of greater interest to population geneticists than to us. The neutral theory of molecular evolution has been an intellectual centerpiece of population genetics since the early 1980s. It continues to have vigorous defenders. Responding to an attack, an international team of seven population geneticists (first author was Jeffrey Jensen) concludes that “it is now abundantly clear that the foundational ideas presented five decades ago by Kimura and Ohta are indeed correct.”62 On the other side are geneticists who think that the debate is over. “We argue that the neutral theory was supported by unreliable theoretical and empirical evidence from the beginning, and that in light of modern, genome-scale data, we can firmly reject its universality,” wrote geneticists Andrew Kern and Matthew Hahn in 2018. “The ubiquity of adaptive variation both within and between species means that a more comprehensive theory of molecular evolution must be sought.”63 If history is a guide, the best bet is that some sort of theoretical synthesis will arise from this dialectic. Whatever form it takes, it will have to accommodate the evidence for far more recent evolution than was anticipated before the genome was sequenced.
Recent Selection Pressure Has Been Mostly Local
From the dawn of the genomics era, studies of recent selection have also found that “local adaptation” was widespread, with “local” meaning that the genes under selection varied by continent. An early analysis of local adaptation using the HapHap database was published in 2006 by a team of geneticists (first author was Benjamin Voight). They examined regions of the genome under selection pressure for three populations: Yoruba (a Nigerian tribe), Europeans from a mix of Northern and Western European countries, and East Asians (a mix of Chinese and Japanese). Of the 579 regions, 76 percent were unique to one of the three populations, 22 percent were shared by two of the three, and only 2 percent were shared by all three populations.64 In the authors’ judgment, the degree to which selection occurred independently is probably underestimated by these percentages.[65] In any case, these events represent quite recent selection—“average ages of ∼6,600 years and ∼10,800 years in the non-African and African populations respectively,” in the authors’ judgment.66
The same pattern has been found repeatedly. In Joshua Akey’s literature review of 21 early studies of recent selection published in 2009, he found that “∼80% of the 722 loci observed in multiple scans show evidence of local adaptation.”67 In that same year, an international team funded by the Max Planck Society and the German government published results using the HGDP-CEPH database you encountered in chapter 7. This study employed its own distinctive method of identifying regions under selection pressure but got familiar results. Grouping the 51 populations into their continental ancestral location (European, East Asian, Central/South Asian, Middle Eastern, Oceanian, and Amerindian), 68 percent of the regions under selection were under selection for a single population. Another 20 percent were under selection in just two of the six. Only 1 percent were under selection in all six populations.68
The six ethnic African ethnicities had little overlap with any of the non-African ethnicities.69 Of the 632 regions identified as under positive selection, at least one of the African populations was represented in 146 of them. Of those 146 regions, 82 percent were represented by a single African ethnicity. Just one was shared by two African populations. Only 18 percent were shared by an African population and one or more non-African populations.70
Principal Component Analysis Revisited
Recall from chapter 7 the patterns shown when noncoding genetic markers were analyzed using principal component analysis. There’s no reason why the same analysis could not be applied to functional variants. In 2013, an international team of geneticists (first author was Xuanyao Liu) did so, using yet another method of identifying signatures of positive selection, applied to the 14 populations of Phase 3 of the 1000 Genomes Project, that identified 405 regions under selection. The figure below shows the results.
Source: Adapted from Liu, Ong, Pillai et al. (2013): Fig. 5C.
The meaning of the abbreviations: LWK: Luhya in Kenya. YRI: Yoruba in Nigeria. ASW: African Americans in the Southwest. MKK: Maasai in Kenya. JPT: Japanese in Tokyo. CHB: Chinese in Beijing. CHS: Chinese in Singapore. CHD: Chinese in Denver. MAS: Malays in Singapore. INS: Indians in Singapore. GIH: Gujarati Indians in Houston. MXL: Mexicans in Los Angeles. CEU: Europeans in Utah. TSI: Tuscans in Italy. The Mexican population is an unknown mix of European and Amerindian ancestry.
The specifics are different from the profiles in the principal component analysis shown in chapter 7—to be expected, since the nature of the datum entered in each cell of the matrix for the figure above is completely different from those entered in the earlier figure. Some of the subpopulations are different as well.71 But the clusters formed by the 14 populations are distinct and familiar. When geneticists use noncoding genetic variation from multiple populations, those populations are genetically distinctive in ways that broadly correspond to self-identified race and ethnicity. When geneticists use genetic variation that is not only functional but has been under selection pressure since the dispersal from Africa, the same correspondence usually appears.
Recapitulation
Much has changed since the 1980s when it was still possible for Stephen Jay Gould to believe that evolution since humans left Africa couldn’t be more than skin deep. The main events were the sequencing of the genome and then the advent of genome-wide scans. Those analyses in turn shifted the center of attention from evolution through mutation to evolution through changes in allele frequencies. The same analyses uncovered unexpectedly large portions of the genome that have been under recent selection. Like the results of the cluster analyses of noncoding SNPs discussed in chapter 7, the new analyses showed that the regions of the genome under selection varied by geography and population ancestry. Or to summarize it in the words of Proposition #6, evolutionary selection pressure since humans left Africa has been extensive and mostly local.
9
The Landscape of Ancestral Population Differences
Proposition #7: Continental population differences in variants associated with personality, abilities, and social behavior are common.
This chapter is about raw ancestral population differences in SNPs that are statistically related to cognitive repertoires—“raw” meaning that a great deal of work remains to be done before the significance of such differences is understood.
Until a few years ago, this topic was still terra incognita. Only a handful of statistical relationships between specific SNPs and cognitive traits had been identified. But the growth in that number has been phenomenal, paralleling the growth in the number of SNPs associated with diseases and physiological traits. To illustrate what’s been happening, consider the GWAS Catalog. It was begun in 2008 by the National Human Genome Research Institute, part of the U.S. National Institutes of Health.1 The first year with a published GWAS was 2005, when two studies reported a grand total of two SNPs.2 In 2018 alone, the GWAS Catalog added 17,182 previously unidentified SNPs. Here’s what the history looks like:
Source: Author’s anal
ysis, GWAS Catalog.
As of the end of May 2019, the catalog included 3,469 studies reporting 136,286 variants. The total number of unique variants was 89,544. And that’s just a fraction of the total number of variants that have been associated with phenotypic traits at lesser levels of statistical significance. That total is over a million, residing in databases maintained by university and private sector research centers scattered around the world.
Hardly any analyses of this burgeoning knowledge base have compared results for different continental ancestral populations (which for readability I will subsequently abbreviate to “continental populations”). Researchers have been wary of such comparisons because the results can’t be trusted. Paradoxically, the reason they can’t be trusted has indirectly become the reason that continental population differences will soon be studied intensively.
Human Diversity Page 22