16.The reactions in question are catalyzed by the enzymes HisA and TrpF. See Wierenga (2001).
   17.The E. coli enzyme is called L-ribulose-5-phosphate 4-epimerase. After the mutation, it becomes an aldolase. See O’Brien and Herschlag (1999).
   18.Several other changes occurred in its hemoglobin, but this one, a proline-to-alanine substitution, is especially important. See Liang et al. (2001), as well as Liu et al. (2001) and Golding and Dean (1998). A number of additional mechanisms facilitate high-altitude adaptation. See Liu et al. (2001), as well as Monge and Leonvelarde (1991).
   19.Gene duplications of the genes encoding opsins were also involved. Golding and Dean (1998) provide an overview of these and other adaptations.
   20.The phenomenon is also known as the Red Queen effect, a term coined by the American biologist Leigh Van Valen.
   21.These proteins do not appear out of nowhere. They are modifications of so-called ABC transporters, a very large and widespread class of proteins that transport all manner of molecules in and out of cells, in organisms ranging from bacteria to humans. See Putman, van Veen, and Konings (2000), as well as Gottesman et al. (1995). The modifications can affect the transporter’s amino acid sequence or the amount of protein itself, for example by changing the number of genes encoding a transporter in the genome. See Mrozikiewicz et al. (2007), as well as Stein, Walther, and Wunderlich (1994). For the rapid spreading of drug resistance see Tomasz (1997) and LeHello et al. (2013).
   22.One could say that these changes are not dramatic on the level of protein phenotypes, but they are dramatic for the (physiological) phenotypes of whole organisms. Thus whether a change constitutes an innovation depends on the level of organization at which one chooses to study the phenotype.
   23.Strictly speaking, this genotype is the DNA sequence encoding the protein, but the two are equivalent for my purpose, because a single DNA string uniquely specifies an amino acid string.
   24.As of this writing, experiments have determined the folds of more than seventy thousand proteins, and computational methods that infer the fold of one amino acid string from an experimentally determined fold of another, similar string, can infer the shapes of millions more. A central public repository for information about protein fold and function is the Protein Data Bank (http://www.pdb.org).
   25.See Maynard-Smith (1970).
   26.Many proteins are complexes of multiple polypeptides. Such complexes can be many times larger than any one polypeptide.
   27.More precisely, this space is called a generalized hypercube. See Reidys, Stadler, and Schuster (1997). One can walk away from each vertex of this hypercube in as many directions as the hypercube has neighbors. For a protein of one hundred amino acids, for example, which has nineteen hundred neighbors, nineteen hundred such directions exist.
   28.A variety of distance measures exists in sequence space. Several of them take into account that some amino acids are more similar in their chemical properties than others. How far a genotype network reaches through sequence space may vary somewhat with the distance measure used.
   29.See Eco (1977) and Putnam (1975) for a discussion of basic semiotic concepts and ambiguities about the meaning of “meaning.”
   30.Estimates of the fraction of foldable proteins vary broadly between 0.01 and 10 percent. See Keefe and Szostak (2001), as well as Finkelstein (1994) and Davidson and Sauer (1994). For the purpose of this section, I equated meaningful proteins with foldable proteins, because function requires folding in most proteins, with the caveat that some unstructured proteins may also perform useful functions.
   31.I use the notion of “work” here in the physical sense.
   32.See Keefe and Szostak (2001).
   33.Bacteriophages that can go dormant like this are called lysogenic. Their DNA becomes part of the host genome until the host experiences severe stress, at which time the viral DNA starts to express its genes and viral particles are made. See Ptashne (1992).
   34.See Reidhaar-Olson and Sauer (1990) and Taylor et al. (2001). Note that even though the total number of sequences adopting a given function may be large, the fraction of sequence space they occupy may be vanishingly small.
   35.Although these solutions may differ in their amino acid sequence, they may have other commonalities, for example a particular spatial arrangement of specific amino acids that allows catalysis of a reaction.
   36.Our genomes encode more than one globin. The hemoglobin protein itself is made up of four globin polypeptides, two so-called alpha chains and two beta chains, each of which is encoded by different genes. Other globin genes in our genome include one that is mostly expressed during development in the womb, and yet another that is important for binding oxygen in muscles.
   37.This is an estimated rate of mutation per human generation, not per round of DNA replication, which would be even lower. See, for example, Nachmann and Crowell (2000).
   38.Hemoglobin-related diseases are well studied and known as hemoglobinopathies. Sickle-cell anemia is one of them. Not all of these diseases are caused by alterations of single DNA letters. They can also be caused by deletions of DNA and other genetic changes. Some mutations in the DNA letter sequence of a gene may not affect the amino acid sequence of the encoded gene at all, because the genetic code is redundant, such that some nucleotide combinations encode the same amino acid, a fact that I briefly mentioned in chapter 1.
   39.They are taken from the beta chain of hemoglobin.
   40.Assuming a human generation time of twenty-five years.
   41.The estimates of times to most recent common ancestry I provide are approximate, as these times can only be estimated with substantial error. See, for example, Hedges and Kumar (2004), as well as Hedges and Kumar (2003).
   42.Even globins from organisms as different as plants and animals probably were not independent inventions but derive from a common ancestor. See Hardison (1996).
   43.A subtle philosophical question is what constitutes different solutions to the same problem. A chemist might argue that two proteins differing in their amino acid sequence but cleaving a small molecule with the same reaction mechanism are similar solutions, whereas two proteins that use a different reaction mechanism are different solutions. From an evolutionary perspective, however, it is sensible to view all genotypes that serve the same function as different solutions to the same problem, because each of these phenotypes can, in principle, be discovered independently from other such genotypes.
   44.See Kapp et al. (1995) and Goodman et al. (1988). To this day, proteins may keep diverging further and further from their common ancestor. See Povolotskaya and Kondrashov (2010).
   45.I emphasize the role of globins in nitrogen fixation here, but globins can also help distribute oxygen in plants. See Hardison (1996).
   46.See Rizzi et al. (1994).
   47.See Wierenga (2001). Proteins with this fold can actually have different functions, but even proteins with this fold and the same function can be highly divergent. TIM barrels may have originated multiple times independently in the history of life.
   48.The argument is analogous to the one from chapter 3 about the exploration of the metabolic library: A few nonillion organisms exploring a new protein every second since life’s origins would yield only a vanishingly small fraction of all proteins. It would not even make a difference if this estimate were off by several orders of magnitude.
   49.Other factors, such as gene duplication and phenotypic plasticity, can also facilitate innovation in proteins. For an overview of such factors see Wagner (2011).
   50.Most of the protein pairs he analyzed were far apart in genotype space, but not so far that they would not have originated from a common ancestral protein, as opposed to having originated independently. See Ferrada and Wagner (2010).
   51.RNA can also carry out other functions inside cells, such as to regulate genes through a process called RNA interference. Here, RNA can have an advantage over proteins, because th
e principle of base complementarity allows it to bind other nucleic acids with high specificity, such as parts of a messenger RNA transcribed from a gene. Among other functions of RNAs, their role in protein transport is especially noteworthy. It involves the signal recognition particle, an RNA-protein complex that helps proteins enter a part of the cell called the endoplasmatic reticulum.
   52.We know the folds of some very well studied molecules, such as the ribosomal RNA that catalyzes the key reaction of protein synthesis in the ribosome, but such information is lacking for many other RNAs.
   53.Together with Manfred Eigen, Schuster showed theoretically how heterogeneous populations of RNA molecules that can catalyze each other’s production can form self-sustaining systems they called hypercycles. See Eigen and Schuster (1979).
   54.These are described in multiple publications beginning with Hofacker et al. (1994).
   55.The base pairs that can form in the secondary structure are A-U, C-G, and G-U. (RNA contains the base uracil, abbreviated by the letter U, instead of the base thymine of DNA.) One difference between the helices of proteins and those of RNA is that the helices of protein structures are formed by a contiguous amino acid strand, whereas the helices of RNA are formed by different, generally noncontiguous parts of the same molecule. Many RNA molecules also require interactions with metal ions to form stable tertiary structures.
   56.As reported by Schuster et al. (1994), the number S of RNA secondary structures scales exponentially with sequence length L, as S α (L-15)(1.85)L.
   57.An important early paper from Schuster’s research group is Schuster et al. (1994), and a broader range of later work is summarized in Schuster (2006). Although based only on secondary structure, this work provides the most comprehensive characterization of a genotype space to date. On a historical note, the first work that provided potential evidence for the existence of genotype networks came before Schuster’s, and used simple models of protein folding. See Lipman and Wilbur (1991), as well as Lau and Dill (1989). Like RNA secondary structure models, these models tell us little about the evolution of protein function, and more about the evolution of structure. Schuster’s group coined the term “neutral networks” for genotype networks. Although widely used, the term “neutrality” has a specific meaning to most students of molecular evolution. Namely, it implies changes that do not affect fitness in any way. The kinds of changes that distinguish neighboring genotypes on a genotype network are not necessarily of this nature, as I discuss in Wagner (2011). Thus it is best to use this term sparingly, and here I avoid it altogether for this reason.
   58.All these observations refer to typical shapes. There may be shapes formed by only a single RNA sequence, but these shapes would be very hard to find in a blind evolutionary search. The vast majority of RNA sequence space is filled with structures that are formed by many sequences. Moreover, the shapes of multiple biologically important RNA molecules are also formed by many sequences, as we were able to show in Jörg, Martin, and Wagner (2008).
   59.Schultes and Bartel (2000).
   60.To be precise, in these walks they changed two residues at a time to preserve RNA secondary structures. One can think of such changes as a combination of a nucleotide change that disrupts secondary structure, followed by another one that restores it. Such pairs of mutations where one compensates for another are observed quite frequently in nature, and thus occur—although perhaps not simultaneously—in naturally evolving RNA molecules. See Kern and Kondrashov (2004). It is also relevant that these researchers had some inkling that their effort might be successful: They had managed to design a sequence that was intermediate between the starting enzymes and that had both activities.
   61.Their work also showed that a sequence that is intermediate between the starting fuser and splitter sequences can catalyze both reactions. See Schultes and Bartel (2000). Such phenotypic plasticity or promiscuity of RNA molecules further helps innovation, because it can make transitions between two genotype networks even easier. See Wagner (2011), chapter 13.
   62.The RNA polymerase that copies information in genes is a DNA-dependent RNA polymerase. (It uses a DNA template.) The RNA polymerase that replicates RNA is an RNA-dependent RNA polymerase.
   63.It is the so-called group I intron of a transfer RNA gene for isoleucine in the bacterium Azoarcus. See Tanner and Cech (1996), as well as Reinhold-Hurek and Shub (1992).
   64.The best-known form of the second process involves RNA and not DNA. It is called splicing and occurs when eukaryotes delete parts of a messenger RNA and splice the rest together to create a contiguous stretch of RNA that encodes a single polypeptide.
   65.See Hayden, Ferrada, and Wagner (2011). I simplified the description of several aspects of this experiment for brevity. Because of how the experiment was designed, the numbers of molecules fluctuated during different stages of the experiment between one hundred million molecules (after selection) and more than a trillion (1012) molecules. Also, during each generation, each molecule may replicate not just once but multiple times. An important aspect of the experiment’s first part was that the activity of the enzyme did not improve, nor did it deteriorate. The population only spread through genotype space without changing its phenotype. The population showed what geneticists call cryptic variation, variation that one cannot normally detect on the level of phenotypes, but that can become visible in a new environment, which involved in our experiment a change in the chemical target molecule of the RNA enzyme. In other words, if our experiment had generated phenotypic variation and this variation had fed the evolutionary process, we would not have been surprised—that is the standard Darwinian view—but the fact that cryptic variation can help evolutionary adaptation is more surprising, and best explained through the genotype network framework. The new substrate in the experiment’s second part could also be transformed by the original enzyme, but at a much slower rate. In other words, the experiment focused on how fast the average reaction rate of ribozymes in the two populations increases during laboratory evolution. In the first population, the rate increased up to eight times faster than in the second population.
   66.See Keats (1994).
   67.See also Dawkins (1998), where the author points out that there can be wonder and awe even in an unwoven rainbow.
   CHAPTER FIVE: COMMAND AND CONTROL
   1.See Swallow (2003), as well as Bersaglieri et al. (2004) and Tishkoff et al. (2007).
   2.Any genetics textbook, such as Lewin (1997), describes some of their ingenious experimental tricks.
   3.There are several kinds of polymerases. The one I am discussing here is a DNA-dependent RNA polymerase.
   4.Strictly speaking, the RNA sequence is complementary to one strand of the gene and identical to the other strand, because DNA is a double-stranded molecule.
   5.β-galactosidases are enzymes that cleave the sugar β-galactose from larger sugars. Because lactose is one such sugar, lactase is a kind of β-galactosidase. For historical reasons, the gene encoding E. coli’s β-galactosidase is called the lacZ gene. See Lewin (1997), chapter 12.
   6.More precisely, the complete word is TGTGTGGGAATTGTGAGC-GATAACAATTTCACACA, and the regulator does not make specific contact with all the letters in this sequence. See also Lewin (1997), chapter 12. The word is very similar to a palindrome, a DNA word that when read in one direction on one strand gives the same letter sequence as read in the reverse direction on the opposite strand.
   7.For example, there may be up to twenty thousand elementary protein shapes or so-called domains. See Levitt (2009).
   8.The details of regulation are more complicated than I describe. For example, the regulator (called the lac repressor) is actually a complex of four polypeptides. And it regulates the expression of not just one but three adjacent genes, the so-called lac operon. But all these details leave the principles of regulation unchanged. See Lewin (1997), chapter 12.
   9.See Russell (2002), chapter 16. Yet another cost factor is that synthesizing
 a useless protein ties up ribosomes—the large complexes of molecules that translate proteins from RNA—which are thus not available to synthesize other, necessary proteins.
   10.See Dekel and Alon (2005).
   11.The interaction between activator and polymerase need not be direct. For example, an activator bound to DNA may change the conformation of DNA to open its double helix and thus make it easier for polymerase to start transcription. Nonetheless, the principle of complementarity is also important in transcriptional activation.
   12.With the exception of a few types of cells, such as red blood cells, which have shed their genome.
   13.See Poole et al. (2001), as well as Piatigorsky (1998) and Morano (1999).
   14.I am referring to type II collagen, encoded by the human COL2A1 gene. Other tissues, such as skin or hair, contain other types of collagen, made by different genes. Regarding motor proteins, I am referring to myosins, which are encoded by a large family of closely related genes in the human genome. Different tissues express different members of this family. Not all of them serve to contract muscles. Some, for example, transport molecules inside a cell.
   
 
 Arrival of the Fittest: Solving Evolution's Greatest Puzzle Page 25