Eden pointed out in his Wistar presentation that the combinatorial space corresponding to an average-length protein (which he assumed to be about 250 amino acids long) is 20250—or about 10325—possible amino-acid arrangements. Did the mutation and selection mechanism have enough time—since the beginning of the universe itself—to generate even a small fraction of the total number of possible amino-acid sequences corresponding to a single functional protein of that length? For Eden, the answer was clearly no.
For this reason, Eden thought mutations had virtually no chance of producing new genetic information. He likened the probability of producing the human genome by relying on random mutations to that of generating a library of a thousand volumes by making random changes or additions to a single phrase in accord with the following instructions: “Begin with a meaningful phrase, retype it with a few mistakes, make it longer by adding letters [at random], and rearrange subsequences in the string of letters; then examine the result to see if the new phrase is meaningful. Repeat this process until the library is complete.”7 Would such an exercise have a realistic chance of succeeding, even granting it billions of years? Eden thought not.
In addition, Schützenberger emphasized that randomly cutting and pasting larger blocks of text, as evolutionary biologists often envision, would not make any appreciable difference to the efficacy of a random search of sequence space. Imagine a computer “mutating” at random the text of the play Hamlet either by individual-letter substitutions or by duplicating, swapping, inverting, or recombining whole sections of Shakespeare’s text. Would such a computer simulation have a realistic chance of generating a completely different and equally informative text such as, say, The Blind Watchmaker by Richard Dawkins, even granting multiple millions of undirected mutational iterations?
Schützenberger didn’t think so. He noted that making random changes “at the typographic level” in a computer program inevitably degrades its function whether those changes are made “by letters or by blocks, the size of the unit really does not matter.”8 Thus, he thought that a process of randomly shuffling blocks of text in any “typographic typology” would inevitably degrade meaning in much the same way that a series of individual-letter substitutions will.
Schützenberger insisted that the evolutionary process faced similar limitations. To him, it seemed extremely unlikely that random mutations of whatever sort would produce significant amounts of novel and functionally specified information within the time available to the evolutionary process.
Subsequent to the confirmation of Crick’s sequence hypothesis, all present at Wistar understood that the entities that confer functional advantages on organisms—new genes and their corresponding protein products—constitute long linear arrays of precisely sequenced subunits, nucleotide bases in the case of genes and amino acids in the case of proteins. Yet, according to neo-Darwinian theory, these complex and highly specified entities must first arise and provide some advantage before natural selection can act to preserve them. Given the number of bases present in genes, and amino acids present in functional proteins, a large number of changes in the arrangement of these molecular subunits would typically have to occur before a new functional and selectable protein could arise. For even the smallest unit of functional innovation—a novel protein—to arise, many improbable rearrangements of nucleotide bases would need to occur before natural selection had anything new and advantageous to select.
Eden and others questioned whether mutations provided an adequate explanation for the origin of the genetic information necessary to build new proteins, let alone whole new forms of life. As physicist Stanislaw Ulam explained at the conference, the evolutionary process “seems to require many thousands, perhaps millions, of successive mutations to produce even the easiest complexities we see in life now. It appears, naïvely at least, that no matter how large the probability of a single mutation is, should it be even as great as one-half, you would get this probability raised to a millionth power, which is so very close to zero that the chances of such a chain seem to be practically nonexistent.”9
Looking for a Loophole
In his presentation at the conference, Eden himself acknowledged a possible way of resolving this dilemma. He suggested that it was at least possible that “functionally useful proteins are very common in this [combinatorial] space so that almost any polypeptide one is likely to find [as the result of mutation and selection] has a useful function.”10 Many neo-Darwinian biologists subsequently came to favor this possible solution. The solution was this: even though the size of the combinatorial space that mutations needed to search was enormous, the ratio of functional to nonfunctional base or amino-acid sequences in their relevant combinatorial spaces might turn out to be much higher than Eden and others had assumed. If that ratio turned out to be high enough, then the mutation and selection mechanism would frequently stumble onto novel genes and proteins and could easily leapfrog from one functional protein island to the next, with natural selection discarding the nonfunctional outcomes and seizing upon the rare (but not too rare) functional sequences.
As an electrical engineer who was used to working with computer code, Eden was intuitively disinclined to embrace this possibility. He noted that all codes and language systems can convey information precisely because they have rules of grammar and syntax. These rules ensure that not just any arrangement of characters will convey functional information. For this reason, functional sequences in working communications systems are typically surrounded in the larger combinatorial space by a multitude of nonfunctional sequences—sequences that don’t obey the rules.
In known codes and language systems, functional sequences do indeed typically represent tiny islands of meaning amid a great sea of gibberish. Geneticist Michael Denton has shown that in English meaningful words and sentences are extremely rare among the set of possible combinations of letters of a given length, and they become proportionally rarer as sequence length grows.11 The ratio of meaningful 12-letter words to 12-letter sequences is 1/1014; the ratio of meaningful 100-letter sentences to possible 100-letter strings has been estimated as 1/10100. Denton used these figures in 1985 to explain why random letter substitutions inevitably degrade meaning in English text after only a few changes and why the same thing might be true of the genetic text.
Given the alphabetic or “typographic” character of genetic information stored in DNA, Murray Eden and others at Wistar suspected that the same kind of problems would affect random mutational changes in DNA. It seemed logical that functional genes and proteins were also surrounded in their relevant combinatorial spaces by vast numbers of nonfunctional sequences—and, further, that the ratio of functional to nonfunctional sequences would also be exceedingly small.
Yet in 1966 none of the scientists on either side of the debates at Wistar knew how rare or common functional gene and amino-acid sequences are among the corresponding space of total possibilities. Do they occur with a frequency of 1 in 10, 1 in a million, or 1 in a million billion trillion? At the time, these questions could not be answered.
Most evolutionary biologists remained optimistic that the answer to this question would vindicate the neo-Darwinian model.12 And some developments supported their confidence. During the late 1960s, molecular biologists learned that most of the functional roles performed by proteins are performed not just by one precise kind of protein, but by a wide variety, each with its own amino-acid sequence. This is unlike a bike lock, which has only one functional combination. Indeed, molecular biologists learned that though some amino acids at certain sites are absolutely essential for any particular protein to work, most sites tolerate amino-acid substitutions without loss of protein function. For many biologists, this suggested that mutation and selection had a reasonable chance of generating functional sequences of nucleotide bases or amino acids after all—that the ratio of functional to nonfunctional sequences was much higher than skeptics had anticipated.
How much higher? How much variability is allow
ed in the amino-acid sequences in proteins? Are there enough functional proteins within a relevant combinatorial space of possibilities to render a random mutational search for new proteins plausible?
When Denton compared linguistic and genetic text to explain the potential severity of the combinatorial inflation facing the neo-Darwinian mechanism, he noted that biologists still didn’t know enough “to calculate with any degree of certainty the actual rarity of functional proteins.” He concluded, however, that since future experiments surely would continue to deepen molecular biology’s fund of knowledge, “it may be that before long quite rigorous estimates may be possible.”13
In Search of the Ratio
Denton’s prediction of imminent progress proved correct. During the late 1980s and early 1990s, Robert Sauer, a molecular biologist at MIT, performed a series of experiments that first attempted to measure the rarity of proteins within amino-acid sequence space.
Sauer’s work exploited, for the first time, new technology that allowed for the systematic manipulation of gene sequences. Before the late 1970s, scientists typically used radiation and chemicals to produce mutant forms of DNA. Though these techniques sometimes paid off with dramatic results, such as mutant fruit flies with legs growing out of their heads (the famed Antennapedia mutation), they did not allow scientists to dictate or target any specific change to a sequence of bases in DNA. The treatments used simply replicated the conditions under which mutations occur naturally.
During the late 1970s and early 1980s, however, molecular biologists developed technologies for making customized synthetic DNA molecules. Robert Sauer used these techniques to make site-directed changes to DNA sequences of specific genes of known function and then to insert those variants into bacterial cells. He could then evaluate the effect of various targeted alterations to a DNA sequence on the function of their protein products within a bacterial cell culture.
Sauer’s technique allowed him to begin to evaluate how many of the variant sequences, as a percentage of the total, still produced a functional form of the relevant protein (see Fig. 9.3). His initial results confirmed that proteins could indeed tolerate a variety of amino-acid substitutions at many of the sites in the protein chain. Yet his experiments also suggested that functional proteins might be incredibly rare among the space of all possible amino-acid sequences. Based on one set of mutagenesis experiments, Sauer and his colleagues estimated the ratio of functional to nonfunctional amino-acid sequences at about 1 to 1063 for a short protein of 92 amino acids in length.14
This result was in rough agreement with an earlier estimate by information theorist Hubert Yockey.15 Yockey did not perform experiments to derive his estimate of the rarity of proteins in combinatorial sequence space. Instead, he used already published data to compare variants of the similar cytochrome c proteins (proteins involved in the biochemical pathways that generate energy in cells) in different species. He did this to see how much variability existed at each amino-acid site in molecules performing the same function with the same basic structure. Using this data about the allowable variability at each site, he estimated the probability of finding one of the allowable sequences among the total number of sequences corresponding to a cytochrome c protein 100 residues in length. He determined the ratio of functional to nonfunctional sequences to be about 1 to 1090 for amino-acid chains of this length.16 So, although Sauer’s experimentally derived results were numerically different from Yockey’s, both approaches gave extremely low ratios suggesting that functional proteins are indeed rare in sequence space, even if proteins do admit significant variability in the specific amino acids present at various positions.
FIGURE 9.3
Figure 9.3a (top) depicts the problem of combinatorial inflation as it applies to proteins. As the number of amino acids necessary to produce a protein or protein fold grows, the corresponding number of possible amino-acid combinations grows exponentially. Figure 9.3b (bottom) poses graphically the question of the rarity of proteins in that vast amino-acid sequence space.
Taken at face value, Sauer’s experiments appeared to yield contradictory conclusions. On the one hand, his results showed that many arrangements of amino acids could produce the same protein structure and function—that numerous amino-acid sequences populated amino-acid sequence space. On the other hand, the ratio of functional sequences to the total number of possible sequences corresponding to a sequence of roughly 100 amino acids appeared to be incredibly low, just 1 to 1063.
Nevertheless, it’s not hard to see how both of Sauer’s seemingly contradictory conclusions could be true. Recall the locks confronting my hypothetical bike thief. Commercially manufactured bike locks typically have only one combination of digits that will allow them to be opened. The combination that will open a typical bike lock specifies one digit on each dial. No variability at any dial is allowed.
Now imagine a new kind of lock with three crucial differences from an ordinary lock. First, with this new alternative lock, there are four positions on every dial that may—in combination with other positions on other dials—open the lock. My bike thief would like this feature of this kind of lock, since it seems to allow more wiggle room at each dial. But he doesn’t like the two other features of this lock. For one, each dial displays one of 20 letters rather than one of 10 numeric digits. Second, instead of 5 dials, there are 100 dials. On the upside, because 4 of the 20 letters on each of the 100 dials might work, there are 4100, or a whopping 1060, correct combinations that will open the lock. That’s an astronomically large number of correct combinations. But, on the downside, there are 20 possible settings at each of 100 dials, which computes to 20100, or 10130, possible combinations, a number that totally dwarfs the number of correct combinations of dial settings.
The bike thief would be happy to learn that every dial has “only” four possible correct positions on each dial, but in a trial-and-error process, he still has only 1 chance in 5 (4 out of 20) of landing on a possibly functional dial position for any dial; and this 1-in-5 chance must be negotiated 100 times. In other words, the 1-in-5 chance must be multiplied by the 100 dials on the monster lock in order to arrive at the probability of the thief stumbling upon a functional combination on a given try. The odds are 1 chance in 5100 or—if we want to convert that to base 10—roughly 1 chance in 1070. The odds are that slender because the functional combinations—numerous as they are—are dwarfed by the number of total combinations.
In the same way, Sauer established that though many different combinations of amino acids will produce roughly the same protein structure and function, the sequences capable of producing these functional outcomes are still extremely rare. He showed that for every functional 92-amino-acid sequence there are roughly another 1063 nonfunctional sequences of the same length. To put that ratio in perspective, the probability of attaining a correct sequence by random search would roughly equal the probability of a blind spaceman finding a single marked atom by chance among all the atoms in the Milky Way galaxy—on its face clearly not a likely outcome.17
Uncertain Situation
Nevertheless, during the 1990s in the immediate wake of the publication of Sauer’s results, the implications of his work for evolutionary theory were not entirely clear. Even in the scientific paper in which Sauer reported his work, the abstract summarizing his results emphasized the tolerance to amino-acid substitution that proteins allow. Consequently, scientists on both sides of the discussion about Darwinian evolution seized on different aspects of Sauer’s findings either to support or challenge the plausibility of the neo-Darwinian account of the origin of genes and proteins.
Scientists sympathetic to neo-Darwinism emphasized the tolerance of proteins to amino-acid substitution; critics of the theory emphasized the rarity of proteins in sequence space. One scientist, Ken Dill, a biophysicist at University of California, San Francisco, cited Sauer’s work to suggest that nearly any amino acid would work at any site in a protein chain, provided the amino acids in question exhibited th
e correct hydrophobicity (water-repelling) or hydrophilicity (water-attracting) properties.18
Yet, at least one scientist, Lehigh University biochemist Michael Behe, cited Sauer’s quantitative estimate of the rarity of proteins as a decisive refutation of the creative power of the mutation and selection mechanism altogether.19 So by the mid-1990s, though Sauer and his group had initiated a program of experimental research that addressed the key question that Murray Eden raised at Wistar, that question had still not been completely settled. Did the mutation and natural selection mechanism have a realistic chance of finding the new genes and proteins necessary to build, for example, a new Cambrian animal? Answering that would await an even more systematic and comprehensive experimental regime.
10
The Origin of Genes and Proteins
As a Ph.D. student in chemical engineering at the California Institute of Technology in the late 1980s, Douglas Axe (see Fig. 10.1) became interested in evolutionary theory after several fellow graduate students read the then-bestselling book by Richard Dawkins, The Blind Watchmaker. Axe’s compatriots were quickly converted to zealous advocates of Dawkins’s arguments and urged him to read the book for himself. Axe was impressed by the clarity of Dawkins’s writing and illustrations, but he found his case for the creative power of natural selection and random mutations unpersuasive. Whether in the analogies he drew to animal breeding or the computer simulations he used to demonstrate the supposed ability of mutation and selection to generate new genetic information, Dawkins repeatedly smuggled in the very thing he insisted the concept of natural selection expressly precluded: the guiding hand of an intelligent agent.
He found Dawkins’s computer simulation particularly interesting. In The Blind Watchmaker, Dawkins described how he had programmed a computer to generate the Shakespearean phrase: “Me thinks it is like a weasel.”1 Dawkins did this in order to simulate how random mutations and natural selection could generate new functional information.
Darwin's Doubt Page 20