Arrival of the Fittest: Solving Evolution's Greatest Puzzle
Page 12
Proteins do not just perform existing jobs. The economy of living organisms, like that of the human world, is constantly changing, and in response, evolution brings forth new protein shapes, innovations that take on new jobs. These jobs open whenever life needs to solve a new problem, like that of surviving the menacing knives of growing ice crystals.
And just as in the human economy, where inventions from blast furnaces to smartphones are often made several times independently, the innovations that fill these jobs are often discovered more than once. Antifreeze proteins are a case in point: They originated not just in the Arctic cod but also in Antarctic fish, and from different proteins in their ancestors.12 They even originated more than once in the Arctic.13 What’s more, some fish evolved more than one kind of antifreeze protein. The winter flounder, a flatfish from the North Atlantic, manufactures one antifreeze protein that prevents its bloodstream from freezing, and another that protects its skin.14 And some of these proteins arose very quickly in evolutionary terms—in less than three million years.
Dozens of amino acids had to change from proteins in some frost-sensitive ancestor to create antifreeze proteins, but protein innovations often require much less change.15 Alter as little as one amino acid in the enzyme needed to manufacture the amino acid histidine, and the result is a new enzyme that helps manufacture the amino acid tryptophan.16 Mutate a specific amino acid in an E. coli enzyme that helps extract energy from the sugar arabinose—its name comes from gum arabic, a natural gum from acacia trees—and this enzyme transmogrifies from a rearranger of atoms to a cleaver of molecules.17
Such minimal changes can have dramatic consequences for life, as the bar-headed goose from Central Asia could tell us. It is one of the world’s highest-flying birds. It has to be, since its migratory route takes it across the Himalayas at altitudes that exceed five miles, where the air is not only thinner—requiring birds to flap harder—but contains only a third as much oxygen by volume as sea-level air. At that altitude, the mountaineers who struggle up Mount Everest use oxygen tanks, and the passengers on jet airliners require pressurized cabins. The goose can’t benefit from either technology, but no problem, it has an even better trick. Its hemoglobin—the protein that shuttles oxygen from lungs to muscles—harbors an amino acid change that helps it bind oxygen much more tightly than our hemoglobin. It allows the goose to scavenge oxygen molecules from thin air, and keeps this bird flying where others are grounded.18
Molecular innovations like the Arctic cod’s antifreeze or the bar-headed goose’s oxygen-binding hemoglobin are valuable because they expand an organism’s habitat, which means more food, better survival, and more offspring. Other innovations confer a different kind of advantage, such as the ability to discriminate between one kind of food and another, to choose a nutritious rather than a poisonous plant for dinner. They depend on improving perception, rather than mobility, and they are why the retina in the back of our eye contains three kinds of opsins. These are highly specialized proteins that detect light and are tuned to the different wavelengths of blue, red, and green light. Thanks to them, we see the world in color. This was not always the case. Our most distant ancestor among the vertebrates probably had only one opsin. Theirs was a black-and-white world. Most mammals have two different opsins, those for red and blue. They can see in two colors. But we and some of our relatives like chimpanzees can see in three colors, perhaps because color vision helped our distant ancestors forage: It lets fruits stand out from the background of green foliage. Whatever the reason, the innovation of color vision takes very little change, as little as three altered amino acids that retune an opsin from red to green.19
Innovations like color vision benefit us, but others harm us—those of deadly bacteria that resist the antibiotics your doctor prescribes. They are the unfortunate side effect of our continual improvement of antibiotics, the result of a biological arms race of bacteria against biotechnologists. This race evokes the Red Queen of Lewis Carroll’s Through the Looking Glass, who famously told Alice, “Now, here, you see, it takes all the running you can do, to keep in the same place.”20 Through it, bacteria have discovered various protein innovations, some of which destroy antibiotic molecules, whereas others, known as efflux pumps, force antibiotics out of the cell like some bacterial rescue squad pumping toxic gas out of a contaminated house. (Horizontal gene transfer, combined with human travel, can spread such innovations throughout the world within months.) Especially sinister are proteins that pump not just one but many kinds of antibiotics, and thus render bacteria resistant against multiple antibiotics. Curiously, when our own body cells go rogue, proliferate wildly, and evolve rapidly—in cancer—they often use similar efflux pumps to rid themselves of unwanted cancer drugs. These are not only independent solutions to a similar problem but also one of many reasons why the war on cancer is hard to win.21
The proteins behind these innovations were not created from scratch. They are modified transporters, proteins that are essential in a cell’s daily life, because they ship thousands of molecules—nutrients, waste, building materials—to various destinations within the cell. So should we really call them innovations? The same question arises for the goose’s improved hemoglobin and primates’ color vision. Nature just fiddled with hemoglobin to tighten its binding to oxygen, and it tinkered with opsins to tune their color sensitivity. Neither was a qualitatively new protein. But consider the impact of these changes. Consider the millions of square miles of new habitat opening up to a bird that can traverse any mountain range. Consider how much duller our world would be in black and white. And consider the life-and-death change that drug resistance can make to a bacterium. For their dramatic consequences alone, these small changes deserve to be called innovations. They show how minute alterations of no more than a few atoms can have effects that percolate through an organism that is a million times as large and alter the life of its descendants forever.22
In chapter 3, we saw how nature continues to create ever-novel sequences of chemical reactions, by combining and recombining metabolic enzymes through horizontal gene transfer. But that is not how metabolic enzymes themselves first appeared. As the last few examples showed, nature creates new proteins, including every one of the known five-thousand-plus enzymes, by altering the amino acid sequence of their protein ancestors. That’s also how it created the countless proteins that regulate genes, ferry materials, contract muscles, transport oxygen, import nutrients, export waste, communicate between cells, and perform a thousand other tasks. Entire books could be written—have been written—that describe a few such innovations in great detail.
This book is not among them.
You cannot understand what made all these innovations possible through anecdotes—an antifreeze protein here, an opsin there—any more than you can draw a map of the Unites States with satellite images of a few counties. The task requires comparing many old proteins and the new ones they brought forth. Thousands of them.
This task is made easier if one can read the DNA of genes or the amino acid strings they encode—the genotypes of proteins.23 Among the first learning to read both was the British biochemist Frederick Sanger, one of few scientists to win two Nobel Prizes—the first for deciphering the amino acid sequence of insulin, the second for techniques to read the letter sequence of DNA. His discoveries came decades earlier than our ability to read the genotypes of metabolisms, and we therefore know many more protein genotypes and phenotypes.24 They hail from organisms that live in Arctic wastelands and tropical jungles, on mountaintops and in ocean depths, in our gut and in boiling hot springs, in barren deserts and in fertile soil, in filthy sewers and in pristine rivers.
Without organization, this giant heap of protein facts would be like a million shuffled words in a madman’s dictionary, but once organized, it becomes part of a library just like the gigantic metabolic library from chapter 3. The volumes in this universal library are protein genotypes, texts written in a twenty-letter alphabet, where each letter corres
ponds to one amino acid. The universal protein library is the collection of all proteins that life has created, and all proteins that it could create. It is sometimes also called a protein space or a sequence space—because each text corresponds to a single sequence of amino acids.25
The size of this library is no less staggering than that of the metabolic library, as an already familiar calculation helps us see. Recall that there are 20 × 20, or 400, possible two-letter texts using one of twenty possible amino acids. Similarly, there are 20 × 20 × 20, or 8,000, texts of three amino acids, 160,000 texts of four amino acids, and so on. Short texts like this are called peptides, but most proteins comprise much longer texts—polypeptides—and the number of possible amino acid texts explodes with their length, such that the number of proteins with merely a hundred amino acids is already greater than a 1 with 130 trailing zeroes. But the library is larger than even this unimaginably large number, because proteins like sucrase have more than a thousand amino acids, and some human proteins are many times longer. (Among them is a behemoth called titin, a 30,000-amino-acid-long protein spring in our muscles.)26 The universal library of proteins is another library of hyperastronomical size.
The similarity to metabolism does not end with the size of this library. Like the metabolic library, the protein library is a high-dimensional cube, with similar texts near one another. Each protein text perches on one vertex of this hypercube, and just like in the metabolic library, each protein has many immediate neighbors, proteins that differ from it in exactly one letter and that occupy adjacent corners of the hypercube.27 If you wanted to change the first of the amino acids in a protein comprising merely a hundred amino acids, you would have nineteen other amino acids to choose from, yielding nineteen neighbors that differ from the protein in the first amino acid. By the same process, the protein has nineteen neighbors that differ from it in the second amino acid, nineteen neighbors that differ from it in the third, the fourth, the fifth, and all the way through the hundredth amino acid. So all in all, our protein has 100 × 19 or 1,900 immediate neighbors. A neighborhood like this is already large, and it would be even larger if you changed not one but two or more amino acids. Clearly, this can’t be bad for innovation: With one or a few amino acid changes, evolution can explore many proteins.
In another parallel to the metabolic library, you would get lost wandering through this library’s maze unless you had an unrolling skein of wool to gauge how far you traveled. Once again, a notion of distance serves this purpose. It is the number of amino acids by which two proteins differ. It tells you how far you need to walk—how many amino acids you need to change—to travel from one protein text to any other.28
The texts in this library are important, but even more important is the meaning each one carries. Our eyes cannot read this meaning, the words, sentences, and paragraphs of a protein’s chemical language, but life is fluent in this language. And it can tell whether a protein is meaningful or embodies jumbled chemical ramblings.
Cells take a hard-nosed view on which proteins are meaningful: those that help them live. A protein is meaningful only if it is useful, and defective mutant proteins that do not fold properly have lost their meaning. If a protein’s “meaning” feels too anthropocentric a word, it is worth reminding ourselves how “meaning” is defined by semiotics, an offshoot of linguistics that explores the meaning of meaning: whatever a sign—which could be anything from a road sign to a book’s text—points to. If that sign is a protein’s amino acid text, then the meaning it encodes is the protein’s phenotype and the function it serves inside a cell.29
We still do not know how many meaningful books a universal library of books would contain, but decades of research allow us to estimate this number for proteins, because most useful proteins fold into a specific shape. If you blindly took a random protein from a random shelf in the library, the odds that it folds are at least one in ten thousand. That may not seem much, but keep in mind how vast the library is, containing more than 10130 proteins of a hundred amino acids. Even if only one in ten thousand of them folds, you are still left with 10126 proteins, a 1 trailed by 126 zeroes, much greater than the number of hydrogen atoms in the universe. The number of meaningful proteins is itself large beyond imagination.30
Evolution explores the protein library through huge populations of organisms. Their proteins change, one amino acid at a time, with the occasional copying errors that alter a DNA string’s letters—A to C, T to G, or in any other way—as this string replicates generation after generation. To understand how such change creates texts with new and useful meaning, we need to map the protein library like we mapped the metabolic library. This is less difficult than it seems: Thanks to decades of work by armies of protein scientists, we know the folds and functions of tens of thousands of proteins and their place in the library. What’s more, the technologies of twentieth-century molecular biology allow us to take any volume off its shelf—to manufacture any protein—and study its fold and function in the laboratory.
The simplest question about innovability in proteins is one we encountered before. How hard is it to find a protein with any one meaning, one whose function helps an organism to survive? If there is only one of it in the library, even the eons elapsed since the Big Bang would not suffice to find it. Since meaningful proteins exist in huge numbers, just about every problem that life solved with a protein innovation must have more than one solution. But how many?
In 2001, Anthony Keefe and Jack Szostak from Harvard University set out to answer this question for a family of proteins whose invention was as crucial as any in life’s history: the proteins that can bind the ATP that we encountered as life’s battery in chapter 2. Proteins that carry out work—they transport materials, contract muscles, build new molecules—cleave ATP, and in doing so, harness its energy for this work.31
To use ATP’s energy, a protein first needs to bind ATP. If only one protein in the vast protein library were able to bind ATP, then searching blindly for it would be futile. Its discovery would require a miracle. To find out how rare ATP binding proteins are in the library, Keefe and Szostak used a chemical technology that can create many different proteins, each one with a different and completely random amino acid string, a process equivalent to pulling random volumes from the shelves of the protein library. The random proteins these researchers generated were all eighty amino acids long. Because there are more than 10104 such proteins, no experiment could create all of them, but this one created an impressive number, about 6 trillion, or 6 × 1012 random proteins.
Keefe and Szostak found that four of them—unrelated to one another—can bind ATP. Four new ATP-binding proteins out of six trillion doesn’t sound like too many, but when the proportions are extrapolated to the number of potential candidates, the number is much larger. It comes out to more than 1093 proteins—a 1 with 93 zeroes—that can bind ATP. The problem of binding ATP has astronomically many solutions.32
John Reidhaar-Olson and Robert Sauer from the Massachusetts Institute of Technology approached the same problem from a different tack. They focused on a regulator protein that can shut down genes in a virus that infects bacteria. The DNA of this virus—bacteriophage lambda—encodes proteins that help it replicate and kill its host bacterium. But this virus can also remain dormant inside the bacterium, using this off switch to shut down its genes until the time is ripe to divide and kill the host. This time usually comes when the host falls on hard times—starved, poisoned by antibiotics, or irradiated with too much ultraviolet light. The virus then starts to replicate, and its children abandon the cell, rats scurrying from the proverbial sinking ship.33
Reidhaar-Olson and Sauer explored a neighborhood of the protein library near this viral off switch, creating many random amino acid sequences in this neighborhood, and asked which of them produced a switch that works, one that can shut down the viral genes. From this information, they calculated that more than 1050 texts in the library encode a working off switch. When they tried a sim
ilar approach on a different protein, an enzyme needed to synthesize amino acids, they found that some 1096 amino acid strings can do this enzyme’s job.34
Nature’s antifreeze proteins gave us a hint, and laboratory experiments like these prove it: Problems like that of binding ATP, shutting down a virus, or catalyzing a chemical reaction don’t have just one solution. Or even a million solutions. They have astronomically many solutions, each embodied by a different volume in the protein library.35 To imagine the sheer number of these solutions is difficult, but that says more about the limits of our imagination than about life’s innovability.
It’s not enough to know, of course, that a library contains a virtually inexhaustible supply of books describing solutions to a particular problem. We also need to find out where these solutions are and how they are organized—in meticulous stacks or thrown together in unruly piles. And for that we need to move beyond laboratory experiments. Even though such experiments can create and test impressive numbers of different proteins, these numbers vanish into insignificance compared with those found in nature, which churns out new proteins every day in countless trillions of live organisms. Every one of these organisms harbors thousands of proteins, and each is only the last link in an unbroken chain of protein creation that goes back billions of years.
Protein scientists have been aware of this bounty for many years. And they throw themselves at it with gusto, like kids in a warehouse-sized candy store. And what these scientists have learned about protein creation in thousands of different organisms goes far beyond laboratory experiment. The oxygen-ferrying hemoglobin we already encountered in the bar-headed goose illustrates how far.