Arrival of the Fittest: Solving Evolution's Greatest Puzzle
Page 17
Evolution explores this circuit library through the familiar crowd of randomly browsing readers, populations of organisms in which circuits get modified through occasional DNA copying errors that arise as genes get passed from parents to children with some corrupted letters. Any one such mutation can have two kinds of effects. It can deform a regulator’s shape and prevent it from recognizing DNA. Or it can alter one of the DNA “words” a regulator recognizes, either clipping a single wire of the circuit—actually disrupting the regulator’s effect on a gene—or creating a new wire, a new molecular word recognizable by some regulator.
The first kind of change often results in disaster, because each regulator affects so many other genes. Destroying a regulator’s ability to recognize DNA is akin to scrambling a complex recipe’s ingredients and thus destroying the entire dish. It can lead to organisms with terrible malformations or to embryos that die before they are born. The second sort of copying error, however, is more like a typo in a recipe. By changing the activity of merely one gene and the amount of protein it expresses—one among thousands of protein ingredients—it is less likely to cause serious damage. One might think that changes of this second kind are more tolerable and could thus steadily accumulate on evolutionary time scales. If so, they could slowly transform a circuit’s wiring diagram.
When one compares circuits that have evolved separately over millions of years, like those in some of the more than a thousand different fruit fly species, one finds indeed that most tolerable changes have occurred in the wires and not in the circuit genes themselves. Evolution alters most circuits one wire at a time, because messing with the circuit genes invites disaster. What is more, these small wiring changes indeed accumulate to transform circuits, and this process is far from slow.47 The reason is that a regulator’s DNA keyword can be as short as five letters and occur thousands of letters away from a gene. By chance alone, random mutations can easily create new keywords and thus new wires in a circuit.48
If only one and no other circuit in the hyperastronomical library of 10700 texts expressed the code to create a specific innovation, evolution might as well pack up and go home, because this code would be a needle in a haystack many times the size of the universe.49 The question of why it had not thrown in the towel had called to me since the early 1990s, but I had ignored it—too many other projects. My procrastination ended in 2004 when I spent a research sabbatical at the Institute for Higher Studies near Paris in France.50
Set in a bucolic park with plenty of old trees, sculpted shrubbery, overflowing flowerbeds, and footpaths to explore while pondering life’s questions, the institute is a monastic refuge from the endless fund-raising, networking, and community service of an academic’s life. Its few resident researchers are highly decorated scientists, among them several recipients of the Fields Medal, widely known as the Nobel Prize for mathematicians. The institute focuses on mathematics and physics, but its leaders were aware that a seed long dormant in molecular biology, the insight that the whole is more than the sum of its parts, had germinated and flowered into an enormous subdiscipline known as systems biology. This emerging research field joins experimental data with mathematics and computation to find out how molecular parts like a fly’s regulators cooperate to shape whole biological systems, that is, organisms.51 Mathematicians and physicists have many tools to crack problems like this, and so the institute invited biologists like me for extended visits to see what we could do together.
Luckily for me, I accepted the invitation. Because it was in Paris that I met Olivier Martin.
Olivier is an internationally respected professor of statistical physics at the University of Orsay near Paris. Statistical physicists like him deal with huge collections of things like the molecules in a pressurized container of propane gas, and how they create properties like the pressure of that gas. To predict this pressure is important—we don’t want our gas tanks to explode—but also impossibly complex, because trillions of molecules bounce into the container walls every instant. Statistical physicists love to think about wholes with trillions of parts—too many to track individually—and they develop clever ways to describe those wholes, employing sophisticated statistical methods that share little beyond the name with the statistics that pollsters use to predict the outcome of elections in the United States.52
Olivier had a problem, though. Statistical physics is like a buffet where a hungry mob has devoured the choice dishes and left only small morsels behind: Most of its big questions are answered, and the remaining ones are either too hard or too trifling—not altogether surprising since scientists like James Clerk Maxwell and Ludwig Boltzmann had solved thermodynamical problems with statistical methods since the nineteenth century. Like most scientists in his situation, Olivier wanted to make a bigger contribution than physics would allow him. His problem was to find a question in systems biology that was new and challenging enough for him to chip away at.
I had a library of 10700 regulation circuits to map. Boy, could I help Olivier out.
As Olivier Martin and I began to collaborate, I first came to appreciate him as a scientist whose intuition and technical skills prevented us from getting lost in the library. But he turned out to be much more than a surefooted travel companion. He was a kind and generous teacher who would patiently explain how the tools of his trade could help us find our way.53
We started with small steps whose purpose was to answer a question you will recognize. Is there only one text in the circuit library that expresses any one meaning? To find out, we started out with a single circuit in the library and computed its expression code. Then we changed one wire, asked whether this mutation altered this expression phenotype, went back to our starting circuit, changed another wire, and so on, until we had created all neighbors of our circuit and knew their phenotypes. And to make sure that the neighborhood of this one circuit was not unusual, we explored the neighborhoods of many different starting circuits, circuits with different numbers of genes, different numbers of wires, a different arrangement of wires, and different phenotypes.
They all gave the same answer. Circuits typically have dozens to hundreds of neighbors with the same phenotype.54 In other words, the phenotypes of these circuits remain unchanged even after encountering mutations that alter individual wires. They are not quite as delicate as those acrobat-formed human sculptures, where a body’s shifting by a few millimeters can spell disaster. Regulatory genotype circuits can tolerate such changes, because not all individual wires are critical to their function.
This first step away from a circuit already told us something very important: No one expression code—be it the one segmenting a fruit fly, dissecting a leaf, or shaping a vertebral column—has only one, special, unique circuit producing it. Each expression code can be produced by many circuits that differ in how their genes are wired. Finding out how many was trickier, because the number is so large that we could not even compute it, at least for circuits of forty or more genes. All we knew was that the number had to be enormous, since we had been able to calculate it for smaller circuits: Those with ten genes already had more than 1040 circuits, and those with twenty genes had more than 10160 circuits able to produce a given expression code. Producing any one expression code is another problem with more solutions than one can count.55
To find out how far apart different solutions to the same gene expression problems are in the library, we took the same random walks we had used when investigating metabolisms and proteins. Starting with a circuit, we computed its expression code, altered a wire—adding or eliminating regulation of one gene—and thus stepped to a random neighbor with the same expression code, and from that to the neighbor’s neighbor, and so on, until we could not go further without changing the expression code.
Once again, we could walk almost all the way through the library. Circuits that differed in more than 90 percent of their wires could still produce the same expression code. Looking at their wiring diagram, you would never guess that one
arose from the other in many tiny steps. Yet each one was a different solution to the same problem: how to produce a specific pattern of gene expression that can shape a cell’s identity.
To make sure that the starting circuit—and its expression code—was not unusual, we started to explore the library from many different shelves, circuits with different numbers of genes, different numbers of wires, different arrangement of wires, and different expression patterns. It did not make a big difference. Some circuits with the same expression code differed in every single wire, whereas others differed in “only” 75 percent of their wiring. But even these would not be recognizably related when examined side by side.
Our explorations also taught us that all circuits with the same expression code are typically connected in the library. We can start from any one of them, change one wire at a time, and transform the circuit step by step into any other circuit with the same meaning, such that each step leaves the meaning unchanged.56 Once again, we could find a path from nearly every point in the library to nearly every other one, without ever getting stuck in a morass of regulatory nonsense.
All this means that circuits with the same phenotype form a vast network in the library of circuits, a genotype network like those we found in the metabolic library and the protein library. The library is filled with these networks, each of them containing more circuits than you can count, each of them reaching far through the library. All circuits in the same network are solutions to the same problem: how to produce a specific expression code that helps shape a cell, a tissue, or an organ. Small wonder that innovations like dissected leaves could evolve dozens of times independently, if vast numbers of circuits have the expression code that can get them there.
To map the millions of circuits needed to understand the library would have been impossible with any available technology aside from computation—hundreds of researchers had to experiment on millions of fruit flies over several decades to understand the single circuit segmenting a fly. However, some intrepid scientists are beginning to map circuits in simpler organisms, such as bacteria and yeasts. One of them is Mark Isalan, a researcher in Barcelona, who rewired a transcriptional regulation circuit in E. coli by adding new wires—regulation between pairs of genes—and created hundreds of circuits in its neighborhood. And he found, as we had, that regulation circuits are sturdy enough to be rewired.57 Ninety-five percent of his rewired circuits function normally.
Other researchers compare regulator circuits among different species of the yeasts we use to brew beer, to see how far they have to travel through the circuit library. One such circuit activates genes that allow yeasts to digest the sugar galactose. You might think that there must be one best way to wire this circuit, and that the yeast species that has discovered this way would have passed it on, unchanged, to others. Not so. In two yeast species that split many million years ago, this circuit not only has become completely rewired but even uses different regulators.58 Neither of these circuits is inferior, otherwise it would not have survived. Nature has solved the same regulation problem in two different but equally adequate ways. Not only that, but a path of small mutational steps connects these solutions, because the species shared a common ancestor.
Genes for the ribosome, the complex multiprotein machine that translates all RNA into proteins, tell the same story. A cell must manufacture its dozens of proteins in precisely balanced amounts, or else it will disappear like those wasteful E. coli cells overproducing beta-gal. Achieving this balance might seem a delicate affair with only one best solution, but again, two different species of yeast have come up with equally successful solutions that regulate these genes in completely different ways.59
Examples like these show that organisms can indeed travel far through the circuit library. But when searching for rare nuggets of new and useful expression codes on their journey, they face a problem similar to that of innovating metabolisms and proteins: There are many trillion possible expression codes, but the immediate neighborhood of any one circuit contains at most thousands of other circuits—those differing in one wire—too few to find all possible expression codes nearby. To discover myriad new expression codes, evolving circuits need to venture out of their neighborhood. Such expeditions yield many discoveries only if different neighborhoods contain different expression codes. To find out whether they do, we asked our computers to draw two arbitrary circuits from the same genotype network—call them A and B, they produce the same expression code but have different wiring—identify all circuits near them, and compile a list of expression codes of all these circuits. We found that most expression phenotypes in the neighborhood of A are different from expression phenotypes in the neighborhood of B—regardless of A and B’s phenotype, number of genes, or wiring. Different neighborhoods contain different phenotypes.
So we are back to a familiar story. The regulatory circuit library has the same layout as the metabolic and the protein libraries. Circuits with the same gene expression phenotype are organized in vast and far-reaching genotype networks. And that has consequences for a crowd of readers wandering aimlessly along such a network, figuratively seeking something new to read, actually propelled only by the steady if directionless force of mutation that slowly changes circuits, one regulatory interaction at a time: Even though some steps garble a circuit’s expression code, many others preserve it and thus allow readers to move along the genotype network. While the readers wander, they reach ever-new neighborhoods that contain texts with ever-new meanings, ever-new expression phenotypes, one of which may seed the next big thing in life’s architectural contests. Once again, genotype networks and their diverse neighborhoods create innovability.60
These similarities among different libraries are mysterious. How could innovability in metabolism, in proteins, and in regulation circuits have the same source, a library full of chemical meaning with a common cataloging system? The answer is held by an invisible hand that guided the world long before life’s origin—self-organization, a peculiar kind of it. We will turn to it next.
CHAPTER SIX
The Hidden Architecture
In 1944, the Nobel Prize–winning theoretical physicist Erwin Schrödinger published a series of his lectures under the title What Is Life? The brief book was an attempt to reconcile physics with what was then known, in the days before Watson and Crick, about evolution. The book is brimming with ideas, and one of them has spilled into the mainstream of popular science culture: It is the idea that evolution increases order and decreases disorder—what Schrödinger called “negative entropy.” Four years later, the American electrical engineer Claude Shannon connected the thermodynamic concept of entropy to the problem of transferring information through a telegraph line. The concepts of evolution and information have been linked ever since, usually in a fairly primitive way. Disorder: bad. Order: good. Positive entropy: bad. Negative entropy—now also known as information—good.
In the years since Schrödinger’s book, we have become more sophisticated in thinking about entropy. Order and information remain central to evolution, but in recent years we have also learned, thanks to genotype networks, that perfect order is as hostile to innovation as total disorder. Nature doesn’t just tolerate disorder. It needs some disorder to discover new metabolisms, regulatory circuits, and macromolecules—in short, to innovate.
Let’s put Lego blocks to another metaphorical use, and consider the difference between a disorderly jumble of those familiar plastic tiles and an arrangement in which every tile has been presorted into a “proper” place, and where a child must assemble them in a specific sequence to build a pirate ship following a plan helpfully provided by Lego. The disordered jumble of Lego tiles has greater potential for innovation than the carefully organized one, and not just because it stimulates a child’s natural creativity to find new ways for building pirate ships. A deeper reason is that there are many more ways to build a pirate ship than those contained in Lego’s instruction book.
In biology this simp
le fact is manifest in the multiple solutions that nature found—courtesy of genotype networks—for problems like protecting organisms against freezing. And it is also deeply connected to a biological phenomenon little appreciated until the end of the twentieth century, but in fact so widespread that it deserves to be called a hallmark of life: robustness, the persistence of life’s features in the face of change.
The meaning of robustness is best illustrated with the difference between typographical mistakes in a traditional book and in a computer program. A book containing the letter sequence
N smll stp fr mn, n gnt lp fr mnknd
would raise eyebrows, but the meaning of this sentence remains understandable. However, a single misplaced letter or as little as a missing comma in a thousand pages of computer code can bring a million-dollar software package to a crash. Software bugs like this cause billions of dollars in economic loss every year. Human language is robust. Programming language, not so much.
The suspicion that life is robust arose at least as far back as the 1940s, when the biologist and philosopher C. H. Waddington studied flies with different genotypes and discovered that they had indistinguishable bodies, down to the minutest details of their wings’ venation and the numbers of bristles that cover their backsides. He called the phenomenon through which development can produce “one definite end-result regardless of minor variations in conditions” canalization—another word for robustness.1 And although his research hinted that the body plans of flies are robust to genetic change—there are many ways to build a fly’s body—research into robustness remained a backwater for another half century.
But almost overnight in the 1990s, robustness entered center stage with a flourish when molecular biologists were baffled by a discovery superficially unrelated to Waddington’s: Many genes apparently serve no purpose.2