From DNA to RNA to protein, reading the code is, on one level, a process of refined simplicity. First, DNA’s four letters read into a message, then the transfer RNAs bind to this message with their amino acids, themselves having a striking preference for a three-letter code. Finally, a string of amino acids is fabricated, the protein itself. On another level, the process seems a thing of tortuous particularity. Four oddly self-binding chemicals come together to make a code from many millions of natural chemicals in the environment. To cap it all, the proteins that emerge from this code are made of just twenty amino acids, while many hundreds of these molecules exist in the natural environment.
Let us recap what we need in our information-of-life toolbox. Between RNA and DNA, we have five major bases (A, T, C, and G in DNA, and A, U, C, and G [the T swapped with U] in RNA), a backbone made of just phosphate groups and ribose sugars, some transfer RNAs (actually, the cell requires at least thirty-one such RNA molecules to read the code), and twenty amino acids. (Some cells use two other amino acids, selenocysteine and pyrrolysine, bringing the total in life to twenty-two.) We have an entire information storage system that can go from a code to production of functional molecules with fewer than sixty molecules. There are two ways you can look at this. Either this system is a chance event, an accident of absurd proportions that could have happened many other ways, or evolution has been very selective; very few paths, perhaps even just one, were possible. Perhaps these sixty or so molecules are special in the pantheon of organic molecules that exist in the known universe? Answering this question is perhaps one of the most profound challenges that has confronted biologists since the code was unraveled. Its answers would establish whether the structure of life’s code and its products are pure chance or whether deeper physical principles have shaped them.
Having built a four-letter code, we now have the mystery of how that code was assigned to the different amino acids. As described, three consecutive bases in DNA code for each amino acid. Now as each of these bases could be one of the four letters, A, C, G, or T, that means we have 4 × 4 × 4 = 64 possible combinations. However, life needs codons for only 20 amino acids and sometimes 22. So what gives? Many of the amino acids used in life have more than one codon assigned to them. This degeneracy, as it is called, leads to a table of all 64 three-letter codes assigned to different amino acids. Within its ranks are two punctuation marks—a start and a stop codon—that tell the code when to begin and end reading, the markers that define the start and end of a gene. Each gene codes for a part of a protein or a full protein.
This table of codes, showing the association of each three-letter code to its respective amino acid, is a little like a Rosetta stone. Besides modifications here and there that lead to about twenty modifications to this core table, the layout is universal across life. This universality speaks of an ancient origin for the assignment of the three-letter codes to their respective amino acids. It suggests that an ancestor to all life on Earth today contained this code, which was then propagated during evolution to all living things. Fathoming out how this table came about and whether it too might be a freak accident has occupied the minds of inquisitive scientists. Regardless of who is right, most scientists, at their core, propose that the table is not a random accident, but that it is the result of quite specific selection.
One crucial thing a life form wants to do is prevent too many errors in the reading of its genetic code, either when the code is being copied or turned into useful proteins. Perhaps the organization of the amino acids with particular sequences of the code is a way to minimize the chances of these errors creeping into proteins. Curiously, codons for the same amino acid tend to be bunched together. The codons for the amino acid alanine, for instance, are GCU, GCC, GCA, and GCG, where only the third place varies. This same pattern applies to other amino acids such as glycine and proline. By bunching codons together like this, the chances are that a small error in the code will not alter the amino acid, leaving the final protein unchanged. These accidental revisions in the code might come from a mutation in the code itself, maybe caused by radiation or a chemical modification to the DNA, or a mistranslation of the code when the messenger RNA was being read. Either way, the code, so manifested, reduces the impact of errors from wherever they may come. Furthermore, amino acids with similar chemical properties seem to share similar codons, which has been explained as a way to minimize the impact of a mutation or misreading in the DNA on the protein eventually made.
If you run a computer program to compare genetic codes that are more efficient at reducing the chances of mistranslation, then the natural code appears to be very unusual. Of all the codes that nature might make, out of a million alternatives, our own code was the best at reducing these errors.
There is another tantalizing clue behind nature’s choices in that table of codes. The amino acid arginine happens to be able to bind to the codons to which its transfer RNAs can also bind. The same is found for the amino acid isoleucine. This suggests to some people that the codon table results from ancient attractions between amino acids and little strands of RNA, perhaps even before transfer RNAs became the intermediary between them. Perhaps amino acids bound directly to the messenger code without all that complex machinery we see in operation today. Those affinities laid the groundwork for the link between the decoding of RNA into proteins.
It is easy to get sucked into polarized arguments, but when all the possibilities are considered, we might expect that maybe elements of all these theories are built into life. When the first code emerged, it seems logical to suspect that certain amino acids bound to bits of RNA and these affinities may well have something to tell us about how certain codons wound up coding for particular amino acids. And perhaps, parallel to these developments, evolutionary selection would have favored codes in which errors were minimized or at least reduced to levels sufficiently non-deleterious to allow for reliable reproduction. The fewer the errors, the more likely that offspring molecules would function properly and be propagated in the environment. Later, mutations may have led to reassignments of the codons to optimize the code further.
There does seem to be a little conundrum in all this. If the table is so crucial to life, such a core part of the genetic apparatus and its translation, surely once it got stuck into the very earliest life, it would remain there as, in Crick’s words, a frozen accident? Surely then, we should expect it to be a highly imperfect thing, full of idiosyncrasies that are a shadow of its early and manifestly vital part in the information storage system of life, a system whose later alteration would spell death for an organism? However, the adeptness of synthetic biologists to reassign codons to completely new amino acids in the laboratory shows that life may have more opportunity to experiment than once was thought. There exists room for change. In the natural environment, there are ways these swaps in the table of codons could have happened even after the fundamental architecture of the genetic code was established. A cell might stop using a certain codon. Perhaps a mutation made it lose the gene for a transfer RNA and thus its associated amino acid. Then later, through a duplication of another transfer RNA gene and its mutation, the codon could be reassigned to a new amino acid entirely. Through such genetic reassignments, the table may be changeable. Like metabolic pathways, it seems that life can traverse new paths and try new experiments in codes.
This flexibility in the biochemistry of life has a much more fundamental general implication—that historical chance, the contingent roles of a pair of dice, may not get locked into life as frozen accidents and immutable legacies of history as rigidly as has always been assumed. If life is somewhat malleable in the way it can change its molecular machinery, then life can also be shaped by physical principles, to be optimized against the laws of physics, and it is not inescapably imprisoned in a molecular straitjacket wrapped around it at the dawn of life.
The enduring question is to what extent this flexibility of life’s biochemistry leads to predictability. Would some alien, with no
prior knowledge of Earth, but with some rudimentary knowledge of biological information storage, be able to predict a priori what we now observe—four particular bases and a codon table that translates the code?
We probably still have much to learn about the flexibility and evolution of the code to answer this question. Synthetic biologists will take us closer to a better understanding. However, that the code is just an accident, a contingent fluke of history that would never be repeated elsewhere, looks unlikely. We find good reasons for having four bases in the code; in the landscape of chemical possibilities, the four bases have certain properties that optimize an information storage molecule and its flexibility and ability to replicate. We also find that the codon table has nonrandomness. Although the exact events and selection pressures that yielded the coding table we observe today are still not fully unraveled, many conditions, from the chemical affinity of amino acids to RNA and the drive to minimize errors, are not mere contingencies, but they emerge from physical and chemical properties, the latter ultimately linked to the physics of atoms.
Like much about biology, before all this knowledge was available, it was almost impossible to predict what the genetic code looked like. No one could have written down the details of DNA in 1950, before the discovery of its structure. This observation leads some people to claim that this is a difference between biology and physics, that physics has laws and equations that are used to make predictions, but biology does not. But this comparison may not be completely fair. We do need to understand the genetic code and its chemistry before we can make predictions, and this knowledge has come only relatively recently in scientific history. In the same way, physicists did need to have some rudimentary knowledge about the behavior of gases under different temperatures and pressures before they could conceive of the ideal gas law, for instance. Indeed, a basic grasp of the genetic code has allowed people to run computer models of error minimization in alternative codes and to predict which ones work and which do not. By running laboratory experiments alongside models using different bases, scientists have been able to explore and predict the efficacy of alternative genetic codes. Synthetic biology, in its quest to make new codes and incorporate them into life, demands better predictive capabilities. The success and accomplishments of synthetic biologists rely on their ability to make predictions about the new compounds or creatures they plan to design.
The sheer complexity of the code compared with, say, a box with helium inside might stymie our ability to use simple equations to predict the code’s behavior and puts the code in a very different category for study. The code’s complexity, however, does not make the code any less a slave to physical principles, and neither does this history imply that the code is an unlikely product of chance. Although physicists who predict behaviors of gases no doubt face a more constrained problem than do scientists trying to predict the complexity of the genetic apparatus, separating these investigations into two entirely different problems seems misguided. Much about the code opens itself to more simple physical, and thus chemical, principles than was once assumed.
Further down the line from the genetic code to the proteins that it encodes, we find this same apparent lack of chance. Churning out from the RNA, the last step in the code, are long strings of amino acids, proteins that will fold to make the working molecules of life: the enzymes and structural parts to build a cell.
Curious researchers have long wondered whether the number and type of amino acids used in proteins are random, particularly given that in the nonbiological world, there are hundreds of amino acids. Initial attempts to discover if, given a set of random alternatives, evolution would select the twenty amino acids predominantly found in life were inconclusive but pointed tantalizingly to the possibility of nonrandomness. But then in 2011, Gayle Philip and Stephen Freeland published a refined study in the journal Astrobiology. They began with the assumption that of all the properties of amino acids essential to protein structure, three are of special importance.
First, the size of amino acids determines how the long amino acid chain, which constitutes a protein, folds and whether it can be properly bundled together into an active molecule. Second, the charge of an amino acid also plays a key role in a protein. Negatively and positively charged amino acids can be attracted to one another and form a bridge that holds a protein together. Many of these bonds dotted through the structure are one of the most important means by which the whole necklace of amino acids can be brought together into a well-defined, ordered structure able to carry out a useful function. Third, repelling water (hydrophobicity) is yet another very useful feature of amino acids. As proteins are dissolved in water or sometimes in membranes where there is no water at all, different affinities for water among the amino acids turns out to be crucial for molding the behavior of whole proteins or parts of them. It alters how they attach to other proteins and whether they have an attraction for regions of the cell that lack water, such as the deep interior of the cell membrane.
Philip and Freeland chose some amino acids and then ran a program that selected a group of them according to whether they had a wide range of sizes, charges, and hydrophobic nature. Their program also chose amino acids that would not only have a wide biochemical range, but also have an even distribution across that range so that their biochemical properties did not overlap too much in one area. This distribution, they assumed, would provide the best tool kit for life since a wide range of properties for the proteins was available. If they are evenly distributed, then life can also choose amino acids that have a high chance of being close to what it would ideally like to use. It is a sort of pick and mix of characteristics like the varied wrenches you might have in a DIY toolbox. You do not want all the wrenches to be very large or small; rather, you want an even distribution, a variety of sizes that have a good chance of including the one you need to undo that bolt on the old door you are trying to remove.
The first set of amino acids Philip and Freeland included in a study of “coverage” (their term for a wide, even distribution of properties) were the amino acids found in the Murchison meteorite. On the assumption that amino acids like these would have been raining down on the Earth when life first emerged, it seemed reasonable to test them first. They tested a selection of amino acids among the fifty found in the meteorite. These fifty included eight actually used in life and another forty-two amino acids that as far as we know are not found in living things. Philip and Freeland ignored some of the branched amino acids (sixteen in total) that were deemed too large and obstructing to be plausibly used by life to make proteins.
What they found was astonishing.
When they compared the twenty amino acids used by life with a million alternative bundles of amino acids randomly chosen from the fifty in the meteorite, the twenty used by life had better coverage and combinations of all three of the key factors than did any other set. The amino acids used by life appeared to be anything but random. Instead, they seemed to be selected by evolution to give a wide range and even distribution of properties that might be useful in proteins—what one might expect from a versatile and flexible tool kit.
Nevertheless, only eight amino acids used by life were found in the meteorite, and the remaining twelve that life uses are derivatives of the first set of eight primitive amino acids. These twelve derivatives were made possible by new synthetic pathways in the cell. So the researchers reran their analysis, again using the fifty primitive meteorite amino acids, but this time searching for only an optimum group of these eight very primitive amino acids. Less than 1 percent of these groups were better than the natural eight used in life, and less than 0.1 percent were better across all three characteristics. Again, the results were uncanny.
In these last calculations, however, maybe we have some enticing evidence of groups of amino acids that might be better than those used in life? In life, we seem to have nonrandomness, but might there be sets even more promising than the chance selection of the eight used by life? We should be cauti
ous asking such questions. As Philip and Freeland themselves recognized, they chose only three features of amino acids, and there may be other factors that decide how useful amino acids are, such as their ability to move around in a protein chain (steric or structural factors).
In their final test, the researchers made an even larger group of amino acids. Fifty came from the meteorite, but they were augmented with the other twelve that life uses. Philip and Freeland also threw into the mix another fourteen amino acids that are made in the cell as intermediate compounds in the synthesis of the twelve made in the cell and used in proteins; these fourteen are not actually encoded in DNA. From this much-expanded set of seventy-six amino acids, taking random groups of twenty amino acids, not a single group out of a million possible alternatives outperformed the natural set.
Philip and Freeland’s results are provocative. There is still much we do not know. Which amino acids were in abundance on early Earth to be commandeered by life? Are other properties of amino acids of importance for life in deciding which ones get picked? Doubtless, as knowledge of early Earth and proteins converges, these sorts of studies will be improved. Barring some strange coincidence and bad luck in running the program that has taken us down an egregious blind alley that will one day be corrected, Philip and Freeland’s research does strongly suggest that the twenty amino acids predominantly used to construct living things are not random. They have been selected for their collective versatility in providing a range of properties that life can sample to build the huge array of proteins from which the earliest life was assembled.
The Equations of Life Page 16