What Mad Pursuit

Page 12

by Francis Crick

Gamow’s “code” was unusual in several ways. Each amino acid was coded by a triplet of bases (actually several triplets, related by symmetry), but the triplets standing for successive amino acids overlapped. For example, if a small part of the sequence was . . . GGAC . . . , then GGA stood for one amino acid, and GAC for the next one. Naturally this imposed restrictions on the amino acid sequence. Certain sequences could not be coded for by Gamow’s code. The matter was not completely straightforward since Gamow did not know which of his triplets stood for which amino acid. This was left open, and would have had to be discovered by experiment. At that time, although the amino acid composition of many proteins had been determined, at least approximately, only fragments of sequence were known (Fred Sanger’s complete sequence of the two chains of insulin were still in the works) so there was not much data with which to test Gamow’s theory.

Jim and I had several objections to Gamow’s ideas. We rather doubted whether the cavities in DNA were capable of doing the job. We worried about his symmetry assumptions, and we didn’t like the idea of DNA coding directly for proteins. RNA seemed a more likely candidate, but perhaps RNA could fold up into a structure that could form the necessary cavities. Gamow had put in, implicitly, one restriction that seemed natural enough. When joined together in a chain, one amino acid is quite close to the next one—only about 3.7 Å apart (the distance between strongly bonded atoms is typically between 1 and 1 ½ Å). By contrast, a group of three bases spreads over a much larger distance. For this reason an overlapping code, which reduces this distance, seemed more likely, in spite of the restrictions it put on the possible amino acid sequences.

Gamow had made another contribution. We eventually realized that solving the code could be viewed as an abstract problem, divorced from the actual biochemical details. Perhaps by studying the restrictions on the amino acid sequences, as they became available, and by watching how mutants affected a particular sequence, one could crack the code without having to know all the intervening biochemical steps. Such an approach seems natural to a physicist, confronted by the complexities of chemistry and biochemistry, though in fairness to Gamow one must concede that his ideas were originally based on our model of the double helix, not just on abstract ideas.

That winter (1953-54), while I was working at the Brooklyn Polytechnic—it was my first visit to the States—I managed to disprove all possible versions of Gamow’s code, by using the small amount of sequence data then available and by assuming (a quite unsupported assumption) that the code was “universal"—that is, was the same in all living organisms.

During the next summer Jim and I spent three weeks together at Wood’s Hole. Gamow and his wife were there, staying at Albert Szent-Györgyi’s cottage by the water. (Szent-Györgyi, a Hungarian, was awarded a Nobel Prize in 1937 mainly for discovering vitamin C.) By that time Gamow had come to know a number of people interested in the coding problem, in particular Martynas Yeas and Alex Rich. On most afternoons Jim and I went out to the cottage and sat on the shore with Gamow, discussing all the different aspects of the coding problem, idly chatting or just watching Gamow showing some of his card tricks to any pretty girl who happened to be around. The pace of scientific life in those days was less hectic than it is now.

By this time we knew Gamow well enough to call him Joe. His first name was George, but he signed his letters “Geo.” He was under the impression that this was pronounced Joe, so that was what his friends called him. We were familiar with his boyish handwriting, his very Russian omission of articles (a and the), and his erratic spelling. We assumed that the latter was due to his writing in a foreign language, but later we learned that in his native Russian his spelling was just as bad. We were also impressed by his automobile, a large white convertible with red seats. He told me that a third of his income came from his academic salary, a third from writing, and a third from consulting, which partly explained his somewhat expensive car. He was fun to be with, and friendly, in spite of being older and more senior than we were. He was the champion of the Big Bang theory of the origin of the universe—among other things he predicted the existence of the background radiation, which had yet to be discovered. The Catholic Church preferred his theory to the rival theory of Continuous Creation, proposed by Gold, Bondi, and Hoyle. Even so, I was mildly surprised when he told me that he had exchanged reprints with the Pope, by way of the Holy Office.

Gamow enjoyed his glass of whiskey. Although I didn’t realize it at the time, he was probably already on the slippery path to alcoholism. I was not at all surprised to receive by mail an invitation, in his own characteristic handwriting, to a “whiskey, twisty RNA Party” to be held at the cottage in a few days’ time. The next time I went there I thanked Joe for his invitation, but he knew nothing about it. To his puzzlement letters of acceptance kept pouring in, brought down from the main house by Albert Szent-Györgyi. Naturally Joe suspected that Szent-Györgyi was the culprit, but he denied this. “On my heart,” he said, “it is not me.” Joe was embarrassed so I realized something had to be done. It did not take me long to discover that Jim was one of the perpetrators of the hoax. He did not usually play practical jokes, but his mentor, Max Delbrück, was notorious for them. The other hoaxer turned out to be Szent-Györgyi’s nephew, Andrew Szent-Györgyi. I negotiated a treaty. Jim and Csuli, as he was known, would provide the beer and Joe would provide the whiskey. The party turned out to be a great success, with almost everyone invited turning up for it.

Meanwhile Joe, in his typical way, had founded that unusual organization, the RNA Tie Club. This was a very select club—Gamow decided who was to be a member. There were to be only twenty members, one for each amino acid, and not only did each member receive a tie, made to Gamow’s design by a haberdasher in Los Angeles (Jim Watson and Leslie Orgel arranged this), but also a tie pin with the short form of his own amino acid on it. I think I was Tyr but I’m not sure I ever got the tie pin. The club never met, but it had notepaper that listed its officers. Geo Gamow was described as Synthesizer, Jim Watson as Optimist, and I as Pessimist. Martynas Yeas was denoted Archivist and Alex Rich as Lord Privy Seal. As it turned out the club served as a mechanism for circulating speculative manuscripts to the few people interested. After I returned to England in the fall of 1956 I wrote a paper for it analyzing Gamow’s ideas, generalizing them, and suggesting what turned out to be an important idea, the adaptor hypothesis.

The paper was called “On Degenerate Templates and the Adaptor Hypothesis.” The main idea was that it was very difficult to consider how DNA or RNA, in any conceivable form, could provide a direct template for the side-chains of the twenty standard amino acids. What any structure was likely to have was a specific pattern of atomic groups that could form hydrogen bonds. I therefore proposed a theory in which there were twenty adaptors (one for each amino acid), together with twenty special enzymes. Each enzyme would join one particular amino acid to its own special adaptor. This combination would then diffuse to the RNA template. An adaptor molecule could fit in only those places on the nucleic acid template where it could form the necessary hydrogen bonds to hold it in place. Sitting there, it would have carried its amino acid to just the right place it was needed.

There were several implications of this idea. The one I want to stress here was that it meant that the genetic code could have almost any structure, since its details would depend on which amino acid went with which adaptor. This had probably been decided very early in evolution and possibly by chance. Because of this pessimistic conclusion the paper led off with a quotation from an obscure Persian writer of the eleventh century: “Is there anyone so utterly lost as he that seeks a way where there is no way?” and ended with the remark, “In the comparative isolation of Cambridge, I must confess there are times when I have no stomach for the coding problem.”

The paper was circulated to members of the RNA Tie Club but was never published in a proper journal. It is my most influential unpublished paper. Eventually I did publish a s
hort remark briefly outlining the idea and tentatively suggesting that the adaptor might be a small piece of nucleic acid. It soon turned out that a biochemist at the Harvard Medical School, Mahlon Hoagland, had quite independently obtained some experimental evidence that supported my proposal. As every molecular biologist now knows, the job is done by a family of molecules now called transfer RNA. Ironically, I did not immediately recognize that these transfer RNA molecules were the predicted adaptor because they were considerably bigger than I had expected, but I soon saw that there were no grounds for my objection. A little later Mahlon came to Cambridge for a year and we did experiments together on transfer RNA. We worked in a small upstairs room in the Molteno Institute that the director graciously allowed us to use since it was temporarily vacant.

Much theoretical effort during this period was put into attempts to solve the coding problem, especially by Gamow, Yeas, and Rich. Gamow and Yeas suggested a “combination code” in which the order of the bases in a triplet did not matter, only its combination of bases. While this was structurally implausible it had some appeal because it so happens there are just twenty combinations of four things taken three at a time. Again there was no hint as to how to allocate each amino acid to its own combination.

For a time it was still thought that the code would have to be an overlapping one, and so the search for restrictions on the amino acid sequence continued. As new sequences became available they were added to those we had already collected, but there was little hint of any forbidden sequences, although the data were so sparse that at first we could not be sure that some sequences were missing. The hunt was mainly restricted to adjacent amino acids. There are 400 (20 × 20) possible amino acid doublets. Any overlapping triplet could code for only 256 (64 possible triplets × 4) of these, so there had to be restrictions if the code were of this type. Sydney Brenner realized that one could sharpen this argument. Any one triplet would have only four other triplets as its neighbors on one side. For example, if the triplet in question was AAT, then the only triplets that could precede it were TAA, CAA, AAA, and GAA, while only ATT, ATC, ATA, and ATG could follow it, assuming as always that the code was overlapping. Thus if in the known sequences one particular amino acid had been shown to have at least nine neighbors following it, then it would have to have at least three triplets allocated to it, since two triplets could have only eight neighbors following it. Sydney was able to show that the number of triplets needed easily exceeded sixty-four and thus tnat all overlapping triplet codes were impossible. This proof assumed that the code was “universal"—that is, was the same in all the organisms from which the experimental data had come—but this was sufficiently plausible to make us almost certain that the idea of an overlapping code was wrong.

This still left the geometrical dilemma. In the process of protein synthesis, how could one amino acid get near enough to the next one to enable them to be joined together, since their triplets would have to be some distance apart as they were not overlapping? Sydney suggested that the postulated adaptors might each have a small flexible tail, to the end of which the appropriate amino acid was joined. Sydney and I did not at the time take this idea very seriously, referring to it as a “don’t worry” theory, meaning that we could see at least one way that nature might have solved the problem, so why worry at this stage what the correct answer actually was, especially as we had more important problems to tackle. In this case it has turned out that Sydney was correct. Each transfer RNA does indeed have a small flexible tail to which the amino acid is joined.

In parenthesis let me say that the English school of molecular biologists, when they needed a word for a new concept, usually use a common English word such as “nonsense” or “overlapping,” whereas the Paris school like to coin one with classical roots, such as “capsomere” or “allosterie.” Ex-physicists, such as Seymour Benzer, enjoyed inventing new words ending in “-on,” such as “muton,” “recon,” and “cistron.” These new words often obtained rapid currency. I was once persuaded by the molecular biologist François Jacob to give a talk to the physiology club in Paris. It was then the rule that all such talks had to be given in French. As I hardly speak French I did not warm to his suggestion at all, but François pointed out to Odile (who is bilingual in French and English) that if I gave the talk she also could have a trip to Paris, so my opposition was soon worn down. I decided to talk on the problem of the genetic code, thinking, quite incorrectly, that I could do most of it by simply writing on the blackboard. It soon became clear that I would have to speak some French in order to get the ideas across, so I started by dictating the whole talk to a secretary (normally I speak from notes). I then deleted all the jokes, since even when giving a talk to a secretary I found that my ad lib jokes intruded, and I felt I could hardly read them out in cold blood. Odile then translated the talk into French, and a typed version of her manuscript was produced, with various stress marks added to make it easier for me to read. There was a problem, however, about the translation of “overlapping.” What could be the French for that? Odile eventually remembered a suitable word, and we set off for Paris. I was sufficiently mistrustful of this strange word that on arrival I asked François what word they used for “overlapping.” “Oh,” he said, “we simply say ‘oh-ver-lap-pang.’”

I would like to report that the talk was a success. I started off fairly well, reading carefully, but as I warmed up my pronunciation got gradually wilder and wilder. The discussion, mainly in French, taxed me greatly. After the talk I asked François how it went. “It was not too bad,” he said tactfully, “but it was not you.“ With no spontaneity and no jokes I saw just what he meant. I have never since attempted to give a talk in a foreign language, even though my French accent has improved a little over the years.

It was now clear that the code was not overlapping, but this immediately raised a new problem. If the code was read as a sequence of now-overlapping triplets, how did we know where the triplets began? Put another way, if we were to imagine that the correct triplets were marked by commas (for example, ATC,CGA,TTC,…), how did the cell know exactly where to put the commas? The obvious idea, that one started at the beginning (whatever that was) and went along three at a time, seemed too simple, and I thought (quite wrongly) that there must be another solution. It occurred to me to try to construct a code with the following properties. If read in the right phase, all the triplets would be “sense” (that is, stand for one amino acid or another), whereas all the out-of-phase triplets (those that bridged the imaginary commas), would be “nonsense"—that is, there would be no adaptor for them and thus they would not stand for any amino acid. I mentioned this idea to Leslie Orgel, who immediately pointed out that for such a code the maximum number of sense triplets was twenty. A triplet such as AAA must be nonsense since otherwise the sequence AAA, AAA could be read out of phase. (We tacitly assumed by now that any amino acid could follow any other amino acid.) That eliminated four of the sixty-four triplets. If the XYZ triplet was sense, then the cyclic permutations YZX and ZXY would have to be nonsense, so the maximum number of sense triplets was 60/3 = 20. The problem was: Did a set of twenty triplets exist that had this property? I was confined to bed with a nasty cold but found I could easily get up to seventeen. Leslie mentioned the problem to John Griffith, who found a set of twenty with the right properties. We soon found several other solutions (plus numerous permutations) so there was no doubt that such a code could exist. We even invented a plausible argument why it could be useful.

The problem of finding a solution having twenty sense triplets is actually not an especially difficult one. A little later I was booked on a night flight from the States to England. Waiting to board I found myself chatting to Fred Hoyle, the cosmologist. He asked what I was doing and I explained to him the idea of the comma-free code. The next morning, as the plane approached the English coast, he came back to where I was sitting with a solution he had worked out overnight.

Naturally Orgel, Griffith, and I were excited b
y the idea of a comma-free code. It seemed so pretty, almost elegant. You fed in the magic numbers 4 (the 4 bases) and 3 (the triplet) and out came the magic number 20, the number of the amino acids. Without more ado we wrote it up for the RNA Tie Club. Nevertheless I was hesitant. I realized that we had no other evidence for the code, other than the striking emergence of the number twenty. But then if some other number had come up we would have discarded the idea and looked around for some other code that led to twenty amino acids, so the number twenty by itself was not confirmatory evidence.

In spite of my worries, the new code attracted some attention. After four people had asked if they could quote our paper (an RNA Tie Club note was not equivalent to publication), we decided to write it up for the Proceedings of the U.S. National Academy of Science, where it duly appeared in 1957. An account of it even appeared in a book for the general reader called The Coil of Life written by Ruth Moore, though this was not published till 1961, by which time we had ceased to believe in the idea.

Since in the comma-less code each amino acid had just one triplet it would have been possible, knowing which amino acid went with each triplet, to deduce the base composition of the DNA, assuming it all coded for protein, from the average amino acid composition of all its proteins. Because the latter was pretty similar in all organisms (though we knew now there were small variations), this would imply that the DNA molecules in all species had much the same composition. As more measurements were made, especially on different types of bacteria, it became clear that this was very far from the case. Of course in all cases the amount of A was the same as the amount of T (A=T) since the base pairing demanded this, and for the same reason G=C, but the structure of DNA itself put no restrictions on the ratio of A+T to G+C, and this ratio was found to vary a lot from one organism to another. This made it likely that the comma-free code must be wrong.

‹ Prev Next ›