by Matthew Cobb
It was a Saturday afternoon in the Indian summer of 1961, and almost no one was around. Marshall was sitting alone at a table with his head bowed and his eyes glassy, obviously upset and depressed. … How could Marshall and Heinrich keep up with a lab that had nearly 20 scientists?65
After a cordial discussion with Ochoa in which it became obvious that collaboration was not an option, Nirenberg decided to fight back (he later said, ‘to my horror, I found that I enjoyed competing’), and he enlisted the help of Bob Martin and Bill Jones at NIH in synthesising polynucleotides for cracking the rest of the code.66 The race was on.
*
Francis Crick was immensely impressed by Nirenberg’s breakthrough – four months later, he described it on the BBC as ‘spectacular’ – but he did not shift his focus at all.67 Refreshed from a month-long holiday in Morocco just before the Moscow congress, excited by the new phase of discovery that had just been opened, Crick returned to Cambridge determined to settle the vexed question of whether the genetic code was a triplet code or not. Together with Brenner, he dreamt up the idea of studying mutants in the rII region of the T4 phage, which had been induced by a chemical that deleted single bases. According to Crick and Brenner’s thinking, a single base deletion would alter how the information was read, potentially rendering the message after that point nonsensical, because what they called the reading frame of the message would now be out of sync. For example, if there were a triplet code sequence such as ATG CAT CCC TGA … and the first C were deleted, then the sequence would become ATG ATC CCT GA … The first codon would be the same but the remaining codons would be altered. As Crick put it:
The simplest postulate to make is that the shift of the reading frame produces some triplets the reading of which is ‘unacceptable’; for example, they may be ‘nonsense’, or stand for ‘end the chain’, or be unacceptable in some other way to the complications of protein structure.68
They found that some point deletions in the rII region did indeed stop the phage from functioning. The trick was to combine a number of these deletions and thereby put the reading frame back in line. If three deletions restored viral function to the manipulated phage and it was able to infect bacteria (this would be detected by the plaques that infected bacteria formed in the Petri dish), that would provide very strong evidence that the code was based on triplets.
In the autumn of 1961, Brenner went to Paris, leaving Crick to do the experiment with Leslie Barnett, a 42-year-old microbiologist. Crick recalled the moment:
And all we had to do was look at one plate. And see if it had any plaques on it. So we came in late at night, ten o’clock at night, or something, and there were plaques on the plate! So I said to Leslie, ‘Let me check; we may have got the plates mixed up,’ and she checked it, and then I told her, ‘We’re the only two to know it’s a triplet code!’69
These findings were published in Nature at the end of December 1961, with a title that was full of Crick’s flair: ‘General nature of the genetic code for proteins’. The paper combined the experimental detail of the study of rII mutants and a summary of research on the coding problem (including Nirenberg and Matthaei’s work), and was infused with Crick and Brenner’s theoretical insight. The article contained four fundamental conclusions that are now taught in schoolrooms and university lecture theatres all around the world:
(a)A group of three bases … codes one amino acid.
(b)The code is not of the overlapping type.
(c)The sequence of the bases is read from a fixed starting point. …
(d)The code is probably ‘degenerate’; that is, in general, one particular amino-acid can be coded by one of several triplets of bases.70
Strictly speaking, not all of these conclusions had been proven. As the paper explained, it was technically possible that the number of bases in each group was six, or some other multiple of three, although this was highly unlikely. Second, although they did not yet have evidence that the code was ‘degenerate’, this ‘could also account for the major dilemma of the coding problem, namely, that while the base composition of the DNA can be very different in different micro-organisms, the amino-acid composition of their proteins only changes by a moderate amount.’71
Crick was later dismissive of the significance of the article, pointing out that ‘it was pretty obvious it was likely to be a triplet code … the fact is, if we’d shown that the code was a quadruplet code, that would have been a discovery.’ He even suggested, ‘I think you could have deleted the whole work and the issue of the genetic code would not have been very different.’ Nirenberg did not agree. In a letter to Crick written in January 1962, he described the article as ‘beautiful’. More than 700 papers have subsequently cited it, and in 2004 the original manuscript sold at Christie’s for £13,145. Crick’s conclusion was audacious, conveying the optimism that swept through the scientific community after Nirenberg and Matthaei’s transformation of the field:
If the coding ratio is indeed 3, as our results suggest, and if the code is the same throughout Nature, then the genetic code may well be solved within a year.72
This view was shared by Nirenberg, who shortly afterwards claimed that ‘within another six months or so most of the genetic code will be cracked.’73 Both men severely underestimated the difficulties ahead.
*
In January 1962, Crick gave a talk on the BBC that outlined the importance of Nirenberg’s discovery. He concluded by putting it into context, and posing some questions that we now know the answers to, and others that are still unanswered today:
We still don’t know whether the code is universal. The same 20 amino acids are used in proteins throughout nature, from virus to man, but it is not yet certain that the same triplets code them in all organisms, although preliminary evidence suggests this is probable. If so, we shall have the key to the molecular organisation of all living things on Earth.
But on Mars, I wonder? Will there be life, or the remains of life, on Mars? And will it be DNA and RNA and protein all over again? The same languages perhaps, with the same code connecting them? Who knows?74
In the coming years, Crick continued to be generous towards Nirenberg and Matthaei, recognising that their discovery had altered the course of history and paying glowing tribute to the importance of the work of the two outsiders. As he put it in 1962:
We are coming to the end of an era in molecular biology. If the DNA structure was the end of the beginning, the discovery of Nirenberg and Matthaei is the beginning of the end.75
* Lily Kay, the principal historian of this subject and someone not at all sympathetic to the ‘great man’ view of history, nevertheless wrote: ‘the breaking of the code by Nirenberg and Matthaei was one of the most stunning events in the history of modern science. It represented a victory of material ingenuity over the Pythagorean ideals and is a David versus Goliath tale of an obscure young scientist defeating the eminent gray matter of physicists, mathematicians, biochemists, and geneticists, some of them Nobel laureates.’ (Kay, 2000, pp. 254–55).
* In response to this claim, the joker Seymour Benzer sent Crick a photo of the congress in which the audience looked utterly bored (Crick, 1988, p. 131). I have been unable to locate any photo of the symposium – Benzer may well have deliberately sent a picture of a different meeting.
* In 1914, the veteran German chemist Emil Fischer had effectively suggested exactly the experiment that Nirenberg and Matthaei carried out. Writing at a time when nucleic acids were known to be an important component of the nucleus, but when their role was still unclear, Fischer discussed recent developments in the synthetic creation of nucleic acids: ‘we are now capable of obtaining numerous compounds that resemble, more or less, natural nucleic acids. How will they affect various living organisms? Will they be rejected or metabolized or will they participate in the construction of the cell nucleus? Only the experiment will give us the answer. I am bold enough to hope that, given the right conditions, the latter may happen and that artificial nucleic
acids may be assimilated without degradation of the molecule. Such incorporation should lead to profound changes of the organism, resembling perhaps permanent changes or mutations as they have been observed before in nature’ (cited in McCarty, 1996). Nothing came of Fischer’s insight.
–ELEVEN–
THE RACE
The weeks after Nirenberg’s dramatic revelation saw the beginning of a frenetic scientific race to crack the rest of the genetic code. Nirenberg and Matthaei’s papers appeared in Proceedings of the National Academy of Sciences in October, quickly followed by a paper from the Ochoa lab, also in PNAS. This was the first of nine articles by Ochoa’s group that were collectively entitled ‘Synthetic polynucleotides and the amino acid code’.1 The pace with which the Ochoa lab produced their material was extraordinary. The first five papers in the series were submitted in the space of about eighteen weeks – today it is hard to imagine this level of productivity when a group begins work in a new field. Irrespective of the fact that the articles would not have gone through any form of peer review – at the time, Academy members like Ochoa could treat the PNAS more or less as a private publishing service – this represented a remarkable avalanche of activity, and it nearly crushed Nirenberg, who was also distraught at the death of his parents around this time.2
Ochoa’s dominance was expressed both in the data and in the tone and detail of his group’s publications. Nirenberg and Matthaei’s contribution was downplayed – they were not even cited in several of the papers – and the opening salvo in the nine-paper bombardment closed with a statement that seemed to claim the breakthrough as Ochoa’s own:
These and other results reported in this paper would appear to open up an experimental approach to the study of the coding problem in protein biosynthesis’.3
Faced with the aggressive output of the Ochoa group, Nirenberg became more savvy about the presentation of his work – his last paper in 1961 was entitled ‘Ribonucleotide composition of the genetic code’, a far clearer indication of what his work implied than the technical titles he initially employed.4
This wave of discovery, coupled with Crick’s Nature paper on the ‘General nature of the genetic code for proteins’, which appeared in December 1961, finally attracted the attention of the world’s press. For nearly six months, the scientific world had been buzzing with excitement – in a reference to the shooting down in 1960 of an American U-2 spy plane over the Soviet Union, Rollin Hotchkiss quipped that ‘The U-2 incident started the cold war, the U3 incident started the code war.’5 Now the general public got to hear about it. On Xmas Eve 1961, the New York Herald Tribune announced ‘The code of life finally cracked’, explaining the four-month press silence since Nirenberg’s Moscow announcement by claiming that ‘the news did not leak into the newspapers until last week’. A few days later, the Sunday Times took a similar line – ‘Scientists have cracked the code of life’ – but chose to emphasise the role of British scientists and the importance of Crick’s recent Nature paper.6 Crick was embarrassed by the coverage, writing to Nirenberg to explain that he had done his best to set the record straight.7 Nirenberg’s response was typically relaxed. It also shows that the way in which the media treat scientific breakthroughs has not changed that much:
I haven’t seen the English newspapers but the American press has been saying that this type of work may result in (1) the cure of cancer and allied diseases (2) the cause of cancer and the end of mankind, and (3) a better knowledge of the molecular structure of God. Well, it’s all in a day’s work.8
*
The widespread excitement was stoked further by what seemed to be rapid progress in the race to crack the code. With the new focus on manipulating RNA rather than DNA, there was an unstated shift in the way that the code was now thought of. It was no longer located solely in the double helix, but equally in the molecular transcription of the gene, in the shape of messenger RNA. As a result, the letters of the code increasingly became those of the RNA molecule – A, C, G and U. Today, as researchers study genomes and their DNA content, the code is tending to be presented in terms of DNA bases.
Nirenberg and Ochoa’s groups both used the same method: they synthesised RNA polynucleotides with known ratios of bases but unknown sequence, put them into the cell-free protein synthesis system and interpreted what came out in terms of the ratios of bases that had been put in. For example, if the ‘coding unit’ was a triplet, a randomly assembled polynucleotide composed of five parts U to one part C – ‘poly(5U1C)’ – would contain the coding sequences UUU UUC UCU, UCC, CUU, CUC and CCC in varying proportions. The CCC combination would be present in much smaller amounts than UUC, because there was far less C present in the mixture than there was U. However, the three ‘2U1C’ triplets – UUC, CUU and UCU – would be expected to be present in the same proportions, so their effects could not be distinguished.
When Ochoa’s laboratory reported that with poly(5U1C) they got high levels of phenylalanine and lower levels of proline and of serine, this was interpreted in terms of the relative proportions of the possible coding triplets. Starting with the known fact that UUU coded for phenylalanine, they claimed that proline was coded by 1U and 2Cs (either CCU, UCC or CUC) and serine was coded by 2Us and 1C (either UCU, UUC or CUU).9 Although this was not too far from the truth – proline is indeed coded by CCU, and serine by UCU – in both cases the method used by the Ochoa group could not distinguish between the three alternative triplets. Furthermore, if one of those alternatives coded for phenylalanine, this would not be detectable, because it would be assumed that the phenylalanine was encoded by UUU, about the only part of the code that everyone agreed on. In fact, this was exactly what was happening – UUC, like UUU, codes for phenylalanine. Comparing the theoretical frequencies of different triplets and the proportions of amino acids they seemed to code for could only get you so far in cracking the code.10
By the beginning of 1962, the two competing laboratories had identified the potential base compositions of the RNA code for nearly all twenty naturally occurring amino acids.11 They had no idea of the sequential order of the nucleotides, nor of how many nucleotides there were in each coding unit; although most people assumed that the code was based on triplets, there was still no absolute proof that this was true. In February 1962, Ochoa told Crick that he expected that fewer than ’30 triplets would stand for amino acids’ – it was widely thought that the code generally involved a one-to-one correspondence between a single RNA coding unit and a single amino acid, with the other forty-odd possible triplets coding for ‘punctuation’ or for nothing.12 But within weeks it was evident that many amino acids were coded by more than one RNA coding unit, suggesting that the code was degenerate or, as we now put it, redundant.13
And something else was odd. As Ochoa’s group noted in March 1962: ‘A striking feature of the code triplets is that they all contain U.’14 A month later, Nirenberg’s lab made the same observation: ‘a surprisingly high proportion of U has been found in coding units thus far’.15 Attempts to get amino acid incorporation with polynucleotides that did not contain U failed repeatedly, leading some researchers to conclude that the twenty-seven possible triplets that did not contain U were ‘nonsense’ triplets, with no meaning. But when the composition of virus RNA was studied, U was not present in particularly high levels, implying that coding units not containing U must exist somewhere in nature, or that something strange was going on in the test tube. Confusion was increased when both Ochoa and Nirenberg’s labs reported that poly(U), which everyone agreed led to the incorporation of phenylalanine, also seemed to code for leucine and valine.16 Everyone assumed that this must be an experimental error (it was), but until that could be explained, there was a real danger: if a triplet coded for more than one amino acid, then all existing ideas about coding would have to be scrapped. The whole code edifice could come crashing down.
*
The cascade of new data, and the lack of clarity about what it all meant, encouraged the theoreticians to retu
rn to the coding problem. As Crick put it in 1966, during this period there was ‘a flurry of theoretical papers, most of which are best forgotten’.17 Forgettable they may have been, but they reveal the thinking of scientists at the time and highlight how they were groping their way towards the right answer. In September 1961, Richard Eck re-raised the possibility that the code might be overlapping, with the bases in one coding unit also forming part of the subsequent unit (so, for example, the sequence CACGU would contain three triplets – CAG, ACG, and CGU).18* Brenner had disproved this idea to most people’s satisfaction in 1957; his criticism had been reinforced by the fact that mutational studies of viruses had shown that changes to a single base only ever altered a single amino acid – if the code were overlapping, then two or more amino acids should be altered.19 Nevertheless, as Crick later put it, it was possible with ingenuity to come up with a complex overlapping code.20 Another theoretician took as his starting point the complementary coding groups that would be found on the two strands of the DNA molecule, for example TAC and ATG, and argued they must code for the same amino acid, to avoid the problem of the cell having to know which strand of DNA to use.21 This bold and mistaken vision underestimated the skill of the cell, which can indeed distinguish between the two strands.
The most sophisticated attempt to crack the code by theoretical means was made by Carl Woese (pronounced ‘Wose’). His starting point was that any code had to be compatible with the known nucleotide composition of RNA in a variety of organisms, which was known not to be rich in U. After a tortuous set of calculations, Woese produced a code that used only twenty-four potential triplets out of sixty-four, with most amino acids being coded by both a triplet that was low in G + C and one that was high in these two bases. Like all the other theoretical schemes, this one was ingenious, but wrong.22