by Matthew Cobb
Other less complicated theoretical codes were dreamt up. Richard Roberts found an easy solution to the problem of the U-rich data coming from Ochoa and Nirenberg’s labs – the Us were simply not relevant, he argued, because the code was really composed of only two bases. Although this fitted with the known RNA base composition of viruses, which was not U-rich, it had the disadvantage of providing only sixteen possible combinations, when at least twenty were needed to code for the naturally occurring amino acids. To get out of this dilemma, Roberts suggested that the code was composed of both doublets and triplets, with some doublets, such as AA or GG, indicating the start of a triplet. There was no evidence to support this, but as Roberts cunningly pointed out, there was no evidence against it, either.23 At the beginning of 1963, Thomas Jukes put forward a variant of this idea, suggesting that each triplet had what he called a ‘pivotal’ base, which could change without altering the amino acid that was coded for. He announced that U was the ‘pivotal base’ in all triplets containing U (it is not).
In a return to the numerology that had dominated the coding problem in the 1950s, Eck published a four-page article in Science in which he claimed to detect a symmetrical pattern in the attribution of triplets to amino acids – four amino acids were coded by four triplets, with the remaining sixteen each being coded by two.24 Eck said all he had to do was to tabulate the known distribution of triplets ‘and the puzzle practically solved itself’. But the solution was based entirely on conveniently fitting as-yet unallocated triplet/amino acid combinations into the schema. The pattern was in Eck’s head, not in the data.
Finally, the pioneer of the biological application of information theory, Henry Quastler, came up with a schema based on data from amino acid changes induced by mutations. He was unimpressed by the cell-free studies, arguing that they did not necessarily measure protein synthesis, and above all he emphasised that in most cases the precise nature of the polynucleotides was unknown.25 Crick was scornful of Quastler’s paper, claiming that it consisted of ‘a rather poor fit to some very doubtful data’ and was based on ‘an unspecified technique’. All of the triplets predicted by Quastler were wrong, with the exception of UUU = phenylalanine, which was hardly a prediction.
The real answer to the conundrum of the predominance of U nucleotides in the cell-free data was inadvertently provided by Solomon Golomb.26 He performed various calculations and concluded that it was not possible to deduce anything about the role of non-U sequences without doing an experiment. Which is what the biochemists did, and by the middle of 1962, RNA with no U bases had been shown to encode amino acids.27 The bewildering preponderance of U-rich polynucleotides was an artefact due to the solvents that were initially employed in the cell-free system.28
*
With the smell of competition in everyone’s nostrils, two meetings took place in the summer of 1962 at which progress on cracking the code was discussed. In July, a ‘Colloquium on Information in Contemporary Science’ was held in the glorious surroundings of the thirteenth-century Royaumont Abbey to the north of Paris. As one of the participants recalled, ‘the gardens, the musical evenings, and supper by candlelight’ were almost as significant as the discussions.29 This meeting involved philosophers, mathematicians, sociologists and biologists and was one of the last attempts to explore the usefulness of information theory across scientific disciplines.
In his brief introduction on ‘the concept of information in molecular biology’, André Lwoff of the Pasteur Institute set out his position quite trenchantly, unwittingly repeating the critique of information theory as applied to biology that had been made at similar meetings in the US a few years earlier. Lwoff argued that it was not useful to calculate the information contained in a DNA sequence, using either Shannon’s equations or Wiener’s negative entropy (following Brillouin, Lwoff called it negentropy), because such calculations did not deal with the meaning or function of that information in the organism. As he put it: ‘the calculation of negentropy using Shannon’s formulas cannot in any way be applied to an organism.’ It would be like trying to calculate the information content of a tragedy by Racine, he said. For a biologist, argued Lwoff, the only meaning of information was ‘a sequence of small molecules and the set of functions they carry out’.30 Wiener and the philosophers who were present could not see what the problem was, thereby inadvertently illustrating the gulf between the information theoreticians and the biologists.
Similar mutual incomprehension was revealed in the other sessions, which were often fractious. The mathematician Benoît Mandelbrot suggested that such information-focused cross-disciplinary meetings were pointless:
The implications of the strict meaning of information have sufficiently explored for its consequences to be quite clear. What remains is so difficult that it can usefully be discussed only in private … we must consider that its scientific usefulness has ceased, at least for the time being’.31
Alongside these rather sterile plenary discussions there were workshops in which experts in the various fields explored their topic in more detail. The workshop on information theory in biology included a session on the genetic code, chaired by Delbrück, with contributions from Crick, Nirenberg, Woese, Jacob and Ochoa’s student Peter Lengyel. During the discussion, Crick introduced the term ‘codon’ to describe the group of bases that codes for an amino acid – the word had been invented by Brenner, apparently partly as a spoof on the other ‘-on’ words that had been coined by Benzer and by Jacob and Monod.32 It stuck, and is still in use today.
Much of the discussion at the workshop focused on the uncertainty of the results from the cell-free system: some participants questioned whether the polynucleotides truly contained the proportions of bases that Ochoa and Nirenberg’s groups assumed they did. Woese outlined his proposed code, framed in terms of the informational content of the different bases, and Jacob described protein synthesis in terms of a ‘theory of informational transfer and regulation’. Whatever the insights these presentations may have had, they were not published and left no trace on subsequent research. As most people agreed, the influence of information theory on molecular biology had passed its peak. Information had now become a vague but essential metaphor, rather than a precise theoretical construct.
This was reflected in the ‘Symposium on Informational Macromolecules’, which took place at Rutgers University in New Jersey, at the beginning of September 1962. Despite the title, there was very little direct exploration of the informational content of the macromolecules that were the subject of the meeting – DNA, RNA and proteins. When Ochoa opened the conference, he nodded in the direction of the new vocabulary, referring to ‘information coded into the DNA molecule’ that was ‘transferred to an RNA tape’, but his real position was made abundantly clear in his very first sentence: ‘This symposium deals essentially with the molecular mechanisms concerned with the genetic control and regulation of protein synthesis.’33 The focus was biochemistry, not information.
The first of two sessions on the genetic code was chaired by the veteran geneticist Ed Tatum, who referred back to his discovery of the ‘one gene, one enzyme’ principle with George Beadle, twenty years earlier:
I think back to the time when we started our work, so many years ago. I think we would not have been able to anticipate that we would, in this relatively short time, be present at a symposium on informational macromolecules. This is something that most of you take for granted, but I can assure you – and I think I speak for Dr Beadle too – that this is really an extraordinary phenomenon in the development of molecular biology.34
Around 250 people attended the meeting, but only 13 were from outside the US; neither Crick nor Brenner, nor anyone from Watson’s group, attended, and only François Gros was there from the Pasteur Institute. The stars of the show were the new kings of the code: Nirenberg and Ochoa.
Dozens of speakers summarised data from a range of species (including bacteria, mice and algae) that indicated that the genetic code was universal; they outlined
the growing conviction that only one of the two strands in the DNA double helix was used to make protein via RNA; and they described the recent discovery that non-U-containing polynucleotides could code for amino acids. But on the question of questions – the nature of the genetic code – there was no agreement. During the coffee breaks and at mealtimes, attendees argued about whether the genetic code had been cracked or not.35 Nothing was certain, beyond the fact that UUU coded for phenylalanine, AAA for lysine and CCC for proline.
During the meeting, both Ochoa and Nirenberg toyed with Roberts’s combined doublet–triplet code. Nirenberg assumed that the code was based on triplets, but warned, ‘it is not possible at this time to distinguish between triplet and doublet codes’.36 As he put it with disarming clarity: ‘Almost all amino acids tested can be coded by polynucleotides containing only two bases.’ Ochoa was even clearer – during the discussion of Nirenberg’s paper he said:
I must say I have been very impressed by Dick Roberts’ ingenious doublet code idea. … It almost looks as if that third base does not matter and, in this regard, I cannot help but think of the possible significance of Roberts’ proposal.37
In a summary of the meeting that appeared in a book collecting the talks, the organisers of the symposium suggested that the status of the genetic code at the time was something like that of the periodic table first published by Mendeleev – it was fragmentary and not all of its predictions were correct, but ‘nevertheless, a fundamental system had been discovered!’38
*
Francis Crick was frustrated by the mixture of unclear experimentation, loosely argued theory and guesswork that had begun to infest studies of the genetic code. In the summer of 1962, as scientists involved in the coding race were either recovering from Royaumont, preparing to go to Rutgers, or both, Crick wrote a long, highly critical, review article on the topic. In typical patrician style it was entitled ‘The recent excitement in the coding problem’. Crick summarised the work of the Ochoa and Nirenberg labs, praising their results as being ‘of very considerable interest’ before changing gear and pulling no punches:
There are so many criticisms which can be brought against this type of experiment that one hardly knows where to begin.39
Crick’s critique was rock solid: the composition of the polynucleotides apart from poly(U), poly(A), etc. was completely unknown (it was not even certain that the incorporation of the different bases into synthetic RNA molecules was truly random), the levels of amino acid incorporation in the ‘cell-free’ experiments were often worryingly low, and even the strongest effect – poly(U) coding for phenylalanine – was weakened by the fact that poly(U) sometimes seemed to code for leucine. Having grudgingly accepted that two codons could be reliably identified (only one – the inevitable UUU = phenylalanine – was in fact correct), Crick concluded that the methodological problems he had outlined ‘make the allocation of further triplets very precarious’.40 He continued:
although not one single codon can be said to be known with certainty we do know something: one codon for phenylalanine contains Us, one for proline contains Cs, and so on. The coding problem has moved out of the realm of rather abstract speculation into the rough and tumble of experimentation.41
Crick was not even convinced by the evidence that the code possessed redundancy – the evidence, he said, ‘is of two types, direct and indirect, and with one exception, none of it is satisfactory’.42 Having surveyed the various options, including Roberts’s code, Crick put his finger on what was the true situation: ‘if the code is really a pure triplet code its degeneracy makes it look at times more like a doublet one’.43
Crick was not scornful of theory – after all, it was theory that had underpinned most studies of the coding problem over the previous nine years, including many of his own contributions – but ultimately, more precise experimentation was needed. No matter how elegant a theoretical solution might be, the data would determine whether it was correct. Crick recognised that his own pursuit of theoretical models, such as the commaless code, had not led to any breakthroughs (he subsequently described the commaless code as ‘one of those nice ideas which is, nevertheless, completely wrong’44). He gave his readers a clear outline of how he thought research on the topic should proceed:
In the long run we do not want to guess the genetic code, we want to know what it is. … The time is rapidly approaching when the serious problem will not be whether, say, UUC is likely to stand for serine, but what evidence can we accept which establishes this beyond reasonable doubt. What, in short, constitutes proof of a codon? Whether theory can help by suggesting the general structure of the code remains to be seen. If the code does have a logical structure there is little doubt that its discovery would greatly help the experimental work. Failing that, the main use of theory may be to suggest novel forms of evidence and to sharpen critical judgement. In the final analysis it is the quality of the experimental work which will be decisive.45
About a week after sending off his article, Crick heard that he, Jim Watson and Maurice Wilkins had won the 1962 Nobel Prize in Physiology or Medicine, for their work on the structure of nucleic acids and its significance for information transfer in living material.46 The debates on the Nobel Prize committee are closed, but it seems probable that the renewed interest in the significance of the sequential structure of DNA produced by the cracking of the code convinced the committee that Watson, Crick and Wilkins’s time had come.
*
In early June 1963, just two years after Nirenberg and Matthaei’s discovery, Cold Spring Harbor Laboratory held its annual meeting under the title ‘Synthesis and structure of macromolecules’. This time, not only was Nirenberg allowed to attend, he had pride of place on the programme. In the ten years since gangly Jim Watson had presented the double helix structure of DNA in a stifling Cold Spring Harbor lecture theatre, the field of molecular biology had been utterly transformed – this was the largest ever meeting held at Cold Spring Harbor, with more than 300 scientists attending, about one-fifth of them from outside the US.
The seventy-four presentations at the meeting were focused on DNA, on various forms of RNA and on protein synthesis, and the framework was resolutely biochemical. Nirenberg’s talk was entitled ‘On the coding of genetic information’, but after the introductory paragraph he immersed himself in the biochemical details, even reverting back to the old vague language of specificity rather than giving any content to the idea of information. Nirenberg’s talk revealed that, as Crick had pointed out, the race to crack the code had hit an experimental bottleneck. The techniques that were employed – a combination of synthetic RNA of unknown sequence and data from the effects of mutations on viruses – could not crack the code. Worse, they could not even settle the question of whether the code was composed of groups of two, three or more bases. Furthermore, Nirenberg sounded a new note of caution about the technique that had made his name: it was possible, he argued, that natural messenger RNA found in cells might not use all sixty-four potential triplet codons; as a result, the randomly ordered synthetic molecules might ‘test the cell’s potential to recognise code words’, he said.47
Progress was certainly being made – increasingly detailed experiments and more accurately assembled synthetic RNA allowed Nirenberg’s group to suggest that a number of amino acids, including proline and phenylalanine, were coded by more than one codon, but there was still no absolute proof. The actual code remained out of reach, because the sequence of bases on the RNA molecules used in the cell-free system remained unknown. At the end of his talk, Nirenberg described how his group and that of Indian-born biochemist Gorind Khorana, who was based at the University of Wisconsin, were separately using two techniques for synthesising short bits of DNA, or oligodeoxynucleotides (oligo- is a Greek prefix meaning few). When these pieces of DNA were transcribed into RNA, it was shown that molecules composed of only four bases could still produce detectable levels of amino acids in the cell-free system. Nirenberg’s understated conclusion
pointed the way forward:
It is possible that defined oligodeoxynucleotides may be useful in the determination of nucleotide sequence and polarity of RNA code words, and also in the study of control mechanisms related to DNA-directed protein synthesis.48
The next talk was by Joe Speyer from Ochoa’s laboratory, who summarised their two-year-long attempts to correlate the theoretical frequency of different triplets in a synthetic RNA molecule with the levels of different amino acids.49 There was little new there, beyond a summary of research from a variety of species indicating that the code was universal. The Ochoa group had made a substantial impact in the field, recovering the initiative from Nirenberg, and showing what focused, large-scale molecular research could achieve, but the limits of their techniques were now apparent. Two weeks later, Ochoa gave a talk in Switzerland in which he inadvertently outlined the impasse his group was in; unlike Nirenberg, he had no solution to the problem.50
The summer of 1963 represented a double shift in the race to crack the code. New techniques had been developed for creating small RNA molecules of known sequence, while the competing laboratories had changed. Ochoa’s group effectively bowed out of trying to determine which triplets coded for which amino acids; Gorind Khorana, the expert in RNA synthesis, took their place.
Over the next two years, Khorana’s group refined its technique for creating small RNA molecules of a known sequence, and in 1964 Nirenberg’s laboratory solved the problem from the other direction – they worked out how to identify the nucleotide sequence on a piece of RNA that had just led to the incorporation of a particular amino acid into a protein chain. This bit of heavy-duty biochemistry involved trapping a complex of molecules – radioactive transfer RNA (tRNA; this was Crick and Brenner’s adaptor molecule), nucleotides and ribosomes – on a Millipore filter. Using this technique, Nirenberg and his colleague Phil Leder were able to show that a UUU triplet led to the binding of phenylalanine tRNA, whereas a UU doublet did not.51 There was no evidence for any of the fancy doublet-based codes that had been suggested in the previous couple of years – a codon was composed of three bases.