Book Read Free

Life's Greatest Secret

Page 14

by Matthew Cobb


  What Crick called ‘the magic twenty’ came to be the criterion against which all potential codes were measured.14 What was called ‘the coding problem’ began to have a whiff of numerology – coding schemes were developed, with the objective always being to come up with twenty possible combinations that might correspond to the twenty widely occurring amino acids. Neither Crick nor Watson had previously thought much about the relationship between a base sequence in DNA and the amino acid sequence – they were more focused on the problem of how the double helix might unwind during gene replication. They suggested that the sequence was ‘the code that carries the genetical information’, but they had not thought beyond that fundamental insight. As Watson told the Cold Spring Harbor meeting in the summer of 1953, he and Crick could not explain how the gene controlled the activity of the cell.15 Gamow’s intervention brought the question of coding to the forefront of everyone’s mind. For Crick, it occupied an important portion of his life for the next fifteen years.16

  *

  In the months following his letter to Watson and Crick, Gamow organised informal discussions involving biologists and a small gang of physicists and mathematicians – including Edward Teller (‘the father of the H-Bomb’) and the future Nobel Prize winner Dick Feynman – who were seduced by Gamow’s infectious enthusiasm and by his zany and intellectually provocative correspondence, which came complete with jokes, comments and cartoons. Gamow spent his time flitting from laboratory to laboratory, writing his slightly mad letters on headed notepaper from hotels or railway companies, scattering forwarding addresses like confetti.17 Gamow was larger than life – he was more than six feet tall, with a taste for hard liquor, practical jokes, magic tricks and women, and he spoke in a thick Russian accent that was often hard to understand.18 Having him around could be hard work – in February 1955, Watson wrote to Crick: ‘Gamow was here for 4 days – rather exhausting as I do not live on whisky.’19 Crick recalled:

  And he was what is called good company, was Gamow. I wouldn’t quite say a buffoon, but – yes, a bit of that, in the nicest possible way. You always knew, if you were going to spend the evening with Gamow that you would have a ‘jolly time’. You know. And yet there was something behind it all.20

  When it came to discussing coding, Gamow was irrepressible; as soon as one of his schemes crashed into the hard wall of biochemical fact, he simply came up with another. So when it became evident that his DNA-based diamond model had little traction, Gamow, undaunted, switched to thinking about coding in RNA. This molecule was not so amenable to the diamond model – its precise structure was unclear, but it was probably some kind of single helix, meaning that Gamow’s original scheme would not work.

  In March 1954, Gamow and Watson came up with the wheeze of creating an informal group in which a select few scientists – the number twenty again – could bat around ideas related to the coding problem. An entirely male body, this jokey group was known as the ‘RNA Tie Club’ after the hand-embroidered ties bearing the single helix of the RNA molecule they were each given.21 Each member was also given a name corresponding to one of the twenty amino acids and a small gold tie-pin carrying the three-letter abbreviation of ‘their’ amino acid (so Gamow was ALA for alanine, Crick was TYR for tyrosine, etc.). The members of the ‘Club’ never actually met all together, but small groups of them gathered to drink and chat, with Gamow generally acting as master of ceremonies. The outstanding feature of the Club was that it allowed its members to present half-formed ideas without the pressure of publication or presentation at a conference – Crick later drew a parallel with the modern circulation of e-mails among groups of scientists.22 Reading the slow correspondence between members of the RNA Tie Club who were scattered around the world, with letters taking days to arrive at their destination – sometimes on the other side of the planet – you cannot be sure whether they would have benefited or suffered from the rapidity of modern electronic communication.23 The correspondence would have bounced back and forth far more quickly, but the participants would have had less time to think and develop their ideas.

  More than half of the RNA Tie Club members were either physicists, chemists or mathematicians, but none of them were involved with the two main points at which mathematics and biology were intersecting – cybernetics and what had become known as information theory.24 Although Gamow did attend a 1956 conference on information theory in biology, there were no direct interactions of any significance with scientists working in either cybernetics or information theory, with the exception of von Neumann.

  Gamow’s role in this phase of work on the genetic code was fundamental. He gave the project a shape – in fact, he made it a project – by pulling together a group of disparate individuals. As RNA Tie Club member Alexander Rich later recalled:

  What Gamow did was to bring a kind of enthusiasm to the problem, and an intensity and a focus.25

  What Gamow also brought was an emphasis on the role of information. And with his showman’s instinct, he made sure that his influence was not restricted to the twenty or so members of the RNA Tie Club. In October 1955 he published an article in Scientific American that gave the scientific community and the public a glimpse into the thinking of the elite group.

  Entitled ‘Information transfer in the living cell’, Gamow’s article began by describing the fundamental unit of life in a radically new light, written in a way that was sure to grab the reader: ‘The nucleus of a living cell is a storehouse of information,’ he wrote. Gamow summarised the various coding schemes that had been devised, and showed readers how the work on the genetic code was linked with the other scientific sea-change that was taking place – the development of computers and cybernetics. The cell was described as ‘a self-activating transmitter which passes on very precise messages that direct the construction of identical new cells’, a reference to von Neumann’s work on self-replicating automata. In fact, Gamow thanked von Neumann for helping with the calculations involved as RNA Tie Club members looked for some kind of pattern in the frequency of amino acids found in different proteins. Gamow explained that the code was involved in transmitting information, drawing a rather laboured analogy between the cell and a factory, in which genes were blueprints stored in filing cabinets (chromosomes) which were used by workers (enzymes) on the factory floor (in the cytoplasm of the cell). For Gamow, information was at the heart of the cell. What that information was, he did not say.

  Over the next year, Gamow, Teller and other members of the RNA Tie Club devised a number of theoretical models of the genetic code; each time their aim was to devise a scheme that produced twenty unique combinations corresponding to the widely occurring amino acids.26 Gamow even managed to rope in a colleague from the Los Alamos laboratory, who used precious time on one of the rare computers available for scientific work – it was called MANIAC – to crank through the myriad possible combinations.27 There was no direct way of testing any of these models – even rudimentary DNA sequencing was two decades ahead.

  After a year of excited chatter, Crick’s enthusiasm for the coding mania began to wane, and in January 1955 he produced the first of the RNA Tie Club’s informal documents, surveying the various ideas that had been tossed about ‘to subject them to the silent scrutiny of cold print’.28 Crick tested Gamow’s model against the first protein amino acid sequence data, which had been obtained thanks to some brilliant and complex chemistry by Fred Sanger, who was also based in Cambridge. In 1951, Sanger published a paper describing the sequence of amino acids in the insulin molecule. First he described the B chain, which was thirty amino acids long; two years later, he published the sequence of the shorter, but technically more demanding, A chain (twenty-one amino acids).29 This work proved so significant that Sanger won the Nobel Prize in Chemistry a mere seven years later.

  Crick used amino acid sequence data for cow and sheep insulin to show that Gamow’s diamond code could not work; the two insulin sequences differed by a single amino acid in the middle of the chain, but in th
e diamond code a minimum of two amino acids would be altered by any base change. The diamond code was wrong. As Crick put it acidly:

  it is surprising how quickly, with a little thought, a scheme can be rejected. It is better to use one’s head for a few minutes than a computing machine for a few days!30

  Despite this jibe, Crick’s view of Gamow’s contribution was overwhelmingly positive – ‘without our President, the whole problem would have been neglected and few of us would have tried to do anything about it,’ he wrote. Crick praised Gamow for his three main contributions: he introduced the idea of ‘degeneracy’ (several different base sequences may code for the same amino acid – this is now generally called redundancy), he suggested that the words in the code might overlap (Crick admitted that neither he nor Watson had thought of this possibility), and he focused thoughts on coding as an abstract idea, ‘independent as far as possible of how things might fit together’.31

  Crick had a better grasp than Gamow of ‘how things might fit together’, and realised that he had to come up with a coding mechanism that fitted biological reality. But the exact details of protein synthesis – which was what coding ultimately related to – were not yet apparent. In a flash of brilliance, Crick described an idea he had discussed with a young South African researcher and junior member of the RNA Tie Club, Sydney Brenner, which they called ‘the adaptor hypothesis’. Crick argued that the physicochemical structure of nucleic acids showed that neither DNA nor RNA was used as a direct template for building proteins – there was simply no physical correspondence between nucleic acid structure and the shape of proteins. Crick therefore hypothesised the existence of small ‘adaptor’ molecules, each specific to a particular amino acid, which would bind onto the DNA or RNA chain on one part of its structure and to an amino acid on another. Crick recognised that there was no evidence for the existence of such molecules, but he breezily dismissed this problem by supposing that there would be very few of them in the cell, so they would be difficult to detect.

  If the adaptor hypothesis were true (and it was), this meant that the role of DNA and RNA would be reduced to simply carrying genetic information: they had no direct structural role as templates; they were merely a medium. If large nucleic acid molecules did not act as a physical template for the production of proteins, it became possible to imagine more complex forms of overlapping codes in which each word included some letters from the previous word. One of the consequences of such codes was that some pairs of amino acids should be found together in a protein sequence more often than expected by chance – each word in the code was not strictly speaking independent, because it shared some letters with the words on either side in the DNA sequence. Despite – or rather because of – this rash of new codes which were marked by what Crick called ‘bewildering variety’, his conclusion was not optimistic: ‘In the comparative isolation of Cambridge I must confess that there are times when I have no stomach for decoding’, he wrote.32

  *

  Crick’s appetite might have been flagging, but other thinkers from outside the RNA Tie Cub were still keen. Drew Schwartz from Oak Ridge Laboratory produced a complex model that involved tweaking Watson and Crick’s DNA structure and then allowing only particular kinds of amino acid to be bound in ‘holes’, as in Gamow’s diamond code. Schwartz’s hypothesis, which he admitted was speculative, was a way of linking the idea of a code (a word that Schwartz did not use) with previous models of protein synthesis, in which the gene – whatever it was made of – was thought to determine the shape of a protein molecule, by acting as a template.33 The fact that most people realised that proteins were not synthesised on DNA did not seem to bother Schwartz.

  For his part, Gamow remained as determined as ever. Together with Alex Rich and Martynas Yčas, Gamow summarised all the various coding alternatives that had been developed and explained why they had all failed.34 Then, in a ingenious attempt at predicting what the code might be, the trio took data on the frequency of the different amino acids found in tobacco mosaic virus proteins and predicted what the frequency of each of the sixty-four possible ‘triads’ (AGC, etc.) might be, given the observed proportions of each of the four bases in viral RNA. But applying this method to different viruses gave completely different results: alanine was supposedly coded either by GCU or AAG according to the tobacco mosaic virus data, but by AAC or GGC if the turnip yellow virus was used (both viruses are based on RNA, so have U in place of T). If the genetic code were universal, which everyone accepted, it seemed likely there was a problem with this clever approach. In the end, all that Gamow and his colleagues could conclude was that overlapping codes looked unlikely and that there were at least three bases in each ‘word’. This was hardly a great step forward from Watson and Crick’s initial vague suggestion that the sequence of bases was a code.

  The final blow to the idea of overlapping codes was delivered by Sydney Brenner, who had recently returned to Witwatersrand University. In the summer of 1956, Brenner gathered all the known amino acid sequence data and tried to see whether it would fit the coding model that was most popular at the time. This code was again based on groups of three letters or triplets (such as ATC), it was overlapping in that the final two letters in each triplet formed the first two letters of the next triplet (so, for example, the base sequence ATCCG would contain three triplet words – ATC, TCC and CCG), and, in the language of the time, it was ‘degenerate’ – an amino acid could be represented by more than one triplet.35 Because of the overlapping code, any given triplet could be followed by only four different overlapping triplets (so ATC could be followed by only TCA, TCC, TCG or TCT) and preceded by only four different triplets (AAT, CAT, GAT or TAT). Everyone assumed that each triplet corresponded to an amino acid, so that would mean that any amino acid coding for, say, cysteine, could only have four different amino acids on either side of it in the sequence. Brenner’s collection of sequence data showed that the actual sequence variability was much greater – cysteine, for example, was preceded by fifteen different amino acids and followed by fourteen different amino acids. Brenner calculated that if the code were overlapping, more than seventy triplets would be required to code the few dozen sequences thus far assembled, whereas only sixty-four combinations were available. Given that the full range of amino acid sequence combinations was probably as high as it could be (in other words, any of the twenty amino acids could precede or follow any other), Brenner’s conclusion was simple: ‘all overlapping triplet codes are impossible’. ‘It seems clear that non-overlapping equivalence between nucleic acids and proteins must exist’, he wrote, but it was not at all obvious what this might be.36 Brenner had shown what the code could not be, not what it was.

  Gamow took Brenner’s demolition job in good part and ensured that a slightly expanded version of the younger man’s article appeared the following year in Proceedings of the National Academy of Sciences. Brenner recalled: ‘I’m proud of that paper because that was just really spurred by just knowing how to divide by four!’37 In rewriting the article for a public audience, Brenner framed the problem in the most abstract way – given it was still not certain how protein synthesis took place, and what exactly were the relations between DNA and RNA and protein, he carefully avoided this issue in the article and above all added a new subtitle, taken from some of Gamow’s articles, which focused on the role of triplet codes ‘in information transfer from nucleic acids to proteins’.38

  Here was the great advantage and the fatal weakness of the three years that the RNA Tie Club had spent trying to come up with theoretical answers to the coding problem. The work of the Club had elevated coding to a completely abstract level, far away from the molecular jiggery-pokery that was going on every second in every cell all over the planet. That meant that the code was not explicitly about the mechanism of protein synthesis – it was about information transfer, although nobody had spelt out what that information was. But all this abstraction would remain at the level of talk unless it could be turned into bioche
mistry. In the end, experimental data had the last word, not the theoretical schemes of Gamow and his clever drinking pals in the RNA Tie Club.

  *

  Twice, Boris Ephrussi asked Francis Crick a question that he could not answer, and both times it was the same question. The first occasion was at Woods Hole in 1954, the second was in a Paris café a year later. Each time, Ephrussi wanted to know why Crick was so certain that the DNA sequence encoded the sequence of amino acids.39 This pointed question went to the heart of everything that had been done since Watson and Crick had discovered the double helix – they had been avoiding the central issues of what the sequence of bases did, or to put it another way, what information they contained. Everyone assumed that there was what was called a colinearity between the sequence of bases in a DNA molecule and the sequence of amino acids in the corresponding protein, but Ephrussi’s question underlined that, in the light of the available evidence, that information could equally be linked to something as mysterious as the overall form of the protein, or it could be something else altogether. As Crick later recalled:

  There was no evidence, you see; there was absolutely no evidence that a gene made a difference in the sense that you could actually determine the amino-acid sequence and show that with a mutation it was changed.40

  Goaded and intrigued by Ephrussi’s challenge, in the spring of 1955 Crick began searching for an example of protein variation that he could tie to genetic variation. He looked for a protein that was widely available and was from an organism that could be studied genetically. The protein he chose was an enzyme called lysozyme, which is found in chicken eggs and in human tears. Together with the German-born biochemist Vernon Ingram, Crick tried to crystallise lysozyme from the eggs of various bird species, and even used onions to make himself cry to provide a source of tears. The project was a failure – they could get lysozyme from chickens but not from other species, meaning that a comparison between species would be impossible. So they tried looking for variation in lysozymes from different individual chickens, hoping to eventually relate such variation to genetic differences. Again they found nothing. As Crick wrote to Brenner:

 

‹ Prev