by Matthew Cobb
Attempt by Vernon Ingram and myself to find two hens with different lysozymes so far completely negative … It is all rather discouraging. Even if we find a difference we shall still have to show it’s due to amino acid composition, and also do the genetics.41
But then Ingram came up with something much more dramatic, which provided the answer to Ephrussi’s question and showed the potential applications of the work that Crick and his colleagues were doing. In 1949, two articles had appeared in Science reporting new findings on sickle-cell anaemia, a genetic disease that predominantly affects people of African origin and can lead to debilitating weakness or death. People with sickle-cell disease have strange sickle-shaped red blood cells – hence the name. It had first been described as a genetic disease in 1917, but it was only in July 1949 that James Neel of the University of Michigan showed that the best explanation of the pattern of inheritance of sickle cells was that the trait was caused by a single gene.42 Four months later, in November 1949, Linus Pauling and his group published a study of the haemoglobin molecule found in the blood of patients with sickle-cell anaemia, with the dramatic title ‘Sickle cell anemia, a molecular disease’.43 They described two forms of haemoglobin, one associated with sickle-cell anaemia (later called the S form) and the other with normal individuals (the A form). These two forms could be distinguished by electrophoresis – when the two haemoglobins were placed in a gel and subjected to an electric field, they moved at different rates, so after a certain time they were found at different points. Pauling’s group concluded that the way the two forms moved under electrophoresis suggested that there was a difference in the shape of the two molecules, with the S form having more positive charges than the A form. For the first time, a disease had been shown to have a molecular basis.
Max Perutz’s group at Cambridge had been studying haemoglobin for years, and had begun to look at the structure of the S form. Encouraged by Crick and Perutz, in the summer of 1956 Vernon Ingram looked at some samples of the S form of haemoglobin that had been left unused by another researcher, Tony Allison. Allison’s field work in Uganda had led him to suggest why sickle-cell anaemia still exists despite having potentially lethal consequences in patients with two copies of the sickling gene: individuals who have one normal and one sickling gene have a lower load of malaria parasites.44 Having one sickling gene is actually a good thing in malaria-ridden regions.
Ingram’s aim was to detect the precise molecular difference between the S and A forms that had been described in general terms by Pauling. The haemoglobin molecule was too long for Fred Sanger’s sequencing method, which remained relatively primitive. First, Ingram had to snip the molecules into smaller bits by using an enzyme; then he divided the components on filter paper, first separating the two forms by applying an electrical current just as Pauling had done and then applying a solvent at 90° to the direction of the current. Once a chemical had been sprayed on the paper to reveal the invisible components of the haemoglobin molecule by turning them purple, the result was a two-dimensional ‘finger print’ of the molecule.45
The differences in electrical charge shown by the two forms of haemoglobin had two potential explanations. Pauling thought it was most likely that the overall shape of the molecule was the source of the differences, and that in turn was produced by the way in which the protein was folded and shaped during synthesis. As he argued in 1954:
The interesting possibility exists that the gene responsible for the sickle-cell abnormality is one that determines the nature of the folding of polypeptide chains, rather than their composition.46
The other possibility was that the difference in charge was simply to do with a difference in the sequence of amino acids in the two forms.
Ingram’s ‘finger prints’ of the two forms of haemoglobin each produced about thirty marks on the filter paper; these were identical in both cases, except for one small blob that was present in the S form and absent in the A form. He therefore analysed this particular component in great detail, and was able to show that the difference between the two molecules was due to a small part of the protein. In an article in Nature published in 1956, Ingram concluded:
there is a difference in the amino-acid sequence in one small part of the polypeptide chains. This is particularly interesting in view of the genetic evidence that the formation of haemoglobin S is due to a mutation in a single gene. It remains to be seen how large a portion of the chains is affected and how the sequences differ.47
Shortly before the article appeared, Ingram presented his findings to the meeting of the British Association in Sheffield, in August 1956. The science correspondent of The Times was there and immediately sniffed out the story, describing Ingram’s work in some detail and highlighting its significance:
He described how in the past six weeks he had shown for the first time how a mutation in a single gene, the unit of heredity, can modify chemical structure in a substance in the body for which that gene is responsible.48
Within a year, Ingram was in the pages of The Times again, following publication of another article in Nature that this time revealed the exact cause of the peptide difference between the two forms: it was all due to a single amino acid.49 In the S form, a valine molecule replaced the glutamic acid found in the normal A form. This minor change, in the most fundamental component of a protein, caused the changes in the behaviour of the molecule that led to the debilitating disease. Although the genetic code was still a mystery, Ingram’s work had shown that there was a relation between a mutation in a gene, which was made of DNA, and a change in the amino acid sequence of a protein. Ephrussi’s question had been answered. For The Times, this was a discovery that was on a par with Mendel’s observations that led to the foundation of genetics. The article concluded:
Dr Ingram’s demonstration of a single, identified difference between genetically determined haemoglobins is thus the nearest that has been got to a direct view of one of Mendel’s genes in action. This is indeed a landmark.50
Ingram’s discovery was a brilliant confirmation of the new understanding of gene function. Previously it had been thought that the gene shaped the protein, like a three-dimensional mould. Ingram, inspired by Crick, had now shown that a gene could alter a single amino acid in a one-dimensional protein sequence, and that in turn would alter the function of the protein. A new vision of protein synthesis was appearing, and it had consequences for how the gene and the genetic code were understood.51 If the gene contained information and a change in that information led to a change in an amino acid sequence, this suggested that genetic information corresponded to nothing more than the amino acid sequence in a protein produced by a gene.
* As Crick later recognised, from a strict cryptographical point of view, the genetic code is a cipher, not a code. Crick had in mind something like the Morse Code, which is equally not a code from a technical point of view. And as he later put it, ‘“genetic code” sounds a lot more intriguing than “genetic cipher”’ (Crick, 1988, pp. 89–90). The term genetic code was first used in print in 1958, by Geoffrey Zubay (Zubay, 1958).
–EIGHT–
THE CENTRAL DOGMA
In September 1957, Francis Crick gave a lecture at University College London. He had been invited by the Society for Experimental Biology to speak at a symposium entitled ‘The Biological Replication of Macromolecules’.1 For nearly a year, Crick had been musing about what genes actually do, thinking about the mechanisms of protein synthesis, trying to tease out the biochemical steps and their theoretical consequences. The London conference was his opportunity to present his ideas about protein synthesis. It was the most influential lecture he ever gave.
The French molecular geneticist François Jacob was in the audience and recalled his impression of Crick:
Tall, florid, with long sideburns, Crick looked like the Englishman seen in illustrations to nineteenth-century books about Phileas Fogg or the English opium eater. He talked incessantly. With evident pleasure and volubly, as if he was afra
id he would not have enough time to get everything out. Going over his demonstration again to be sure it was understood. Breaking up his sentences with loud laughter. Setting off again with renewed vigour at a speed I often had trouble keeping up with. … Crick was dazzling.2
Crick’s recollection was not quite so positive – ‘I ran overtime, and didn’t get it over very well’, he felt.3
Crick’s lecture led to two articles – one in Scientific American that appeared at the same time, and another more detailed piece that was published in the symposium collection in 1958. This second paper has been cited more than 750 times and is still cited frequently.4 The bold proposals Crick made in his lecture continue to play a fundamental role in modern debates over the nature of the genetic code and the evolutionary process.
Behind its dull title, ‘On protein synthesis’, Crick’s talk explained some of the new ideas about what genes do and how they do it, all wrapped up in an elegant, conversational style that still beguiles the reader. At every step he admitted the limits of his knowledge, and distinguished clearly between established fact and logical conjecture. It was these conjectures that proved so influential. Crick’s starting point was his assumption that the role of genes is to control protein synthesis, even though, as he put it with disarming simplicity, ‘the actual evidence for this is rather meagre’:
I shall … argue that the main function of the genetic material is to control (not necessarily directly) the synthesis of proteins. There is a little direct evidence to support this, but to my mind the psychological drive behind this hypothesis is at the moment independent of such evidence. Once the central and unique role of proteins is admitted there seems little point in genes doing anything else.5
Crick claimed that the involvement of nucleic acids in protein synthesis was ‘widely believed (though not by every one)’.6 The caveat was significant – even at this date, there were still those who found it hard to accept that all genes were made of DNA. A year earlier, at a symposium on ‘The Chemical Basis of Heredity’ held in June 1956, Bentley Glass noted the reluctance of some geneticists to abandon the protein part of the old nucleoprotein theory of heredity, but he was nonetheless sure that ‘most persons’ accepted that DNA (or RNA in some viruses) was the primary genetic material.7 This obviously left open the possibility that there might be other, secondary and non-nucleic acid forms of heredity. At the same meeting, George Beadle focused on this problem in his opening address, underlining that there was no experimental evidence that DNA was the genetic material in organisms other than viruses and bacteria. In his talk, Steven Zamenhof, who had worked with Chargaff and had been an early supporter of Avery’s claim that DNA was the genetic material in bacteria, accepted that although ‘extensive evidence’ supported the argument that the pneumococcal transforming principle was DNA, and that ‘no evidence to the contrary had ever been presented’, there was nevertheless ‘no absolute proof’.8 With his fellow-scientists still haunted by doubt, Beadle argued that ‘it is assumed as a working hypothesis that the primary genetic material is DNA rather than protein.’9 What now appears evident was merely a ‘working hypothesis’ in 1956.
As Crick outlined in both his symposium talk and in his Scientific American article, the direct evidence for the involvement of nucleic acids in protein synthesis came from two recent sources – Ingram’s discovery of the molecular basis of sickle-cell anaemia, and work on the tobacco mosaic virus (TMV), which was now known to use RNA as its genetic material. In 1956, Heinz Fraenkel-Conrat, a biochemist working at Berkeley, had shown that it was possible to separate the RNA and protein components of the TMV and then reassemble them to produce a functional virus. Fraenkel-Conrat then went on to recombine protein from one TMV strain and RNA from another strain; the recombinant strains produced viruses with proteins that were typical of the RNA donor strain and never like those found in the protein donor strain. In his 1957 lecture, Crick described this finding and concluded: ‘the viral RNA appears to carry at least part of the information which determines the composition of the viral protein’.10 For the first time, Crick spelled out the precise implications of the use of the term information in genetics: ‘By information I mean the specification of the amino acid sequence of the protein.’11
This was not information as Shannon had described it – Crick was not referring to a mathematical measure of the degree of uncertainty of one particular message. He was talking about something much more tangible, straightforward and immediately understandable: the sequence of amino acids in a protein. A gene somehow carried the code that could produce a particular amino acid sequence, and it was also capable of passing that code to the next generation. The information necessary to include a particular amino acid in a protein was encoded by the sequence of bases in the DNA that made up the gene.
Crick then repeated an assertion that he had recently been touting at various conferences: the sequence of amino acids alone was significant in determining protein function. Although the three-dimensional shapes of proteins were ultimately the explanation of specificity, of the myriad functions of proteins, these complex structures were in fact contained within the one-dimensional message of DNA. The three-dimensional protein structure simply emerged out of the one-dimensional DNA code through the process of protein synthesis, argued Crick. Nothing else mattered except the sequence of amino acids, which was in turn determined by the order of bases in the DNA molecule: the genetic code. As Crick put it: ‘It is of course possible that there is a special mechanism for folding up the chain, but the more likely hypothesis is that the folding is simply a function of the order of the amino acids’.12 Crick called this view the sequence hypothesis:
In its simplest form it assumes that the specificity of a piece of nucleic acid is expressed solely by the sequence of its bases, and that this sequence is a (simple) code for the amino acid sequence of a particular protein. This hypothesis appears to be rather widely held.13
Crick may have though it was ‘widely held’, but plenty of scientists were deeply suspicious of Crick’s view.
In June 1956, Erwin Chargaff, in typically contrarian and acerbic mood, complained that too much attention was being paid to nucleic acids, and he still wondered whether there was something else apart from DNA that was giving proteins their final shape:
It is obvious that sequence cannot be the sole agent of biological information. Even if the arrangement of an entire polynucleotide would be written, a third dimension would be lacking: the operative three-dimensional shape of the molecular aggregate, in which perhaps not only numerous nucleic acid molecules take part, but also proteins and possibly even other polymers’.14
Another doubter was Macfarlane Burnet, who had just published a short book called Enzyme, Antigen and Virus: A Study of Macromolecular Pattern in Action. Burnet’s book had made an impression on Crick because it took an opposing stance to him on many points. In his book, Burnet declared that it was ‘quite impossible at the present time’ to envisage how a nucleic acid could specify a linear polypeptide sequence; like Chargaff, instead of relying totally on DNA, Burnet wanted to ‘leave open the possibility of some associated factor, histone possibly, which allows a sufficient complexity to carry the needed coding.’15 Even leading scientific figures were clinging to the idea that proteins must play an essential role in genetics.
Part of the problem was that no one knew exactly how a cell took a nucleic acid sequence and turned it into a blob of protein made up of amino acids. It was known that the main steps of protein synthesis took place in the cell’s cytoplasm, which surrounds the nucleus. On the one hand, DNA was known to stay in the nucleus, so it was clearly not directly involved; on the other hand, RNA was found throughout the cell in a variety of forms, apparently associated with the synthesis of proteins. It looked as though the genetic message passed from DNA to RNA, but how this happened was unclear. It was equally uncertain how the amino acid chain was assembled to make a protein, although small recently discovered RNA-rich structures called mi
crosomal particles seemed to be involved. Informal discussion at a conference in 1958 led to these particles being re-baptised ‘ribosomes’, by which name they are known today.16 A few weeks before Crick’s talk, Mahlon Hoagland and Paul Zamecnik at Harvard had shown that if amino acids were radioactively labelled, proteins throughout the cell were eventually found to be radioactive, indicating that the amino acids had been assembled into a protein.17 On shorter time-scales, however, radioactivity was found only in the ribosomes, strongly suggesting that amino acids had to pass through the ribosome to be combined into a protein.18 It seemed likely that the RNA in the ribosome was the actual site where the protein was made. This raised the question of the nature of the link between DNA and RNA, and how each amino acid found its way to the ribosome.
In his lecture, Crick turned his brilliant mind to both these issues and publicly described the idea he had worked up with Brenner: there must be an unknown class of small molecule, which they called an adaptor, which would gather each of the twenty amino acids and take them to the ribosome, so that the protein could be assembled there. The most likely hypothesis was that there was one adaptor for each type of amino acid, and that it would contain a short stretch of nucleotides – a tiny bit of the genetic code that was able to bind to the RNA template in the ribosome, just like base pairing between the two strands of the DNA double helix. At the same time, on the other side of the Atlantic, Hoagland and Zamecnik were isolating what was later identified as Crick’s adaptor – eventually known as transfer RNA or tRNA – without knowing anything about Crick’s hypothesis.19