What Mad Pursuit
Page 13
Its final downfall came from two directions. Our work on phase-shift mutants, described in chapter 12, made it unlikely, but a more decisive blow was dealt by Marshall Nirenberg when he showed that poly U (a simple form of RNA) coded for polyphenylalanine, whereas in a comma-free code UUU should have been a nonsense triplet. Finally the correct genetic code, confirmed by so many methods, has proved decisively that the whole idea is quite erroneous. However, it is just conceivable that it may haveplayed a role near the origin of life, when the code first began to evolve, but this is pure speculation.
The idea of comma-free codes attracted the attention of combinatorialists, in particular Sol Golomb. We had failed to solve the problem of enumerating all possible triplet overlapping codes (with four letters) although we had found more than one solution. This enumeration was worked out by Golomb and Welch, using a very neat argument (which we ought to have seen for ourselves) as a key part of the proof. The problem was also solved by the Dutch mathematician H. Freudenthal at about the same time.
Eventually the code (see appendix B) was solved by experimental methods, not by theory. Major contributors were the groups of Marshall Nirenberg and of Gobind Khorana. The group of an earlier Nobel laureate, Severo Ochoa, also made important contributions. Even as the code was coming out, attempts were made to guess the whole from the part, but these were also largely unsuccessful. In some ways the code embodies the core of molecular biology, just as the periodic table of the elements embodies the core of chemistry, but there is a profound difference. The periodic table is probably true everywhere in the universe, and especially relevant in places that have about the same temperature and pressure as the Earth. If there is life on other worlds and even if that life also uses nucleic acids and proteins, which is far from certain, it seems very probable that the code there would be substantially different. There are even minor variants of it in some of the organisms we have here on the Earth. The genetic code, like life itself, is not one aspect of the eternal nature of things but is, at least in part, the product of accident.
9
Fingerprinting Proteins
IN THE LAST CHAPTER I discussed the various theoretical attempts to solve the coding problem. In this one I describe some experimental approaches. The problem was much the same as before: Do genes (DNA) control the synthesis of protein? And if so, how?
It seems obvious enough now that the amino acid sequence of a protein is determined genetically, and in particular by the base sequence of a stretch of DNA (or RNA), but this was not always so clear. After the double helix was discovered the idea seemed much more attractive, so much so that Jim and I began to take it for granted. The next step was to show that the gene and the protein it coded were co-linear. By this I mean that the sequence of bases in that stretch of nucleic acid was in step with the corresponding sequences of amino acids in the particular protein it coded, just as a stretch of Morse code is co-linear with the corresponding message in English.
In those days there seemed no hope of sequencing either DNA or RNA directly, but in favorable circumstances we thought it might be possible to order a set of mutants within one gene, using standard genetic methods. Since the genetic distances were likely to be rather small, the recombination rates involved were expected to be much less than those geneticists usually measured. This implied that many progeny would have to be examined, suggesting that it would be necessary to use some sort of microorganism, such as a bacterium or a virus.
Once the mutants had been put in order, the next step would be to pin down the amino acid change due to each mutant. Although sequencing a protein chain was then still laborious, Fred Sanger had shown that it could be done, and we expected that for a small protein it would not be impossibly difficult.
Some time in the summer of 1954 I was sitting on the grass at Wood’s Hole, explaining these ideas to the Polish geneticist Boris Ephrussi. Boris, by then working in Paris, had been particularly interested in genes in yeast that appeared to be outside the nucleus of the cell. We know now that such cytoplasmic genes are coded in the DNA of the cell’s mitochondria, but at that time all that was known was that they did not behave like nuclear genes. Boris was indignant. “How do you know,” he asked, “that the amino acid sequence is not determined by a cytoplasmic gene and that all the nuclear genes do is to fold up the protein correctly?”
I don’t think Boris necessarily believed this (and certainly I did not), but his question made me realize that we first needed to show that a single mutant in a nuclear gene altered the amino acid sequence of the protein for which it coded, probably changing just a single amino acid. On returning to Cambridge I decided that this was the next most important step to take.
It was not at all clear what organism to use nor what protein to study. A little later Vernon Ingram joined us at the Cavendish. His main task was to add heavy atoms to hemoglobin or myoglobin, to help the X-ray work, but he and I decided to have a go at the genetic problem. We realized that for the first step we need not map the gene in detail. All we needed was enough genetic information to show that a mutant was being inherited in a Mendelian way and was therefore likely to belong to a nuclear gene. Nor did we need to fix the changed amino acid in the sequence. It was only necessary to show that there had been a change in the sequence due to the mutant. We thought that this would make things easier, since we then only needed to study the amino acid composition of the proteins. If the protein were small enough we might, with luck, pick up a change as small as an alteration to just one amino acid.
In order to work with a protein that was easy to obtain, we chose the protein lysozyme. Lysozyme is a small, basic (meaning positively charged) enzyme originally characterized by Alexander Fleming, the discoverer of penicillin. Fleming had shown that it occurred in tears and that egg white was also a rich source. The enzyme lyses (breaks up) a certain class of bacteria, and in both contexts acts to counteract bacterial infection. One particular bacterium is especially sensitive to it, and this can be used as an assay for the enzyme.
Our main target was egg white but we also tried human tears. Each morning when I came into the laboratory the assistant took a small sample of my tears. Not being an actor, I did not find it easy to weep at will, so my assistant would hold a slice of raw onion underneath one eye. I would hold my head to one side, to make it less easy for the tear to escape down the tear duct, and she would catch the tears with a little Pasteur pipette as they dribbled out of the other side of my eye. Even so, it was difficult to produce more than one or two tears, though I found it helped to think sad thoughts. Curiously enough, I never cry spontaneously at sad or tragic events, but a happy ending makes me weep uncontrollably. Let the bride finally walk triumphantly down the aisle, with the organ playing in jubilation. The tears will stream down my face, in spite of my intense annoyance and embarrassment.
The effect of a single tear can be dramatic. A weak suspension of the bacteria we used looks appreciably cloudy, though not as dense as milk. Add a single tear, swirl the fluid in the test tube, and in a moment the suspension becomes completely clear. All the bacteria have been lysed, thus immediately reducing the scattering of light that caused the cloudiness. Of course we used a more quantitative assay, but the phenomenon was basically the same.
Because chick lysozyme has a strong positive charge, unlike all the other proteins in egg white, it is possible to crystallize it in the egg white, without any further purification. To a biochemist it is really surprising to see the crystals sitting in the rather concentrated, gooey egg white. For the same reason lysozyme was relatively easy to separate on the simple ion exchange columns that had just then been developed for fractionating proteins.
It would be nice to report that we found a mutant, but in fact we had no success at all. We tested the lysozyme rather crudely, checking, in effect, its charge and the way it absorbed ultraviolet light, yet we could easily show that chick lysozyme differed from guinea fowl lysozyme, and that they were both quite different from the lysozyme i
n my tears. Although we studied about a dozen strains of chickens, kindly supplied by the local chicken geneticist, testing about a hundred eggs in all, we never detected any difference. We tried the tears of half a dozen people around the lab, but these all seemed to be similar to each other. I wanted to test the tears of my younger daughter Jacqueline, then only two years old, but Odile would have none of it. What! Use her precious baby for an experiment! I was sternly forbidden to attempt it.
I expect we would have gone on, but at that stage there was a dramatic development. Max Perutz was working on hemoglobins, including human hemoglobin. Some years earlier Harvey Itano and Linus Pauling had shown that the hemoglobin from a person with sickle-cell anemia was electrophoretically different from normal hemoglobin. Pauling rightly dubbed it a genetic disease. A colleague of his at Cal Tech measured its amino acid composition and reported that there was no difference between normal and sickle-cell hemoglobin. This conclusion was badly worded. What he meant was that there was no difference in composition he could reliably detect, but since hemoglobin is a comparatively large protein, a single amino acid change could easily be missed using this rather crude measure.
Sanger had developed a method he called fingerprinting proteins. He digested the protein with an enzyme (trypsin) that cut the polypeptide chain only at special places. The limited number of peptide fragments thus produced were then run on a two-dimensional paper chromatographic system to sort them one from another, spreading the peptides out on the paper. Vernon realized that this was just the method he needed to pick up small alterations in a protein. Fortunately Max had been sent some sickle-cell hemoglobin, and he gave some to Vernon to test. To his delight, the fingerprints of sickle-cell hemoglobin and of normal hemoglobin differed in the position of a single peptide.
Vernon was able to isolate the altered peptide, determine its sequence, and show that indeed the difference was due to the change of a single amino acid. Valine had been substituted for glutamic acid. At one point, I recall, he thought that perhaps two amino acids might be changed. Jim and I were brasher then and refused to believe this. “Try it again, Vernon,” we said, “you’ll find there’s just a single change” and so it turned out to be.
This result was surprising from two points of view. Sickle-cell anemia is a disease in which the altered hemoglobin forms a type of crystal inside the “red” cells of the blood when it gives up its oxygen in the veins. This often breaks the red cell open, so that patients have a chronic lack of hemoglobin in their blood and, in many cases, die in their teens. Yet this lethal effect is produced by a tiny alteration in just one of the organism’s many genes (we know now it is due to a single base change). Essentially just two molecules are defective, one inherited from the father and one from the mother. How can such a minute change possibly kill someone? The reason is the cascade of magnification. Each defective gene is copied many, many times, since each cell in the body has to have its own copy. Then, in the precursors of each red cell, each gene is copied many times onto messenger RNA, and each messenger RNA directs the synthesis of many defective protein molecules. The tiny atomic defect gets magnified and magnified till there is a considerable amount of the defective protein in the patient’s body, quite enough to kill him if the circumstances are unfavorable.
The other surprising aspect was the scientific one. Strange as it may seem, up to that point most geneticists and protein chemists had not seriously considered that their respective fields were related. Of course a few far sighted individuals, such as Hermann Muller and J. B. S. Haldane, were aware of the likely connection, but each field pursued its aims with very little awareness of the other. Ingram’s result produced a dramatic change of attitude. At about this time I ran into Fred Sanger, I think on a train to London. He said that he and his small group thought they ought to learn a little genetics, a subject about which, up to that point, they hardly knew anything at all except that it existed.
I arranged that we should have weekly evening meetings in my sitting room at the Golden Helix. Sydney Brenner and Seymour Benzer agreed to conduct these tutorials. I recall the first one rather vividly. Sydney came over a little while before the others. I asked him what he proposed to say. He said he thought he would start with Mendel and peas. I suggested that this was perhaps by now a little old-fashioned. Why not start with haploid organisms (which have only one copy of the genetic material), such as bacteria, rather than peas or mice or men, which are diploid (that is, with two copies in each cell) and thus more complicated? Sydney agreed. He gave a brilliant lecture, mainly on the difference between genotype and phenotype, illustrated with examples from bacteria and bacterial viruses. It was all the more striking since I knew it was improvised as he went along.
I think that there is a lesson here for those wanting to build a bridge between two distinct but obviously related fields (a possible modern example would be cognitive science and neurobiology). I am not sure that reasoned arguments, however well constructed, do much good. They may produce an awareness of a possible connection, but not much more. Most geneticists could not have been easily persuaded to learn protein chemistry, for example, just because a few clever people thought that was where genetics ought to go. They thought (as functionalists do today) that the logic of their subject did not depend on knowing all the biochemical details. The geneticist R. A. Fisher once told me that what we had to explain was why genes were arranged like beads on a string. I don’t think it ever occurred to him that the genes made up the string!
What makes people really appreciate the connection between two fields is some new and striking result that obviously connects them in a dramatic way. One good example is worth a ton of theoretical arguments. Given that, the bridge between the two fields is soon crowded with research workers eager to join in the new approach.
10
Theory in Molecular Biology
AS WE HAVE JUST SEEN, the genetic code was a problem that would not yield to purely theoretical approaches. This does not mean that some general theoretical framework could not be helpful, if only to guide the directions that experiments might take. It was the nature of the structure of DNA that gave life to such speculations. Otherwise they would have been too vague to be useful. In 1957 I was invited to give a paper to a symposium of the Society for Experimental Biology in London. This gave me the opportunity to sort out and write down my ideas, most of which had been formulated earlier.
What the structure of DNA suggested was that the sequence of bases in the DNA coded for the sequence of amino acids in the corresponding protein. In the paper I called this the sequence hypothesis. Rereading it, I see that I did not express myself very precisely, since I said “… it assumes that the specificity of a piece of nucleic acid is expressed solely by the sequence of its bases, and that this sequence is a (simple) code for the amino acid sequence of a particular protein.” This rather implies that all nucleic acid sequences must code for protein, which is certainly not what I meant. I should have said that the only way for a gene to code for an amino acid sequence of a protein is by means of its base sequence. This leaves open the possibility that parts of the base sequence can be used for other purposes, such as control mechanisms (to determine if that particular gene should be working and at what rate) or for producing RNA for purposes other than coding. However, I don’t believe anyone noticed my slip, so little harm was done.
The other theoretical idea I proposed was of a rather different character. I suggested that “once ‘information’ has passed into protein it cannot get out again,“ adding that “Information means here the precise determination of sequence, either of bases in the nucleic acid or of amino acid residues in the protein” (see appendix A).
I called this idea the central dogma, for two reasons, I suspect. I had already used the obvious word hypothesis in the sequence hypothesis, and in addition I wanted to suggest that this new assumption was more central and more powerful. I did remark that their speculative nature was emphasized by their names.
&
nbsp; As it turned out, the use of the word dogma caused almost more trouble than it was worth. Many years later Jacques Monod pointed out to me that I did not appear to understand the correct use of the word dogma, which is a belief that cannot be doubted, I did apprehend this in a vague sort of way but since I thought that all religious beliefs were without any serious foundation, I used the word in the way I myself thought about it, not as most of the rest of the world does, and simply applied it to a grand hypothesis that, however plausible, had little direct experimental support.
What is the use of such general ideas? Obviously they are speculative and so may turn out to be wrong. Nevertheless, they help to organize more positive and explicit hypotheses. If well formulated, they can act as a guide through a tangled jumble of theories. Without such a guide, any theory seems possible. With it, many hypotheses fall away and one sees more clearly which ones to concentrate on. If such an approach still leaves one lost in the jungle, one tries again with a new dogma, to see if that fares any better. Fortunately in molecular biology the one first selected turned out to be correct.
I believe this is one of the most useful functions a theorist can perform in biology. In almost all cases it is virtually impossible for a theorist, by thought alone, to arrive at the correct solution to a set of biological problems. Because they have evolved by natural selection, the mechanisms involved are usually too accidental and too intricate. The best a theorist can hope to do is to point an experimentalist in the right direction, and this is often best done by suggesting what directions to avoid. If one has little hope of arriving, unaided, at the correct theory, then it is more useful to suggest which class of theories are un likely to be true, using some general argument about what is known of the nature of the system.