Dna: The Secret of Life
Page 23
GGCCGGGCGCGGTGGCTCACGCCTGTAATCCCAGCACTTTGG GAGGCCGAGGCGGGCGGATCACCTGAGGTCAGGAGTTCGAGA CCAGCCTGGCCAACATGGTGAAACCCCGTCTCTACTAAAAATA CAAAAATTAGCCGGGCGTGGTGGCGCGCGCCTGTAATCCCAG CTACTCGGGAGGCTGAGGCAGGAGAATCGCTTGAACCCGGGA GGCGGAGGTTGCAGTGAGCCGAGATCGCGCCACTGCACTCCA GCCTGGGCGACAGAGCGAGACTCCGTCTCAAAAAA
Writing it out a million times would give a sense of the scale of the Alu presence in our DNA. In fact, levels of repetitive sequence are even higher than they would appear: sequences that would once have been instantly identifiable as repeats have, over many generations of mutation, diverged beyond recognition as members of a particular class of repetitive DNA. Imagine a set of three short repeats: ATTG ATTG ATTG. Over time mutation will change them, but if the period is short, we can still see where they came from: ACTG ATGG GTTG. Over a longer period, their original identity is completely lost in the welter of mutation: ACCT CGGG GTCG. Proportions of repetitive DNA are much lower in many other species: 11 percent of the mustard weed genome is repetitive, 7 percent of the nematode worm's, and just 3 percent of the fruit fly's. The large size of our genome is mostly due to its containing more junk than that of many other species (see Plate 44).
These differences in the amounts of junk DNA explain a longstanding evolutionary conundrum. The basic expectation is that more complex organisms should have bigger genomes – they need to encode more information – than simple ones. There is indeed a correlation between genome size and an organism's level of complexity: the yeast genome is bigger than that of E. coli but smaller than ours. It is, however, only a weak correlation.
COMMON NAME SPECIES NAME APPROX.
GENOME SIZEM
(MILLIONS OF
BASE PAIRS)
Fruit Fly Drosophila melanogaster 180
Fugu (puffer) Fugu rubripes 400
Snake Boa constrictor 2,100
Human Homo sapiens 3,100
Locust Schistocerca gregaria 9,300
Onion Allium cepa 18,000
Newt Amphiuma means 84,000
Lungfish Protopterus aethiopicus 140,000
Fern Ophioglossum petiolatum 160,000
Amoeba Amoeba dubia 670,000
It is reasonable to suppose that natural selection operates to keep genome size as low as possible. After all, every time a cell divides, it must replicate all its DNA; the more it has to copy, the greater the room for error, and the more energy and time the process requires. It is quite an undertaking for the amoeba (or newt, or lungfish). So what could have caused the amount of DNA in these species to get so out of hand? In cases of unusually large genomes, we can only infer that some other selective forces must have negated the selection-driven impulse to keep the genome slim. It could be, for instance, that large genomes are advantageous to species likely to be exposed to environmental extremes. Lungfish live at the interface of land and water, and they can survive protracted periods of drought by burying themselves in mud; it could be they need more genetic hardware than a species adapted to a single medium.
Two major evolutionary mechanisms account for this DNA excess: genome doubling, and the proliferation of particular sequences within a genome. Many species, particularly in the plant kingdom, are actually the product of a cross between two preexisting ones. The new species often simply combines the DNA complement from each of its parent species, yielding a double genome. Alternatively, through some kind of genetic accident, a genome may get doubled without input from another species. For example, one of the standbys of molecular biology, baker's yeast, has about 6,000 genes. But close inspection reveals that a large proportion of those genes are duplicates – baker's yeast often has two divergent copies of many of its genes. At some early stage in its evolutionary history, the yeast genome apparently got doubled. Initially the gene copies would have been identical, but, over time, they have diverged.
An even richer source of excess DNA has arisen from the multiplication of genetic sequences capable of replicating and inserting themselves at more than one site in a given genome. These so-called mobile elements have been found to come in many varieties. But when their discovery was first announced by Barbara McClintock in 1950, the very idea of "jumping" genes was too far-fetched for most scientists accustomed to the simple logic of Mendel. McClintock, a superb corn geneticist, had already endured something of a bumpy career ride. When it became clear in 1941 that she would not be granted tenure at the University of Missouri, she came to Cold Spring Harbor Laboratory, where she would remain an active member of the staff until her death in 1992, at the age of ninety. McClintock once told a colleague, "Really trust what you see." This was exactly how she did her science: her revolutionary idea that some genetic elements could move around genomes followed simply from observable facts. She had been studying the genetics underlying the development of different-colored kernels in corn, and noticed that sometimes, part way through the development of an individual kernel, the color would switch. A single kernel might then turn out variegated, with both patches of the expected yellow cells and patches of purple ones. How to account for this sudden switch? McClintock inferred that a genetic element – a mobile element – had hopped into or out of the pigment gene.
Only with the advent of recombinant DNA technologies have we come to appreciate just how common mobile elements are; we now recognize them as major components of many, if not most, genomes, including our own (see Plate 44). And some of the most common mobile elements, those that appear again and again in different sites in the same genome, have earned names reflecting their itinerant lifestyles: two fruit fly mobile elements, for example, are called "gypsy" and "hobo." And among those who study a simple plant called Volvox one mobile element is honored for its extraordinary capacity to jump around the genome: it is known as the "(Michael) Jordan element."
Mobile elements contain DNA sequences that code for enzymes that, through their capacity to cut and paste chromosomal DNA, work to ensure that copies of their particular element are inserted into new chromosomal sites. If a jump carries a mobile element into a junk sequence, the functioning of the organism is unaffected, and the only result is more junk DNA. But when the jump lands the mobile element in a vital gene, thereby disabling its function, then selection intervenes: the organism may die or otherwise be prevented from passing on the new jumped-in gene. Very rarely the movements of mobile elements may either create new genes or alter old ones in a way that benefits the host organism. Over the course of evolution, therefore, the effect of mobile elements seems mainly to have been the generation of novelty. And curiously, in recent human history, there is little evidence of active jumping: most of our junk DNA, it appears, was generated long ago. In contrast, the mouse genome contains many actively reinserting mobile elements, making for a much more dynamic genome. But this seems not to trouble the mouse species unduly; the intrinsically high reproductive potential of mice likely helps the species as a whole tolerate the genetic disasters attending frequent jumps into vitally functioning genetic regions (see Plate 45).
Having been used to establish many of the basic facts about how DNA functions, E. coli's track record as a model organism was unparalleled. Not surprisingly, its genome therefore ranked high on the Human Genome Project's early "to do" list. It was Fred Blattner of the University of Wisconsin who was most eager to start sequencing E. coli. But his grant proposals went nowhere until the HGP got funded and he was awarded one of the first substantial sequencing grants. Were it not for his initial reluctance to adopt automated sequencing, his lab would have been the first to sequence a complete bacterial genome. But in 1991 his strategy for scaling up the operation was an old-fashioned one: employ more undergraduates. Another latecomer to automation was Wally Gilbert, whom I had urged two years before to have a go at the smallest known bacterial genomes, those of the parasitic Mycoplasma – tiny bacteria that live within cells. Sadly, when a clever new manual sequencing strategy of his came to naught, his Mycoplasma project died with it. Blattner did, however, accept automation in time to establish in 1997 that the E, coli genome conta
ins some 4,100 genes.
But the broader race to complete the first bacterial genome had been won two years before at The Institute for Genomic Research (TIGR) by a large team led by Hamilton Smith, Craig Venter, and his wife, Claire Fraser. And the bacterium they sequenced was Haemophilus influenzae, from which twenty years earlier Smith – a towering six-foot-six one-time math major who had gone on to medical school – had isolated the first useful DNA cutting (restriction) enzymes, a feat that won him the Nobel Prize in Physiology or Medicine in 1978. With Haemophilus DNA prepared by Smith, Venter and Fraser used a whole genome shotgun approach to sequence its 1.8 million base pairs. Just documenting the first "small" genome was enough to suggest the awesome size of the awaiting larger ones: if all the As, Ts, Gs, and Cs of the Haemophilus genome were printed on paper of this size, the resulting book would run some four thousand pages. Two pages on average would be needed for each of its 1,727 genes. Of these, only 55 percent have readily identifiable functions: for example, energy production involves at least 112 genes, and DNA replication, repair, and recombination requires a minimum of 87. We can tell from their sequences that the remaining 45 percent are functioning genes, but we simply can't at this stage be sure what it is they do.
By bacterial standards, the Haemophilus genome is pretty small.
The size of a bacterial genome is related to the diversity of environments a particular species is likely to encounter. A species that leads a dull life in a single uniform setting – say, the gut of another creature – can well get by with a relatively small genome. One that hopes to see the world, however, and is apt to encounter more varied conditions, must be equipped to respond, and flexibility of response usually depends on having alternative sets of genes, each tailored to particular conditions, and ready at all times to be switched on.
Pseudomonas aeruginosa, a bacterium that can cause infections in humans (and poses a particular danger for cystic fibrosis [CF] patients), lives in many different environments. We saw in chapter 5 how a genetically doctored form of a related species became the first living organism to be patented; in that case, it was adapted to life in an oil slick, an environment notably different from the human lung. The Pseudomonas aeruginosa genome contains 6.4 million base pairs and 5,570 genes. About 7 percent of those genes encode transcription factors, proteins that switch genes on or off; a respectable proportion of its entire genetic complement is thus devoted to regulation. The E. coli "repressor" whose existence was predicted by Jacques Monod and François Jacob in the early sixties (see chapter 3) is just such a transcription factor. A rule of thumb then would go as follows: The greater the range of environments potentially encountered by a bacterial species, the larger its genome, and the greater the proportion of that genome dedicated to gene-switching.
TIGR did not stop at Haemophilus. In 1995, collaborating with Clyde Hutchison at the University of North Carolina, the institute sequenced the genome of Mycoplasma genitalium as part of what has been dubbed the "minimal genome project." M. genitalium (which, despite its ominous name, is a benign inhabitant of human plumbing) has the smallest known nonviral genome, some 580,000 base pairs. (Viruses have smaller genomes but, by co-opting the genomes of their hosts, can get away with not having the genetic wherewithal for many fundamental processes.) And that relatively short sequence was found to comprise 517 genes. So a question naturally arose: Is that the minimal gene complement necessary to sustain life? Subsequent research has set about knocking out M. genitalium's genes to see which are absolutely vital and which are not. Currently it appears that the minimal genome contains no more than 350 genes and possibly as few as 260. Admittedly, this is a somewhat artificially defined "minimum" since the enfeebled bugs are supplied through their growth medium with every substance they could conceivably need. It's a bit like claiming kidneys are not necessary for life because patients can survive on dialysis machines.
Will we ever be able to construct a functioning minimal cell from scratch, by artificially combining its separate purified components? Considering there are more than a hundred Mycoplasma genitalium proteins whose functions remain a mystery, the achievement of such a goal seems for now a long way off. Even the five hundred proteins of Mycoplasma, some represented in the cell by a huge number of molecules, some by just a handful, constitute an enormously complex living system. I, for one, have enough difficulty following a movie like Gosford Park in which there are more than four or five major characters; the thought of blocking out the complexity of interactions among the vital players inside a living cell is nothing short of mind-blowing. For the living cell is no neat miniature machine; it is rather, as Sydney Brenner put it, "a snake pit of writhing molecules." Still, Craig Venter is confident that the era of the artificial cell is just around the corner, and he has wasted no time in assembling a panel of bioethicists to counsel him on whether to venture forth. They, like me, see no moral dilemma in trying to "create life" in this way. If such a feat were ever achieved, it would merely reaffirm what most of us in molecular biology have long known to be the truth: the essence of life is complicated chemistry and nothing more. Such a finding would have made headlines a century ago; today it's no big deal. Only the opposite conclusion – that there is more to the life of the cell than the sum of its basic components and processes – could generate deep excitement in today's scientific world.
DNA analysis has already changed the face of microbiology. Before DNA techniques were broadly applied, methods of identifying bacterial species were extremely limited in their powers of resolution: you could note the form of colonies growing in a petri dish, view the shape of individual cells through the microscope, or use such relatively crude biochemical assays as the Gram test, by which species can be sorted as either "negative" or "positive" depending on features of their cell wall. With DNA sequencing, microbiologists suddenly had an identification factor that was discernibly, definitively different in every species. Even species, like those inhabiting the ocean depths, that cannot be cultured in a laboratory because of the difficulty of mimicking their natural growing conditions are amenable to DNA analysis, providing a sample can be collected from the deep.
Now led by Claire Fraser, TIGR remains the leader of the bacterial genomics pack. In short order they have polished off the genomes of more than twenty different bacteria, including that of an ulcer-causing Helicobacter, a cholera-causing Vibrio, a meningitis-causing Neisseria, and a respiratory-disease-inducing Chlamydia. Their biggest competitor is a group at the Sanger Centre. The British contingent is led by Bart Barrell, who had the luck not to be in the United States, where his limited academic credentials would have barred him from top-gun status: he has no Ph.D., having come into science straight out of high school to work as Fred Sanger's assistant long before DNA sequencing became a reality. Before moving on to bacteria, Barrell made his name as an automation pioneer, having used several ABI sequencing machines to crank out some 40 percent of the baker's yeast genome of 14 million bases while the largely European yeast-sequencing consortium remained wedded to manual methods. Barrell's group later had the satisfaction of being the first to complete the sequence of Mycobacterium tuberculosis, the agent of the fearsome affliction once known as consumption.
In high school, Claire Fraser "had felt like an outcast because it wasn't cool to be a woman taking so many science courses." After studying at Rensselaer Polytechnic Institute, where she first became interested in microbes, she applied to medical school. Rather than accepting a place at prestigious Yale, she opted for SUNY Buffalo because her boyfriend was moving to Toronto. The director of admissions at Yale was nonplussed: "Well, young lady, I hope you know what you're doing." The Toronto connection would, alas, prove ephemeral; in 1981 Fraser married Venter, then a young assistant professor at SUNY Buffalo. "We went to a [scientific] meeting for our honeymoon," she recalled, "and wrote a grant proposal there."
The power of DNA analysis of microbes has been harnessed with great success in medical diagnostics: to treat an infection effec
tively, doctors must first identify the microbe causing it. Traditionally the identification has required culturing the bacteria from infected tissue – a process that is maddeningly slow, particularly in cases when time is of the essence. Using a fast, simple, and more accurate DNA test to recognize the microbe, doctors can start appropriate treatment that much sooner. And recently the same technology was pressed into service to deal with a national emergency: the hunt for the perpetrator of the anthrax outrage in the United States in the fall of 2001. By sequencing the anthrax bacteria from the first victim, TIGR investigators obtained a genetic fingerprint of the precise strain used.
The hope is that this precise information on the source of the anthrax will lead eventually to the culprit.
As we learn more about microbial genomes, a striking pattern is emerging. As we have seen, vertebrate evolution is a story of progressive genetic economy: through a widening array of mechanisms for gene regulation, it has become possible to do more and more with the same genes. And even when new genes do appear, they tend to be merely variations on an existing genetic theme. Bacterial evolution, by contrast, is proving itself a saga of far more radical transformation, a dizzying process that favors the importation or generation of whole new genes, as opposed to merely tinkering with what already exists.