The Monkey's Voyage
Page 16
The discreteness of DNA sequence differences—an A in one species, a T in another—coupled with other properties of DNA, also made this kind of data a dream for biologists who wanted to use mathematical models to describe evolution, and this suitability for modeling would turn out to be critical for molecular dating. The modelers liked the discreteness of DNA because it let them treat the evolutionary process as straightforward probabilities—for instance, the probability that an A would switch to a T or a G or a C over some period of time. They were also drawn to DNA because the nucleotide changes could be fairly neatly categorized in ways that reflected differences in how evolution works. For instance, some mutations occur more frequently than others (A to G happens more than A to C or T, for example); some changes in the sequence switch the corresponding amino acids in the resulting protein and others do not; some parts of a gene code for especially critical sections of a protein for which almost any change in the amino acids would be harmful, while other parts code for less critical sections where some amino-acid changes are tolerated. The fact that you can divide up DNA changes into these, and many other, logical categories meant that the modelers had a lot to play with. In essence, DNA sequences gave them something to do that was tractable (a favorite word of modelers), but also satisfyingly complex.
This modeling of DNA changes had actually started before the invention of PCR. What PCR did was provide an enormous number of sequences to which the models, growing more and more complex, could be applied. For molecular dating, the importance of the models was that they provided improved estimates of the actual amount of genetic change separating two species, or, more generally, the amount of change along the branches of the evolutionary tree. This may sound a little counterintuitive; you might think that the amount of evolutionary change separating two species is just the observed number of nucleotide differences between them, like those 92 differences in the cytochrome b gene between black-necked and checkered garter snakes. However, this is not necessarily the case. In particular, if nucleotide substitutions (the preferred term among biologists) are happening frequently enough, there will be some sites in the sequence that have switched more than once along the evolutionary path connecting two lineages. Changes will have piled on top of other changes, so to speak. For any one site, you can never know for sure just how many substitutions have occurred—if one species has an A and the other a T, that could mean that their ancestor had an A that changed to a T in one lineage (one substitution), or that the ancestor had an A that went to G then back to A in one lineage and to C and then to T in the other (four substitutions), or, theoretically, an infinite number of other possibilities. You would know there was at least one change, but you couldn’t rule out the possibility of so-called multiple hits at the site. It’s like running into a friend you haven’t seen for many years, who previously had brown hair but now has green hair, and wondering whether he went straight from brown to green or took some more circuitous hair-coloring route, such as brown to blonde to orange to pink to green.
A critical thing that the models do is make an educated guess—actually a calculation based on probabilities—at how many extra “hits” have occurred, changes that are hidden from direct observation, like your friend’s possible multiple hair-color switches. In many cases, especially when dealing with distantly related species, that number can be substantial; in fact, the inferred number of hidden substitutions for a given sequence can be larger than the observed number of differences between species. The models are almost certainly right about the existence of many hidden hits, which, in turn, means that they are giving much better estimates of the actual amount of change along branches in the evolutionary tree than a simple tally of the differences between species. And, almost certainly, the estimates are getting better and better as the models become more realistic, taking into account more of the actual complexity of evolution. All of this is desirable for molecular dating studies—it’s giving us a much better handle on half of the equation for relating genetic change to time.
Today, the Rhyolite bottle house sits locked and empty. Tourists, most of them on their way into or out of Death Valley, circle it, snapping photos under the baking desert sun. However, the house wasn’t built just as a curiosity; it actually functioned as a dwelling for a few years and, later, as a trinket shop. All those beer and whiskey bottles really were good for something (apart from holding beer and whiskey). The same might be said for the unexpected information that came out of the big pile of DNA sequences, the outcome of Kary Mullis’s insight (and Watson and Crick’s discovery of the structure of DNA, and Fred Sanger’s invention of an efficient DNA sequencing method, and the work of countless others). The pile of sequences and the evolutionary models that turned those sequences into amounts of genetic change generated a flood of molecular dating studies. For biogeography, these studies have been critical, providing evidence of timing, of what happened when. In my view, this evidence has been the key to extricating biogeography from the intellectual cul-de-sac created by the vicariance school.
However, as noted earlier, many scientists still have doubts about the validity of molecular clock analyses, and not all of these people are hard-core vicariance advocates. For instance, a friend of mine, an evolutionary biologist who I think of as both moderate and reasonable, refers to such analyses succinctly as “bullshit.” As the foundation for a new view of biogeography, “bullshit” doesn’t really work. Before we go on, then, we need to confront the criticisms of molecular dating. We need to establish that, in the results of these analyses, we’re dealing with a functional bottle house, not a dangerous pile of broken glass.
22At one time, the age of the universe estimated from its apparent rate of expansion was considerably younger than the estimated age of the Earth, indicating that something was seriously amiss with at least one of these estimates. Present knowledge suggests that, in fact, both estimates were too young at the time, but the estimate of the age of the universe was far too young.
23The chronicle tends to be more difficult to reconstruct as one delves into the more distant past. However, problems of an uncertain chronicle also can arise for the very recent past, as in a criminal case in which the whereabouts of a person at a particular time are critical, but hard to establish. That sort of example illustrates that what is often important is the accurate placement of an event in time relative to other events.
24The early molecular clock studies were based on amino-acid sequences in proteins rather than on base-pair sequences in DNA.
25A complication is that many organisms, including humans, have nonfunctional sequences derived from mitochondrial DNA that have been incorporated into the nuclear genome. When that is the case, using PCR on DNA samples that include both the mitochondrial and nuclear genome will often generate sequences from both the targeted mitochondrial DNA and the untargeted nuclear copies corresponding to the same segment. If, as is often true, the nuclear copies are quite different from the original mitochondrial DNA, this phenomenon can strongly distort the results.
On Sunday, December 26, 2004, at about 8:00 in the morning, Rizal Shahputra and several other workers were laying the foundation for a mosque in Banda Aceh on the Indonesian island of Sumatra when they felt and heard a strong tremor. Not long after, a boy came running toward them, warning that big waves were coming and they should run for cover. But it was too late. The tsunami that was on its way was made up of waves up to 100 feet tall and would become one of the worst natural disasters in history, killing more than 200,000 people. The waters rushed in and swept Shahputra and the rest out to sea.
Drifting in the ocean, Shahputra and many others spotted a floating tree, swam over to it, and held on. Days passed. One by one, Shahputra’s companions weakened and slipped into the sea, until he was the only one left. Bodies floated past, some of them of people he knew. He drank rainwater and ate coconuts and packets of a powdered chocolate drink that he found on the
water. He recited verses from the Koran. Finally, a container ship passed nearby, and an officer spotted him waving frantically. When rescued, he had been floating on his tree-raft for eight days and was more than 100 miles from Banda Aceh.
A woman known by the single name Melawati had a similar experience, drifting on the ocean for five days after the tsunami hit, clinging to a sago palm. She survived by eating the bark and fruit of the plant. As an example of how humans or other primates might have established new populations after improbable ocean journeys, her experience is especially cogent, for she was three months pregnant, and her unborn child survived the ordeal.
Chapter Six
BELIEVE THE FOREST
“PRETTY SLOPPY STUFF”
My first serious encounter with the molecular clock had to do with parasites—specifically, tapeworms living in the guts of humans and, in another part of their life cycle, in the muscle tissues of some domestic animals. It was a fascinating study, but also troubling. It was troubling because we had to rely on the clock.
The usual story with these tapeworms was that humans had picked them up, historically speaking, through the domestication of cattle and pigs. In this view, the wild progenitors of cattle and pigs were natural hosts of the parasites and, as a result, humans became infected when we started routinely eating the domesticated versions of these animals. However, the lead scientist on our study, a parasitologist named Eric Hoberg, had the idea that the real story was just the reverse, that we humans had a long history with the tapeworms and had infected our domestic animals with them. In this alternate scenario, cows and pigs would have become hosts when they became associated with humans, and thus began eating food contaminated with tapeworm eggs from human excrement.
My part in this project was to perform a molecular clock analysis using tapeworm DNA sequences downloaded from GenBank (a database of molecular sequences that is the most obvious manifestation of the DNA explosion made possible by PCR). The specific goal was to figure out when two species of tapeworms that infect humans had separated from each other, because, under the reasonable assumption that the common ancestor of these species also infected humans, that divergence date would indicate the latest possible time when the tapeworms had become associated with humans. That time could tell us if humans had acquired tapeworms before the domestication of cattle and pigs some 10,000 years ago.
This was in the late 1990s, but, even back then, there were many different ways to do a clock analysis. I sorted through the possibilities, figured out which approaches seemed to make the most sense, and did the analysis several different ways. None of them inspired great confidence, particularly because the DNA sequences were short, and I had to assume that the tempo of the tapeworm clock fell within the range of rates that other biologists had calculated for organisms only distantly related to tapeworms, like shrimp and mice and sharks. Nonetheless, the calculated ages told a striking story: even the youngest possible ages for the tapeworm split were older than the likely age of domestication of cattle and pigs. It looked as if Eric Hoberg was right and the traditional story was wrong—we had given tapeworms to cattle and pigs, not vice versa. (In a way, one could view this conclusion as one more bit of support for a major theme that came from Darwin, namely, that humans are not intrinsically lords or victims of nature, but are simply parts of the living world, like all other species. At least, that’s the way I thought of it.)
I came away from that tapeworm study with a new appreciation for what molecular clocks could provide, but with my skepticism about such analyses intact. I had been reasonably conservative in my approach, but I still wondered whether some new evidence would come to light showing that my estimates were all wrong. Maybe, for instance, someone would discover that the tapeworm clock ticks much faster even than the fastest clock I had used for calibration (the snapping shrimp clock). Or, perhaps, different genes would give a very different answer. Our conclusions about the origins of tapeworms in humans and domestic animals hinged on the clock results, and those results were debatable.
I’m certainly not alone in having used molecular clock analyses while remaining somewhat leery of them. For instance, Michael Donoghue confessed to me, “I’ve been involved in a bunch of dating analyses and I think they’re mostly pretty suspect. And I look at the literature and I’m sort of appalled by the [analyses] people do.” Although Donoghue believes that new methods for molecular dating are improvements, he imagines that, down the road, people will look back and think, “We just did a decade’s worth of work with BEAST [a commonly used dating program], . . . and now we realize that actually a lot of that’s pretty sloppy stuff.”
And yet, Donoghue also thinks, as do many other biogeographers, that the vicariance people are crazy to ignore the molecular dating evidence. He sees their dismissal of this evidence as “misguided” and a case of “burying their head[s] in the sand.” So there seems to be a bit of a disconnect here: How can a person view molecular dating as “pretty suspect” and “pretty sloppy stuff,” but also think that this information cannot be ignored? The point of this chapter is to make sense of this apparent contradiction by establishing that the molecular dating evidence in general is not likely to be far from the truth, especially in what it’s saying about biogeographic history. It’s a case of believing what the forest is telling us rather than getting too hung up on particular disease-ridden trees. In building to that point, though, it will be instructive to first deal with specific problems—the reasons for the diseased trees—to understand why there is such distrust of molecular dating. It will be apparent that the problems are real and substantial, but that, in the end, they don’t derail the whole enterprise.
TWO PROBLEMS AND HOW TO DEAL WITH THEM
The ideal molecular clock would be one that ticked at the same rate for all branches in the tree of life and could be perfectly calibrated so the rate was known precisely. With a clock like that and the appropriate DNA sequences, we could confidently put an age (with error bars) on any given branching point in the tree. In other words, the ideal clock would produce a reliable historical chronicle—a record of what happened when—with respect to the timing of the separation of evolutionary lineages. We would know, with great precision, when humans separated from chimps, or Australian baobab trees from Malagasy ones, or African from South American lungfishes, and we would be able to relate those evolutionary splits to geologic or climatic events, such as the opening of the Atlantic Ocean or the Pleistocene Ice Ages.
Unfortunately, that ideal clock does not exist. In essence, it doesn’t exist for two reasons, a problem with fossils and a problem with molecules. These problems are big, fundamental ones, and, to the likes of Gary Nelson and Michael Heads, they’re insurmountable. However, other scientists have not been so pessimistic and have tried to solve them.
The first problem has to do with the fact that one usually needs fossils to assign ages to calibration points. These are the particular evolutionary branching points used to calculate the tempo of genetic change, that is, to figure out how fast the molecular clock is ticking. The basic procedure here is to find fossils of known age that can be connected to given branching points. Just figuring out where a fossil should be placed in the tree of life—that is, inferring the branch to which it belongs—can be difficult, especially since fossils are incomplete specimens, usually missing some parts of the skeleton, shell, or other hard parts, and almost always lacking clear indications of soft parts of the anatomy. For instance, certain apes, known only from very incomplete fossils, fall somewhere close to the evolutionary split of chimps and humans, but cannot confidently be placed in the tree beyond that. They might fall on the chimp side, or on the human side, or they might be ancestors of both, and the exact placement of these fossils makes some difference in how they’re used in calibrations. In addition, the age of a fossil often cannot be precisely determined. The vast majority of fossils are assigned ages based on the geological strata in which they were f
ound, and this sometimes means that we “know” a fossil’s age only within a range of several million years. Even if a fossil’s age is obtained from radiometric dating (using the rate of radioactive decay of one isotope into another), the slop in the estimate can be fairly large, depending on the circumstances.
However, the big problem with fossil calibrations is not about where fossils should be placed in the tree of life or how old they are. The real stumbling block comes from the fact that the fossil record is not even close to being a continuous chronicle of everything that ever lived, but is more like a series of snapshots, some of them close together and others widely spaced in time. (Actually, if the fossil record were complete, there would be no need for molecular dating.) The fragmentary, snapshot nature of the record means that, even if all the fossils ever discovered were placed in the correct evolutionary groups, and even if they were assigned perfectly accurate ages, the greatest difficulty with calibrations would still remain. A particular problem is that the time when a group first shows up in the fossil record only gives us the latest possible age for the group’s origin; the actual age will almost always be older, and it will often be much older. For instance, the first fossil hummingbirds are from the Late Eocene, about 35 million years ago, but it is likely that the group has a history, so far invisible to us, that extends much deeper in time. Thus, if one used that 35-million-year date as the age of the split between hummingbirds and their nearest relatives, the swifts, the calibration probably would be way off. As a result, one would end up thinking the molecular clock was running much faster than it really was. Calibration errors like this one are especially likely for groups like hummingbirds, with delicate parts that do not easily fossilize, but the problem potentially applies to any group. This kind of inaccuracy, caused by the incompleteness of the fossil record, is the fundamental weakness of calibrating a molecular clock with fossils.