Unlocking the Past
Page 8
the surprising world of the DNA molecule
As methods of sequencing DNA molecules and reading their genetic code came on stream in the 1970s, the tacit assumption had been that the existence of DNA was subservient to that of the whole organism and its evolutionary battles. One of the surprises was how little of the stuff seemed to be engaged in those battles. It transpired that ‘coding genes’–the sequences of DNA actually used to build proteins and thus engage in the outside world–make up as little as 10 percent of the DNA within the cell’s nucleus. The other 90 percent reproduce without any connection to the trials and tribulations to which the whole organism is exposed. These emerging non-coding regions were ideal for charting evolutionary patterns independently from natural selection. This is not to say that previous evolutionary markers were displaced by this, but there was a progressive distancing that could be achieved from natural selection and the confusions of convergent evolution. This moved first from whole organisms to proteins, and then on to non-coding DNA.
These non-coding regions are the best source we have for independently tracking lineage and generating a molecular clock. We can predict that on average a particular stretch of DNA is likely to accumulate a new change every few hundred generations, or few thousand years. This will not proceed like clockwork. As a random process, it is not entirely predictable, but over the long term this randomness is absorbed into a relatively uniform rate. If we focus on that particular stretch, then two closely related individuals will have almost identical sequences, and increasingly distant relatives will have increasingly disparate sequences. The sequences thus form the basis for the construction of phylogenetic trees, but this is not an entirely self-contained process. At some stage, these molecular projections of evolution had to be anchored on some real dates, bringing them back to the archaeological and geological records.
The way the molecular clock is normally calibrated is either by reference to the fossil record, and some well-dated assumed common ancestor, or by reference to some well-dated separation of land masses, marking when close relatives either side of the barrier began to diverge. One of the important features that such reference points reveal is the small number of mutations involved in measuring archaeological time scales. This is a key feature of the molecular clock. It is not so much wrong, as imprecise in the extreme. Many of the issues about chronology discussed in this book push the clock well beyond the limits of its resolution. It thus stands as a first rough estimate, often leading to a need for much sharper archaeological dates.
By the early 1980s, an important step forward in the endeavour of DNA sequencing would open the way to exploit fully the possibilities of applying a molecular clock to human ancestry. For the first time, one whole section of the human genetic code had been fully sequenced, with every single base-pair along the double helix charted.
mitochondrial DNA
Cracking the genetic code presented a breathtaking task to the genetic sciences. It was as if they had unearthed the keys to the world’s most prolific libraries, all the volumes of which had yet to be read. But where should they start? The DNA within the nucleus of the human cell is enormous, made up of 2-3 billion base-pairs distributed among twenty-three pairs of chromosomes, and coding for between 30,000-40,000 genes. Imagine the number of letters in all the words in all the volumes in a small library. That is the order of magnitude of the total sequence. Its decipherment has been the goal of the international Human Genome Project. In the early days of DNA research, there was a less daunting task available for study.
Situated in the cell fluid outside the nucleus are its powerhouses, the mitochondria. The number varies according to how much work the cell does. Not surprisingly, a muscle cell has quite a large number. These small powerhouses also have a small amount of DNA within them which, like the more prolific quantities in the cell’s nucleus, contains the blueprint for certain of the cell’s molecular ‘machine tools’. Unlike the nuclear genome, which is shuffled with every sexual reproduction, so that offspring receive genetic information from both parents, mitochondrial DNA is normally inherited from the mother alone. Its sequence is not reshuffled during reproduction, and changes only as a result of occasional mutations and any selection pressure that may act upon them.
Not only does it have this simpler system of inheritance, without recombination, but it is also much smaller than the nuclear genome–100,000 times smaller in the case of humans. If the nuclear genome is a small library, the mitochondrial genome is a large pamphlet. It thus posed a more achievable target to the first generation of DNA sequencers. By 1981, the entire human mitochondrial genome had indeed been sequenced. Three successive pages of the April edition of Nature for that year carried a densely packed sequence of the letters A, C, G and T, corresponding to the order of 16,569 base-pairs along one of the two strands of DNA. The other strand carried the complementary sequence of bases with which the first strand’s sequence was coupled.
Printed above some parts of the sequence was another series of letters, this time corresponding to the proteins built from this blueprint. Yet further parts of the sequence were boxed in, and related to the genetic functions of the sequence. It was an extraordinary achievement and generated a genetic map that would serve as the basis for a whole series of research initiatives, including the analysis of ancient DNA. Subsequently, the mitochondria of mouse, cow and rat were also sequenced, and they proved to be very similar. In other words, this new genetic map was of value, not just in the case of humans, but also in exploring other animal species.
The shape of the human mitochondrial genome is quite different from that of the nuclear chromosome. While the latter is a thread-like object, made up of linear DNA molecules tightly and intricately bundled together with proteins, the mitochondrial DNA is a rather simpler, circular molecule, the two interconnected strands continuing round without a break. Comparisons with mouse, cow and rat also show it to be evolving ten times faster than the nuclear genome, a feature that lends itself well to exploring small evolutionary differences. The numbering system that Anderson and his colleagues used to locate any partial sequence on the circle therefore had an arbitrary origin, but situated sequences in relation to each other. Take the example, considered in the previous chapter, of Scott Woodward’s search for dinosaur DNA. He had targeted the region between base positions 15,603 and 15,777 on Anderson’s map, which lies in the middle of a gene coding for a protein called cytochrome b, one of the cell’s energy-processing molecules. Unlike the nuclear DNA, in which coding genes are rather dwarfed by the amount of non-coding DNA, human mitochondrial DNA is tightly packed with genes and the non-coding regions are in the minority. A number of these coding sequences build proteins like the one above which are involved in the cell’s energy management, a suitable role for the cell’s own powerhouses. Other regions of this circular sequence coded for various kinds of RNA structures, the molecular tools used in protein building. Different parts of these RNA structures can mutate and vary without damaging their function, and have been examined to explore certain relationships between taxa. However, the part of the genome that has attracted most attention by ancient DNA analysts is a sequence of just over 1,000 base-pairs that appears to have no coding function at all.
Reading around Anderson’s linear map of the circular sequence, this non-coding region falls between positions 16,026 and 00,577 (the sequence continues from position 16,569 to 00,001). Although this region has no coding function, it has a number of interesting features. First, a number of the trigger points that control transcription, that is the reading off of the blueprint to make proteins, occur along this sequence. For this reason, it is sometimes referred to as the control region. Second, there is a rather unusual structure called a displacement loop, or D-loop, within it. This loop is a sequence of about 680 base-pairs where the double strand opens up to form a kind of eye along the length of the circle. One strand has been displaced from the main circle as a result of the other strand pairing up with its own pa
rtner strand of RNA, in this way displacing the loop of single-strand DNA.
The control region and the D-loop within it, in all making up almost 7 percent of the entire mitochondrial sequence, has been a key target for those exploring close evolutionary relationships by modern and ancient DNA alike. Two or three sections within the control region contain the most variable sequences in the entire genome. In some parts of the genome, the greatest mutation tolerated is an occasional base replacement in a position that is not going to impair function. In these highly variable or ‘hypervariable’ segments of the mitochondrial genome, not only do the bases change, but even the length of the sequence changes. This is shown dramatically in what is known as the first hypervariable segment of the mitochondrial control region. This stretch lies at one side of the D-loop. So fast is it evolving that not only its sequence of bases, but also its actual length varies considerably between species. In humans, it is a little below 400 base-pairs in length (positions 16,023 to 16,400), in rats just below 300, and in cows below 200 base-pairs. This is a quite remarkable level of diversity for what are quite closely related species. Even within a single, relatively young species like our own, this segment displays marked variation. Human sequences will differ among themselves by an average of eight substitutions along the segment. This first hyper variable segment presented itself as an ideal place to look for relationships between closely related organisms, such as within a single genus or species. In time it would crop up again and again in ancient DNA analyses, and was soon to have an impact on the human story in relation to an ancient woman who came to be dubbed ‘the mother of us all’.
mitochondrial eve
With Anderson’s mitochondrial sequence to hand, Allan Wilson was now able to apply the same logic to DNA that he had earlier applied to albumin proteins in the 1960s, but with a sharper taxonomic precision. Variation in the human sequence would reveal something about human origins. Two of his graduate students, Rebecca Cann and Mark Stoneking, set to the task. Typing DNA in the early 1980s still required large quantities of tissue. The potential of PCR had not yet been realized and so there was no question of typing DNA from a hair or a small drop of blood as can be done today. Cann and Stoneking worked instead with whole human placentas. About two-thirds of these came from American hospitals, and others came from Aboriginal populations in Australia and New Guinea. The total sample of 147 specimens included 2 Africans and 18 African Americans, 34 Asians, 46 Caucasians, 21 Australians and 26 New Guineans. Their mitochondrial DNA was purified and analysed and the different types compared. These comparisons were made in relation to the different regions of the mitochondrial genome, within the D-loop, the protein-coding regions, the RNA-coding regions and elsewhere. Little more than a decade after Wilson’s lab had changed the human story with one set of genetic patterns, his students would now change it with another. Once again, one of the key features in the new story was a matter of time scale, and a pace of evolution that would further distance us from our curious cousins.
The central findings were that the variation was very little, in comparison to other primates, and most of the variation could be seen within the African Americans. Taken together, the results implied that all living humans could trace their mitochondrial lines back to a common female ancestor around 200,000 years ago. Most of the existing variation within the sequence was found among Africans. That suggested that the single human lineage also found its root in Africa. The mother at the source of that lineage captured the popular imagination as ‘Mitochondrial Eve’.
Taking these two points separately, the small amount of variation resonated with the Wilson lab’s earlier finding. Just as the albumins had shown that the hominids were more recent than the traditional story had held, so the mitochondrial DNA indicated a recent time scale for our own species within the hominids. There had not been time for a lot of variation to accrue. We saw earlier how these abbreviated time scales were difficult for those who wanted to model all the Asian fossils within a single gradual story of inclusive and progressive evolution. The second feature of Cann’s and Stoneking’s results accentuated that difficulty. The DNA results placed that recent ancestral ‘mother of us all’ in Africa, marginalizing a series of fossil records scattered around Europe and Asia which others remained keen to retain within the central human lineage. Around the world hominids could be found who significantly pre-dated the short time scale proposed by Wilson’s group. Moreover, the dates were safe. Accepting the out-of-Africa story certainly meant the end of the great collective drive towards human progress. It would mean hominid species more often than not went into extinction. This was not so surprising given that the fossil record indicated the same was true of most species, but was nevertheless a profound challenge to the still-persisting view that we humans are different.
The multi-regionalists, as those arguing for the inclusive approach are known, responded to the Mitochondrial Eve hypothesis in a predictable manner. The criticisms came from a new generation of physical anthropologists. Foremost among these were Allen Thorne and Milford Wolpoff. Both authors had been struck by physical variations between the fossil skulls in the eastern part of the Old World range. It seemed to them that a group of the earliest east Asian hominid skulls displayed features that could still be seen in human populations alive in that part of the world today. Likewise, they linked certain attributes of south-east Asian fossil skulls with recent and modern Australian forms. Their multi-regional theory was not a direct replay of the earlier stories of Coon and Weidenreich. They had by now accepted both the shorter 5-million-year time scale for the hominid line, and the original source in Africa. But they were still looking for a model that drew all the Old World fossils into a single collective story of forward evolution. Attaining that required a time period five times what Rebecca Cann’s results permitted. They vigorously sought out the weak points in the arguments.
The clearest weakness was in the sampling, particularly of ‘Africans’. With two exceptions, the placentas were taken from African Americans. Let us take just one hypothetical mother and conjecture that seven out of eight of her great-grandparents were of African descent, but that her mother’s mother’s mother was European. Because of the way that nuclear chromosomes work, her physical features would be unlikely to display an undue influence from that 12 percent of her DNA. The 88 percent of African descent is what we would discern. However, her mitochondrial DNA will be 100 percent European, and that would obviously skew the results of Cann and Stoneking’s analysis. In this particular instance, the analysis was probably on reasonably safe ground. This is because in America, so far as we know, the great majority of marriages between those of European and African descent have tended to be between European men and African American women. None the less, the principle holds. We could repeat this logic for a whole range of American ancestries, just to make the point that mDNA lineages in a multi-ethnic society may not be directly transposable to the rest of the world.
This in turn related to a much wider perceived weakness in the argument surrounding mitochondrial lineages. Such lineages are rather like the female equivalent of surnames. Each is passed between generations undiluted and from only one parent, thus reflecting a decreasing fraction of the genetic contribution of an increasingly distant ancestor. Let us take the analogy further. Thumbing through page after page of my own extremely common Welsh surname in an English telephone directory, one might have cause to develop an ‘Out-of-Wales’ hypothesis for the peopling of England. I know the surnames of eleven of my sixteen great-great-grandparents. They are all different. Few are Welsh, and just as many are of Italian origin, but Londoners outnumber the two put together. The phylogenies, first of the surname Jones, second of my mitochondrial haplotype, and third of myself are three distinct phylogenies, and have no need to match up. Thorne and Wolpoff were of the view that mitochondrial evidence served an interesting function of generating hypotheses, but those hypotheses could only be ratified or refuted by reference to the fossil and
archaeological record itself. In their view, those records refuted it. They saw no evidence of a rapid replacement of one hominid group by our species, and no evidence of the uniformity one might expect to result from such a replacement. The hypothesis, however interesting, did not fit with the data and there was therefore no problem in rejecting it.
The second weakness arose from the use of the molecular clock. The sceptics were still uncertain about Wilson’s earlier use of sequence variation to derive a figure of 5 million years. Now the molecular clock was brought into play for an even shorter time span. Opinion still varies as to how far it can be pushed to perform in the very recent evolutionary time scales of archaeology. Thorne and Wolpoff doubted it could be brought forward more than half a million years.
If that was not enough, the use of the computed tree was also an interpretative minefield. Computers are powerful things; it’s not so difficult to get a computer to take a batch of data and build a tree out of it, based on some variable related to similarity. It isnot even so difficult to get it to build a range of different trees, by fine-tuning what is meant by ‘similarity’, and adjusting other variables and assumptions. The multi-regionalists quite predictably went for that Achilles’ heel with vigour, and repeatedly found a target to hit. It was a baptism of fire for Rebecca Cann, who experienced the furore that could arise from challenging established views about the human past. The comment ranged widely between enthusiasm and ridicule, but grant applications became difficult. But for the encouragement of her supervisor, Allan Wilson, Cann was ready to quit. To him it was a familiar scenario. His compressed time scale for primate evolution had caused a similar rumpus twenty years earlier, but had stood the test of time. The Mitochondrial Eve argument, and all its implications about the wider pattern of human beginnings, had some way to go.