Imagine a group of anything that exhibits variation – the different colours of stones in a streambed, snail-shell size, fruit-fly wing length, or human blood groups. At first glance these variations seem random and disconnected. If we have multiple sets of such objects, then it seems more complex still – even chaotic. What does it reveal about the mechanism by which the diversity was generated?
The knee-jerk reaction of most biologists in the 1950s to any pattern of diversity in nature was that selection was the root cause. Human diversity was no exception, as the eugenicists made quite clear. In part this stemmed from the widespread belief in ‘wild types’ and ‘mutants’. The wild type could encompass any trait – size, colour, nose shape, or any other ‘normal’ characteristic of the organism. This was reinforced by the fact that genetic diseases (which were clearly ‘abnormal’) were some of the first variants recognized in humans, setting the stage for a worldview in which people were categorized as fit or unfit according to a Darwinian evolutionary struggle. However, in the 1950s Motoo Kimura, a Japanese scientist working in the United States, began to do some genetic calculations using methods originally derived for analysing the diffusion of gases, formalizing work carried out by Cavalli-Sforza and others. This work would eventually lead the field out of the ‘mutant’ morass.
Kimura noticed that genetic polymorphisms in populations can vary in frequency owing to random sampling errors – the ‘drift’ mentioned above. What was exciting in his results was that drift seemed to change gene frequencies at a predictable rate. The difficulty with studying selection was that the speed with which it produced evolutionary change depended entirely on the strength of selection – if the genetic variant was extremely fit, then it increased in frequency rapidly. However, it was virtually impossible to measure the strength of selection experimentally, so no one could make predictions about the rate of change. In our coin-flipping example, if heads is one variant of a gene and tails is another, then the increase in frequency from 50 per cent to 70 per cent in a single ‘generation’ would imply very strong selection favouring heads. Clearly, though, this isn’t the case – heads increased to 70 per cent for reasons that had nothing to do with how well adapted it was.
Kimura’s insight was that most polymorphisms appear to behave like this – that is they are effectively free from selection, and thus they can be treated as evolutionarily ‘neutral’, free to drift around in frequency due entirely to sampling error. There has been great debate among biologists about the fraction of polymorphisms that are neutral – Kimura and his scientific followers thought that almost all genetic variation was free from selection, while many scientists continue to favour a significant role for natural selection. Most of the polymorphisms studied by human geneticists, though, had probably arrived at their current frequencies because of drift. This opened the door to a new way of analysing the rapidly accumulating data on blood group polymorphisms. But before that could happen, the field needed to make a quick detour through the Middle Ages.
‘Ock the Knife’
William of Ockham (c.1285–1349) was a medieval scholar who must have been a nightmare to be around. Ockham believed literally in Aristotle’s statement that ‘God and nature never operate superfluously, but always with the least effort’, and took every opportunity to invoke his interpretation of this view in arguments with his colleagues. Ockham’s razor, as it became known, was stated quite simply in Latin: Pluralitas non est ponenda sine necessitate (plurality is not to be posited without necessity). In its most basic form, Ockham’s statement is a philosophical commitment to a particular view of the universe – a view that has become known as parsimony. In the real world, if each event occurs with a particular probability, then multiple events occur with multiplied probabilities and, overall, the complex events are less likely than the simple ones. It is a way of breaking down the complexity of the world into understandable parts, favouring the simple over the absurd. I may actually fly from Miami to New York via Shanghai – but it is not terribly likely.
This may seem trivial when applied to my travel schedule, but it is not so obvious when we start to apply it to the murky world of science. How do we really know that nature always takes the most parsimonious path? In particular, is it self-evident that ‘simplify’ is nature’s buzzword? This book is not the forum for a detailed discussion of the history of parsimony (there are several references in the bibliography where the subject is discussed in great detail), but it seems that nature usually does favour simplicity over complexity. This is particularly true when things change – like when a stone drops from a cliff to the valley below. Gravity clearly exerts itself in such a way that the stone moves directly – and rather quickly – from the high to the low point, without stopping for tea in China.
So, if we accept that when nature changes, it tends to do so via the shortest path from point A to point B, then we have a theory for inferring things about the past. This is quite a leap, since it implies that by looking at the present we can say something about what happened before. In effect, it provides us with a philosophical time machine with which to travel back and dig around in a vanished age. Pretty impressive stuff. Even Darwin was an early adherent – Huxley actually scolded him on one occasion for being such a stick-in-the-mud about his belief that natura non facit saltum (nature doesn’t make leaps).
The first application of parsimony to human classification was published by Luca Cavalli-Sforza and Anthony Edwards in 1964.* In this study they made two landmark assumptions which would be adopted in each subsequent study of human genetic diversity. The first was that the genetic polymorphisms were behaving as Kimura had predicted – in other words, they were all neutral, and thus any differences in frequency were due to genetic drift. The second assumption was that the correct relationship among the populations must adhere to Ockham’s rule, minimizing the amount of change required to explain the data. With these key insights, they derived the first family tree of human groups based on what they called the ‘minimum evolution’ method. In effect, this means that the populations are linked in a diagram such that the ones with the most similar gene frequencies are closest together, and that overall the relationship among the groups minimizes the total magnitude of gene frequency differences.
Cavalli-Sforza and Edwards looked at blood group frequencies from fifteen populations living around the world. The result of this analysis, laboriously calculated by an early Olivetti computer, was that Africans were the most distant of the populations examined, and that European and Asian populations clustered together. It was a startlingly clear insight into our species’ evolutionary history. As Cavalli-Sforza says modestly, the analysis ‘made some kind of sense’, based on their concept of how human populations should be related – European populations were closer to each other than they were to Africans, New Guineans and Australians grouped together, and so on. This was a reflection of similarities in gene frequencies, and since these frequencies changed in a regular way over time (remember genetic drift), it meant that the time elapsed since Europeans started diverging from each other was less than the time separating Europeans from Africans. The old monk had proven useful after 700 years – and anthropology had a way forward.*
With this new approach to human classification, it was even possible to calculate the dates of population splits, making several assumptions about the way humans had behaved in the past, and the sizes of the groups they lived in. This was first done by Cavalli-Sforza and his colleague Walter Bodmer in 1971, yielding an estimate of 41,000 years for the divergence between Africans and East Asians, 33,000 for Africans and Europeans and 21,000 for Europeans and East Asians. The problem was, it was uncertain how reasonable their assumptions about population structure really were. And crucially, it still failed to provide a clear answer to the question of where humans had originated. What the field needed now was a new kind of data.
Alphabet soup
Emile Zuckerkandl was a German-Jewish émigré working at the California Insti
tute of Technology in Pasadena. He spent much of his scientific career tenaciously focused on one problem: the structure of proteins. Working with the Nobel Prize-winning biochemist Linus Pauling in the 1950s and 60s, Zuckerkandl studied the basic structure of the oxygen-carrying molecule haemoglobin – chosen because it was plentiful and easy to purify. Haemoglobin had another important characteristic: it was found in the blood of every living mammal.
Proteins are composed of a linear sequence of amino acids, small molecular building blocks that combine in a unique way to form a particular protein. The amazing thing about proteins is that, although they do their work twisted into baroque shapes, often with several other proteins sticking to them in a complex way, the ultimate form and function of the active protein is determined by a simple linear combination of amino acids. There are twenty amino acids used to make proteins, with names like lysine and tryptophan. These are abbreviated by chemists to single letter codes – K and Y in this case.
Zuckerkandl noticed an interesting pattern in these amino acid sequences. As he started to decipher haemoglobins from different animals, he found that they were similar. Often they had identical sequences for ten, twenty, or even thirty amino acids in a row, and then there would be a difference between them. What was fascinating was that the more closely related the animals were, the more similar they were. Humans and gorillas had virtually identical haemoglobin sequences, differing only in two places, while humans and horses differed by fifteen amino acids. What this suggested to Zuckerkandl and Pauling was that molecules could serve as a sort of molecular clock, documenting the time that has elapsed since a common ancestor through the number of amino acid changes. In a paper published in 1965, they actually refer to molecules as ‘documents of evolutionary history’. In effect, we all have a history book written in our genes. According to Zuckerkandl and Pauling, the pattern written in a molecular structure can even provide us with a glimpse of the ancestor itself, making use of Ockham’s razor to minimize the number of inferred amino acid changes and working back to the likely starting point (see Figure 1). Molecules are, in effect, time capsules left in our genomes by our ancestors. All we have to do is learn to read them.
Figure 1 The evolutionary ‘genealogy’ of two related molecules, showing sequence changes accumulating on each lineage.
Of course, Zuckerkandl and Pauling realized that proteins were not the ultimate source of genetic variation. This honour lay with DNA, the molecule that actually forms our genes. If DNA encodes proteins (which it does), then the best molecule to study would be the DNA itself. The problem was that DNA was extremely difficult to work with, and getting a sequence took a long time. In the mid-1970s, however, Walter Gilbert and Fred Sanger independently developed methods for rapidly obtaining DNA sequences, for which they shared the Nobel Prize in 1977. The ability to sequence DNA set off a revolution in biology that has continued to this day, culminating in 2000 with the completion of a working draft of the entire human genome sequence. DNA research has revolutionized the way we think about biology, so it isn’t surprising that it has had a significant effect on anthropology as well.
The crowded garden
So we find ourselves in the 1980s with the newly developed tools of molecular biology at our disposal, a theory for how polymorphisms behave in populations, a way to estimate dates from molecular sequence data and the burning question of how genetics can answer a few age-old questions about human origins. What the field needed now was a lucky insight and a bit of chutzpah. Both of these were to be found in the early 1980s in the San Francisco Bay area of northern California.
Allan Wilson was an Australian biochemist working at the University of California, Berkeley, on methods of evolutionary analysis using molecular biology – the new branch of biology that focused on DNA and proteins. Using the methods of Zuckerkandl and Pauling, he and his students had used molecular techniques to estimate the date of the split between humans and apes, and they had also deciphered some of the intricate details of how natural selection can tailor proteins to their environments. Wilson was an innovative thinker, and he embraced the techniques of molecular biology with a passion.
One of the problems that molecular biologists encountered in studying DNA sequences was that of the duplicate nature of the information. Inside each of our cells, what we think of as our genome – the complete DNA sequence that encodes all of the proteins made in our bodies, in addition to a lot of other DNA that has no known function – is really present in two copies. The DNA is packaged into neat, linear components known as chromosomes – we have twenty-three pairs of them. Chromosomes are found inside a cellular structure known as the nucleus. One of the main features of our genome is the astounding compartmentalization – like computer folders within folders within folders. In all there are 3,000,000,000 (3 billion) building blocks, known as nucleotides (which come in four flavours: A, C, G and T), in the human genome, and we need some way to get at all of the information it contains in a straightforward way. This is why we have chromosomes, and why they are kept squirrelled away from the rest of the cell inside the nucleus.
The reason we have two copies of each chromosome is more complicated, but it comes down to sex. When a sperm fertilizes an egg, one of the main things that happens is that part of the father’s genome and part of the mother’s genome combine in a 50 : 50 ratio to form the new genome of the baby. Biologically speaking, one of the reasons for sex is that it generates new genomes every generation. The new combinations arise, not only at the moment of conception with the 50 : 50 mixing of the maternal and paternal genomes, but also prior to that, when the sperm and egg themselves are being formed. This pre-sexual mixing, known as genetic recombination, is possible because of the linear nature of the chromosomes – it is relatively easy to break both chromosomes in the middle and reattach them to their partners, forming new, chimeric chromosomes in the process. The reason why this occurs, as with the mixing of Mum’s and Dad’s DNA, is that it is probably a good thing, evolutionarily speaking, to generate diversity in each generation. If the environment changes, you’ll be ready to react.
But wait, you might say, why are these broken and reattached chromosomes any different from the ones that existed before? They were supposed to be duplicates! The reason, quite simply, is that they aren’t exact copies of each other – they differ from each other at many locations along their length. They are like duplicates of duplicates of duplicates of duplicates, made with a dodgy copying machine that introduces a small number of random errors every time the chromosomes are copied. These errors are the mutations mentioned above, and the differences between each chromosome in a pair are the polymorphisms. Polymorphisms are found roughly every 1,000 nucleotides along the chromosome, and serve to distinguish the chromosomes from each other. So, when recombination occurs, the new chromosomes are different from the parental types.
The evolutionary effect of recombination is to break up sets of polymorphisms that are linked together on the same piece of DNA. Again, this diversity-generating mechanism is a good thing evolutionarily speaking, but it makes life very difficult for molecular biologists who want to read the history book in the human genome. Recombination allows each polymorphism on a chromosome to behave independently from the others. Over time the polymorphisms are recombined many, many times, and after hundreds or thousands of generations, the pattern of polymorphisms that existed in the common ancestor of the chromosomes has been entirely lost. The descendant chromosomes have been completely shuffled, and no trace of the original deck remains. The reason this is bad for evolutionary studies is that, without being able to say something about the ancestor, we cannot apply Ockham’s razor to the pattern of polymorphisms, and we therefore have no idea how many changes really distinguish the shuffled chromosomes. At the moment, all of our estimates of molecular clocks are based on the rate at which new polymorphisms appear through mutation. Recombination makes it look like there have been mutations when there haven’t, and because of this it causes us to
overestimate the time that has elapsed since the common ancestor.
One of the insights that Wilson and several other geneticists had in the early 1980s was that if we looked outside of the genome, at a small structure found elsewhere in the cell known as the mitochondrion, we might have a way of cheating the shuffle. Interestingly, the mitochondrion has its own genome – it is the only cellular structure other than the nucleus that does. This is because it is actually an evolutionary remnant from the days of the first complex cells, billions of years ago – the mitochondrion is what remains of an ancient bacterium which was swallowed by one of our single-celled ancestors. It later proved useful for generating energy inside the cell, and now serves as a streamlined sub-cellular power plant, albeit one that started life as a parasite. Fortunately, the mitochondrial genome is present in only one copy (like a bacterial genome), which means that it can’t recombine. Bingo. It also turns out that, instead of having one polymorphism roughly every 1,000 nucleotides, it has one every 100 or so. To make evolutionary comparisons we want to have as many polymorphisms as possible, since each polymorphism increases our ability to distinguish between individuals. Think of it this way: if we were to look at only one polymorphism, with two different forms A and B, we would sort everyone into two groups, defined only by variant A or variant B. On the other hand, if we looked at ten polymorphisms with two variants each, we would have much better resolution, since the likelihood of multiple individuals having exactly the same set of variants is much lower. In other words, the more polymorphisms we have, the better our chances of inferring a useful pattern of relationships among the people in the study. Since polymorphisms in mitochondrial DNA (mtDNA) are ten times more common than in the rest of our genome, it was a good place to look.
The Journey of Man: A Genetic Odyssey Page 4