Of course, as the name suggests, Indo-European languages are spoken not only in Europe. Modern Iran, Afghanistan and the Indian subcontinent all have a majority of Indo-European speakers. How did they come to speak languages related to Irish Gaelic, thousands of miles away? Again, there are competing hypotheses. The first, advanced by Childe, Gimbutas and others, is that the early steppe horsemen carried their language from central Asia into India when they invaded around 1500 BC. The Rig Veda, an early Indian religious text, records the conquest of India by mounted warriors from the north. This received corroboration in the 1920s when Sir John Marshall and his colleagues excavated Mohenjo Daro and Harappa in the Indus Valley. These great cities date from around 3500 BC, and by the second millennium BC they were massive settlements with thousands of houses, extensive agriculture and enormous populations. Then, around 1500 BC they entered a period of decline, and by AD 1000 the Harappan culture had disbanded, its cities abandoned. What caused this sudden cultural collapse? To the archaeologists, it seemed to correlate perfectly with an invading force of Aryans from the Steppes. Archaeology seemed to be reinforcing Childe’s argument, and corroborating the Rig Veda.
More recent research has suggested that there were probably indigenous causes for the collapse of the Harappan civilization. Perhaps a river changed course, or social decay had set in (think of the Romans, 2,000 years, later). Whatever the cause(s), the invading Aryans were not necessarily the all-powerful conquerors that early archaeologists thought they were. In the wake of this reinterpretation, Renfrew suggested two models for how the Indo-European languages could have come to India.
Renfrew’s first model is that of an early Neolithic migration from the Middle East, with the settlers carrying their PIE language with them. In this model, the Harappans would already have been Indo-European, and thus there is no reason to infer an Aryan invasion in order to account for the languages of India. The second model, giving more credence to the Rig Veda, is that there was an invasion of the Indus region by Indo-European speaking nomads from central Asia, but it was carried out by relatively few individuals. Thus it had little impact on the population of the subcontinent, aside from the imposition of a language and culture. In both cases, the Indian genetic data shows a minor contribution from the northern steppes.
The test of the Childe–Gimbutas and Renfrew hypotheses awaited the development of markers that were capable of distinguishing between populations from the steppe and the indigenous Indian gene pool. As we saw in Chapter 6, M20 defines the first major wave of migration into India from the Middle East, around 30,000 years ago. It is found at highest frequency in the populations of the south, who speak Dravidian languages – a language family completely unrelated to Indo-European. In some southern populations, M20 reaches a frequency of over 50 per cent, while it is found only sporadically outside India. Thus, for our purposes, it is an indigenous Indian marker. What was needed to complete the analysis was a steppe marker, in order to see what contribution it may have made to the genetic diversity present in India.
This came with the discovery of a marker known as M17, which is present at high frequency (40 per cent plus) from the Czech Republic across to the Altai Mountains in Siberia and south throughout central Asia. Absolute dating methods suggest that this marker is 10–15,000 years old, and the microsatellite diversity is greatest in southern Russia and Ukraine, suggesting that it arose there. M17 is a descendant of M173, which is consistent with a European origin. The origin, distribution and age of M17 strongly suggest that it was spread by the Kurgan people in their expansion across the Eurasian steppe. The key to solving our language puzzle is to see what it looks like in India and the Middle East.
The answer is that M17 in India is found at high frequency in those groups speaking Indo-European languages. In the Hindi-speaking population of Delhi, for example, around 35 per cent of men have this marker. Indo-European-speaking groups from the south also show similarly high frequencies, while the neighbouring Dravidian speakers show much lower frequencies – 10 per cent or less. This strongly suggests that M17 is an Indo-European marker, and shows that there was a massive genetic influx into India from the steppes within the past 10,000 years. Taken with the archaeological data, we can say that the old hypothesis of an invasion of people – not merely their language – from the steppe appears to be true.
And what of the Middle East? Interestingly, M17 is not found at high frequency there – it is present in only 5–10 per cent of Middle Eastern men. This is true even for the population of Iran, speaking Farsi, a major Indo-European language. Those living in the western part of the country have low frequencies of M17, while those living further east have frequencies more like those seen in India. What lies between the two regions is, as we learned in Chapter 6, an inhospitable tract of desert. The results suggest that the great Iranian deserts were barriers to the movement of Indo-Europeans in much the same way that they had been to late Upper Palaeolithic migration.
The Y-chromosome results from Iran and the Middle East also suggest that early Middle Eastern agriculturalists did not spread Indo-European languages eastward as they moved into the Indus Valley. The marker M172, associated with the spread of agriculture, is found throughout India – consistent with an early introduction from the Middle East, most likely during the Neolithic. But the frequency is comparable in Indo-European and Dravidian speakers, suggesting that the introduction of agriculture pre-dated that of the Indo-European languages. Thinking in terms of actual behaviour, many Indian descendants of Neolithic farmers have learned to speak Indo-European languages, while fewer M17-carrying Indo-European speakers – up to this point – have given up their language in favour of Dravidian.
The low frequency of M17 in western Iran suggests that, in this case, exactly the sort of scenario envisaged by Renfrew in his second model has occurred. It is likely that a few invading Indo-European speakers were able to impose their language on an indigenous Iranian population by a process Renfrew calls elite dominance. In this model, something – be it military power, economic might, or perhaps organizational ability – allowed the Indo-Europeans of the steppes to achieve cultural hegemony over the ancient, settled civilizations of western Iran. One candidate for this ‘something’ was their use of horses in warfare, either to pull chariots or as mounts. Cavalry and chariots, both steppe inventions, would have given the early nomadic Indo-Europeans a distinct advantage over their adversaries’ infantry. The use of horses would provide a major technological advantage to armies over the next three millennia. It is not difficult to imagine that it gave an early advantage to the people of the Eurasian steppe.
Thus, while we see substantial genetic and archaeological evidence for an Indo-European migration originating in the southern Russian steppes, there is little evidence for a similarly massive Indo-European migration from the Middle East to Europe. One possibility is that, as a much earlier migration (8,000 years old, as opposed to 4,000), the genetic signals carried by Indo-European-speaking farmers may simply have dispersed over the years. There is clearly some genetic evidence for migration from the Middle East, as Cavalli-Sforza and his colleagues showed, but the signal is not strong enough for us to trace the distribution of Neolithic lineages throughout the entirety of Indo-European-speaking Europe. Cavalli-Sforza has suggested that an initial migration of Neolithic pre-PIE speakers from the Middle East could have introduced a language to Europe, including our Kurgan people, which later became PIE. There is nothing to contradict this model, although the genetic patterns do not provide clear support either.
There is another possibility, which comes from the distribution and relationships among extinct languages in the Middle East and Europe. What if the language of the first farmers was not Indo-European, but another language entirely? The Basques, who live in north-eastern Spain, speak a language unrelated to any other in the world. Jared Diamond, in his book The Rise and Fall of the Third Chimpanzee, suggested that it might be a remnant of the agricultural Wave of Advance fro
m the Middle East. Interestingly, some linguists have suggested that Basque is related to languages spoken in the Caucasus, while others find similarities to Burushaski, a language isolate spoken in a remote part of Pakistan. Similarly, there were other now-extinct languages spoken throughout the Mediterranean world, in south-eastern Spain (Tartessian and Iberian), Italy (Etruscan and Lemnian) and Sardinia (there is a non-Indo-European source for many place names). Place names in southern France similarly suggest that Basque was much more widely spoken in the past than it is today, and Greek place names indicate the presence of a pre-Indo-European element there as well. Overall, there is reasonable evidence for a ‘Mediterranean’ collection of pre-Indo-European languages that were later replaced by the expansion of Greek and Latin.
Taken at face value, then, we have a set of languages that were once widespread around the Mediterranean and Middle East, extending eastward into Pakistan. This is precisely the territory colonized by early Neolithic farmers during the period between 10,000 and 7,000 years ago. One possibility is that these early farmers spread ‘Mediterranean’ languages as they expanded their populations. The Palaeolithic populations of Europe took on the language of farming, and its culture, even if (as in the case of the Basques) there was hardly any genetic influx. These languages also spread to the east, introducing farming throughout the river valleys of central Asia and Pakistan. Later migrations, of Dravidian and Indo-European speakers in the case of Pakistan, and Indo-Europeans in the case of Europe, would have reduced the current speakers of the Mediterranean languages to the isolated pockets we see today.
Of course, this scenario is purely speculative, but it may be a plausible alternative to Renfrew’s Indo-European farmers and Cavalli-Sforza’s pre-PIE farmers. Furthermore, the genetic data shows some correlations: most of the regions mentioned, from the Mediterranean to the Caucasus to Pakistan, have substantial frequencies of M172, our canonical Neolithic marker. This is particularly true for populations from the Caucasus, some of which have frequencies of M172 in excess of 90 per cent. The generally close genetic similarly between Caucasian populations and those from the Middle East suggests that there was a substantial influx of people during the Neolithic, who may have introduced languages related to Sumerian to the region. Of course, this scenario assumes a relationship among all of the Mediterranean languages, which is tenuous at best. However, some linguists have found evidence for such a language ‘superfamily’, revealing deep structures common to seemingly unrelated languages. The search for these superfamilies is where we are headed next.
The big picture
Charles Darwin, writing in the time before modern methods of language classification had been fully worked out, noted the similarity between classifications based on genealogy and those based on linguistics. In the Origin of Species, he noted that ‘if we possessed a perfect pedigree of mankind, a genealogical arrangement of the races of man would afford the best classification of the various languages now spoken throughout the world’. Cavalli-Sforza has said that he was unaware of Darwin’s hypothesis when he began his 1988 comparison between genetic and linguistic relationships, his attention having been drawn to it later by a colleague who studied the history of science. It is not, perhaps, such a great leap of faith to suggest that languages tend to track population relationships. After all, we do ‘inherit’ our language from our parents, so at least on the time scale of the recent past, languages should be a good proxy for genes. What happens, though, when we look further? Is there a deeper relationship among languages that unites them into larger groups? And, perhaps most importantly, is there any evidence of a linguistic equivalent of our genetic Adam or Eve?
Joseph Greenberg, whom we encountered in Chapter 7, was convinced that such deeper relationships did exist. He made his name in the field of linguistic classification by uniting the hundreds of languages of Africa into four distinct families, described in his 1963 book The Languages of Africa. These early attempts at higher-order classification were generally well received by the linguistics community, and their success encouraged Greenberg to begin to look tentatively at deeper relationships among languages, particularly those of Eurasia.
Greenberg found that many of the languages, including those belonging to the Indo-European family, seemed to share certain structural elements that they found too striking to be due to chance. The details seem trivial to non-specialists (one example is how nouns are made into plural forms, by the addition of either a -k or -t suffix), but are significant to many linguists. Merritt Ruhlen, in his book The Origin of Language, traces many of the similarities among Greenberg’s so-called Eurasiatic family, called Nostratic by some specialists.
One of the first questions we might have about this group of languages is whether, like Indo-European, there is any archaeological or genetic evidence for it. Unfortunately, this does not appear to be the case. One problem is that its members are so widespread across much of Eurasia that it encompasses a huge number of distinct populations. This may be due to the estimated age of the family – perhaps more than 20,000 years old. Any correlation with such an ancient and widespread group of languages is tenuous at best, and the only obvious Y-chromosome marker would be M9. M9, however, is also found in the other Eurasian superfamily of languages, known as Dene-Caucasian.
The first group in this family is that of the American Na-Dene languages (such as Navajo) and Sino-Tibetan, the languages of China and Tibet. Many linguists now accept the relationship between these two language families, but the more distant relationships are much more controversial. This is because Dene-Caucasian also includes, as its name suggests, languages from the Caucasus, as well as Basque and Burushaski. To put this into perspective, the languages belonging to Dene-Caucasian are spoken from the Pyrenees to the Rockies, with isolated patches scattered across Eurasia – a rather disparate group to say the least. In part because of this, American linguist John Bengtson has identified a subgroup within Dene-Caucasian that includes Basque, Caucasian, Burushaski and the extinct Sumerian language. The overlap with our hypothetical ‘Mediterranean’ family is striking, and (as we have already seen) there is some genetic evidence to support the dispersal of this group of languages during the past 10,000 years, perhaps in association with agriculture. The inclusion of Sumerian is especially telling, since this language – spoken by one of the earliest Mesopotamian civilizations – has geographic and cultural links back to the earliest days of agriculture in the Fertile Crescent.
While the genetic data supports the notion of a population connection among some of the western members of the Dene-Caucasian family, there is no clear link between them and the eastern languages. These languages, the Sino-Tibetan and Na-Dene families, do have their own genetic connection, however. It comes in the form of the M130 marker, which we first encountered in tracing the coastal migration to Australia. As we saw in the last chapter, M130 is also found in the population of eastern Asia, including China, marking a northward expansion of the marker from south-east Asia. Interestingly, this marker is also found in Na-Dene-speaking populations in North America. As with the Na-Dene languages themselves, it is not found in South America. This suggests a unique genetic link between east Asians and some Native American tribes, which arose from a second migration into the Americas between 5,000 and 10,000 years ago. In this case, genetics reinforces the linguistic relationship and provides a rough date for the divergence.
Their success in identifying common features in languages separated by tens of thousands of years has led some linguists to delve even further into the recesses of linguistic history, searching for the deepest relationship of all – a common origin for all languages. Merritt Ruhlen, one of the staunchest supporters of this view, believes that the Dene-Caucasian family marks the earliest spread of modern humans out of Africa, while the Eurasiatic family marks a later expansion emanating from the Middle East. As we have seen, there is no clear genetic data to support this model. One alternative is that these families spread, at least in part, via
cultural dissemination, without leaving well-defined genetic trails. This has happened with some branches of Indo-European, for instance. The other possibility is that Eurasiatic and Dene-Caucasian do not really exist – perhaps they are simply collections of unrelated languages that show random similarities. Or perhaps subgroups do exist, particularly those supported by genetic data (such as Sino-Tibetan and Na-Dene), while many of the languages are unrelated. Ruhlen clearly has his work cut out for him.
It is likely that the evolution of language does follow the same paths as the migration of modern humans, with an origin in Africa and subsequent dispersal to the far corners of the globe. However, this statement is based on circumstantial evidence – the universality of language in all human populations, extrapolation from short-term linguistic change in recognized families such as Indo-European, and the presumed importance of language for the development of modern human culture. Almost all of the signals of the original human language – if it existed – have been lost, leaving us with today’s dispersed Tower of Babel. In the same way that English fragmented into a large number of dialects that became more dissimilar over the past 500 years, so too do all languages become more dissimilar over time. Eventually, they lose all evidence of their common origin. The period of time required for this is unclear. Some linguists think that 6,000 years is long enough, while Ruhlen and others claim to have found similarities that trace back more than 20,000 years. The search for the language of Adam and Eve promises to be a contentious and exciting field in the next few years, and genetics should be able to offer some input.
The Journey of Man: A Genetic Odyssey Page 20