This Is the Voice
Page 13
Far from suggesting a gene that controls grammar, the family’s heritable linguistic difficulties pointed to a gene that controls speech—and precisely that element of speech (motor control of the articulators) that Lieberman theorized had undergone a critical change during our evolution, a genetic mutation that endowed us, and us alone, with the power to tune our voices from purely emotional cries and calls, and to shape them into articulate, spoken language through exquisitely well-planned, precisely timed movements of tongue, lips, larynx, and velum.
Strong support for that aspect of Lieberman’s theory emerged in 2001 when the defective gene in the London family was finally isolated by a lab at Oxford University. Located on the seventh chromosome, it is a “master regulating gene” whose on- and off-switching, when an embryo is developing in the mother’s womb, affects a number of “downstream” genes responsible for building parts of the lungs, heart, motor cortex, and (crucially) the basal ganglia. The Oxford team called the gene FOXP2. It turns out that we all possess two copies of FOXP2—one we get from our dad, the other from our mom. In the afflicted West London family, a transcription error in just one of those parental copies had occurred, a mix-up with two of the over seven hundred amino acid base pairs that make up that stretch of DNA. That tiny error had reduced the mobility and agility of the afflicted family members’ tongues and lips enough to render speech, in the most severely affected members, almost unintelligible. (That it also affected their thinking was clear: all of the family members with shriveled basal ganglia had dramatically lower verbal and nonverbal IQs than the unaffected members.)40
Further research revealed that the FOXP2 gene is actually found in all mammalian species, including mice and dogs, cats and whales, chimps and orangutans, where it controls the fine motor movements involved in actions like walking and running, chewing and swallowing. As such, it has a remarkable evolutionary story to tell—a story anticipated thirty years earlier by Lieberman, and one that offers astonishing insight into the emergence of our uniquely human voice, its adaptation for speech, and our subsequent rise to the top of the food chain.
Shortly after the Oxford lab isolated FOXP2, it shared the discovery with Svante Pääbo, the head of Molecular Genetics at the Max Planck Institute for Evolutionary Anthropology in Leipzig, and a scientist famous for his research into the genes that make us human. (It was Pääbo’s team that had sequenced the Neanderthal genome and revealed our familial relationship with this extinct species.) Pääbo enlisted one of his top researchers, Wolfgang Enard, to investigate FOXP2’s evolutionary backstory. When Enard compared FOXP2 in mice, orangutans, gorillas, chimps, and us, he discovered that the gene had changed very little over the roughly 130 million years that separated the evolution of mice and the appearance of apes: just one amino acid change in over 100 million years. But sometime after our line branched off from the common ancestor we share with chimps (around six million years ago), the FOXP2 gene underwent two more mutations: two amino acid substitutions in just six million years, a major acceleration in change. The widespread appearance of this doubly mutated FOXP2 in human populations around the globe “strongly suggest[s],” Enard wrote, “that this gene has been the target of selection during recent human evolution.”41
By “target of selection,” he meant that the bodily and behavioral changes conferred by those two amino acid substitutions gave a significant survival advantage to the hominin line—our line. The West London family with the sluggish tongue and lips makes clear what that advantage was: a turbocharged basal ganglia that allow for high-speed, exquisitely coordinated, carefully sequenced movements of the vocal organs that only we are capable of, and that make speech possible. Enard determined that this human form of FOXP2 became “fixed” in our genome about 200,000 years ago, which is when modern Homo sapiens first appeared on the scene. This did not, however, rule out that our close relative, the Neanderthal, might not also have borne the mutated FOXP2. Pääbo’s team analyzed Neanderthal DNA and discovered that their FOXP2 had indeed undergone the same two mutations as ours—the strongest evidence yet that Neanderthals could talk (even if their vowels were blurry). Tellingly, when the human/Neanderthal FOXP2 was “knocked-into” fetal mice by Päabo’s team, the neural pathways to the basal ganglia were enhanced and the mice, at birth, produced ultrasonic cries for their mothers that differed in pitch and “syllable” duration from those of untreated mice. 42
But perhaps the most extraordinary finding to emerge about FOXP2 is that birds, too, possess the gene and that it expresses itself in a part of the avian brain analogous to our basal ganglia, and which controls the high-speed movements of the syrinx, tongue, and beak involved in mastering a bird’s species-specific mating songs.43 This genetic similarity between bird and human brains makes perfect sense, given that (as Darwin noted) we are among the only animal species that learn our vocalizations by exposure to adults’ voices. But many scientists nevertheless greeted news of avian FOXP2 with jaw-dropped amazement because birds evolved well before mammals. Birds are actually flying reptiles (indeed, dinosaurs) and were presumed not to possess a gene that otherwise appears in the animal genome only much later in evolutionary history, with the emergence of mammals. This suggests one of two possibilities: that birds evolved their FOXP2 gene independently, by “convergent” evolution, and that it arose, like our FOXP2, as part of the selection pressure for vocal learning and communication crucial to birds’ survival and reproduction. Or, in a scenario advanced by Lieberman, that bird and human FOXP2 originated far earlier in evolution than previously suspected; indeed, back to the Cambrian age, 500 million years ago, with the emergence of amphibians who possess basal ganglia.
Regardless, FOXP2 is emphatically not a “gene for language.” It is, more accurately, the first gene ever found for the unique specializations of our human voice. In the twice-mutated form found in our species, it is among the best genetic evidence we possess to explain how a marginal species of small, physically weak, slow-running, hairless primates made their improbable climb, over just a few hundred thousand years, to the top of the food chain.
For Lieberman, the findings about FOXP2 and the basal ganglia are only the latest evidence to support his theory that language, far from being a purely mental phenomenon, is a physical act whose first stirrings can be traced back, hundreds of millions of years, to the oldest, air-breathing vertebrate (the lungfish), as voice (regardless of how fartlike). The exigencies of survival and reproduction gave rise to speech, some 200,000 years ago, when a series of random, but advantageous, genetic mutations led, in our early hominin line, to increased control over respiration, to the descent of the larynx, and to the powering of the basal ganglia for articulation—all anatomical accidents selected by nature for the advantages in survival and reproduction that they conferred, and which bred a bigger, better, language-capable brain. The voice, in Lieberman’s conception, thus played the major role in creating language.
As he once put it: we “talked ourselves” into becoming human.44
* * *
Noam Chomsky, as we’ve noted, has long professed indifference to the question of where language came from—dismissing Darwinian natural selection, but declining to advance any plausible alternative. However, in the late 1990s, when the subject of language origins began to dominate linguistics, and other branches of science, Chomsky was no longer content to sit on the sidelines. In a 1999 interview, he let fall that he might see a role for Darwinian natural selection in language, after all.45 In 2002, he published in Science a major paper on the subject. Coauthored by evolutionary biologists Marc Hauser and Tecumseh Fitch, the paper triggered an earthquake in linguistics for the startling minimalism (not to say bizarre reductionism) of Chomsky’s idea of how language came about.46
Sticking to his view that our linguistic ability emerged, not for communication, but for thinking, he ignored the recent revelations about FOXP2, and continued to relegate Lieberman’s research about the descended larynx and improved respiration to “secon
dary” issues concerned with the “peripheral” faculty of mere speech. Instead, Chomsky and his coauthors focused on the purely cognitive changes that endowed us, alone among animals, with the ability to think linguistically. They concluded that this ability emerged thanks to one mental operation (and one only): recursion.
Recursion refers to our ability to put one idea inside another. It is embodied in everything from simple adjectival phrases (“The red boat” puts the idea of redness inside the idea of a boat) right on up to complicated embedded clauses, as when discrete thoughts (“the man is walking down the street” and “the man is wearing a top hat”) are combined in a single sentence (“The man who is wearing a top hat is walking down the street”). Thanks to recursion, you can just keep embedding ideas (“The man who is wearing a red top hat, which is slightly crumpled at the brim, is walking down the street and eating a slightly bruised but still delicious banana, while humming a tune made famous by Engelbert Humperdinck but which, according to musical historians, was actually written by…”). Words themselves use the recursive process by putting one speech sound (like the vowel “o”) inside others (“d_g”) to make “dog.” Thus did Chomsky and his coauthors call recursion the “only uniquely human component” behind language, the mental engine that makes it possible to generate infinite meanings (or infinitely long sentences like the one about the man eating the banana) from a finite set of sounds (English’s twenty-six vowels and consonants) and thus the single linguistic universal that distinguishes our speech from the grunts, hisses, moans, shrieks, and chirps of all other animals.
This was a far cry from the incredible complexity that Chomsky had always argued for in language, and which he had enshrined in his concept of Universal Grammar. His longtime acolytes were not only unconvinced—they were enraged. Told that they could now, seemingly, toss on the scrap heap the half century of work they had devoted to dissecting language for the common “deep structures” of Universal Grammar (a process that had so far brought to light only a tiny handful of supposed universals), they openly rebelled. Steven Pinker led the charge, coauthoring a response in Cognition with Ray Jackendoff, a leading Chomskyan. They argued that recursion, while important, was by no means the only “special” thing about languages, which are, they wrote, “full of devices like… quantifiers, tense and aspect markers, complementizers, and auxiliaries, which express temporal and logical relations.”47
Meanwhile, tucked away in a paragraph halfway through the thirty-six-page cri de coeur was the single most salient piece of evidence to suggest that recursion could not be the whole story of language—since recursion was not even universal. According to Pinker and Jackendoff, there existed, in the deep Brazilian jungle, an isolated tribe of fewer than four hundred people, the Pirahã, who spoke a highly unusual language of which the most striking feature was its total lack of recursive grammar.
Pinker and Jackendoff’s source was a remarkable 2005 paper written by Daniel Everett, a missionary-turned-linguist who had lived with, and studied, the Pirahã tribe for thirty years.48 According to Everett, Pirahã speakers do not recursively embed ideas. Instead of saying, “I saw the dog that was at the beach get bitten by a snake,” they would have to say, “I saw the dog. The dog was at the beach. A snake bit the dog”49—a little like the Motherese that Catherine E. Snow’s experimental subjects used when speaking to their infants. Chomskyans immediately insisted that this aspect of Pirahã speech must reflect mental deficits (effectively retardation) from inbreeding—an explanation Everett instantly shot down by explaining that the tribe, although living in the most remote reaches of the rain forest, regularly refreshes its genome by sleeping with outsiders (mostly traders who ply the Amazon for Brazil nuts and wood), and are, on all other evidence, no less intelligent than any other humans. Moreover, Everett wrote, the tribe’s inability to use recursion reflected not a cognitive constraint but a cultural one, since the tribe lived according to an extreme “immediacy-of-experience” principle so powerful that it affected every aspect of their lives. “When someone walks around a bend in the river, the Pirahã say that the person has not simply gone away but xibipío—‘gone out of experience,’ ” Everett wrote. “They use the same phrase when a candle flame flickers. The light ‘goes in and out of experience.’ ”50
This immediacy-of-experience principle, Everett said, explained the failure of missionaries like himself to convert the tribe. Told that Christ died two thousand years ago, the Pirahã lost all interest in Christianity; eventually, so did Everett, who in the late 1990s became an atheist and ceased trying to convert the tribe, focusing, instead, on studying their highly unusual language. The immediacy-of-experience principle explained why the Pirahã rejected outsiders’ efforts to teach them forward-planning skills like farming or food storage, and instead still lived as hunter-gatherers unchanged since they first arrived in the Brazilian jungle some ten to forty thousand years ago.
The immediacy-of-experience principle also explained their lack of creation myths, numbers, and art—and it profoundly influenced their speech, “extending its tentacles deep into their core grammar,” as Everett put it, to affect the feature that Chomsky claimed was universal to all language: recursion. Because the Pirahã accept as real only that which they can observe, in the here-and-now, their speech consists solely of direct assertions (“The dog was at the beach. It bit the man”). Recursively embedded clauses (“The dog that was at the beach bit the man”) are not assertions but supporting, quantifying, or qualifying information—in short, abstractions, and thus impossible in the tribe’s tongue.
To students of voice, the absence of recursion was by no means the only astounding thing about Pirahã. Unrelated to any other extant tongue, Pirahã is based on just eight consonant and three vowel sounds—an eleven-letter “alphabet” in comparison to our twenty-six—one of the simplest sound systems known. Pirahã makes up for its paucity of individual phonemes because it is a tonal language that uses the pitch of the voice to, in effect, multiply its small number of individual articulated sounds to create a sound alphabet extensive enough to make complex language possible. Mandarin Chinese is also tonal. It uses five distinct pitches. Thus, the Mandarin ma can mean five different things: spoken on a high, level pitch, it means “mother”; in a midrange pitch that then rises, “hemp”; in a low pitch with a slight dipping fall then rise, “horse”; with an abrupt fall from high to low, “scold.” A fifth tone (called “neutral”) is spoken on a weakly stressed syllable that takes its pitch from the preceding one, for questions. Tonal languages don’t use precise pitches (you don’t sing a syllable on an E-flat to mean one thing and the same syllable on an A-sharp to mean another, which would seriously inconvenience people with amusia, a disorder that makes them tone deaf and unable to carry a tune). Instead, tonal languages use relative pitch, that is, the contrast in pitch between syllables, or within a syllable. Thai, Vietnamese, most African languages, several South American and Amerindian languages are tonal. Indeed, most languages are. English and the European languages are the exceptions.
Pirahã operates similarly to Mandarin in that it uses a basic pattern of high and low tones, and combinations of those (high dipping to low; and low rising into high), but coupled with an extraordinarily complex array of stresses on syllables (by increasing the volume on a part of a word), but also syllable lengths (drawing a vowel out, or clipping it short), so that its speakers can, through combining all these elements, dispense with the individual phonemes altogether and sing, hum, or whistle conversations. All of this makes Pirahã so confounding that no outsider (trader or missionary) had mastered it for two hundred years—until Everett, an exceptionally gifted field linguist, and his similarly talented wife, Keren, arrived among the tribe as missionaries in the 1970s and, over the course of many years, achieved fluency.
Everett had been publishing papers on Pirahã for decades, but not until his 2005 article on recursion—and its challenge to Chomsky—had the wider world taken notice. Suddenly, CNN, the
BBC, Der Spiegel magazine, and a slew of international newspapers were clamoring for an introduction to the tribe. Anthropologists, linguists, and evolutionary biologists were no less fascinated because, given the Pirahã’s rejection of change, the tribe seemed to offer a snapshot of humans at an earlier time in our collective history—perhaps back to when their speech first emerged, tens of thousands of years ago. “That’s what Dan’s work suggests,” Brent Berlin, a cognitive anthropologist at the University of Georgia, said. “The plausible scenarios that we can imagine are ones that would suggest that early language looks something like the kind of thing that Pirahã looks like now.”51
To skeptics, Everett extended an invitation to visit the Pirahã, and test his assertions. First to accept was Tecumseh Fitch, a coauthor with Chomsky of the controversial 2002 article on recursion. With the exception of the seventy-seven-year-old Chomsky himself, Fitch was the ideal representative of the Chomsky “side.” I became the sole journalistic eyewitness to this historic, linguistic showdown-in-the-jungle when, for an article in The New Yorker, Everett invited me to join him and Fitch in the Pirahã village. What played out in the six days and nights that we spent in the Amazon would not definitively settle the debate over whether the Pirahã are capable of recursive speech, but it offered—at least, for me—an unanticipated insight into the human voice, one that raised still deeper questions about the most fundamental principle upon which Chomsky built his theory of language—and about where language came from in the first place.