The rate of the vocal cords’ vibration is called the fundamental frequency, an important component of speech. Perhaps the most significant aspects of the sound we make are the formant frequencies, the set of frequencies created by the entire shape of the vocal tract. When you whisper, your vocal cords don’t vibrate and there is no fundamental frequency, but people can still understand you because of the formant frequencies in the sound.
Overall, the variations in loudness, pitch, and length in speech that we think of as the intonation of an utterance help structure the speech signal while also contributing to its meaning. Prosody, the rise and fall of pitch and loudness, can be emotional, it can signal contrast, and can help distinguish objects in time and space (“I meant this one, not that one”). Prosodic meaning can be holistic, like gesture. It can signal to the listener what a speaker thinks about what he is saying—how sure he is of something, whether it makes him them sad or happy. When people make errors in what they are saying, they can use intonation to guide listeners to the right interpretation. Prosody can also mark structural boundaries in speech. At the end of a clause or phrase, speakers will typically lengthen the final stressed syllable, insert a pause, or produce a particular pitch movement.
Even though we hear one discrete word after another when listening to a speaker, there’s no real silence between the words in any given utterance, so comprehension needs to happen quickly. Whatever silence does fall between words does so mostly as a matter of coincidence—as a rule, when sounds like k and p are made (like at the beginning and end of “cup”). These consonants are uttered by completely, if briefly, blocking the air flowing from your lungs. (Make a k sound, but don’t release it, and then try to breathe.) So while a sentence like “Do you want a cup of decaffeinated coffee?” may be written with lots of white space to signify word breaks, the small silences within the sound stream don’t necessarily correspond to the points in between words.
The beginning of speech is found in the babbling of babies. At about five months children start to make their first speech sounds. Researchers say that when babies babble, they produce all the possible sounds of all human languages, randomly generating phonemes from Japanese to English to Swahili. As children learn the language of their parents, they narrow their sound repertoire to fit the model to which they are exposed. They begin to produce not just the sound of their native language but also its classic intonation patterns. Children lose their polymath talents so effectively that they ultimately become unable to produce some language sounds. (Think about the difficulty Japanese speakers have pronouncing English l and r.)
While very few studies have been conducted on babbling in humans, SETI (Search for Extraterrestrial Intelligence) Institute researcher Laurance Doyle and biologist Brenda McCowan and colleagues discovered that dolphin infants also pass through a babbling phase. (In 2006 German researchers announced that baby bats babble as well.) In the dolphin investigation Doyle and McCowan used two mathematical tools known as Zipf’s law and entropy. Zipf’s law was first developed by the linguist George Zipf in the 1940s. Zipf got his graduate students to count how often particular letters appeared in different texts, like Ulysses, and plotted the frequency of each letter in descending order on a log scale. He found that the slope he had plotted had a–1 gradient. He went on to discover that most human languages, whether written or spoken, had approximately the same slope of–1. Zipf also established that completely disordered sets of symbols produce a slope of 0. This meant there was no complexity in that particular text because all elements occurred more or less equally. Zipf applied the tool to babies’ babbling, and the resulting slope was closer to the horizontal, as it should be if infants run randomly through a large set of sounds in which there is little, if any, structure.
When Doyle and McCowan applied Zipf’s law to dolphin communication, they discovered that, like human language, it had a slope of–1. A dolphin’s signal was not a random collection of different sounds, but instead had structure and complexity. (Doyle and his colleagues also applied Zipf’s law to the signals produced by squirrel monkeys, whose slope was not as steep as the one for humans and dolphins (–0.6), suggesting they have a less complex form of vocalization.2 Moreover, the slope of baby dolphins’ vocalizations looked exactly like that of babbling infants, suggesting that the dolphins were practicing the sounds of their species, much as humans do, before they began to structure them in ordered ways.
The scientists also measured the entropy of dolphin communication. The application of entropy to information was developed by Claude Shannon, who used it to determine the effectiveness of phone signals, by calculating how much information was actually passing through a given phone wire. Entropy can be measured regardless of what is being communicated because instead of gauging meaning, it computes the information content of a signal. The more complex a signal is, the more information it can carry. Entropy can indicate the complexity of a signal like speech or whistling, even if the person measuring the signal doesn’t know what it means. In fact, SETI plans to use entropy to evaluate signals from outer space: if we ever receive an intergalactic message and can’t decode its meaning, we can apply entropy to give us an idea about the intelligence of the beings that transmitted the signal even if we can’t decode the message itself.
The entropy level indicates the complexity of a signal, or how much information it might hold, such as the frequency of elements within the signal and the ability to make a prediction about what will come next in the signal, based on what has come before. Human languages are approximately ninth-order entropy, which means that if you had a nine-word (or shorter) sequence from, say, English, you would have a chance of guessing what might come next. If the sequence is ten words or more, you’ll have no chance of guessing the next word correctly. The simplest forms of communication have first-order entropy.3 Squirrel monkeys have second-or third-order, and dolphins measure higher, around fourth-order. They may be even higher, but to establish that, we would need more data. Doyle plans to record a number of additional species, including various birds and humpback whales.
Many of the researchers interviewed for this book would stop in the middle of a conversation to illustrate a point, whether it concerned the music of protolanguage or the way that whales have a kind of syntax, by imitating the precise sound they were discussing. Tecumseh Fitch sat at a restaurant table making singsong da-da da-da da-DA sounds. Katy Payne, the elephant researcher, whined, keened, and grunted like a humpback whale in a small office at Cornell. Michael Arbib, the neuroscientist, stopped to purse his lips and make sucking sounds. In a memorable radio interview, listeners heard the diminutive Jane Goodall hoot like a chimpanzee.
As well as demonstrating the point at hand, the researchers’ performances illustrated on another level one of the fundamental platforms of language—vocal imitation. Imitation is as crucial to the acquisition of speech as it is to learning gesture (another way in which these systems look like flip sides of the same coin). Humans are among the best vocal imitators in the animal world, and this is one area in which we are unique in our genetic neck of the woods. Even though chimpanzees do a great job of passing on gestural traditions and tool use in their various groups, they don’t appear to engage in a lot of imitation of one another’s cries and screeches. Orangutans must have some degree of imitation in the vocal domain, otherwise they couldn’t have developed the “goodnight kiss” tradition. But humans have taken the rudiments of this ability and become virtuosos.
It would appear that this skill has become fully developed in our species over the last six million years, since we split from our common ancestor with chimpanzees and bonobos. From the babbling stage on we start to repeat simple vowels and consonants, like “mamamamamama,” advancing to whole words, sentences, longer tracts, all the while using rhythms, pitch, and loudness. Still, like many of our other abilities, this one is built on a platform that stretches back a long way in evolutionary time.
Vocal learning is one of the
reasons that Fitch believes the field of language evolution is worth pursuing. “Where you get any kind of open-ended learning, you have the ability to pair signals with meaning. And we didn’t have to evolve that, because our common ancestor with other primates already evolved it. What we don’t have in a chimp or any other ape is vocal learning—the ability to generate new signals. Dogs, for example, aren’t able to invent new barks.”
Some other animals are also exceptional at vocal imitation, whether it involves imitating a human or a member of their own species. Songbirds are not born with genetic programs from which their songs arise. Instead, in the same way that we are born with a predisposition to produce the sounds of language, the specifics of which we still must learn, they need to be exposed to the songs of their species in order to acquire them.4
African gray parrots, Alex’s species, as well as other types of parrots, are well known for their excellence in imitating human words. Some animals seem to entertain themselves by imitating the sounds of inanimate objects. Mockingbirds have been heard imitating sounds like car alarms and mobile phones, and elephants in Kenya have been recorded making almost perfect reproductions of the sound of trucks from a road nearby. Whales are very good at vocal learning. Each mating season, the males come together to sing, riffing on the songs of the previous season and producing something new from them. Dolphins are as talented at vocal imitation as they are at gestural imitation. As Lori Marino explained, “They seem to be able to imitate a number of different dimensions of a behavior. They can imitate the physical dimension, but also the temporal dimension. They can imitate rhythms. For instance, you can give them a series of tones, and they’ll be able to imitate the rhythm of that series of tones. So if you give them ENH-ENH, ENH-ENH-ENH, ENH-ENH, they’ll give you ENH-ENH, ENH-ENH-ENH, ENH-ENH.”
There have been odd, one-off cases of individual animals showing exceptional imitative talents. Fitch is fascinated by the story of Hoover, a harbor seal at the New England Aquarium that was raised by a Maine fisherman. Hoover surprised visitors by saying, “Hey, hey, you, get outta there!” Hoover didn’t “talk” until he reached sexual maturity, but once he started, he improved over the years. He spoke only at certain times of the year (not as much in the mating season) and would reputedly adopt a strange position in order to do so. He didn’t move his mouth. In The Symbolic Species, Terrence Deacon recounts stumbling across Hoover while walking near the aquarium one evening. He thought a guard was yelling at him (“Hey! Hey! Get outta there!”). Deacon reports that Hoover died unexpectedly of an infection and his body was disposed of before his brain could be examined.
“We don’t know if Hoover was a mutant or if other seals can do this,” said Fitch. “It’s not hard to train a seal to bark on command. There’s a sea lion named Guthrie at the New England Aquarium. He gets rewarded when he does something different. His barks are not very special, but they are bona fide novel vocalizations.” Fitch relates Hoover’s ability to the Celtic selkie myths, which may have originated in earlier Hoover-like accounts. “It’s not uncommon for humans to take seals into their homes,” said Fitch. “Maybe we just need to expose male seals to human speech and the right social context,” and they’ll be able to learn some speech.
What makes Hoover so interesting, according to Fitch, is that all the other animals that are excellent at vocal learning, with the possible exception of bats, use a completely different process from the ancestral vertebrate mechanism for making sound. What we use for vocal production is the same thing that a frog uses—a larynx and tongue, equipment that has been around since early vertebrates dragged themselves onto land. Birds, on the other hand, have evolved a completely novel organ—the syrinx. The toothed whales, like dolphins and killer whales, have evolved a unique organ in their nose, and we still don’t really know how other whales make sound. “It’s hard to peer down the nostril of a humpback or get them in an X-ray setup while they are singing,” observes Fitch.
Early speech researchers like Philip Lieberman proposed that one of the adaptations that humans made to produce language and speech was a descended larynx. The human larynx is a complicated assemblage of four different kinds of cartilage and the small, bent hyoid bone that sits upon them. The area above the larynx is called the upper respiratory tract. Below the larynx there are two tracts: the windpipe, which leads to the lungs, and the digestive tract, leading to the stomach. When humans swallow, the larynx essentially closes, ensuring that food or liquid doesn’t fall into our lungs. The larynx also contains the vibrating vocal cords we use in speech.
In many animals, such as other apes, the larynx sits high in the throat. In fact, for most animals the larynx is positioned so high that it’s effectively in the nasal passages, meaning that these creatures can breathe and drink at the same time. Human babies, who are born with high larynxes, can do the same, but by the time they turn three, the larynx has descended and this is no longer possible. For boys, the larynx descends a bit more in adolescence, giving their voices a more baritone timbre. Somewhere in our evolutionary history—between the present and the last common ancestor we had with chimpanzees and bonobos six million years ago—our larynx dropped, making the upper and lower respiratory tracts roughly equal in size. It is these two tubes that allow humans to make such a wide range of different vowel and consonant sounds.
For a long time researchers thought that the descended human larynx was the smoking gun of speech evolution, but the picture turns out to be more complicated than that. Most previous findings about the larynx of other animals were based on the anatomy of dead specimens, but Fitch investigated the behavior of living, vocalizing animals and discovered that the larynx is a far more mobile structure than previously thought. He found that other animals that don’t have a permanently descended larynx pull it into a lower position when they vocalize. Dogs do so, as do goats, pigs, and monkeys. In addition, Fitch discovered that some animals have a permanently descended larynx, including species as diverse as the lion and the koala. What this means, said Fitch, is that you can’t assume that the reason the larynx descended in humans was for speech; you have to be able to explain the function of the descended larynx in these other animals as well.
In his Ph. D. work Fitch demonstrated a basic correlation between body size and the deepness of voice. In the animal kingdom this correlation provides extremely useful information. If you hear a competitor wooing the female you are interested in, and you can tell from his voice alone that he is much bigger than you, slinking away without direct confrontation makes the most evolutionary sense. Fitch argues that this is how we initially came by our descended larynx, meaning that one of the fundamental elements of our ability to create speech came about not because of language but as a primitive mechanism to signal an exaggerated body size.
Other critics maintain that the descended larynx is most likely an example of evolutionary adaptation in the human lineage. Steven Pinker explained:
I think it’s premature to say that there has been no evolutionary change in speech perception and speech production mechanisms. In fact, certainly for speech production mechanisms I think the argument that there’s been no adaptation or evolutionary change is very weak. It’s based on the idea of the descent of the larynx seen in some other mammals, which did not evolve it for language, but rather for bellowing in a more macho way. So yes, it’s marginally possible that the larynx descended in humans for some reason other than language, but that theory doesn’t work for humans, because we have a descended larynx in both sexes, where exaggerating body size by bellowing more loudly is not a factor.
Fitch adds that just because the descended larynx may have come about for reasons other than speech doesn’t mean it wasn’t then co-opted—or in Darwinian terms, exapted—for speech evolution. He emphasizes the possibility of gradual evolution. “The fact remains,” he writes, “that the human larynx is unusual (though not unique) among mammals.” It’s possible, he says, that early hominids had a mobile larynx, like those of dogs
and pigs. But as they began to develop the extensive sound range of speech, it became more efficient to leave the larynx in the descended position instead of pulling it back to vocalize, as other animals do.5
The notion of a graded evolutionary descent is supported by recent findings on the larynx of chimpanzee infants, which also undergoes a process of descent. This process results from a somewhat different mechanism, accomplished by the descent of the skeleton around the chimpanzee hyoid bone rather than the descent of the hyoid bone itself. Nevertheless, it suggests that descent of the larynx in humans is unlikely to have occurred in one big, speech-related transition.6
Other features of vocal production in humans that appear to be especially attuned for language include a particular kind of muscle fiber in the vocal folds. According to Ira Sanders at the Mount Sinai School of Medicine, slow tonic muscle fibers have unique features. They don’t twitch like most muscle fibers but contract in a precise, graded fashion. Sanders examined a series of adult tongues and found that the slow tonic muscle fibers occur there in high numbers. Other mammals do not have this kind of muscle in their vocal folds.
Attempts to find fossil evidence for the key anatomical changes required for modern human speech have been mostly unsuccessful. Fitch attributes this to the fact that “the vocal tract is a mobile structure that essentially floats in the throat, suspended from the skull by elastic ligaments and muscles.” Some researchers have compared the part of the spine that affects voluntary breathing—a crucial part of speech production—in Homo sapiens, Homo ergaster, and earlier hominids. It appears that this region is significantly enlarged in modern humans as compared with earlier ancestors.7
Regardless of their other theoretical differences, most language evolution researchers agree that human speech appears to have evolved in the last six million years to meet some of our species’ unique communication needs. The most basic and obvious evidence for this is that despite concerted efforts to teach spoken language to other primates, no attempt has been successful. At most, chimpanzees have been trained to utter a few words.8 But the perception of speech is another matter.
The First Word: The Search for the Origins of Language Page 17