The First Word: The Search for the Origins of Language Page 18 Read online free by Christine Kenneally

Home > Other > The First Word: The Search for the Origins of Language > Page 18

The First Word: The Search for the Origins of Language Page 18

The human facility for perceiving speech begins very young: small babies have been shown to prefer the sounds of speech to nonspeech sounds. It is a fascinating paradox that humans can hear only up to fifteen different nonspeech sounds per second, and beyond this they hear unremitting noise. Yet when they decode speech, they hear twenty to thirty distinct sounds per second. Somehow human speakers can pack, and in turn unpack, almost twice as many sounds if those sounds consist of consonants and vowels that are the components of the language they speak.

Humans also have a remarkable ability to calibrate the way that speakers’ voices occupy many different spots within the range of possible pitch. Children’s voices are typically the highest, women’s are in the middle of the range, and men can have very deep timbre.9 This means that even though they are all speaking the same language, the formant frequencies of any given vowel can be quite different. Nevertheless, we understand the speakers of our language to be making the same sounds.

Some researchers believe that the movements of our throats, tongues, mouths, and faces in speech are as important as the sound of speech. They hold that at some level, speech is also gesture. Indeed, our ability to perceive the speech of others is based in part on our knowledge of the motor movements we make when we produce it. It’s been demonstrated that subjects who are shown a video of someone saying “ga” that is accompanied by a recording of the sound “ba” perceive something entirely different. They will “hear” “da,” which in terms of speech production is in between the “ga” and “ba” sounds (“ba” is made with the lips, “da” is made with the tongue touching the roof of the mouth behind the teeth, and “ga” is made with the back of the tongue hitting the roof at the back of the mouth). This phenomenon is called the McGurk effect, and it demonstrates that as far as the perception of such simple sounds goes, people can be as influenced by the motor acts they see as by the sound they hear.

One of the most important strategies that human brains use to understand speech is called categorical perception. Even though we think of the sounds in our alphabet as being distinct from one another, there is a continuum between sounds like p and b, which differ only in the timing of the vocal cords’ vibrations.

Scientists who first discovered categorical perception in the 1950s found that timing is critical in the perception of sound. For example, listeners’ perception of b-p changes at the twenty-five-millisecond mark. If they hear the b-p sound and the vocal cords begin to vibrate at 10 or 20 ms, they hear a b; if the vocal cords begin to vibrate at 25 ms or higher, even though everything else about the sound is the same, they hear a p instead. It is as if a switch is thrown at the 25 ms mark. People hear only one sound or the other, not a sound that is a little like both. In the 1970s the experiment was repeated using infants as subjects, and researchers found that children make the same categorical distinction between sounds. The finding was hailed as evidence of an innate and uniquely human language trait.

That claim was made without any relevant data from animal studies, and it took only a few years to be invalidated. In 1975 two researchers repeated the infant study but used chinchillas, which also proved to have categorical perception. So even though this trait fundamentally underlies the human ability to perceive speech, it’s a much more general feature of animal auditory systems. Later experiments have shown that categorical perception also applies to nonspeech sounds.

Other important properties of human speech perception are shared by other animals. In a study conducted by Marc Hauser and colleagues, researchers found that humans aren’t the only species with the ability to identify different languages based on their characteristic rhythms. Tamarins, tiny primates that roam the forests of the Amazon basin, can distinguish between languages based on different rhythmic cues.10 This ability suggests that we probably didn’t evolve our sensitivity to linguistic rhythm for the specific purpose of understanding or producing speech, even though that is now its primary function. Instead we use a general perceptual mechanism that is shared among animals. In another study Hauser and colleagues extended the earlier findings to show that other properties of this perceptual mechanism are common to humans and tamarins. For example, neither human babies nor tamarins distinguish between languages that come from the same rhythmic class, such as English and German, or that are rhythmically similar like English and Dutch. However, they could tell the difference between rhythmically different languages like Japanese and Polish. Another property of speech perception is the ability to hear the formant frequencies that characterize different vowels. In another study, Hauser and colleagues have pointed out that some animals are able to use formant frequencies to make distinctions between sounds and that other species perceive formants in their own species’ vocalizations.11

Many questions remain about the animal perception of speech. There is no evidence that animals either have or could be trained to develop the ability to parse out the vast number of words in the semicontinuous speech stream of human conversation. Still, we have yet to explain the very basic fact that animals like the Border collie Rico, the African gray parrot Alex, and the bonobo Kanzi clearly have some capacity for perceiving and understanding words within a semicontinuous speech stream. These animals appear to take the speech-noise, identify distinct sounds within it, break the whole thing up into smaller meaningful units (if not as many as humans, then at least some), and derive a meaning from that. Kanzi, for example, has learned that the buzz coming out of someone’s mouth can be broken up into recognizable units (“throw,” “ball,” “water”) that can be combined to create larger meaningful units (“Throw the ball in the water”).

In order to accurately determine how much of speech perception is shared by humans and animals, researchers must eventually explain how these creatures adjust to different speakers in the same way that humans do and, even if one person’s p is different from another’s, still make sense of the word, no matter who is saying it.

Of course, humans do a lot more perceptually than simply pulling a few words out of a larger set of vocalizations. We parse the speech stream exhaustively, and we do it in real time, picking out sounds that are jammed many to a second. We identify the words they create and at the same time the sentences they create. “Speech flows together like this” actually sounds more like “Speechflowstogetherlikethis,” and yet we effortlessly work out where one word has ended and another has begun in real time.

Researchers like Marc Hauser and Tecumseh Fitch believe that the claims for human uniqueness have been proven wrong so often in the perceptual domain that people should no longer make default assumptions about any special human ability. In their view, it is reasonable to believe that the hearing part of language is completely shared with many other animals. But others are more skeptical.

Speech perception is such a complicated task, Steven Pinker pointed out, that even speech recognition systems on today’s modern computers require that you talk to them with exaggerated breaks between words unless they are trained on a specific person’s voice. “Understanding connected speech from a variety of speakers is a remarkable ability,” he said, “one that artificial intelligence researchers have had enormous difficulty duplicating in computers. It certainly has not been shown that other animals are capable of processing continuous speech. It would be very hard to test, because they don’t have the language that continuous speech is converted into. The fact is that we don’t know that they can do it, and I’d be very skeptical if they can.”

9. You have structure

Although many components of language have some kind of analog in animal communication, our close relatives typically lack highly structured signals. Of course birdsong can be complexly patterned, but ape and monkey communication seems to consist mostly of unanalyzable cries. Human language involves two types of structures. In the first, elements from a finite set of meaningless sounds are combined into meaningful words and parts of words, known as morphemes. Linguists call this phonology. The rules of phonology cov
er intonation and rhythm as well as the way specific sounds can be combined. The rules of sound apply at the smallest scale, between two single sounds that occur side by side, and over vast tracts of speech—from single sentences that either rise or fall depending on whether they are questions, to lengthier statements that end on a falling intonation. All these rules change depending on the language that is spoken.

In the second type of structure, words and morphemes are combined into phrases. This is what linguists call syntax. In 1960 the linguist Charles Hockett said that the relationship between the two types of combinatory rules was one of the major design features of human language; he called it “duality of patterning.”

Inevitably, both kinds of structure have been found to be not restricted to humans. Elements of phonology operate not just in birdsong but in the songs of whales. Phrases in these songs recur and are used again. In one early experiment Marc Hauser and a colleague demonstrated that vervet monkeys use a fall in pitch to mark the end of an utterance and that other vervets seem to interpret this as a signal to take a turn in vocalizing, like humans do. Tecumseh Fitch suggests there may be other elements of sound rules that animals share. Rhythm is an important element of human language, and Fitch points to the rhythm in the dominance displays of chimpanzees and gorillas as a possible precursor for this ability in humans. Gorillas put on impressive performances of vocalizing and rhythmic chest beating, and while this behavior has been little studied, it might provide a clue to the origins of rhythm in humans. Still, chimpanzees do not speak, and neither do they dance. If important analogs for this aspect of language exist in other animals, there are also important distinctions. Not only does other animal vocal communication not have the range of distinct sounds of human language, it doesn’t appear to employ anything like the number and range of rules that we have for combining speech sounds.

Interestingly, it’s been pointed out that the rules of phonology contradict Chomsky’s notion of the poverty of stimulus—the idea that there is not enough information in the language a child hears for it to learn language. Philip Carr, a phonologist at the University of Montpellier in France, says there is abundant evidence of the rules of phonology in the speech that children hear. The “data are more than complete,” he wrote. Neonates, according to Carr, have access to more information than they need to understand the sound system of their language.1

Of the two types of structures, syntax has been the more hotly contested in the language evolution debate. At its most basic, syntax is a series of rules for combining words in a meaningful way. All the words in the following sentence make perfect sense by themselves, but because the way they are lined up defies the syntax of English, there is no larger meaning: the the are up way they meaning lined there no syntax English is defies larger of. Until very recently it was believed only we could understand or deploy any of the structural devices found in human syntax, but Kanzi showed that this is not entirely the case. He is able to learn and apply some rules to structure the symbols with which he communicates. In addition, Klaus Zuberbühler has also established that rudimentary syntax can occur in the natural cries of monkeys in the wild.

Different types of syntax have been observed in the communication of a number of primate species. The black-and-white colobus, the titi monkey, the male gibbon, the chimpanzee, and the wedge-capped capuchin monkey have combinations of calls in their repertoire of cries. The black-and-white colobus uses a snort as an alarm call, but also places it before a roar, a combination that is used to help groups of these monkeys keep their distance from one another. The titi monkey combines several different calls into various combinations, and the response of its listeners shows that they distinguish between the different ordering of the sounds. Gibbons arrange a series of sounds into structured vocalizations, and the same is true of capuchins. In the case of gibbons, when the animal’s song is arranged in a normal order, the listening gibbons squeak in response.

Zuberbühler wanted to know whether an obvious change of meaning resulted from the way that elements of the calls were ordered. He started with the Campbell’s monkey in the Taï Forest of the Côte d’Ivoire. Like vervets, these animals employ different kinds of alarm calls, with one distinctive cry to warn of crowned-hawk eagles and another for leopards. They also use an interesting combination cry, in which one of the alarm calls is preceded by a boom sound. Boom-plus-alarm combinations appear to indicate a lesser threat, and are used in a situation that calls for a response to the alarm cries of a distant group, the detection of a far-off predator, or less direct dangers like falling trees or breaking branches.2

Zuberbühler had shown in earlier experiments that Diana monkeys respond to the cries of other species. Even though the calls of the Diana monkeys are very different from those of the Campbell’s monkey, the Diana monkeys, who live closely with the Campbell’s monkeys, appear to both understand and use their alarm cries to protect themselves. For example, if it hears a Campbell’s monkey make an alarm call for an eagle, a Diana monkey will make its own distinct eagle alarm cry. In the syntax experiment, Zuberbühler played a series of Campbell’s monkey alarm calls to a group of Diana monkeys. The recordings consisted either of Campbell’s monkey alarm calls or the Campbell’s monkey phrase, boom-plus-alarm. (In order to run the experiment, Zuberbühler had to use a great deal of stealth, approaching the monkeys without detection; otherwise he would have just provoked a series of human-induced alarm calls.) Zuberbühler confirmed that the Diana monkeys responded to Campbell’s monkey alarm cries with alarm cries of their own. If he played an eagle alarm call, they’d respond with their own eagle alarm call; if he played the leopard alarm call, they would start making leopard alarm calls themselves. If Zuberbühler played a boom and then one of the Campbell’s monkey alarm cries, the Diana monkeys wouldn’t respond with one of their own alarms—indicating that they understood the nondirect nature of the threat.

Zuberbühler likens the boom to qualifiers in our own language, such as “maybe” and “kind of.” His study, he says, suggests that primates have some naturally occurring syntactic abilities, and also suggests that projects in which animals are trained by humans to use syntax are tapping into abilities that occur naturally in these species.

In a more recent experiment, Zuberbühler and Kate Arnold showed that male putty-nosed monkeys combine two basic calls to add meaning to a message. Typically, these monkeys produce a pyow sound in various situations, most often as an alarm in response to the sighting of a leopard. They also make a hack sound when an eagle has been seen. Zuberbühler and Arnold discovered that male putty-nosed monkeys also make a pyow-hack sound, a combination call that signals that either a leopard or an eagle has been seen. The difference in response is that shortly after a pyow-hack is made, the whole monkey troop will move location, suggesting that it has the additional message of “Move!”

Gibbons structure units of sound to create meaning, but their vocalizations are quite different from those of most other primates; they produce complex songs, communicating over distances up to one kilometer away. Typically gibbons form monogamous pairs, and every morning mated pairs sing a duet that pronounces their bond to neighboring apes.

Zuberbühler and colleagues recorded white-handed gibbons at Khao Yai National Park, Thailand, and found that the gibbons use their songs to repel predators as well as to perform duets. The duets and the predator songs used the same notes (“wa,” “hoo,” “leaning wa,” “oo,” “sharp wow,” “waoo,” and “other”), but they systematically differed in how they were arranged. At the beginning of a song, there were fewer “leaning wa” notes and significantly more “hoo” notes if a predator had been sighted. In addition, predator songs had more “sharp wows” in them and were longer overall than duets. Male-and female-specific parts of the songs also differed depending on the referent. While female-specific parts came later in the predator song, the males replied earlier to females in these songs than in the duets. As with the other Zuberbühler experiments, it was
also found that the structured utterances were meaningful to neighboring animals. Nearby gibbons responded differently to the two kinds of songs.

The scientists don’t view the gibbon songs as sentences created with syntactic rules about word order. There is no context in which to determine whether notes have smaller discrete meanings, like words, which build a larger meaning when combined in different ways. What is important about the gibbon utterances is that they use combinatorial rules to functionally refer to different things. The same set of sounds has two different meanings when ordered in different ways.

The simple structural rules that these primates use in the wild contradict the idea that creating meaning with structure is a special human ability. Though there remains a wide gulf between what we do with structure and what other animals do, at least some elements of our ability seem to be graded. Robert Seyfarth and Dorothy Cheney, the researchers who pioneered the vervet monkey work, suggest that more evidence for an evolutionary precursor to human syntax may be found somewhere other than in the vocal domain.

After their vervet work, Seyfarth and Cheney began to study a baboon group in the Okavango Delta of Botswana. Baboons—Old World monkeys—typically live in stable groups of 50 to 150 animals. They have a small and limited set of calls, which are largely innate, and they have no call combinations.3 There are 80 to 90 baboons in the Seyfarth-Cheney group, and every day since 1992 someone has observed the animals. By now, Seyfarth, Cheney, and their colleagues recognize all the animals individually. The rules of baboon society, said Seyfarth, are similar to those of Jane Austen’s: be nice to your relatives, and get in with the high-ranking family. For the researchers this extended period of observation has been like watching a long-running soap opera.

‹ Prev Next ›