This Is the Voice
Page 7
Kids seem to stall out in the two-word stage for a long time—often up to a year. They might add a word or two to the speech stream (“Lucy hit doll” or “Go out play!”), but their speech remains telegraphic, choppy. Then, around their third birthday, they magically start speaking in fluent multi-word sentences. Again, Chomsky saw this as proof that language must be inborn, and Steven Pinker, in The Language Instinct (a popularizing explanation of Chomsky’s theory), reinforced this idea in his description of a boy named Adam who, at age two, is limited to utterances like “Play checkers” and “Big drum,” but who, suddenly, at age three, starts saying things like, “So it can’t be cleaned?” and “Can I put my head in the mailbox so the mailman can know where I are and put me in the mailbox?”57 Pinker calls Adam’s “explosive” linguistic near-mastery evidence that grammar must be preinstalled before birth.
But as we’ve noted, Adam has, over the previous three years, been enrolled in as intensive a language immersion course as can be imagined: all day, every day, with no time off on weekends and no breaks except for naps. You could earn an undergraduate degree in astrophysics in that time—and you, as an adult, don’t learn with anything like the speed of an infant. That Adam’s linguistic aptitude appears sudden is because the bulk of his learning took place before he could do much more than coo or babble.58
* * *
For Chomsky and his acolytes, the definitive evidence that our brains come preinstalled with syntax and grammar is the sheer complexity of three-year-old Adam’s speech—a complexity, Pinker says, that “could not have been taught.”59 He furnishes a famous example from Chomsky, who analyzed how small children form a question from the statement, “A unicorn is in the garden.” Even a three-year-old will know that to make a question you shift the verb (“is”) to the front (“Is a unicorn in the garden?”). But Chomsky warned against assuming that children do this thanks to a rule learned from hearing adults speak—a rule such as: To form questions, scan the sentence for the verb and slide it to the front. Because if you apply that rule to a more complex sentence like “A unicorn that is eating a flower is in the garden,” you produce nonsense: “Is a unicorn that eating a flower is in the garden?”
Chomsky correctly pointed out that children never make this error. With the more complex sentence, they somehow know to skip over the first “is” and shift the second “is” to the front—proof, in Chomsky’s eyes, that very small children innately understand the “deep structures” of Universal Grammar—language not as a string of words, but as a set of interlocking chunks made up of verb phrases like “is eating a flower,” and noun phrases like “a unicorn that is in the garden.” Matt Ridley, an award-winning science writer, also used Chomsky’s “unicorn” example in his bestseller Genome to argue (like Pinker) that syntax must be inborn because moving the second “is” to the front of the sentence indicates a knowledge of noun and verb phrases that “could not be inferred from the examples of everyday speech without great difficulty.”60
I beg to differ. It is precisely in “everyday speech” that children hear and learn the distinct noun and verb phrases in sentences like “A unicorn that is eating a flower is in the garden.” Those phrases are distinguished by dramatic changes in pitch and rhythm across the utterance, by the melodic changes we use to help people follow what we’re saying. Not only do we lower our pitch after “unicorn,” to sonically tuck one phrase (“that is in the garden”) into the other, we also slightly increase the speed of articulation for the embedded chunk, so we don’t put undue demands on the listener’s short-term memory; we want her to remember that we’re talking about the unicorn mentioned in the earlier part of the sentence. If you doubt the rich melodic and rhythmic elasticity of the utterance, try saying it on a single note and at a steady unchanging pace. “A. Unicorn. That. Is. Eating. A. Flower.…” You will sound ridiculous, as if trying to imitate a robot.
For three-year-olds who have been making exceptionally subtle discriminations in the voice’s fundamental frequency since the womb (and for several years after birth), detecting the significant drop in pitch across the embedded phrase is achieved not, as Ridley states, with “great difficulty”; hearing that change is (pun intended) child’s play. Which is why toddlers would never say the discordant, melodically impossible, and rhythmically ungainly utterance: “Is a unicorn that in the garden is eating a flower?” It sounds wrong. Why? Because the human voice speaking—the music of speech—teaches infants how sentences are put together, where sentences properly begin and end, how noun and verb phrases are embedded, and how they can be shifted around to maintain the satisfying, songlike, melodic, and rhythmic resolutions that characterize all human speech.
In the book This Is Your Brain on Music, musician and neurophysiologist Daniel Levitin describes what happens when melodies resolve; that is, when a tune satisfyingly arrives back at the note that established the melody’s key. This actually causes the release of dopamine in the brain, giving the listener a pleasurable sensation of reward.61 Babies have been gleaning those pleasurable rewards, from spoken language, since they were in the womb, and they use those rewards to learn the underlying grammar (Chomsky calls them syntactic structures) of the language they hear spoken all around them.
This was demonstrated in elegant experiments using four month old babies. Temple University linguist Kathy Hirsh-Pasek played speech samples through speakers placed on either side of a baby. Infants indicate their interest and attraction to particular sounds by turning toward the speaker broadcasting the sample. Hirsh-Pasek first played an audiotape of a woman reading a storybook passage:
Once upon a time, a lady and a witch lived in a big house. The house was very old and messy. It had a big garden and six windows in the front.
As four-month-olds, the babies couldn’t understand a word, but hearing speech that conformed to the prosodic patterns they had become accustomed to since the womb, they turned to face the speaker broadcasting the story, clearly delighting in the dopamine rush of all those nicely resolving vocal beats and melodies. But Hirsh-Pasek also created a doctored version of the same tape, in which she inserted one-second pauses in random places:
Once upon a time, a lady [one-second pause] and a witch lived in a big house. The house was [one-second pause] very old and messy. It had a big garden and [one-second pause] six windows in the front.
The babies showed their confusion and distaste by turning away from the speaker.62 Interrupting a melody mid-arc, and then resuming at the unnatural pitch of a voice in midsentence, created discordances of timing and tune that the babies could not tolerate—strong proof that long before they could ever dream of assembling a complex sentence filled with properly interlocking noun and verb phrases, a baby has learned, through close attention to the voice, how those “syntactic structures” work: because of how language sounds.
Speech scientists call these grammar-defining vocal melodies linguistic prosody (in contrast to emotional prosody) and it is controlled, processed (and perceived) by brain areas quite distinct from the left-brain “language areas”—indeed, in the opposite hemisphere, in those parts of the right side of the brain associated with the making of, and appreciation for, music: pitch, rhythm, pace. We know this because stroke patients who experience damage to the right side of the brain often speak in a monotone—and are incapable of hearing the linguistic prosody in others’ speech. The drastic limits that this places on such patients’ ability to understand, and be understood, leaves no doubt of the crucial role played by the music to which we set the lyrics of everything we say. (Even, as we saw earlier, as mundane a two-syllable utterance as “Hello.”)
* * *
While recent science has provided startling insights into how we, as individuals, acquire language in infancy, debate rages over how that skill was acquired by our species as a whole. To believers in the Great Leap Forward, our speech is a behavior discontinuous with our animal lineage and thus represents the most abrupt, and dramatic, of behavioral breaks fr
om our nonhuman past. To others, like Darwin, our speech is continuous and grew out of the vocal noises of our evolutionary ancestors, as a refinement of the emotional vocalizations that every mammal and bird uses to drive off enemies and woo mates—a view with the widest possible implications for our species, since it establishes the voice as the most conspicuous borderline between our nonhuman and human identities, as the single biological endowment where both aspects of our nature are fused in a single act: an acoustic signal that at once embodies our most extraordinary and defining attribute (speech), yet delivered by a mechanism we share with every bird and mammal, no matter how “bestial” or primitive. Which has had (to put it mildly) big repercussions not only for how our voice affects our private, personal relations with friends and family, but for how the voices of public figures guide the fates of societies and civilizations.
My allegiance is to the latter, Darwinian, view. To understand why, we must go back in time. Back to when the voice began.
TWO ORIGINS
The voice of every animal (including birds, dogs, lions, sheep, seals, frogs, cats, chimpanzees, mice, us) shares at least two traits in common: they are sounds powered by the lungs and emitted through the mouth; and every voice (barks, whinnies, whines, chirps, squeals, meows, ribbits, roars, the State of the Union address) derives from a common ancestor, an animal we don’t ordinarily associate with voice: fish.
To understand how this could possibly be so, we must travel to a time around 530 million years ago, when the first fish evolved. Like their living descendants, these ancient fish sustained life by extracting oxygen from the water and expelling CO2 with a specialized membrane that lines the inside of the throat: gills. Some of these primordial fish, however, evolved in shallow lakes or swamps and during droughts would become stranded on land. Many suffocated to death, but at least one was lucky enough to undergo one of those random mutations that drive natural selection. In this case, a possible copying error in one of the genes responsible for building gills, rendering the subtly altered membrane capable of pulling a little oxygen from the air—a tiny sip that kept the landlocked fish alive long enough, not only to survive the dry spell, but to mate and pass along the mutated gill gene and the tiny survival advantage it conferred to its offspring. Over hundreds of thousands of years, and many other random mutations that improved the animal’s ability to survive on land, a new species evolved in these swampy, shallow-water areas, a transitional, hybrid animal that possessed both water-breathing gills and rudimentary air-breathing lungs, which had formed from the hollow swim bladders it used for flotation. These creatures are known as lungfish, and they are our oldest air-breathing, land-dwelling relatives.
They can still be found in the swamps of South America, Africa, and Australia: remarkable animals so little changed from their ancient ancestors that they are known as “living fossils.” Darwin, in the Origin of Species, used the lungfish to illustrate a central concept of evolution by natural selection: namely, that “an organ originally constructed for one purpose” (the swim bladder, for flotation) “may be converted into one for a wholly different purpose” (the lung, for respiration).1 Thus did Darwin document a crucial milestone in the origin of what would become our human voice: the emergence of the air-propelling bellows that powers our speech and song. It was, however, up to another scientist, writing seven decades later, to reveal how another key adaption in the lungfish gave rise to the voice.
Victor Negus was a thirty-four-year-old First World War veteran, in 1921, when he began a residency in throat surgery at King’s College Hospital in London. There, he undertook a research project on “the production of voice in animals and man.”2 The planned two-year thesis stretched to nine years, as Negus dissected an ever-growing menagerie that included fish, lizards, frogs, birds, and various mammals. From seeking to learn how the voice is produced, Negus found himself on a quest for where it had come from. The resulting five-hundred-page treatise, The Mechanism of the Larynx (1929), became the most important reference work on the subject for the next half century and the basis for his eventual knighthood. As Negus showed, our voice starts with the lungfish.
He introduces the animal on page three, where he describes the dissections he performed on an Australian species called the Lepidosiren. He notes how a hole had developed in the gills that separated the throat from the digestive tract, creating an opening in the bottom of the mouth that led into the swim bladder, whose lining had thinned to the point where oxygen could diffuse through the membrane into the blood vessels beneath: the primitive air-breathing lung. No more suffocating when landlocked. But the hole in the animal’s throat also left the creature vulnerable to drowning when it returned to its aquatic existence. “Therefore,” Negus wrote, “it became imperative that only air, and not water or other harmful substances, should enter [the lung]. With this object in view, a valve was evolved to guard the entrance to the pulmonary outgrowth.”3
As the image of my throat projected on Dr. Woo’s computer screen makes clear, our vocal cords are an inheritance from these ancient fish—a valve that opens and closes over the opening to our windpipe and that we hold in the open position to allow air to pass to and from our lungs (as we breathe), but which we snap closed over the windpipe when “water or other harmful substances” threaten to enter our lungs and choke us to death—or when we wish to make voice sounds. Air pushed up from the lungs encounters the barrier of the closed vocal valve which makes the membranes flutter and flap against each other in the same way that your lightly sealed lips flap noisily against each other when you blow a Bronx cheer.
The term vocal “cords” is thus a misnomer. It dates from the mid-eighteenth century when the French anatomist Antoine Ferrein, studying the larynges of animal and human cadavers, compared the membranes to “violin strings” (cordes) that vibrate under the “bowing action” of the breath.4 Our vocal cords don’t produce sound that way. As a valve set in motion by air from the lungs, the vocal cords actually chop the airstream into rapid pulses that produce a sound wave. This distinction helps explain the vocal injury that Adele, Julie Andrews, and I sustained (singing a high, loud note, we can clash the vocal cords together up to a thousand times a second). It also has relevance for the fate of our species—since the raw sound source of the air-chopping vocal cords creates a particularly rich and buzzy sound wave with lots of overtones—those higher frequencies that we filter with movements of the tongue and lips to make the vowel formants that distinguish head, hid, hood, had, and so on, and that endow us with the ability to produce clear and intelligible speech.
For the lungfish, vocal sounds are about as far from articulated speech as can be imagined: squeezing air through the simple sphincter of its throat valve, the animal can produce only an array of grunts, squeaks, hisses, and belches (I am trying to avoid the word “farts” but I’m afraid those were the first vocal sounds heard on Earth). Over millennia, however, the larynx underwent drastic physical changes as the lungfish’s land-dwelling descendants (first amphibians, then reptiles, and finally mammals) refined the vocal valve’s efficiency for breathing—and for the production of voice sounds so useful for survival and reproduction. These changes included the addition of a system of movable cartilages to which the ends of the vocal membranes attach, and a complex musculature that permits a stretching or slackening of the vocal cords to vary vocal pitch (pull them taut and they chop the air faster, raising the pitch; loosen them and they chop the air slower, giving a lower note), and to stiffen the membranes to create sounds like a growl.
Changes in the voice’s volume, from loud to soft, are controlled by the speed and force with which we drive the air up through our vocal cords—a refinement of the respiratory system that appeared in mammals, around 220 million years ago, with the emergence of the diaphragm, a thick muscle that attaches to the bottom of the rib cage and that divides the trunk into an upper chamber, housing the lungs and heart, and a lower chamber, housing the stomach and intestines. Movements of the diaphragm u
p and down control the action of inhaling and exhaling, and fine motor control of the speed and force with which the diaphragm propels air from the lungs is how we and other mammals vary the loudness and softness of vocalizations. But it was yet another evolutionary enhancement unique to mammals that would have the most decisive impact on our human voice. Mammals are defined as a species by their habit of feeding their babies with milk from the mother’s mammary glands (the term “mammal” is from the Latin mammalis, “of the breast”). Our baby mammal ancestors, by affixing their lips to a teat and performing a suite of complex sucking and swallowing maneuvers, developed the throat, mouth, tongue, and facial muscles that our species would learn to coordinate for articulated speech.
Most mammals, then, possess all the vocal apparatus necessary for talking. Indeed, a chimp’s lips, tongue, velum, lungs, and larynx are virtually indistinguishable in structure and function from ours. Likewise, the rest of its anatomy, from its frontally placed eyes, opposable thumbs, two symmetrical nipples, and shortened snout. So anatomically similar to us are apes that the eighteenth-century Swedish naturalist Carl Linnaeus classified humans and all simian species in the same mammalian order, which he called “primate” (from the Latin primus for “first rank”). Working a century before Darwin, Linnaeus drew no evolutionary connection between apes and us. He was going purely on anatomical similarities. It was only because of concerns expressed by the church (which said that man was created in the image of God, and apes, by logical inference, weren’t) that Linnaeus eventually invented a separate primate classification that lifted us above the rest of the animal kingdom in a genus he called Homo (Latin for “Man”) and a species he called sapiens (“wise”). However, Linnaeus continued (privately) to insist, in letters to biologist friends: “I know scarcely one feature by which man can be distinguished from apes.”5 Well, there was one clear distinction—although it was, he said, a behavioral rather than an anatomical difference.