Book Read Free

This Is the Voice

Page 4

by John Colapinto


  * * *

  Of course, to learn a language, it is not enough simply to recognize the difference between pa and ba, or la and ra. To understand speech—and to produce it one day—babies must accomplish another exceedingly difficult feat of voice perception. Though it might seem, to us, as if we insert tiny gaps of silence between words when we speak (like the spaces between words on a printed page), that’s a perceptual illusion. All voiced language is actually an unbroken ribbon of sounds all slurring together. To learn our native tongue, we had to first cut that continuous ribbon into individual words—not easy when you’re a newborn and have no idea what any words mean. You can get an idea of what you were up against by listening to a YouTube clip of someone speaking a language you don’t know: Croatian, or Swahili, or Tagalog. Try listing ten words. You can’t do it because you can’t tell where one word ends and another begins. This is the problem you faced at birth—and, by around eight months, had solved.

  Here’s how. Despite appearances, babies, reclining in their strollers or lying in their cribs, are anything but passive receptors of the speech that resounds all around them. Indeed, even before birth—from the seventh month of gestation onward—the fetus runs a complex statistical analysis on the voices it perceives, and registers patterns. The sucking test shows that one pattern newborns detect is word stress.12 English, on average, emphasizes the first syllable of words: contact, football, hero, sentence, mommy, purple, pigeon; words that emphasize the second syllable (like surprise) are far less common. In French, it’s the reverse—a weak-strong pattern: “bonjour,” “merci,” “vitale,” “heureux.” Babies zero in on these patterns and use them to locate word boundaries. Take a mystifying sequence of speech sounds like:

  staytleeplumpbukmulaginkaymfrumtheestarehed

  An American baby will apply English’s strong-weak probability to identify the first sound clusters (staytlee) as a possible stand-alone word (STAYT-lee—or “Stately”). The next two syllables, however (plumpbuk), don’t make an English word, no matter what stress pattern you apply (PLUMP-buk; plump-BUK). To deal with that, the baby uses another type of statistical analysis. In all languages, the likelihood that one speech sound will follow another is highest within words, less likely across words. Patricia Kuhl supplies a good example from Polish, where the zb combination is common, as in the name Zbigniew.13 But in English zb occurs only across word boundaries, as in “leaveZ Blow” or “windowZ Break”—and thus crops up less frequently. Sophisticated listening tests show that eight-month-olds use these “transition probabilities” to segment the sound stream—and babies can do this after just two minutes’ exposure to a stream of unfamiliar speech sounds.

  This staggering speed of learning speaks to Darwin’s assertion, in The Descent of Man, that speech acquisition in children reveals not an instinct for language, but an instinct to learn—as in an English baby’s lightning-fast realization that the pb in plumpbuk is illegal and that it makes sense to split the speech stream there, to create the separate chunks plump and buk. Eventually, the child will use both statistical strategies to help segment the entire sequence and arrive at the first words of James Joyce’s Ulysses:

  Stately, plump Buck Mulligan came from the stairhead…

  She will accomplish this stunning feat before her first birthday, well before she has the least clue about what any of the words actually mean. But in snipping the sound ribbon into its separate parts, the baby stands a chance of figuring out how to assign meaning to each small cluster of sounds—clusters we call “words.”

  * * *

  Babies do not do all this work on their own. They receive significant help from adults, who unconsciously adopt a highly artificial vocal style when addressing them.

  Remarkably, no language expert took any formal notice of the unusual way we talk to infants until 1964, when Charles A. Ferguson, a linguist at Stanford University, published the paper “Baby Talk in Six Languages.” It catalogued the identical way parents speak to babies in a slew of widely different tongues, including Syrian Arabic, Marathi (a language of western India), and Gilyak (spoken in Outer Manchuria), as well as English and Spanish. In each instance, caregivers prune consonants (as when English parents use “tummy” rather than “stomach”) and use onomatopoeia (in English, “choo choo” for “train,” and “bow wow” for “dog”).14 Ferguson was not, however, investigating how babies learn to speak—you could even say he was doing the exact opposite. He was searching for evidence to support the theory, first advanced by linguist Noam Chomsky, that language is not learned at all, but is instead inborn, preinstalled in the brain before birth.

  Chomsky made a strong argument for this when he pointed out, in a series of now famous books and papers of the late 1950s and early ’60s,15 that parents don’t sit around systematically teaching newborns how to talk. Instead babies acquire speech from hearing only the half-mumbled, sporadic, often ungrammatical talk all around them (like the murky, overlapping conversations in a Robert Altman movie). Despite this “poverty of the stimulus,” as Chomsky called it, children by age four speak in complex multi-word sentences: forming questions from statements, embedding clauses, speaking in past and future tenses. They can accomplish this, Chomsky said, only because language already exists in the brain. “In fact,” he once said, “language development really ought to be called language growth because the language organ grows like any other body organ.”16 This figurative “language organ” cannot be dissected like a liver or heart, Chomsky said, but it can be described through analysis of the syntax common to all languages—the “deep structures” that make up what Chomsky called “Universal Grammar,” the innate rules that govern all languages (no matter how different they sound on the surface).

  Ferguson, in studying the baby talk of such vastly different cultures, was in search of Chomsky-inspired “language universals.” The first study to address how adult “baby talk” helps infants acquire language did not appear until 1971, when Catherine E. Snow, a twenty-six-year-old graduate student at McGill University, stumbled onto the topic by chance.

  Like Ferguson (and most social scientists of the day), Snow accepted Chomsky’s claim that language is innate, so when she was invited to lead a graduate seminar on language acquisition, she planned to do so from the Chomskyan perspective. In the interest of thoroughness, however, she decided to look up the evidence upon which Chomsky based his claim that infants hear mostly garbled, incomplete, stuttered, overlapping, highly degraded speech—his primary proof that language must be inborn. Snow discovered that no papers existed supporting his “poverty of the stimulus” argument. Chomsky had apparently relied on his subjective impression of what babies must hear. Snow was amazed, and aghast, later saying: “I felt somehow offended that linguists made, accepted, and uncritically propagated claims about such matters with no sense of obligation to make the relevant observations.”17

  For her PhD dissertation, Snow designed lab studies to learn what babies actually hear from caregivers. She recorded thirty women (some mothers, some not) talking to children, of various ages, on set topics. The recordings revealed that the mothers spoke entirely differently to children of one month to two years old than to children who are ten years old, and that the childless women also adopted these unique features of infant-directed speech. All the women used the simplifications Ferguson had documented (pared-back sound clusters, onomatopoeic words) and also heavy repetition of new words. “Put the red truck in the box now,” one mother told her two-year-old. “The red truck. No, the red truck. In the box. The red truck in the box.” Such systematic redundancies helped infants segment the speech stream (the sounds “red” and “truck” and “box” jump out), while the short, simple utterances, each containing just one idea, helped babies detect how sentences are constructed. “That’s a lion,” one mother told a toddler. “And the lion’s name is Leo. Leo lives in a big house. Leo goes for a walk every morning. And he always takes his cane along.” Snow concluded that, contrary to Chomsky’s c
laim, infants “do not learn language on the basis of a confusing corpus full of mistakes, garbles, and complexities. They hear, in fact, a relatively consistent, organized, simplified, and redundant set of utterances which in many ways seem quite well designed as a set of ‘language lessons.’ ”18

  Snow’s findings didn’t mean that our language capability is not to some degree inborn; clearly, we possess a biological, genetically determined capacity for speech—otherwise, we wouldn’t be able to do it. But Snow’s work provided a good and necessary corrective to the pendulum swing that had made language, under Chomsky’s model, seem entirely a result of “nature,” with no role played by “nurture”—no role played by the human voice.

  * * *

  Snow’s findings, published in 1972, sparked an explosion of follow-up studies challenging Chomsky’s view. In 1977, Olga Garnica, an assistant professor at Ohio State University, published a groundbreaking paper focused on the artificial, exaggerated prosody that caregivers automatically adopt when speaking to infants and toddlers—the high-pitched, slowed-down, singsong speech familiar to anyone who has ever heard someone talk to a baby (“Nowww… aren’t you… Keeeyyy-OOOOT?”).19 Pitch peaks on specific words, extended pauses between words, and long-drawn vowels are, Garnica said, all part of a system to help babies segment the speech stream, hear how grammar works, and detect the specific tongue and lip positions that distinguish an ee from an oo, or an ah from an uh. Stanford University linguist Anne Fernald showed that these prosodic exaggerations are adopted by parents in all cultures and languages and that every adult uses them when talking to babies (whether they’re aware of it or not).20 Furthermore, children as young as four adopt the infant-directed singsong when talking to two-year-olds—or to their dolls. We do it when speaking to our pets and when talking to foreign-accented strangers who ask us for directions in the street, strong evidence that high-pitched, singsong, slowed-down speech is an adaptive vocal mechanism that evolved in our species for teaching language.

  Physiologically, this speaking style makes sense: babies are best at detecting high-pitched sounds (not until ten years old do they acquire the low-frequency hearing typical of adults21). The artificially high pitch grabs and holds the infant’s attention and, when used along with the simplifications and repetitions described by Ferguson and Snow, it becomes part of an elaborate voice-based language-tutoring system that linguists call Motherese. Remarkably, Motherese works, Fernald showed, in a feedback loop between parent and baby: as the infant starts to speak, forming first syllables and words, the adult’s Motherese automatically adjusts itself, the raised pitch progressively lowering, the word simplifications and repetitions gradually diminishing, in inverse proportion to the child’s mastery of its native tongue.

  * * *

  Most babies are a year old before they use all the linguistic information they’ve been hoarding since the womb—and utter their first word. But they don’t, to put it mildly, spend those first twelve months in vocal silence.

  The first act that every healthy baby commits upon emerging from the womb is a cry—an expertly coordinated spasm of the diaphragm, with an exquisitely timed closure of the vocal cords across the windpipe (so that they vibrate and produce sound, an act linguists call “phonation”) and a synchronized opening of the mouth and lowering of the tongue: Waaaahhh!

  That newborns can, without a single rehearsal, perform this act of complex physical coordination upon first exposure to the air (before birth, all humans are aquatic animals) suggests that the infant cry is pure instinct, like the reflex kick of your foot when the doctor taps your knee with his hammer. And it has a clear, biological survival purpose: it ejects from the windpipe any mucous or amniotic fluid on which the baby could choke. But it also has a vital function as communication. It notifies everyone within earshot that the screamer is alive.

  In the first days and weeks of life, the baby’s cry grows more robust as the abdominal muscles and diaphragm strengthen with use, and as the baby gains greater control of its tongue and lips, instinctively shaping the resonance chambers of throat, mouth, and lips to boost the signal to a window-rattling volume (people who study to be opera singers have to relearn how to do what a baby does naturally, as we’ll see later). This sonic blast gives the otherwise helpless creature the ability to summon, from a great distance, its mother. Consequently, the infant cry has been called an “acoustical umbilical cord.”22

  It has also been called a “biological siren,” and like any siren, it was engineered (by nature) to be intensely annoying.23 A typical baby’s cry has a fundamental frequency (or pitch) around 500 cycles per second (five times that of an adult male voice) with overtones (that is, the additional audible pitches that are part of every complex vocal sound) around 1,400 and 5,700 cycles per second—very high frequencies that overload the human auditory cortex. Like nails scraping a blackboard, or the rattle of a jackhammer, the cry causes great psychological distress in those who hear it, so they must spring into action and tend to the baby’s needs, if only to alleviate the assault to their own nervous system. Thus, the paradox in the baby’s cry, as described by Debra Zeifman, a psychologist at Vassar College who specializes in mother-infant bonding: “part of [the cry’s] power to activate caregiving lies in its noxiousness, and… this very noxiousness can also evoke abusive or avoidant responses by caregivers.”24 (Parents convicted of injuring, or even killing, infants with shaken baby syndrome frequently offer as defense that, “She wouldn’t stop crying.”)

  In the late 1950s, psychiatrist Peter Ostwald, of the University of California School of Medicine, became fascinated by the baby’s cry and its uncanny similarity to the vocal acoustics of severely ill psychiatric patients. In a 1961 paper in the Archives of General Psychiatry, Ostwald isolated the universal “stress tone” in patients suffering from acute depression, schizophrenia, and psychoneurotic hypochondria.25 The voices of all these patients showed anomalies in the higher overtones centered around 500 cycles per second—the average pitch of the baby’s cry. In hysterical patients, the overtones spiked, giving the voice a “sharp” tone of complaint; in obsessional depressives, the overtones were level, giving the voice a “flat” and “irritating” quality; in brain-damaged patients, the overtones dropped in pitch across utterances, giving the voice a “hollow” and “emotionally drained” sound; in grandiose patients, the overtones rose and stayed level, resulting in a voice “characteristic of persons who spoke loudly, emphatically, and needed to be heard, to impress, and to influence others.” Ostwald saw this as acoustic confirmation of Freud’s theory that emotional disorders reflect an ur-injury from earliest childhood, a psychic wound that regresses the adult sufferer to a state of infantile need and complaint that can actually be heard in the pitch and timbre of the sufferer’s voice.

  The connection between the baby’s cry and the sound of clinical neurosis would, less than ten years after Ostwald’s study, inform a treatment pioneered by a California-based psychotherapist named Arthur Janov. During a group therapy session at Janov’s San Francisco clinic in 1967, an overwrought twenty-two-year-old male patient fell writhing to the floor and began to emit what Janov later described as “an eerie scream welling up from the depths”—a sound that “one might hear from a person about to be murdered.”26 The screaming fit, reportedly, alleviated the patient’s neurosis. Janov began encouraging his other patients to scream. They, too, felt better afterward. Janov called the noise the “Primal Scream” and said that it was a more effective treatment for neurosis than psychotherapy or drugs. The scream, he said, regresses patients to a period before certain emotional injuries were inflicted, injuries that create a permanent “muscle tension” throughout the body (an idea he borrowed from the influential psychologist, and student of Freud, Wilhelm Reich). This stored-up “psychic pain,” Janov said, leads to a “clamping” of the respiratory and vocal muscles that is heard in the “squeezed” voice of the neurotic.27 Janov claimed that the violent muscular spasms involved in screami
ng unlock this muscle tension and relieve the psychic pain, permanently.

  When my neighbor Andrea teaches Kristin Linklater’s “Freeing the Natural Voice” technique, she sometimes encourages loud, uninhibited vocal noises that release muscle tension, although not for psychotherapeutic purposes, but rather to free up the voice’s acoustic range and power. Nevertheless, a friend of mine who actually traveled to Orkney, Scotland, to take a weeklong workshop withLinklater herself (who at eighty-three years old was still teaching her vocal technique to those willing to make the pilgrimage) reported that the participants, when encouraged to roll around on the floor and adopt various unusual postures while screaming, manifested a striking psychological side-effect. Every person in the workshop, my friend told me, at one point or another, “cried about their mother.”28

  While anxiety and other emotional disorders do affect the voice—tensing the respiratory muscles, which dampens volume; tightening laryngeal muscles and causing the voice to tremble; freezing muscles of the face and tongue, blurring articulation—there’s no evidence that severe neurosis can be permanently alleviated by screaming, primal or otherwise. Today, any reported effectiveness in Janov’s screaming treatment is understood to be a placebo effect, or short-term emotional catharsis.29 Nevertheless, Janov’s book The Primal Scream, published in 1970, sold millions and attracted celebrity adherents, including the actor James Earl Jones and Apple founder Steve Jobs. But the most famous proselytizer for Primal Scream Therapy was John Lennon. He underwent the treatment after Janov sent an unsolicited prepublication copy to Lennon’s estate in England. The singer’s openness to trying new therapies, religions, and drugs was well known,30 as was his psychic turmoil: abandoned by both parents at age four and raised by an aunt, he was only just growing close to his mother, in his teens, when she was killed by a student driver; to this trauma was added, shortly before he read The Primal Scream, his split from the Beatles, divorce from his first wife, and a descent into heroin addiction.31

 

‹ Prev