I Can Hear You Whisper
Page 8
Like children with a new camera, the Bell researchers used their new device to make wave pictures of a panoply of letters and words. A host of speakers, known to history as “M.A.—Male, Low-Pitched,” and “F.D.—Female, High-Pitched,” and the like, put their lips about three inches from the transmitter and intoned lists of vowel sounds—the “u” of “put,” for example—and words such as “seems” and “poor.” The results established some general characteristics of speech sounds. That the pitch of the voice varies with individuals, for instance. A “deep-voiced man” spoke vowels at about ninety cycles per second—or ninety hertz—and a “high shrill-voiced woman” (F.D. perhaps?) at about three hundred cycles per second. They also noticed that when that same man and woman spoke the “ah” sound in “father,” the wave pictures looked quite different, “yet the ear will identify them as the vowel ‘a’ more than 99 percent of the time.” Whereas two low-pitched male voices pronouncing “i” as in “tip” and “o” as in “ton” create much more similar pictures, “yet they are never confused by the ear.” The Bell team had tumbled to the fact that speech sounds carried some other important characteristic that didn’t show up in the waveform. Later, that characteristic was given a name: timbre.
Even I, untrained in reading waveforms and spectrograms, could see that the separate sounds in a simple word like “farmers,” casually uttered in an instant, carried detailed and identifiable information that distinguished it from “alters” almost like the fingerprints that distinguish my right hand from my husband’s. The very high frequencies in the “f” and the “s” sounds at the beginning and end of “farmers” are so rapid they look like a nearly straight line. The “a,” “r,” and “m” sounds in the middle show up as peaks and valleys of varying sharpness and depth—the “r” spiking then rolling, spiking then rolling, and the “m” flatter but still undulating like a line of mesas in the desert. All three sounds hovered at the same frequency of 120 cycles per second. The “er” toward the end of the word brought a slight rise in pitch to 130 cycles. “Farmers,” I said out loud. Sure enough, I raised my voice in the second syllable, a fact I had never noticed before. From this work, I could draw a direct line to the audiogram chart Jessica O’Gara had given me, so I could see where the main frequencies of various phonemes in the English alphabet fell.
Fletcher and his team spent particular time on vowels. Vowels are distinguished from consonants in the way they are formed in our vocal tracts. Critically, they are also at the heart of each syllable. Syllables, I was going to learn, are an essential ingredient in the recipe that allows us to hear and process spoken language. No wonder all languages require vowels. Expanding on Helmholtz’s and Bell’s investigations into the complexities of vowel sounds, the men of Bell Labs identified not just the fundamental frequencies of “ah” and “oo,” for instance, but also the accompanying harmonic frequencies that readily distinguish one sound from the other. From that, they generated tables showing two primary frequencies—one lower, one higher—for each vowel sound. For telephone engineers, such information “makes it possible to see quickly which frequencies must be transmitted by the systems to completely carry all the characteristics of speech.” After World War II, two more Bell researchers were able to use another pioneering device, the spectrograph, to create definitive specifications of vowel frequencies, known as formants. What none of those early researchers could possibly guess was that decades later, a different generation of engineers would use the formant information compiled at Bell Labs to figure out how to transmit the necessary frequencies through a cochlear implant to make speech intelligible to the deaf.
They did see the potential to help in other ways immediately. With his new arsenal of oscillators, amplifiers, and attenuators, Harvey Fletcher could for the first time accurately measure hearing, because he could now produce a known frequency and intensity of tone. He patented the audiometer that was the forerunner of the machines used in Alex’s hearing test. His group also created the decibel to measure the intensity of sound as perceived by humans, and they established the range of normal hearing as 20 to 20,000 Hz. The range of speech from a whisper to a yell proved to span about sixty decibels. The new audiometer could measure noise as well, which allowed the editors of the August 1926 issue of Popular Science Monthly to note that a Bell Labs device had identified the corner of Thirty-Fourth Street and Sixth Avenue as the noisiest place in New York City. A decade later, Bell scientists capitalized on demonstrations at the 1939 World’s Fair and measured the hearing of enough curious fairgoers that they were able to show just how much hearing degrades from the teenage years into late middle age.
Fletcher’s newfound abilities and equipment brought him some interesting visitors. When American industrialist and philanthropist Alfred I. duPont couldn’t hear what was being said at his own board meetings, he turned to Bell Labs for help. After a childhood swimming accident in the Brandywine River, duPont’s hearing had gotten progressively worse, and he was almost entirely deaf as an adult. DuPont told Fletcher that his ability to hear fluctuated. It improved after X-ray beam treatment from a doctor he was seeing, then worsened again. Skeptical, Fletcher asked to accompany duPont on his next visit to the doctor. Beforehand, Fletcher measured duPont’s hearing himself and created a picture of his considerable hearing loss using what he called an “embryo audiometer.” According to Fletcher, the doctor treating duPont had a very different technique.
There was a path along the floor … about 20 feet long. Mr. Dupont was asked to stand at one end of this. The doctor stood at the other end and said in a very weak voice, “Can you hear now?” Mr. Dupont shook his head. [The doctor] kept coming closer and asking the same question in the same weak voice until he came to about two feet from his ear, where [Mr. Dupont] said he could hear. His hearing level was found to be two feet.
Mr. Dupont then was asked to stand four or five feet in front of an X-ray tube with his ear facing the tube. The X-ray was turned on two or three times. He then turned his other ear toward the tube and had a similar treatment. He then stood in the 20 foot path and another hearing test was made. But this time as he started to walk toward Mr. Dupont [the doctor] shouted in a very loud voice: “Do you hear me now?” As the doctor reached the 10 or 15 foot mark, Mr. Dupont’s eyes twinkled and he said he could hear.
I could hardly keep from laughing… .
When they returned to the laboratory, Fletcher measured duPont’s hearing again and found it unchanged. “After that,” noted Fletcher, “Mr. Dupont never paid a visit to this doctor.”
Fletcher and duPont then turned to the problem of the board meetings. The invention of the telephone had led to the first electronic hearing aids by making it possible to manipulate attributes of sound like loudness and frequency as well as to measure distortion. (Likewise, the invention of the transistor at Bell Labs in the 1950s would revolutionize hearing aid technology by making the devices smaller and more powerful.) One early electronic hearing aid apparently consisted of a battery attached to a telephone receiver. For duPont to hear all the participants in a meeting, Fletcher set up a system with two microphones in the center of the boardroom table and two telephone receivers (one for each ear) attached to a headband for duPont to wear. Hidden under the table was a desk-size set of amplifiers, transformers, and condensers. By using two receivers instead of one, duPont was able to tell where the speaker was. “And that,” said Fletcher, “was the first hearing aid Bell Labs ever made.” Later, Fletcher made hearing aids for Thomas Edison as well, though Edison later complained that his hearing aids had revealed to him that speakers at the public events he attended said little of interest.
• • •
Fletcher wasn’t the only one whose work was inspired by the telephone. In the 1920s, just as the Bell Labs team was investigating the properties of speech and hearing, Hungarian scientist Georg von Békésy began zeroing in on just one component of that chain: the inner ear. After completing his PhD in physics in 1923 at the Universit
y of Budapest, Békésy took a job at the Telephone System Laboratory at the Hungarian Post Office, which maintained the country’s telephone, telegraph, and radio lines. “After World War I, [it was] the only place in Hungary that had some scientific instruments left and was willing to let me use them,” he said later. His job was to determine whether making changes in the telephones themselves or in the cables led to greater improvements in telephone quality. His engineering colleagues wanted to know “which improvements the ear would appreciate.” At first, Békésy turned to library books for answers. But he soon realized that while a lot was known about the anatomy of the ear, very little was understood about its physiology, how it actually worked. He began studying the function of the inner ear, and the subject became his life’s work.
Békésy wanted to see the cochlea in action, and I do mean “see.” He collected an assembly line of temporal bones from cadavers at a nearby hospital and kept them in rotation on his workbench. First, he made models of the cochlea based on his samples, then he began to do experiments with the human cochlea. Using a microscope that he designed himself to send strobes of multicolored light onto the inner ear, he watched the basilar membrane, the cellophane-like ribbon that runs the length of the cochlea, as it responded to sound. The setup he rigged allowed him to see a sound wave ripple from one end of the basilar membrane to the other. He also identified critical properties of the membrane, that it was stiff at one end and more flexible at the other. Although the idea that different places on the basilar membrane responded to different frequencies had already been posited, Békésy was the first to see that response with his own eyes: The displacement of one part of the membrane was greater than the rest, depending on the frequency of the tone. His discovery was called Békésy’s traveling wave.
After World War II, not wanting to live in what had become Communist Hungary, Békésy continued his work first at the Karolinska Institute in Sweden and then at Harvard. A loner by nature, he never taught a student or collaborated with anyone. Nevertheless, he was awarded the Nobel Prize in Physiology or Medicine in 1961 for his work on “the physical mechanism of stimulation within the cochlea.” Nobel Prize or no, we know today that there are at least two fundamental problems with Békésy’s work. One is that his subjects were dead. The auditory system is a living thing and responds more subtly when alive than dead. Secondly, in order to get any response from the cochlea of a cadaver, he had to generate noise that was loud enough (134 dB) to wake the dead, so to speak. As a result, the broad response Békésy saw didn’t accurately reflect the finesse of the basilar membrane. Despite its limitations, Békésy’s traveling wave represented an important advance. In a recent appreciation of his work, Peter Dallos and Barbara Canlon wrote, “This space-time pattern of vibration of the cochlea’s basilar membrane forms the basis of … our ability to appreciate the auditory world around us: to process signals, to communicate orally, to listen to music.”
Békésy had narrow shoulders, but in the best scientific tradition, many who came later stood upon them. Back at Bell Labs, in the 1950s, later generations of researchers used Békésy’s work to build an artificial basilar membrane and then, in the 1970s, to devise computer models of its function, all of which would prove critical in the digital speech processing that lay ahead.
Jean Marc Gaspard Itard had been dead wrong about the possibilities of science.
7
WORD BY WORD
The day after Alex got his hearing aids, he and I went on an errand. As I unstrapped him from his car seat, I discovered he had yanked out both earmolds. At least they were still connected to the safety clip—his had a plastic whale to attach to his shirt and long braided cords like those for sunglasses. But when I tried to reinsert the earmolds, everything looked wrong. Alex had twisted them out of position. I stood there with the tangle of nubby plastic and blue cord in my hand and realized that in spite of the lesson Jessica had given me, I had absolutely no idea how to restore order—which was left, which was right, whether they were backward or forward, or how to begin to put them into his ears.
“Well, shit,” I said out loud. “Shit, shit, shit.”
Alex smiled. At least I was free to curse in front of him. The word “shit” had high-frequency “sh” and “t” sounds that he would never hear. After the frustration of weeks of uncertainty and disequilibrium … here I was, on the sidewalk, lost.
Fortunately, we were about to visit the preschool program at the Auditory/Oral School in Brooklyn, and I’d been given a fresh reminder of the benefits of an environment where people were familiar with what it is to be deaf or hard of hearing. It was just dawning on me that one of my new roles was going to be serving as Alex’s IT Help Desk, and I was sorely unprepared.
Piling the tangled equipment onto the top of his stroller, I made my way to the front door.
“Um, we need a little help putting these back in,” I confessed when I got inside. “We’re new at this.”
The director of the school smiled and picked up the hearing aids. “See how the mold curves?” she said, indicating the way the plastic followed the line of Alex’s ear canal. Gently, she pulled his earlobe back and popped one aid into position. Then she did the same on the other side. Ten seconds and it was done.
In making choices about education and communication in the deaf and hard-of-hearing world, people talk about “outcomes.” Since Alex had usable hearing and our desired outcome was talking and listening, we had decided to look at oral programs. These were the “option” schools my friend Karen had been talking about a few weeks earlier. All would provide explicit language instruction to get Alex beyond “mama,” “dada,” hello,” and “up.”
I had thought Karen was out of her mind to suggest that a two-year-old child might travel an hour from home for school. The additional complication of having to get two other children, then seven and four, to and from school near our house stymied me. But Mark is good at making the impossible seem possible, and far less worried than I am about spending money to solve problems. He immediately threw out solutions: babysitters, car services, an Ecuadorean taxi driver we knew who might be willing to help, and so on. Soon we had a plan. I wasn’t willing to put Alex on the school bus yet, so we split the week between me and a pair of babysitters and ultimately chose Clarke, a Manhattan satellite of the school founded by Mabel Hubbard’s father, in part because it was reachable by subway. There, Alex would spend every morning bathed in words and language.
In the same way that an aspiring athlete has to train and strengthen muscles, Alex had to practice learning to talk. Speech production is a motor skill like kicking a ball or picking up a raisin. We don’t think of it that way because it doesn’t usually make us sweat or even require much effort once we’ve mastered it, but a babbling baby is training her vocal system to produce the sounds she’s been hearing through the first months of life. The attempts are tentative at first. She knows she’s getting close when the adults in her world get excited about the noises she makes. The sounds get more and more confident until they come out as words. Alex had missed all of that.
Sound is produced by vibrations and columns of air. Our bodies provide both. In all spoken languages, the fundamental speech sounds are similar because the range of possibilities has physical limits. Words begin in the lungs, which serve both as a store of air and as a source of energy. That air is pushed out of the lungs and, on its way to being transformed into the sounds of speech, it passes along the conveyor belt of our vocal systems.
Lodged in the top and front of the trachea, the larynx is made mostly of cartilage, including the thyroid at the front that forms the Adam’s apple. Inside the larynx are the vocal cords. They’re sometimes called the vocal folds, and that’s a more accurate term, as there is nothing cordlike about the vocal cords. They are pieces of folded ligament that meet to make a V-shaped slit (the glottis) that closes to stop air or opens to let it pass through. To produce the “d” in “idiot,” for example, air is stopped entirely
. For soft sounds like the “f” of “farm” and the “s” in “sunny,” the vocal cords are completely open, and the feathery or hissing sounds can go on indefinitely, which is why those consonants are described as “continuant.” And why they had so much less going on in the picture of “farmers” created by the Bell Labs oscillograph. When the vocal cords rapidly open and close, they create a vibration that allows us to make vowels and the sounds of “voiced consonants” such as “v,” “z,” “b,” “d,” and “g.” If you watch the changing shape of your lips when you make a “p” sound, an “o,” or an “f,” you can get an idea of the movement of the vocal cords inside your throat, and you can feel the difference between voiced and unvoiced sounds if you make a “zzz” and then a “sss” with a finger resting on your Adam’s apple.
To whisper, we keep our vocal cords in the same middle position as for the “h” of “hill.” The louder the whisper we want to make, the closer together we bring our vocal cords, so that the word “hill” spoken in a loud whisper results in more air leaving your lips than saying it in a normal voice.
Leaving the lungs and vocal cords as somewhat amorphous buzzes and whooshes, the flow of air is further refined—stopped and restarted, pushed and pulled, narrowed or flattened—by the tuning we do in our mouths when we vary the shape and relative position of the palate, tongue, teeth, and lips. When speech pathologists talk of plosives or fricatives, for instance, they are describing what we have to do in this last stage of the conveyor belt. The plosives (“p,” “b,” “t,” etc.) require us to block the flow of air somewhere along the way, usually in the mouth. The fricatives (“s,” “f,” “sh,” etc.) are made by narrowing the air flow to form turbulence. To form the liquids (“r” and “l”) we raise the tip of the tongue and keep the mouth a bit constricted. “M,” “n,” and “ng” are nasals; “w” and “y” are semivowels. Speech sounds are further identified by place of articulation—labial (lips) or dental (teeth), for example. So a “p” is an unvoiced labial plosive, and a “th” is a voiced dental fricative. Like the Linnaean system of biological classification into kingdom, phylum, class, and so on, this way of organizing elementary features was a breakthrough when it was invented in the 1930s because it captures all of the speech sounds of the world’s languages.