Book Read Free

Harnessed: How Language and Music Mimicked Nature and Transformed Ape to Man

Page 4

by Mark Changizi


  Speech Events

  Grasshopper

  In M. Night Shyamalan’s movie The Village, a young woman, Ivy, sets off on a journey into an unknown forest. She has persuaded the elders of her tribe to let her find other people on the far side of the forest, get medicine, and return to save the life of her sick lover. She has no knowledge of anything beyond the several acres of her village, except that beyond their meadow and inside the forest are chilling, otherworldly beasts that occasionally invade the village and carve up one of the pets.

  As if this quest were not harrowing enough, there’s an important fact I left out: she is blind. Now, the village leaders know the truth about what’s beyond their meadow—no beasts (but the costumed elders themselves), just woods, and then modern civilization, from which they’ve sheltered their children. That’s why they allow her to go into the forest. But no one of Ivy’s generation knows this. And neither do we, the moviegoers. We’re terrified for her. As it turns out, terrifying things do happen to her in that forest, because a monster (really a man from the village in a monster costume) secretly follows her, and eventually attacks her.

  The movie would be considerably less dramatic if our female heroine were deaf, rather than blind. Instead of a woman waving her arms and tramping about through the thorny tangles, we’d be watching a woman walking normally through the forest, keeping to deer trails. In fact, many of us regularly do just this, wearing headphones and blasting music as we deafly, yet deftly, jog through our local park. This would not quite elicit the thrill Shyamalan had in mind. A deaf person on a forest quest does not make a good movie. Being deaf just doesn’t seem like much of a big deal compared to blindness. If not for the inability to hear speech, we might hardly miss our auditory systems if they fell out through our ears.

  Then again, there’s another twist to the story that may change one’s feeling about audition: our young blind heroine defeats her attacker. She kills him, in fact. She may look out of sorts crashing into trees, but her hearing makes it impossible for her attacker to sneak up on her. Especially in the forest. Had she been deaf, not blind, her attacker could have whistled “Dixie” with an accordion accompaniment while following her through the woods and still taken her completely by surprise.

  If deaf-maiden-alone-in-the-forest is not spine-tingling to movie audiences, it is only because we tend not to appreciate all that our ears do for us beyond language. Providing a sneakproof alert system is just one of the many powers of audition.

  The greatest respect for our ears is found among blind kung fu masters. Every “Grasshopper” learns from his old blind master that by attending to and dissecting the ambient sounds around oneself, it is possible to sense how many attackers surround one, their locations, stances, weapons, intent, confidence level, and which one is the enemy mastermind. I once saw, in an old movie, one of these scrawny geezers defeat six men using only a baseball bat wielded upside down. But you don’t have to be a fictional blind kung fu master to have a mastery of audition and know how to sense the world with it. We all do; we just don’t get all “Grasshopper” about it. Our brains have a mastery of it even if we’ve never thought about it.

  In fact, when I first began pondering whether speech might sound like natural events, I had great difficulty thinking of any important natural-event sounds. I was initially dumbfounded: what is so useful about having ears that nearly all vertebrates have them? It seemed to me that I primarily use my ears for listening to speech, and that obviously cannot explain why all those other vertebrates have ears as well. Sure, it is difficult to sneak up on me, but one hardly needs such a fine-tuned ear and auditory system for a simple alarm.

  After some months of contemplation, however, I came to consciously appreciate my ability to use sound to recognize the world and what’s happening around me. I began to notice every tap, clink, rub, burble, and skid. And I noticed how difficult it was for me to do anything without making a sound that gave away what I was doing, like eating from my daughter’s Halloween stash. When you’re next at home and your family is active around you, close your eyes and listen. You will hear sounds such as the plink of a spoon in a coffee mug, the scrape of a drawer opening, or the scratch of crayons on drywall. It will typically take some time before you hear an event that you cannot recognize. In the late 1980s, the psychologist William Gaver played environmental sounds to listeners, and asked them to identify what they heard. He found that people are impressive at this: most are capable, for example, of distinguishing running upstairs from running downstairs. Research following in the tradition of work done by the psychologist William H. Warren in the mid-1980s has shown that people are even able to use sound to sense the shapes and textures of some objects.

  Our ears and auditory systems are, then, highly designed for and competent at sensing and recognizing what is happening around us. Our auditory systems are priceless pieces of machinery, just the kind of hardware that cultural evolution shouldn’t let go to waste, perfect for harnessing. In this chapter, I sift through the sounds of nature and distill a host of regularities found there, regularities that apply nearly anywhere—in the jungle, on the tundra, or in a modern city. The idea is that our auditory system, having evolved in the presence of these regularities for hundreds of millions of years, will have evolutionarily “internalized” them; our auditory system will therefore work best when incoming sounds conform to these regularities. I will then ask whether the sounds of speech across human languages tend to respect these regularities. That’s what we expect if language harnesses us.

  Over Hear

  It can be difficult for students to attract my attention when I am lecturing. My occasional glances in their direction aren’t likely to notice a static arm raised in the standing-room-only lecture hall, and so they are reduced to jumping and gesturing wildly in the hope of catching my eye. And that’s why, whenever possible, I keep the house lights turned off. There are, then, three reasons why my students have trouble visually signaling me: (i) they tend to be behind my head as I write on the chalkboard, (ii) many are occluded by other people, are listening from behind pillars, or are craning their necks out in the hallway, and (iii) they’re literally in the dark.

  These three reasons are also the first ones that come to mind for why languages everywhere employ audition (with the secondary exceptions of writing and signed languages for the deaf) rather than vision. We cannot see behind us, through occlusions, or in the dark; but we can hear behind us, through occlusions, and in the dark. In situations where one or more of these—(i), (ii), and (iii) above—apply, vision fails, but audition is ideal. Between me and the students in my course lectures, all three of these conditions apply, and so vision is all but useless as a route to my attention. In such a scenario a student could develop a firsthand appreciation of the value of speech for orienting a listener. And if it weren’t for the fact that I wear headphones blasting Beethoven when I lecture, my students might actually learn this lesson.

  The three reasons for vision’s failure mentioned above are good reasons why audition might be favored for language communication, but there is a much more fundamental reason, one that would apply to us even if we had eyes in the backs of our heads and lived on wide-open prairies in a magical realm of sunlit nights. To understand this reason, we must investigate what vision and audition are each good at.

  Vision excels at answering the questions “What is it?” and “Where is it?” but not “What happened?” Each glance cannot help but inform you about what objects are around you, and where. But nearly everything you see isn’t doing anything. Mostly you just see nature’s set pieces, currently not participating in any event—and yet each one is visually screaming, “I’m here! I’m here!” There’s a simple reason for this: light is reflecting off all parts of the scene, whether or not the parts have anything interesting to say. Not only are all parts of a scene sending light toward you even when they are not involved in any event, but the visual stimulus often changes in dramatic ways even when the ob
jects out there are not moving. In particular, this happens whenever we move. As we change position, objects in our visual field dynamically shift: their shapes distort, nearer objects move more quickly, and objects shift from visible to occluded and vice versa. Visual movement and change are not, therefore, surefire signals that an event has occurred. In sum, vision is not ideal for sensing events because events have trouble visually outshouting all the showy nonevents.

  If visual nature is the loquacious coworker you avoid eye contact with, auditory nature is (ironically) the silent fellow who speaks up only to say, “Piano falling.” Audition excels at the “What’s happening?” sensing a signal only when there’s an event. Audition not only captures events we cannot see—like my (fictional) gesticulating students—but serves to alert us to events occurring even within our view. Nonevents may be screaming visually, but they are not actually making any noise, and so audition has unobstructed access to events—for the simple reason that sound waves are cast only when there is an event.

  That’s why audition, but not vision, is intrinsically about “what’s happening.” Audition excels at event perception. And this is crucial to why audition, but not vision, is best suited for everyday language communication. Communication is a kind of event, and thus is a natural for audition. That is, everyday person-to-person language interactions are acute events intended to be comprehended at that moment. Writing is not like this; it is a longer-term record of our thoughts. And when writing does try to be an acute person-to-person means of communication, it tends to take measures to ensure that the receiver gets the message now—and often this is done via an auditory signal, such as when one’s e-mail or text messaging beeps an alert that there is a new message.

  That language is auditory and not visual is, in the broadest sense, a case of harnessing, or being like nature for the purpose of best utilizing our hardware. Language was culturally selected to utilize the auditory modality because sound is nature’s modality of event communication.

  That’s nice as far as it goes, but it does not take us very far. The Morse code for electric telegraphy utilizes sound (dots and dashes), and even the world-record Morse code reader, Ted McElroy, could only handle reading 75.2 Morse code words per minute (a record set in 1939), whereas we can all comprehend speech comfortably at around 150 words per minute—and with effort, at rates approaching 750 words per minute. Fax machines and modems also communicate by sound, but no human language asks us to squeal and bleep like that. Clearly, not just any auditory communication will do. And that brings us to the main aim of this chapter: to say what auditory communication should sound like in order to best harness our auditory system. We move next to the first step in this project: searching for the atoms of natural sounds, akin to the contours in natural scenes on the visual side.

  Nature’s Phonemes

  By understanding the different evolutionary roles for vision and audition, we just saw that audition is the appropriate modality to harness for language: sound is nature’s standard event stream, and language therefore wants to utilize sound to make sure language utterances get received. But what kinds of sounds, more specifically, should language use to best harness our brains? The sounds of nature, of course. But the natural world has a large portfolio of sounds it can make, and people are good at mimicking a fair share of these sounds, mostly with their mouths, but sometimes with the help of their hands and underarms. Saying that a well-designed language will use sounds from nature is like saying one had “a sandwich” in a deli. Which sounds from nature? Wind blowing, water splashing, trees falling (when someone is around), leaves rustling, thunder, animal vocalizations, knuckle cracks, eggs breaking? Where is language to begin?

  Although nature’s sounds are all over the map, there’s order to the cacophony. Most events we hear are built out of just three fundamental building blocks: hits, slides, and rings.

  Hits happen whenever a solid object bumps into another object. When you walk, your feet hit the ground. When you knock, your knuckles hit the door. A tennis match is a game of hits—ball hits racket, ball hits net, ball hits ground. Hits make a distinctive sound. They happen suddenly, and the auditory signal consists of an almost instantaneous explosive burst of energy emanating from the impact.

  Slides are the other common kind of physical interaction between solid objects. Slides occur whenever there is a long duration of friction contact between surfaces. If you drag your finger down the page of this book, you’re making a slide. If you push a box along the floor, that’s a slide. The auditory structure of slides differs from that of hits: Rather than a nearly instantaneous release of energy, slides have a non-sudden start and a white-noise-like sound that can last for a more extended period of time. Slides are less common than hits. First, they require a special circumstance, the extended interaction of two surfaces; hits, on the other hand, are what perception scientists call “generic,” because no special coincidences are needed to carry off a hit. Second, when slides do happen their friction tends to significantly lower the energy in the event, and therefore they commonly occur at the tail ends of events. Third, whereas a long sequence of hits is possible (with intervening rings, as discussed in a moment)—as when a ping pong ball bounces lower and lower, for instance—a long sequence of distinct slides is not typically possible; something would have to stop one slide to allow another one to start, but any such interference with a slide is likely to involve a hit.

  Hits and slides are the only physical interactions among solid objects that we regularly experience, and they are certainly the primary ones our ancestors would have experienced. We are land mammals. Splashes, involving a solid and a liquid, are neither hits nor slides, and although they could shape the auditory system of otters, seals, and whales, they’re unlikely to be of central significance to our auditory system.

  With the two kinds of solid-object physical interaction out of the way, we are left with the final fundamental constituent of these natural events: rings. A ring is what happens to a solid object after a physical interaction, that is, after a hit or a slide. When a solid object is physically impinged upon, it vibrates and wobbles, and although one can almost never see these vibrations, one can hear them. You can tell from the sound whether your pen is tapping your desk, your computer, or your coffee mug, because the same pen hit leads to different rings; you may also be able to tell that it is the same pen hitting the three different objects.

  Different objects ring in distinct “timbres,” a word (pronounced “TAM-ber”) that refers to the overall perceptual nature of the sound. For example, a piano C and a violin C have the same pitch, or frequency, but they differ in the quality or texture of their sound, and timbre refers to this. Most objects have very short-lived rings—unlike the long-drawn-out ring of a gong—but they do ring, and once you set your mind to noticing, you’ll be amazed to hear these rings everywhere. And it is not just hits that ring, but slides as well. The vibrations that occur when any two objects hit each other will have many similarities to the vibrations resulting from the same two objects sliding together, so that we can tell that a coffee mug is being dragged along the desk because the ring possesses certain features also found in the ring of a pinged coffee mug.

  Hits, slides, and rings are, therefore, nature’s primary phonemes (see Figure 3). They are a consequence of how solid physical objects interact and vibrate. Although these three kinds of sound are special in the lexicon of nature, there is nothing requiring language to carve sounds at these joints. Dog woofs, cat calls, horse neighs, whale song, and bird song do not carve at these joints. Neither does the auditory communication of a fax machine. But if a language is to be designed to harness the human auditory system, then it will be built out of the sounds of hits, slides, and rings.

  Figure 3. The three principal constituents of physical events: (a) hits, (b) slides, and (c) rings. They sound suspiciously similar to plosives, fricatives, and sonorant phonemes in human languages.

  Are human languages built out of these
constituents? Yes. In fact, the most fundamental universal of human speech is that phonemes, the “atoms” of speech, come in three primary types, and these types match nature’s phonemes! Language’s hits, slides, and rings are, respectively, plosives, fricatives, and sonorants.

  Plosives—like b, p, d, t, g, and k—are found in every language, and consist of sudden, explosive, high-energy inceptions. Plosives sound like hits (even embedding their explosive hitlike starts in the name). Figure 4a shows the time-varying frequency distribution for the sound made when I hit my desk with a small plastic cup, and one can see that the hit begins with a sharp vertical line indicating the presence of a wide range of frequencies at the instant of the collision. That same figure shows, on the right, the same kind of plot when I made a “k” sound. Again one can see the sharp edge at the beginning of the sound, characteristic of a hit. (Also note that, in English, at least, one finds many plosive-filled words with meanings related to hits: bam, bang, bash, blam, bop, bonk, bump, clack, clang, clink, clap, clatter, click, crack, crush, hit, klunk, knock, pat, plunk, pop, pound, pow, punch, push, rap, rattle, tap, and thump.)

  Languages have a second principal kind of consonant called the fricative, such as s, sh, th, f, v, and z. They are extended and noisy, and sound like slides. (In fact, the very word “fricative” captures the friction nature of a slide.) And just as slides are rarer than hits, fricatives are less common than plosives. All languages have plosives, whereas many languages (especially in Australia) do not have fricatives. Figure 4b, on the left, shows the frequencies of sound emanating from a small cup that I slid on my desk, and one can see that there is no longer a crisp start to the sound as there was for hits. There is also a longer duration of sound, all of it with a wide range of frequencies. On the right of Figure 4b is the same kind of plot, this one generated when I made a “sh” sound. One sees the signature features of a slide in fricatives. (Also note that in English, at least, one finds many fricative-filled words with meanings related to slides: fizzle, hiss, rustle, scratch, scrunch, shuffle, sizzle, slash, slice, slip, swoosh, whiff, whiffle, and zip.)

 

‹ Prev