Harnessed: How Language and Music Mimicked Nature and Transformed Ape to Man

Page 5

by Mark Changizi

The third principal phoneme type used across human languages is the sonorant, including vowels like a, e, i, o, u, but also sonorant consonants like l, r, y, w, m, and n. Each of these phonemes has strongly periodic vibrations, and has a complex spectral shape. Sonorants sound like rings. Figure 4c, left, shows the ringing after tapping my coffee mug. Only certain frequencies occur during the quickly decaying ring, and these frequency bands are characteristic of the shape and material properties of my mug. To the right of that in Figure 4c is the signal of me saying “ka.” (The plosive “k” sound corresponds to the tap.) As with the coffee mug, there are certain frequency bands that are more active, and these patterns are what characterize the sound as an “a.”

Lo and behold! The principal three classes of phonemes in human speech sound just like nature’s three classes of phonemes. We speak in hits, slides, and rings!

Before getting overly excited by the realization that language’s phonemes are like nature’s phonemes, we must, however, address a worry: How else could we speak? What if human vocalization can’t help but sound like hits, slides, and rings? If that were the case, then the observations made in this section would have little significance for harnessing; culture would not need to design language to sound like hits, slides, and rings, because our mouths would make these sounds by default. We take this up next.

Figure 4. Illustration that plosives, fricatives, and sonorants sound like hits, slides, and rings, respectively. These plots show the frequencies on the y-axis, and time on the x-axis. Comparison of (a) hits and plosives, (b) slides and fricatives, and (c) rings and sonorants.

Tongue Wagging

When the Mars Rover landed on Mars, it bounced several times on balloon-like cushions; the cushions then deflated, allowing the rover to roll gently onto the iron-red dirt. If you had been there watching the bouncy landing, you would have heard—as you writhed in pain from decompression in the low-pressure atmosphere—a sequence of hits, with rings in between. And once the rover found a place to take a sample of Martian soil, it would have scraped debris into a container for analysis, and that scrape would have sounded like a slide, followed by a ring characteristic of the Rover’s scraping arm. Hits, slides, and rings on Mars! It is not so much that hits, slides, and rings are Earthly nature’s phonemes as much as they are physics’ phonemes. These sounds are the principal building blocks of event sounds anywhere there are solid objects interacting—even in our mouths.

Our mouths have moving parts, including a powerful and acrobatic tongue; fleshy, maneuverable lips; and a jaw rigged with rock-hard teeth. When we speak, these parts physically interact in complex ways, creating speech events. But speech events are events, and if hits, slides, and rings are the fundamental constituents of physical events, then speech events must also be built from hits, slides, and rings in the mouth. It is no wonder, then, that human speech sounds like hits, slides, and rings. Speech is built from the fundamental constituents of physical events because speech is a physical event. Harnessing would appear to have nothing to do with it.

However, when we speak, our mouth is not simply a container with a tongue, lips, and teeth rattling around. We are not, for example, making hit sounds by tapping our teeth together, or slide sounds by grinding our teeth. When our mouth (in collaboration with our nose, throat, and lungs) makes sounds, it is using mechanisms for sound production that go well beyond the solid-object event atoms—hits, slides, and rings. Although hits, slides, and rings are the most fundamental kinds of physical events (because solid-object events are the most fundamental kind of physical event), they are not the only kinds. There are hosts of others. In particular, there are many physical events that involve the flow of fluid or air. The events in our mouths that make the sounds of speech are events involving airflow, not hits, slides, or rings at all. Airflow events in our mouths mimic hits, slides, and rings, the constituents of solid-object physical events. Our mouths make a plosive by a sudden release of air, not by an actual collision in the mouth. Fricatives are made by the noninstantaneous movement of air through a tight passage; no surfaces in the mouth are actually rubbed against one another. And sonorants are not due to an object vibrating because of a hit or slide; instead, sonorants come from the vocal chords vibrating as air passes by.

Hit, slide, and ring sounds without hits, slides, or rings! What a coincidence! Human speech employs three principal sounds via airflow mechanisms, and yet they happen to sound just like the three principal sounds that happen in events with physical interactions between solid objects. Utterly different mechanisms, but the same resultant sound. That’s too coincidental to be a coincidence. That’s just what harnessing expects: airflow sound-producing mouths settling on just a few sounds for language—the sounds of physical interactions among solid objects.

We must be careful, though. What if airflow mechanisms cannot help but make hit, slide, and ring sounds? Or, more to the point, could it be that the particular airflow mechanisms our mouths are capable of can lead only to sounds like hits, slides, and rings? No. Human mouths are capable of sounds much more varied than the sounds of interacting solid objects. For example, people can mimic many animal sounds—quacks, moos, barks, ribbits, meows, and even human sounds like slurps, burps, sneezes, and yawns—that are constructed out of constituents beyond simple hit, slide, and ring sounds. People can mimic water-related sounds—like splashes, flushes, and drips—none of which are built from hit, slide, and ring sounds. And our airflow sound-mimicking mouths can, of course, mimic airflow sounds—like a soda pop being opened, howling wind, or even breaking wind—also unrelated to the sounds of hits, slides, and rings. People can mimic “hot” sounds, like sizzling bacon and roaring fires. They can even mimic the sounds of revving motorcycles, fax machines, digital alarm clocks, shrilling phones, and alien spaceships, none of which are sounds built from hits, slides, and rings. We see, then, that our airflow sound-producing mouths have a very wide repertoire, and yet speech has employed only the barest of our talents for mimicry, preferring exactly the sounds that occur among interacting macroscopic solid objects. We’re not, therefore, speaking in hits, slides, and rings by default. That we find these in all languages is a sign that we have been harnessed.

In upcoming sections, I will also concentrate on some other kinds of sounds our mouths can produce, but that language tends to avoid; these cases deserve special attention because of their prima facie similarity to sounds we do find in speech. Thus, they can help to answer the question of why speech utilizes some sounds we can make, but not others we can make just as easily. For example, we will see in the upcoming section that although we can make the sounds of wiggly hits and slides, we do not have them as phonemes—and this is consistent with their absence in physics. In the section following that we will see that although we can make slide-hit sounds and hit-slide sounds, only the latter is given the honor of phoneme status in languages (see the section titled “Nature’s Other Phoneme”), consistent with hit-slides being a fundamental sound in physics, while slide-hits are not. And we’ll see in the “Two-Hit Wonder” section that a simple kind of sound (a “beep”) that could exist as a phoneme does not occur in human languages, consistent with its nonprimitive status in physics. More generally, for the next five sections I will brandish a magnifying glass and closely examine the internal structures of hits, slides, and rings, asking whether those same fine structures are found in plosives, fricatives, and sonorants, respectively.

Wiggly Rings

Harmonicas don’t get no respect. They’re cheap (I just found one online for $5), tiny hunks of metal that tend to be played by guys who didn’t finish finishing school. I’ve had a couple of harmonicas for years, and have never understood them: they don’t have all the notes and can only play three chords. Blowing on a harmonica can’t help but sound fairly good, but I have always been frustrated by my inability to get it to do much more. A serious blues harmonica player can create sounds far richer than seems possible from what would appear to be li
ttle more than a toy.

A harmonica is deceptive because it is, in a sense, not an entire instrument at all. It is perhaps half an instrument—maybe that’s why they’re so inexpensive. The other half of the instrument is the human hand. That explains why the best harmonica players have hands, and, in addition, tend to move them all about the instrument when playing. This is described as “bending” the notes, and by doing so, the performer can provide a musical dynamism not possible with just the twenty or so notes in the harmonica’s range. The sounds reaching the listener’s ears are not only those coming directly from the harmonica, but also the harmonica sounds that first bounce off objects in the environment before reflecting toward the listener’s ears. For the note-bending blues performer, the hands are the objects the sounds bounce off. Each time a sound bounces off something, some sound frequencies are absorbed more than others, and so the timbre of the sound coming from that reflection is changed. The total timbre depends on the totality of harmonica sounds that reach the ear directly and indirectly from all points in the environment. And we’re able to hear these sound shapes, which is why harmonica benders go to all the trouble of wiggling their hands—and why there are acoustics engineers who worry about the physical layout of auditoriums.

Bending and acoustic reflections don’t just matter in the blues and in concert halls where instruments (including half instruments) are crooning out musical tones. Objects involved in events also croon, or ring. A ring has a complex timbre that informs us of the object’s size, shape, and material. But just like harmonica sounds, rings can get bent by the environmental surroundings. And our brains can decode the bends, and can give us a sense of our surroundings purely on the basis of the shapes of the sounds reaching our ears. The psychologist James J. Jenkins demonstrated in 1985 that blindfolded students, after a little practice, can navigate very well amongst obstacles by utilizing such auditory cues.

These acoustical observations about how the surroundings affect sound have an important consequence for the internal structure of rings: rings can be wiggly. There are several converging reasons for this. First, an event that causes a ring often also sets the ringing object in motion: something has been hit, or something is sliding. Because the shape of a ring reaching one’s ears depends on the object’s surroundings, ringing objects that are moving produce rings that vary over time. Second, when an event occurs, we are often on the move. Because the shape of the ring we receive depends, in part, upon our position in the world, the shape of the ring reaching our ears may be varying over time. In each case, whether we are moving or the object is, the timbre of a ringing object can change, and these are wiggles we notice, at least subconsciously. In addition to such dynamic changes in the subtleties of a ring’s timbre, there is another dimension in which rings can often vary: pitch, the musical-note-like “higher” or “lower” quality of sound. When motion is involved—either our own motion or that of the objects involved in events—we get Doppler shifts, a phenomenon we are all familiar with, as when a car approaching you sounds higher-pitched than when it is moving away. (See also the later section of this chapter titled “Unresolved Questions” for more about the Doppler effect and its stamp upon speech. And see the following chapters on music, where the Doppler effect will be discussed in detail.)

Rings can therefore change over time, both in timbre and in pitch. That is, a single ring can often be intrinsically dynamic. What about hits and slides?

Hits are nearly instantaneous, and for this simple reason they cannot change over time, at least not in the sense of continuously varying from one kind of hit to another. Hits can, of course, happen in quick succession, such as when you drop a pen and one end hits an instant before the other. But such a pen event would be two physical interactions, not one. Unlike a single ring, which can wiggle, a single hit has no wiggle room.

How about slides? Slides can occur for a lot longer than an instant, and so they can, in principle, dynamically vary over their occurrence. Although slides can be long—for example, a single snowy hill run on a sled may be one continuous slide—they are much more commonly short (though not instantaneous) in duration, because they quickly dissipate the energy of an event, sometimes ending it. Do the sounds of slides ever, in fact, dynamically vary over time? Before answering this, let’s be clear on what we mean by the sound of a slide. A slide can cause a ring, as we have discussed, but that is not what we’re interested in at the moment. We are, instead, interested in the sound made by the physical interaction of the two sliding surfaces—the noisy friction sound itself, caused by the coarseness of the objects involved. Therefore, to produce a wiggly slide, the coarseness of the surface being slid upon would have to vary, so that one friction sound would change gradually to another friction sound. Although coarseness varies randomly on lots of materials, few objects vary in a systematic, graded fashion, and thus slides will tend to have a rather nonvarying sound.

Rings, then, can be wiggly. But not hits, and not slides. If language has culturally evolved to sound like nature, then we would expect that sonorant phonemes (language’s rings) would sometimes be dynamically varying, but not plosives (language’s hits) or fricatives (language’s slides).

Languages do, indeed, often have sonorants that vary during their utterance. Although vowels like those in “sit” and in “set” are nonvarying, some vowels do vary, like those in “skate” and “dive.” When one says “skate,” for example, notice how the vowel sound requires your mouth to vary its shape, thereby dynamically modulating its timbre (in particular, modulating something called the formant structure, where formants are the bands of frequencies emanating from a sonorant). Vowel sounds like these are called diphthongs. Furthermore, sonorant consonants like l, r, y, w, and m demand ring changes. For example, when you say “yet,” notice how during the “y” your mouth dynamically varies its shape. These sonorants incorporate timbre changes. Recall that rings in nature also can change in pitch due to the Doppler effect. Do we find something like the Doppler shift in sonorant phonemes? Yes, in fact, in the many tonal languages of the world (such as Chinese), where vowels may be distinguished from one another only by virtue of how they dynamically vary their pitch during their utterance.

Whereas sonorants are commonly wiggly, effectively making more than one ringing sound during their utterance, no language possesses phonemes having in them more than one hit sound. It is possible in principle to have a single phoneme that sounds like two hits in very quick succession—for example, the “ct” in “ectoplasm”—but while we can make such sounds, and they even occur in language, they are never given building-block, or phoneme, status.

Are language’s slides like nature’s slides in being non-wiggly? First, let’s be clear on what it would even mean to have a fricative that varies dynamically as it is spoken. Try saying the sound “fs.” That is, begin with an “f” sound, and then slowly morph it to become “s” at the end. You make this sound when, for example, you say “puffs.” Languages could, in principle, have fricative phonemes that sound like “fs.” That is, languages could possess a single phoneme that has this complex dynamic fricative sound, just as languages possess single sonorant phonemes that are dynamic. One does not, however, find phonemes like this among human languages.

Nature’s rings are wiggly but hits and slides are not, and culture has given us language with the same wiggles: language commonly has sonorant phonemes that dynamically vary, but does not have plosive or fricative phonemes that dynamically vary. Our auditory systems are happy with dynamic rings, but not with dynamic hits or slides, and culture has given us speech that conforms to these tastes.

In addition to looking at dynamic changes within phonemes, we can make similar observations at the level of how phonemes combine into words: languages commonly have words with multiple sonorants in a row, but more rarely have multiple plosives or multiple fricatives in a row. For example, consider the following English words, which I found by perusing the second paragraph of this chapter: “h
arrowing” possesses six sonorants in a row (a, rr, o, w, i, and ng, the latter of which is a nasal sonorant), “village” has three in a row, “generation” has five in a row, and “eventually” has four in a row. One can find adjacent plosives, like in “packed” (“kt”) and “grabbed” (“bd”), and one can find adjacent fricatives like in “puffs” (“fs”), “gives” (“vz”), and “isthmus” (“sth”), but finding more than two in a row is difficult, and five or six in a row is practically impossible.

We now know how, and how much, each of the three kinds of “event atoms” can vary in sound while they are occurring. We have not, however, considered whether an event of one of these three kinds can ever dynamically change into another kind of event. Could some simple event pairs be so common that we are likely to possess special auditory mechanisms for their recognition, mechanisms language harnesses? We turn to this question next, and uncover a kind of event sufficiently fundamental in physics that it is also found as a fourth kind of phoneme in language.

Nature’s Other Phoneme

I have been treating hits and slides as two different kinds of physical interaction. But slides are more complex than hits. This is because slides consist of very large numbers of very low-energy hits. For example, if you rub your fingernail on this piece of paper, it will be making countless tiny collisions at the microscopic level. Or, if you close this book and run your fingernail over the edges of the pages of the book, the result will be a slide with one little hit for each page of the book. But it would not be sensible to conclude, on this basis, that there are just two fundamental natural building blocks for events—hits and rings—because describing a slide in terms of hits could require a million hits! We still want to recognize slides as one of nature’s phonemes, because slides are a kind of supersequence of little hits that is qualitatively unlike the hits produced when objects simply collide.

‹ Prev Next ›