Book Read Free

Harnessed: How Language and Music Mimicked Nature and Transformed Ape to Man

Page 7

by Mark Changizi


  Figure 6. (a) A rigid hit (i.e., involving rigid objects) rebounds—and rings—with little delay after the initial collision. (b) A nonrigid hit takes some time before rebounding and ringing. These physical distinctions are similar to the voiced and unvoiced plosives.

  The key difference between the high-pressure ball and the flat ball—and the difference between the book falling on a solid desk versus crumpled paper—is that the former is more rigid than the latter. The more rigid the objects in a collision, the shorter the compression period, and the shorter the gap between the initial hit and the ring. The high-pressure ball is not only more rigid than the flat ball, but also more elastic. More elastic objects regain their original shape and kinetic energy after decompression, lose less energy to heat during compression, and tend to have shorter gaps. Also, if an object breaks, cracks, or fractures as it hits—a kind of nonrigidity and inelasticity—the gap is longer.

  Therefore, although some hits ring with effectively no delay, other kinds of hits take their time before ringing. Hits can be hesitant, and the delay between hit and ring is highly informative because it tells us about the rigidities of the objects involved. Our auditory systems understand this information very well: they have been designed by evolution to possess mechanisms for sensing this gap and thus for perceiving the rigidity of the objects involved in events.

  Because our auditory systems are evolutionarily primed to notice these hit-to-ring delays, we expect that languages should have come to harness this capability, so that plosives may be distinguished on the basis of such hit-to-ring delays. That is, we would expect that plosive phonemes will have as part of their identity a characteristic gap between the initial explosive sound and the subsequent sonorant. Language does, indeed, pay homage to the hit-ring gaps in nature, in the form of voiced and unvoiced plosives. Voiced plosives are like “b,” “g,” and “d,” and in these cases the sonorant sound following them occurs with negligible delay (Figure 7b, left). They even sound bouncy—“boing,” “bob,” and “bounce”—like a properly inflated basketball. Unvoiced plosives are like “p,” “k,” and “t,” and in these cases there is a significant delay after the plosive and before the sonorant sound begins, a delay called the voice onset time (Figure 7b, right). (Try saying “pa,” and listen for when your voice kicks in.) In English we have short voice onset times and long ones, corresponding to voiced and unvoiced plosives, respectively. Some other languages have plosives with voice onset times in between those found in English.

  Figure 7. Illustration that voiced plosives are like rigid, elastic hits, and unvoiced plosives like nonrigid, inelastic hits. These plots show the amplitude of the sound on the y-axis, and time on the x-axis. (a) The sound made by a stiff hardcover book landing on my wooden desk on the left, followed by the sound of that same book landing on my desk, but where a wrinkly piece of paper cushioned the landing (making it less rigid and less elastic). (b) Me saying “bee” and “pee.” Notice that in the inelastic book-drop and the unvoiced plosive cases—i.e., the right in (a) and (b)—there is a delay after the initial collision before the ringing begins.

  Not only do languages utilize a wide variety of voice onset times—hit-to-ring gaps—for plosive phonemes, but one does not find plosive phonemes that don’t care about the length of the gap. One could imagine that, just as the intensity of a spoken plosive doesn’t change the identity of the plosive, the voice onset time after a plosive might not matter to the identity of a plosive. But what we find is that it always does matter. And that’s because the intensity of a hit in nature is not informative about the objects involved, but the gap from hit to ring is informative (as is the timbre). That’s why the gap from hit to ring is harnessed in language. And that’s why, as we saw earlier, the distinct plosive sounds at the start and end of words are treated as the same, despite being acoustically more different than are voiced and unvoiced plosives (like “b” and “p”).

  In light of the ecological meaning of voiced versus unvoiced plosives, consider the following two letters from a mystery language: ◆ and ✴. Each stands for a plosive, but one is voiced and the other unvoiced. Which is which? Most people guess that ◆ is voiced, and that ✴ is unvoiced. Why? My speculation is that it is because ◆ looks rigid, and would tend to be involved in hits that are voiced (i.e., a short gap from hit to ring), whereas ✴ looks more kinked, and thus would be likely to have a more complex collision, one that is unvoiced (i.e., a long gap between hit and ring). My “mystery language” is fictional, but could it be that more rigid-looking letters across real human writing systems have a tendency to be voiced, and more kinked-looking letters have a tendency to be unvoiced? It is typically assumed that the shapes of letters are completely arbitrary, and have no connection to the sounds of speech they stand for, but could it be that there are connections because objects with certain shapes tend to make certain sounds? This is the question Kyle McDonald—a graduate student at Rensselaer Polytechnic Institute (RPI) working with me—raised and set out to investigate. He found that letters having junctions with more contours emanating from them—i.e., the more kinked letters—have a greater probability of being unvoiced. For example, in English the three voiced plosives are “b,” “d,” and “g,” and their unvoiced counterparts are “p,” “t,” and “k.” Notice how the unvoiced letters—the “t” and “k,” in particular—have more complex structures than the voiced ones. Kyle McDonald’s data—currently unpublished—show that this is a weak but significant tendency across writing systems generally.

  Rigid Muffler

  As I walk along my upstairs hallway, I accidentally bump the hammer I’m carrying into the antique gong we have, for some inexplicable reason, hung outside the bedroom of our sleeping infant. I need to muffle it, quickly! I have one bare hand, and the other wielding the guilty hammer; what do I do? It’s obvious. I should use my bare hand, not the hammer, to muffle the gong. Whereas my hand will dampen out the gong ring quickly, the hammer couldn’t be worse as a dampener. My hand serves as a good gong-muffler because it is fleshy and nonrigid. My hand muffles the gong faster than the rigid hammer, yet recall from the previous section that nonrigid objects cause explosive hits with long hit-to-ring gaps. Nonrigid hits create rings with a delay, and yet diminish rings without delay. And, similarly, rigid hits create rings without delay, but are slow dampeners of rings.

  These gong observations are crucial for understanding what happens to voiced and unvoiced plosives when they are not released (i.e., when the air in the mouth and lungs is not allowed to burst out, creating the explosive hit sound), which often occurs at word endings (as discussed in the section titled “Two-Hit Wonder”). When a plosive is not released, there clearly cannot be a hit-to-ring gap—because it never rings. So how do voiced and unvoiced plosives retain their voiced-versus-unvoiced distinction at word endings? For example, consider the word “bad.” How do we know it is a “d” and not a “t” at the end, given that it is unreleased, and thus there is no hit-to-ring delay characterizing it as a “d” and not a “t”?

  My gong story makes a prediction in this regard. If voiced plosives really have their foundation in rigid objects (mimicking rigidity’s imperceptibly tiny hit-to-ring gap at a word’s beginning), then, because rigid objects are poor mufflers, the sonorant preceding an unreleased voiced plosive at a word ending should last longer than the sonorant preceding an unreleased unvoiced plosive at a word ending. For example, the vowel sound in “bad” should last longer than in the word “bat.” The nonrigid “t” at the end of the latter should muffle it quickly. Are words like “bad” spoken with vowels that ring longer than in words like “bat”?

  Yes. Say “bad” and “bat.” The main difference is not whether the final plosive is voiced—neither is, because neither is ever released, and thus neither ever gets to ring. Notice how when you say “bad,” the “a” gets more drawn out, lasting longer, than the “a” sound in “bat.” Most nonlinguist readers may never have noticed that the principal disting
uishing feature of voiced and unvoiced plosives at word endings is not whether they are voiced at all. It is a seemingly unrelated feature: how long the preceding vowel lasts. But, as we see from the physics of events, a longer-lasting ring before a dampening hit is the signature of a rigid object’s bouncy hit, and so there is a fundamental ecological order to the seemingly arbitrary linguistic phonological regularity. (See Figure 8.)

  Figure 8. Matrix illustrating the tight match between the qualities of hits (not in parentheses) and plosives (within parentheses). For hits, the columns distinguish between rigid and nonrigid hits, and the rows distinguish between hits that initiate rings and hits that muffle rings. Inside the matrix are short descriptions of the auditory signature of the four kinds of hits. For plosives, the columns distinguish the analogs of rigid and nonrigid hits, which are, respectively, voiced and unvoiced plosives; the rows distinguish the analogs of ring-initiating and ring-muffling hits, which are, respectively, released and unreleased plosives. Together, this means four kinds of hits, and four expected kinds of plosives, matching the signature features of the respective hits. If the meaning of voiced versus unvoiced concerns rigid versus nonrigid objects, then we expect that plosives at word starts should have little versus a lot of voice-onset time, respectively, for voiced and unvoiced. And we expect that for plosives at word endings the voiced ones should reveal themselves via a longer preceding sonorant (slow to damp) whereas unvoiced should reveal themselves via a shorter preceding sonorant (fast to damp). Plosives do, in fact, modulate across this matrix as predicted from the ecological regularities of rigid and nonrigid hits at ring-inceptions and ring-dampenings.

  Over the last half dozen sections of this chapter we have analyzed the constituents—the hits, slides, and rings—of events and language. Hits, slides, and rings may be the fundamental building blocks for human speech, but that alone doesn’t make speech sound natural. Just as natural contours can be combined in unnatural ways for vision, natural sound atoms can be combined unnaturally for audition. Language will not effectively harness our auditory system if speech combines plosives, fricatives, and sonorants in unnatural ways, like “yowoweelor” or “ptskf.” To find out whether speech sounds like nature, we need to understand how nature’s phonemes combine, and then see if language combines in the same way. For the rest of this chapter, we will look at successively larger combinations of sounds. But we turn first to the simplest combination.

  Nature’s Syllables

  My friend’s boy made a video of himself solving a Rubik’s Cube blindfolded, and then posted it on the Web. As I watched him put the blindfold on, pick up the cube, and begin twisting, I noticed something strange about the sound, but I couldn’t put my finger on what was unusual. Later, when I commented to my friend how his bright boy must owe it to inheritance, he replied, “Indeed, the apple doesn’t fall far from the tree. He faked it. The movie was in reverse.”

  The world does not sound the same when run backward. What had raised my antennae when watching the Rubik’s Cube video was the unusual sounds that occur when one hears events in reverse. One of the first strange sounds occurred when he picked up the cube at the start of the video. Knowing now that it was shown in reverse, what appeared in the video to be him picking up the cube to begin unscrambling it was actually him setting the cube down after having scrambled it. Setting the cube down caused a hit and a ring, but in reverse what one hears is a ring coming out of nowhere, and ending with a sudden ring-stopping hit (the second voice of a hit, as discussed earlier in the section titled “Two-Hit Wonder”). That just doesn’t happen much in nature. When nature comes to the door, it knocks before ringing, not the other way around. Rings don’t start events. Rings are due to the periodic vibrations of objects, and objects do not typically ring without first being in physical contact with another object. Rings therefore do not typically occur without a hit or slide occurring first.

  Hits, slides, and rings may be the principal fundamental building blocks for events, but rings are a different animal than hits and slides. Hits and slides involve objects in motion, physically interacting with other objects. Hits and slides are the backbone of the causal chain in an event. Rings, on the other hand, occur as a result of hits or slides, but don’t themselves cause more events. Rings are free riders, contributing nothing to the causality. Events do not have a ring followed by another ring. That’s impossible (although a single complex, or wiggly, ring is possible, as we discussed in an earlier section). And events never have an interaction (i.e., a hit or a slide) followed directly by another interaction without an intervening ring. Sometimes a ring will be inaudible, and so there will appear to be two interactions without an intervening ring, but physically there’s always an intervening ring, because objects that are involved in a physical interaction always vibrate to some extent. Events also always end with a ring, although whether it is audible is another matter.

  The most basic way in which hits, slides, and rings combine is, then, this:

  Interaction—Ring

  where the interaction can be either a hit or a slide. If we let c stand for a hit or a slide (because “c” can be pronounced either as a plosive, “k,” or as a fricative, “s”), and a stand for a ring (which, recall, can sometimes be wiggly), then the fundamental structure of solid-object physical events is exemplified by caca. Not acac. Not cccaccca. Not accacc. And so on. Letting b stand for hits and s for slides, events take forms such as ba, sa, baba, saba, basaba, and so on. Not ab or sba or a or bbb or ssb or assb or the like. This interaction-ring combination is perhaps the most fundamental event regularity in nature, and is perhaps the most perceptually salient. Objects percussively interact via either a hit or slide, and give off a ring. Our auditory system—and probably that of most other mammals—is designed to expect nature’s phonemes to come in this interaction-ring form.

  Given the fundamental status of interaction-ring combinations, if language harnesses the innate powers of our auditory system, then we expect language to be built out of vocalizations that sound like interaction-ring. Do languages have this feature? That is, do plosives and fricatives tend to be followed by sonorants? Yes. A plosive or fricative followed by a sonorant is, in fact, the most basic and most common phoneme combination across languages. It is the quintessential example of a syllable. Words across humankind tend to look approximately like ca, or caca, or cacaca, where c stands for a plosive or fricative, and a for one or more consecutive sonorants. All languages have syllables of this ca form. And many languages—such as Japanese—only allow syllables of this form.

  Whereas interaction-ring is the most fundamental natural combination of event atoms, ring-interaction is a combination that is not possible. A ring followed by an interaction sounds out of this world, as in my friend’s son’s Rubik’s Cube video. We therefore expect that languages tend to avoid combinations like ac and acac. This is, in fact, the case. The rarest syllable type is of this ac form, and words starting with a sonorant and followed by a plosive or fricative are rare. In data I collected at RPI in 2008 with the help of undergraduate student Elizabeth Counterman and graduate student Kyle McDonald, about 80 percent of our sampled words (with three or fewer non-sonorants) across 18 widely varying languages begin with a plosive or a fricative. (See the legend of Figure 9 for a list of the sampled languages.) And a large proportion of the words starting with a sonorant start with a nasal, like “m” and “n,” the least sonorant-like of the sonorant consonants (nasals at word starts can have a fairly sudden start, and are more plosive-like than other sonorant consonants).

  Note that a word starting with a vowel does not start with a sonorant, because when one speaks such a word, the utterance actually begins with something called a glottal plosive, produced via the sudden hitlike release of air at one’s voice box. To illustrate the glottal plosive, slowly say “packet,” and then slowly say “pack it.” When you say the latter, there can often be a sharp beginning to the “it,” something that will never occur before the “et” so
und in “packet.” That sharp beginning is the glottal plosive. Words starting with sonorants are, thus, less common than one might at first suspect. Even words like “ear,” “I,” “owe,” and “owl,” then, are cases of plosives followed by sonorants, and agree with the common hit-ring (the most common kind of interaction-ring) structure of nature.

  Words truly beginning with a sonorant sound begin not with a vowel, but with a sonorant consonant like w, y, l, r, and m. When one says, “what,” “yup,” “lid,” “rip,” and “map,” the start of the word is nonsudden (or less sudden than a plosive), ramping up more gradually to the sonorant sound instead. And notice that words such as these—with a sonorant at the start and a plosive at the end—do sound like backwards sounds. Try saying the following meaningless sentence: “Rout yab rallod.” Now say this one: “Cort kabe pullod.” Although they are similar, the first of these meaningless sentences sounds more like events in reverse. This is because it has words of the ring-hit form, the signature sound of a world in reverse. The second sentence, while equally meaningless, sounds like typical speech (and event) sounds, because it starts with plosives.

  Language’s most universal structure above the level of phonemes—the syllable—has its foundation, then, in physics. The interaction-rings of physical events got instilled into our auditory systems over hundreds of millions of years of vertebrate and mammalian evolution, and culture shaped language to sound like physics in order to best harness our hardware.

  Before we move next to the shape of words, there is another place where syllables play a central role: in rhyme. Two words rhyme if their final syllables have the same sonorant sound, and the same plosive or fricative following the sonorant—for example, “snug as a bug in a rug.” The sonorant sound is the more important of the two: “bug” rhymes better with “bud” than with “bag.” Our ecological understanding of syllables may help to make sense of the perceptual salience of rhyme. When two events share the same ring sound, it means the same kind of object is involved in both events. For example, “tell and “sell” rhyme, and in terms of nature’s physics, they sound like two distinct events involving the same object. “Tell” might suggest that some object has been hit, and “sell” that that same object is now sliding. The “ell” in each case signals that it is the same object undergoing different events. This is just the kind of gestalt perceptual mechanism humans are well known to possess: we attempt to group stimuli into meaningful units. In vision this can lead to contours at distant corners of an image being perceptually treated as parts of one and the same object, and in audition it can lead to sounds separated by time as nevertheless grouped into the same object. That’s what happens in rhyme: the second word of a rhyming pair may occur several lines later, but our brain hears the similar ringing sound and groups it with the earlier one, because it would be likely in nature that such sounds were made by one and the same object.

 

‹ Prev