Harnessed: How Language and Music Mimicked Nature and Transformed Ape to Man

Home > Other > Harnessed: How Language and Music Mimicked Nature and Transformed Ape to Man > Page 9
Harnessed: How Language and Music Mimicked Nature and Transformed Ape to Man Page 9

by Mark Changizi


  This is a natural lead-in to the rest of the book, which deals with the origins of music, where loudness and pitch are even more crucial. We will see that “unresolved” pitch even tends to get resolved in melody.

  Summary Table

  In our modern lives we hear hits, slides, and rings all around us, and we also hear the sounds of speech. They mean fundamentally different things to us, and so our brains quickly learn to treat them differently. Our brains can treat them differently because, despite the many similarities between solid-object physical event sounds and speech sounds that I have pointed to throughout this chapter, there are ample auditory cues distinguishing them (e.g., the timbre of a voice is fundamentally different from the timbre of most solid objects). And once our brains treat these sounds as fundamentally different in their ecological meaning, it can be next to impossible to hear that there are deep similarities in how they sound. A fish struggling up onto land for the first time, however, and listening to human speech intermingled with the solid-object event sounds in the terrestrial environment, might find the similarity overwhelming. “What is wrong with these apes,” it might wonder, “that they spend so much of their day mimicking the sounds of solid-object physical events?”

  In this chapter, I have tried to bring out the fish in all of us, pointing out the solid-object event sounds we make when we’re speaking, but fail to notice because of our overfamiliarity with them (and because of the similarities not holding “all the way up,” as discussed in the previous chapter). The table below summarizes the many ways in which speech sounds like solid-object physical events, with references to the earlier sections where we discussed each of them.

  Setion of chpter

  Soid-object

  hysical events

  Laguage

  1.Mother Natre’s Voice

  Phsical evens are best sensed by audition.

  Laguage usesaudition.

  2.Nature’s Ponemes

  Th main thre event constituents are hits, slides, and rings.

  Th main thre kinds of phoneme are plosives, fricatives, and sonorants.

  3.Nature’s Ponemes

  His are morecommon than slides.

  Plsives are ore common than fricatives.

  4.Wiggly Rins

  Rigs can chage in timbre and tone during their occurrence.

  Soorants canchange in formants (diphthongs and sonorant consonants) and tone during utterance.

  5.Wiggly Rins

  His and slids do not tend to change their sound during their occurrence.

  Plsives and ricatives do not tend to change during their utterance.

  6.Nature’s Oher Phoneme

  A ourth mainconstituent of events is the hit-slide. But not slide-hit.

  A ourth mainphoneme type is the affricate. But there is no “fricative-plosive” phoneme type.

  7.Two-Hit Woder

  A it betweentwo objects can have two distinct auditory consequences. Usually it is an instantaneous explosive burst, but sometimes it is a sudden dampening.

  Plsives of ay kind have two forms, explosive and dampened (usually word-final).

  8.Slides Tha Sing

  Sldes usuall occur on nonregular surfaces, but sometimes occur on surfaces with periodic regularities, leading to sound with periodicity (tonality).

  Frcatives ar more commonly voiceless, but are still often voiced. Whether or not a fricative is voiced is usually part of the identity of the fricative (as the surface periodicity is part of the identity of a surface).

  9.Hesitant Hts

  His can varywidely in the rigidity of the objects involved, and thus vary widely in the time from first explosion to the ring. This can help to identify the objects involved.

  Plsives varyin the duration of time to the following sonorant sound. This is called the voice onset time (VOT), and is part of a phoneme’s identity.

  10 Rigid Mufler

  Riid hits (wich cause short hit-to-ring delays when initiating a ring) are poor dampeners of rings.

  Voced plosivs (which have short voice onset times when released) are, when unreleased at word-endings, preceded by longer sonorant sounds.

  11 Nature’s yllables

  His and slids cause (usually audible) rings.

  Plsives and ricatives tend to be followed by a sonorant. This is the basic syllable form, consonant-vowel (CV).

  12 In the Beinning

  His tend to tart events disproportionately more often than slides do.

  Plsives tendto start words disproportionately more often than fricatives do.

  13 The FirstWas a Doozy

  Rigs are mor audible early in an event.

  Soorants aremore likely to follow a plosive or fricative near the starts of words.

  14 Nature’s ords

  Th number ofinteractions in an event tends to be from one to several, and the time scale of natural solid-object physical events tends to be on the order of several hundred milliseconds (with a lot of variability).

  Th number ofplosives or fricatives in a word tends to be one to several, and the time scale of its utterance tends to be on the order of several hundred milliseconds (with a lot of variability).

  15 Nature’s ords

  Th combinatins of hits and slides that occur in natural solid-object physical events have a characteristic, theoretically comprehensible pattern.

  Th combinatins of plosives and fricatives in words of languages have the signature pattern of solid-object physical events.

  16 Unresolve Questions

  Evnts with rsing pitch are often due to the Doppler effect, wherein an object is veering more toward the observer; i.e., it is the signature auditory pattern of an event “headed your way.” Falling pitch means the object is directing itself less and less toward you.

  Phases with ising intonation tend to connote a question or something that is unresolved, metaphorically akin to an event suddenly being directed toward you. Phrases with falling intonation tend to connate greater resolution, metaphorically akin to an object veering away from you, which you no longer have to deal with.

  This chapter, together with the fourth chapter in The Vision Revolution, argues that our linguistic ability, for both speech and writing, may well be due to nature-harnessing, rather than to a built-in “language instinct” or to general learning. Although language is central to our modern human identity, so is art, and it is natural to wonder whether some of humankind’s artistic wonders also have their origins in nature-harnessing. The remainder of the book takes up music, arguably the pinnacle of humankind’s artistic achievement.

  [

  1 ]Handbook of phonological data from a sample of the world’s languages: A report of the Stanford Phonology Archive (1979). Stanford University, Department of Linguistics.

  Chapter 3

  Soylent Music

  Blind Joggers

  Joggers love their headphones. If you ask them why, they’ll tell you music keeps them motivated. The right song can transform what is by all rights an arduous half hour of ascetic masochism into an exhilarating whirlwind (or, in my case, into what feels like only 25 minutes of ascetic masochism). Music-driven joggers may be experiencing a pleasurable diversion, but to the other joggers and bikers in their vicinity, they’re Tasmanian Devils. In choosing to jog to the beat of someone else’s drum rather than their own, headphone-wearing joggers have “blinded” themselves to the sounds of the other movers around them. Headphones don’t prevent joggers from deftly navigating the trees, stumps, curbs, and parked cars of the world, because these things can be seen as one approaches them. But when you’re moving in a world with other movers, things not currently in front of you can quickly arrive in front of you. That’s when the headphoned jogger stumbles . . . and crashes into the crossing jogger, passing biker, or first-time tricycler.

  These music-blinded movers may be a menace to our streets, but they can serve to educate us all about one of our underappreciated powers: using sound alone, we know where people are around us, and we know
the nature of their movement. I’m sitting in a coffee shop as I write this, and when I close my eyes, I can sense the movement all around me: a clop of boots just passed to my right; a person with jingling keys just walked in front of me from my right to my left, and back again; and the pitter-patter of a child just meandered way out in front of me. I sense where they are, their direction of motion, and their speed. I also sense their gait, such as whether they are walking or running. And I can often tell more than this: I can distinguish a brisk from a shuffling walk, an angry stomp from a happy prance; and I can even recognize a complex behavior like turning and stopping to drop a dirty tray in a bin, slowing to open a door, or reversing direction to fetch a forgotten coffee. My auditory system carries out these mover-detection computations even when I’m not consciously attending to them. That’s why I’m difficult to sneak up on (although they keep trying!), and why I only rarely find myself saying, “How long has that cheerleading squad been doing jumping jacks behind me?!” That almost never happens to me because my auditory system is keeping track of where people are and roughly what they’re doing, even when I’m otherwise occupied.

  We can now see why joggers with ears unencumbered by headphones almost never crash into feral dogs or runaway grandpas in wheelchairs: they may not see the dog or grandpa, but they hear their movement through space, and can dynamically modulate their running to avoid both and be merrily on their way. Without headphones, joggers are highly sensitive to the sounds of cars, and can track their movement: that car is coming around the bend; the one over there is reversing directly toward me; the one above me is falling; and so on. Joggers in headphones, on the other hand, have turned off their movement-detection systems, and should be passed with caution! And although they are a hazard to pedestrians and cyclists, the people they put at greatest risk are themselves. After a collision between a jogger and an automobile, the automobile typically only needs a power wash to the grille.

  How does your auditory system serve as a movement-tracking system? In addition to sensing whether a mover is to your left or right, in front or behind, and above or below—a skill that depends on the shape, position, and number of ears you have—you possess specialized auditory software that interprets the sounds of movers and generates a good guess as to the nature of the mover’s movement through space. Your software has evolved to give you four kinds of information about a mover: (i) his distance from you, (ii) his directedness toward (or away from, or at an angle to) you, (iii) his speed, and (iv) his behavior or gait. How, then, does your auditory system infer these four kinds of information? As we will see in this and the following chapters, (i) distance is gleaned from loudness, (ii) directedness toward you is cued by pitch, (iii) speed is inferred by the number of footsteps per second, and (iv) behavior and gait are read from the pattern and emphasis of footsteps. Four fundamental parameters of human movement, and four kinds of auditory cues: (i) loudness, (ii) sound frequency, (iii) step rate, and (iv) temporal pattern and emphasis. (See Figure 13.) Your auditory system has evolved to track these cues because of the supreme value of knowing what everyone is doing nearby, and where.

  This is where things get interesting. Even though joggers without headphones are not listening to music, their auditory systems are listening to fundamentally music-like constituents. Consider the four auditory movement cues mentioned just above (and shown on the right of Figure 13). Loudness? That’s just pianissimo versus piano versus forte and so on. (This is called “dynamics” in music, a term I will avoid because it brings confusion in the context of a movement theory of music.) Sound frequency? That’s roughly pitch. Step rate? That’s tempo. And the gait pattern? That’s akin to rhythm and beat. The four fundamental auditory cues for movement are, then, mighty similar to (i) loudness, (ii) pitch, (iii) tempo, and (iv) rhythm. (See Figure 14.) These are the most fundamental ingredients of music, and yet, there they are in the sounds of human movers. The most informative sounds of human movers are the fundamental building blocks of music!

  Figure 13. The four properties of human movers (left) are inferred from the four respective auditory stimuli (right).

  Figure 14. Central to music are the four musical properties in the center column, which map directly onto the auditory cues for sensing human movement.

  The importance of loudness, pitch, tempo, and rhythm to both music and movement is, as we will see, more than a coincidence. The similarity runs deep—something speculated on ever since the Greeks[1]. Music is not just built with the building blocks of movement, but is actually organized like movement, thereby harnessing our movement-recognition auditory mechanisms. Headphoned joggers, then, don’t just miss out on the real movement around them—they pipe fictional movement into their ears, making them even more hazardous than a jogger wearing earplugs.

  Much of the rest of this book is about how music came into the lives of us humans, how it gets into our brains, and why it affects us as it does. In short, we will see that music moves us because it literally sounds like moving.

  The Secret Ingredient

  When I was a teenager, my mother began listening to French instructional programs in order to brush up. She was proud of me when I began sitting and listening with her. “Perhaps my son isn’t a square physics kid after all,” she thought. And, in fact, I found the experience utterly enthralling. After many months, however, my mother’s pride turned to worry, because whenever she attempted to banter in even the most elementary French with me, I would stare back, dumbfounded. “Why isn’t this kid learning French?” she fretted.

  What I didn’t tell my mother was that I wasn’t trying to learn French. Why was I bothering to listen to a program I could not comprehend? I will let you in on my secret in a moment, but in the meantime I can tell you what I was not listening to it for: the speech sounds. No one would set aside a half hour each day for months in order to listen to unintelligible speech. Foreign speech sounds can pique our curiosity, but we don’t go out of our way to hear them. If people loved foreign speech sounds, there would be a market for them; we would set our alarm clocks to blare German at 5:30 a.m., listen to Navajo on the way to work in the car, and put on Bushmen clicks as background for our dinner parties. No. I was not listening to the French program for the speech sounds. Speech doesn’t enthrall us—not even in French.

  Whereas foreign speech sounds don’t make it as a form of entertainment, music is quintessentially entertaining. Music does get piped into our alarm clocks, car radios, and dinner parties. Music has its own vibrant industry, whereas no one is foolish enough to see a business opportunity in easy-listening foreign speech sounds. And this motivates the following question. Why is music so evocative? Why doesn’t music feel like listening to speech sounds, or animal calls, or garbage disposal rumbles? Put simply: why is music nice to listen to?

  In an effort to answer, let’s go back to the French instructional program and my proud, and then concerned, mother. Why was I joining my mom each day for a lesson I couldn’t comprehend, and had no intention of comprehending? Truth be told, it wasn’t an audiotape we were listening to, but a television show. And it wasn’t the meaningless-to-me speech sounds that lured me in, but one of the actors. A young French actress, in particular. Her hair, her smile, her mannerisms, her pout . . . but I digress. I wasn’t watching for the French language so much as for the French people, one in particular. Sorry, Mom!

  What was evocative about the show and kept me wanting more was the human element. The most important thing in the lives of our ancestors was the other people around them, and it is on the faces and bodies of other people that we find the most emotionally evocative stimuli. So when one finds a human artifact that is capable of evoking strong feelings, my hunch is that it looks or sounds human in some way. This is, I suggest, an important clue to the nature of music.

  Let’s take a step back from speech and music, and look for a moment at evocative and nonevocative visual stimuli in order to see whether evocativeness springs from people. In particular, cons
ider two kinds of visual stimuli, writing and color—each an area of my research covered in my previous book, The Vision Revolution.

  Writing, I have argued, has culturally evolved over centuries to look like natural objects, and to have the contour structures found in three-dimensional scenes of opaque objects. The nature that underlies writing is, then, “opaque objects in 3-D,” and that is not a specifically human thing. Writing looks like objects, not humans, and thus only has the evocative power expected of opaque objects: little or none. That’s why most writing—like the letters and words on this page—is not emotionally evocative to look at. (See top left of Figure 15.) Colors, on the other hand, are notoriously evocative—people have strong preferences regarding the colors of their clothes, cars, and houses, and we sense strong associations between color and emotions. I have argued in my research and in The Vision Revolution that color vision in us primates—our new-to-primates red-green sensitivity in particular—evolved to detect the blood physiology modulations occurring in the skin, which allow us to see color signals indicating emotional state and mood. Color vision in us primates is primarily about the emotions of others. Color is about humans, and it is this human connection to color that is the source of color’s evocativeness. And although, unlike color, writing is not generally evocative, not all writing is sterile. For example, “V” stimuli have long been recognized as one of the most evocative geometrical shapes for warning symbols. But notice that “V” stimuli are reminiscent of (exaggerations of) “angry eyebrows” on angry faces. Color is “about” human skin and emotion, and “V” stimuli may be about angry eyebrows—so the emotionality in each one springs from a human source. (See top right of Figure 15.) We see, then, that the nonevocative visual signs look like opaque, not-necessarily-human objects, and the evocative visual signs look like human expressions. I have summarized this in the top row of the table in Figure 15.

 

‹ Prev