Harnessed: How Language and Music Mimicked Nature and Transformed Ape to Man

Home > Other > Harnessed: How Language and Music Mimicked Nature and Transformed Ape to Man > Page 18
Harnessed: How Language and Music Mimicked Nature and Transformed Ape to Man Page 18

by Mark Changizi


  Figure 36. (a) When a foot hits the ground, it is not moving forward or backward, and therefore has no Doppler shift. But as we’ll discuss, there’s more to the story. (b) (i) Top: The footstep leads to sound waves going in all directions, all at the same frequency (indicated by the spacing between the wave fronts). Bottom: These waves hit the body and reflect off it. Because the mover is moving forward, the sound waves reflected forward will be Doppler shifted to a higher pitch; the waves hitting the mover’s rear will reflect at a lower pitch. (ii) When feet land they don’t simply move vertically downward for a thud. The surface of the ground very often has complex material on it, which the landing foot strikes as it is still moving forward. These complex sounds will have Doppler shifts. (iii) If our feet were like a pirate’s peg leg, then the single thud it makes when hitting the ground would have no Doppler shift. But our feet aren’t peg legs. Instead, our foot lands on its heel, and the point of contact tends to move forward along the bottom of the foot.

  Footsteps can, then, Doppler shift, and these shifts are detectable. There is now a third difficulty that can be raised: if Doppler shifts for human movers are fairly meager, then why doesn’t musical melody have meager tessitura width (i.e., meager pitch range for the melody)? The actual tessitura in melody tends to be wider than that achievable by a human mover, corresponding to speeds faster than humans can achieve. Why, if melodic pitch contours are about Doppler pitches, would music exaggerate the speed of the depicted observer? Perhaps for the same reason that exaggeration is commonplace in other art forms. Facial expressions in cartoons, for example, tend to be hyperexaggerations of human facial expressions. Presumably such exaggerations serve as superstimuli, hyperactivating our brain’s mechanisms for detecting the characteristic (e.g., smile or speed), and something about this hyperactivation feels good to us (perhaps a bit like being on a roller coaster).

  One final thought concerning the mismatch between the Doppler pitch range and the tessitura widths found in music: could I have been underestimating the size of the Doppler shifts humans are capable of? Although we may only move in the one- to ten-meters-per-second range, and our limbs may swing forward at a little more than twice our body’s speed, parts of us may be moving at faster speeds. Recall that your feet hit the ground from heel to toe. The sequence of microhits travels forward along the bottoms of your feet, and the entirety of sound the sequence makes will be Doppler shifted. An interesting characteristic of this kind of sound is that it can act like a sound-making object that is moving much faster than the object that actually generates it. As an example, when you close scissors, the actual objects—the two blades—are simply moving toward each other. But the point of contact between the blades moves outward along the blades. The sound of closing scissors is a sound whose principal source location is moving, even though no object is actually moving in that manner. This kind of faux-moving sound maker can go very fast. If two flattish surfaces hit each other with one end just ever so slightly ahead of the other, then the speed of the faux mover can be huge. For example, if you drop a yardstick so as to make it land flat, and one end hits the ground one millisecond before the other end, then the faux mover will have traveled between the yardstick and the ground from one end to the other at about one kilometer per second, or about two thousand miles per hour! The faux mover beneath our stepping feet may, in principle, be moving much faster than we are, and any scissor-like sound it makes will thus acquire a Doppler pitch range much wider than that due to our body’s natural speed.

  Human movers do make sounds that Doppler shift, and these shifts are detectable by our auditory system. And their exaggeration in music is sensible in light of the common role of exaggeration in artistic forms. Melodic contour, we have seen thus far, has many of the signature properties expected of Doppler shifts, lending credence to the idea that the role of melodic pitch contours is to tell the story of the sequence of directions in which a mover is headed. That’s a fundamental part of the kinematic information music imparts about the fictional mover. But that’s only half the story of “kinemusic.” It doesn’t tell us how far away the mover is, something more explicitly spatial. That is the role of loudness, the topic of the rest of this chapter.

  Loud and in 3-D

  Do you know why I love going to live shows like plays or musicals? Sure, the dialogue can be hilarious or touching, the songs a hoot, the action and suspense thrilling. But I go for another reason: the 3-D stereo experience. Long before movies were shot and viewed in 3-D, people were putting on real live performances, which provide a 3-D experience for all the two-eyeds watching. And theater performances don’t simply approximate the 3-D experience—they are the genuine article.

  “But,” you might respond, “one goes to the theater for the dance, the dialogue, the humans—for the art. No one goes to live performances for the ‘3-D feel!’ What kind of lowbrow rube are you? And, at any rate, most audiences sit too far away to get much of a stereo 3-D effect.”

  “Ah,” I respond, “but that’s why I sit right up front, or go to very small theater houses. I just love that 3-D popping-out feeling, I tell ya!”

  At this point you’d walk away, muttering something about the gene pool. And you’d be right. That would be a dopey thing for me to say. We see people doing their thing in 3-D all the time. I just saw the waitress here at the coffee shop walk by. Wow, she was in 3-D! Now I’m looking at my coffee, and my mug’s handle appears directed toward me. Whoa, it’s 3-D!

  No. We don’t go to the live theater for the 3-D experience. We get plenty of 3-D thrown at us every waking moment. But this leaves us with a mystery. Why do people like 3-D movies? If people are all 3-D’ed out in their regular lives, why do we jump at the chance to wear funny glasses at the movie house? Part of the attraction surely is that movies can show you places you have never been, whether real or imaginary, and so with 3-D you can more fully experience what it is like to have a Tyrannosaurus rex make a snout-reaching grab for you.

  But there is more to it. Even when the movie is showing everyday things, there is considerable extra excitement when it is in 3-D. Watching a live performance in a tiny theater is still not the same as watching a 3-D movie version of that same performance. But what is the difference?

  Have you ever been to one of those shows where actors come out into the audience? Specific audience members are sometimes targeted, or maybe even pulled up onstage. In such circumstances, if you’re not the person the actors target, you might find yourself thinking, “Oh, that person is having a blast!” If you’re the shy type, however, you might be thinking, “Thank God they didn’t target me because I’d have been terrified!” If you are the target, then, whether you liked it or not, your experience of the evening’s performance will be very different from that of everyone else in the audience. The show reached out into your space and grabbed you. While everyone else merely watched the show, you were part of it.

  The key to understanding the “3-D movie” experience can be found in this targeting. 3-D movies differ from their real-life versions because everyone in the audience is a target, all at the same time. This is simply because the 3-D technology (projecting left- and right-eye images onto the screen, with glasses designed to let each eye see only the image intended for it) gives everyone in the audience the same 3-D effect. If the dragon’s flames appear to me to nearly singe my hair but spare everyone else’s, your experience at the other side of the theater is that the dragon’s flames nearly singe your hair and spare everyone else’s, including mine. If I experience a golf ball shooting over the audience to my left, then the audience to my left also experiences the golf ball going over their left. 3-D movies put on a show that is inextricably tied to each listener, and invades each listener’s space equally. Everyone’s experience is identical in the sense that they’re all treated to the same visual and auditory vantage point. But everyone’s experience is unique because each experiences himself as the target—each believes he has a specially targeted vantage point.


  The difference, then, between a live show seen up close and a 3-D movie of the same show is that the former pulls just one or several audience members into the thick of the story, whereas 3-D movies have this effect on everyone. So the fun of 3-D movies is not that they are 3-D at all. We can have the same fun when we happen to be the target in a real live show. The fun is in being targeted. When the show doesn’t merely leap off the screen, but leaps at you, it fundamentally alters the emotional experience. It no longer feels like a story about others, but becomes a story that invades your space, perhaps threateningly, perhaps provocatively, perhaps joyously. You are immersed in the story, not an audience member at all.

  What does all this have to do with music and the auditory sense? Imagine yourself again at a live show. You hear the performers’ rhythmic banging ganglies as they carry out behaviors onstage. And as they move onstage and vary their direction, the sounds they make will change pitch due to the Doppler effect. Sitting there in the audience, watching from a vantage point outside of the story, you get the rhythm and pitch modulations of human movers. You get the attitude (rhythm) and action (pitch). But you are not immersed in the story. You can more easily remain detached.

  Now imagine that the performers suddenly begin to target you. Several just jumped off the stage, headed directly toward you. A minute later, there you are, grinning and red-faced, with tousled hair and the bright red lipstick mark of a mistress’s kiss on your forehead . . . and, for good measure, a pirate is in your face calling you “salty.” During all this targeting you hear the gait sounds and pitch modulations of the performers, but you also heard these sounds when you were still in detached, untargeted audience-member mode. The big auditory consequence of being targeted by the actors is not in the rhythm or pitch, but in the loudness. When the performers were onstage, most of the time they were more or less equidistant, and fairly far away—and so there was little loudness modulation as they carried on. But when the performers broke through the “screen,” they ramped up the volume. It is these high-loudness parts of music—the fortissimos, or ff s—that are often highly evocative and thrilling, as when the dinosaur reaches out of the 3-D theater’s screen to get you.

  And that’s the final topic of this chapter: loudness, and its musical meaning. I will try to convince you that loudness modulations are used in music in the 3-D, invade-the-listener’s-space fashion I just described. In particular, this means that the loudness modulations in music tend to mimic loudness modulations due to changes in the proximity of a mover. Before getting into the evidence for this, let’s discuss why I don’t think loudness mimics something else.

  Nearness versus Stompiness

  I will be suggesting that loudness in music is primarily driven by spatial proximity. Rather than musical pitch being a spatial indicator, as is commonly suggested (see the earlier section “Why Pitch Seems Spatial”), it is loudness in music that has the spatial meaning. As was the case with pitch, here, too, there are several stumbling blocks preventing us from seeing the spatial meaning of loudness. The first is the bias for pitch: if one mistakenly believes that pitch codes for space, then loudness must code for something else. A second stumbling block to interpreting loudness as spatial concerns musical notation, which codes loudness primarily via letters (pp, p, mf, f, ff, and so on), rather than as a spatial code (which is, confusingly, how it codes pitch, as we’ve seen). Musical instruments throw a third smokescreen over the spatial meaning of loudness, because most instruments modulate loudness not by spatial modulations of one’s body, but by hitting, bowing, plucking, or blowing harder.

  Therefore, several factors are conspiring to obfuscate the spatial meaning of loudness. But, in addition, the third source of confusion I just mentioned suggests an alternative interpretation: that loudness concerns the energy level of the sound maker. A musician must use more energy to play more loudly, and this can’t help but suggest that louder music might be “trying” to sound like a more energetic mover. The energy with which a behavior is carried out is an obvious real-world source of loudness modulations. These energy modulations are, in addition, highly informative about the behavior and expressions of the mover. A stomper walking nearby means something different than a tiptoer walking nearby. So energy or “stompiness” is a potential candidate for what loudness might mean in music.

  Loudness in the real world can, then, come both from the energy of a mover and from the spatial proximity of the mover. And each seems to be the right sort of thing to potentially explain why the loudest parts of music are often so thrilling and evocative: stompiness, because the mover is energized (maybe angry); proximity, because the mover is very close by. Which of these ecological meanings is more likely to drive musical loudness, supposing that music mimics movement? Although I suspect music uses high loudness for both purposes—sometimes to describe a stompy mover, and sometimes to describe a nearby mover—I’m putting my theoretical money on spatial proximity.

  One reason to go with the spatial-proximity interpretation of loudness, at the expense of the stompiness interpretation, is pragmatic: the theory is easier! Spatial proximity is simply distance from the listener, and so changes in loudness are due to changes in distance. That’s something I can wrap my theoretical head around. But I don’t know how to make predictions about how walkers vary in their stompiness. Stompers vary their stompiness when they want to, not in the way physics wants to. That is, if musical loudness is stompiness, then what exactly does this predict? It depends on the psychological dynamics of stompiness, and I don’t know that. So, as with any good theorist, spatial proximity becomes my friend, and I ignore stompiness.

  But there is a second reason, this one substantive, for latching onto spatial proximity as the meaning of musical loudness. Between proximity and stompiness, proximity can better explain the large range of loudness that is possible in music. Loudness varies as the inverse square of proximity, and so it rises dramatically as a mover nears the listener. Spatial proximity can therefore bring huge swings in loudness, far greater than the loudness changes that can be obtained by stomping softly and then loudly at a constant distance from a listener. That’s why I suspect proximity is the larger driver of loudness modulations in music. And as we will see, the totality of loudness phenomena in music are consistent with proximity, and less plausible for stompiness (including the phenomenon discussed in Encore 5, that note density rises with greater loudness).

  Thus, to the question “Is it nearness or stompiness that drives musical loudness modulations?” the answer, for both pragmatic and substantive reasons, is nearness, or proximity. Nearness can modulate loudness much more than stompiness can, and nearness is theoretically tractable in a way that stompiness is not. Let’s see if proximity can make sense of the behavior of loudness in music.

  Slow Loudness, Fast Pitch

  Have you ever wondered why our musical notation system is as it is? In particular, why does our Western music notation system indicate pitch by shifting the notes up and down on the staff, while it indicates loudness symbolically by letters (e.g., pp, f ) along the bottom? Figure 37 shows a typical piece of music. Even if you don’t read music—and thus don’t know exactly which pitch each note is on—you can instantly interpret how the pitch varies in the melody. In this piece of music, pitch rises, wiggles, falls, falls, falls yet again, only to rise and tumble down. You can see what pitch does because the notation system creates what is roughly a plot of pitch versus time. Loudness, on the other hand, must be read off the letters along the bottom, and their meaning unpacked from your mental dictionary: p for “quiet,” f for “loud,” and so on. Why does pitch get a nice mapping onto spatial position, whereas loudness only gets a lookup table, or glossary?

  Figure 37. The usual notation, where vertical position indicates pitch, and intensities are shown along the bottom. The music is a simplification of the seventh through twelfth measures from Johann Christoph Friedrich Bach’s Menuet and Alternativo. It utilizes standard music notation. Standard
notation is sensible because pitches vary much more quickly than loudness, so it tends not to be a problem to have to read the loudness levels along the bottom.

  Music notation didn’t have to be like this. It could do the reverse: give loudness the spatial metaphor, and relegate pitch to being read along the bottom in symbols. Figure 38 shows the same excerpt we just saw in Figure 37, but now in this alternative musical notation system. Going from the lowest horizontal line upward, the lines now mean pianissimo (pp), piano (p), mezzo forte (mf), forte (f), and fortissimo (ff). The pitches for each note of the song are now shown along the bottom. Once one sees this alternative notation system in Figure 38, it becomes obvious why it is a terrible idea. When vertical height represents loudness, vertical height tends to just stay constant for long periods of time. The first eight notes are all at one loudness level (piano), and the remaining 12 are all at a second loudness level (forte). Visually there are just two plateaus, severely underutilizing your visual talents for seeing spatial wiggles. In standard notation, where pitch is spatially represented, on the other hand, the notes vary vertically much more on the page. Not only does our hypothetical alternative notation underutilize the capabilities of the visuospatial code, it overutilizes the letter codes. We end up with “word salad” along the bottom. In this case, there are 15 instances where the pitch had to be written down, nearly as many as there are notes in the excerpt. In standard notation, where loudness is coded via letters, there were just two letters along the bottom (see Figure 37 again).

 

‹ Prev