Book Read Free

Um-- Slips, Stumbles, and Verbal Blunders, and What They Mean

Page 22

by Erard, Michael.


  11

  The Future of Verbal Blunders

  In the future, you can avoid making verbal blunders just as your ancestors did: by not speaking at all. Which is to say, until someone invents a pill that makes people say only what they intend to, the average, normal speaker of English will continue to make as many as seven to twenty-two slips of the tongue a day*65 and will have about two to four moments each day where finding the right word or name takes embarrassingly too much time.

  Such pesky intransigence will be true for disfluencies, too. Unless someone comes up with a machine that freezes time so you can stop to plan what to say, yet make your listener hear you as uninterrupted and smooth, about 5 to 8 percent of the words that normal speakers say every day—from about 325 to 1,800 of them—will involve an “uh,” “um,” some other pause filler; a repeated sound, syllable, or word; a restarted sentence; or a repair, all of which is normal for the everyday speaking that underpins our lives and our society, talk that isn’t scripted or rehearsed. These blunder stats are rough estimates, of course; your own performances will vary.

  It is very hard to change your natural tendency to slip and be disfluent. A person born a repeater or a deleter will likely leave this life as such. Yet you can rise above—or fall below—your blundering baseline quite readily, depending on your speaking situation and any tasks that accompany your talking. You will blunder more if you juggle eggs while talking. You will also slip more if you worry about the state of your cowlick—though your slips will not necessarily reveal your specific anxiety about your hair. However, you can probably blunder less if you reduce the number of things you have to be aware of or think about while speaking. For some people faced with formal speaking, this means rehearsing. For others, it means relaxing. Speaking in short sentences also helps. In other experiments, people speak more fluently when they’re threatened with electric shocks. That’s not so appealing for everyday use. Neither is shouting, another punishment with limited effectiveness. A more palatable therapy is drinking beer, which may, in large enough doses, eliminate “uhs” and “ums” entirely.

  Yet consider more evidence about the meanings of disfluency. In 1995, Nicholas Christenfeld asked college students to rate speakers on a tape who filled their pauses, left the pauses silent, or who did not pause at all. The students preferred speakers with no pauses. But guess what? Ummers were rated no less eloquent than the silent pausers. Moreover, the pause fillers had less of an actual impact than listeners said they preferred. Why? Because, as Christenfeld explained, “Speakers’ ‘ums’ were noticed only when the audience was attending to the style, and not to the content of the utterance.” Left to listen as they would, some listeners notice style, others content. What seems to distinguish good speakers from bad ones was that the good ones hide their hesitation, making listeners focus on substance and content. Want people to not notice your “ums”? Be interesting.

  What this research really tells us is that disfluency is utterly normal, that our rules for what counts as “good speaking” are resistant to the biological facts about it, that the rules evolve while the disfluencies remain stable, and that trying to communicate without disfluencies may be more distracting (and hence more damaging to fluency) than it’s worth. It shows that everyone’s baseline is uniquely their own. And it means that verbal blunders do not mean any more, in themselves, than what we attribute to them. The reality is that we talk in more situations every day where our verbal blunders pass by as unnoticed as shadows. And in the world of speaking, everyone has a shadow.

  While our verbal blunders are not endangered—their numbers won’t shrink in any absolute sense—it’s quite possible that people’s attitudes about them, the meanings they ascribe to them, and the assumption they make about verbal blunderers themselves may change subtly. In the last several years, prescripted disfluencies have begun to appear in advertisements, obviously in order to manipulate the listener’s trust. Telemarketers used to leave prerecorded messages on my answering machine like “Hi! This is Bob Jones, and I’m here to tell you about a great new home equity loan!” Now the messages sound more like, “Hey, I uh, I’m sorry I missed you, but uh…” They still try to sell me a mortgage I don’t need, but I keep listening longer than I used to. I admit, the blundering used to fake me out. Such deliberate ineloquence is nothing new: the ancient Greeks had a vast repertoire of ways to deform language in order to shock, confuse, or amuse. But doing it well is harder than it looks.

  Listeners also encounter verbal blunders when a speaker doesn’t rehearse a presentation to a glib polish or edit a production into a smooth-sounding stream. This can be deliberate, as when speakers who want to appear spontaneous, creative, or authentic don’t monitor their speech as assiduously as they might. Other speakers want to make themselves more personable. It’s not just punks; Pauline Webber has shown that doctors add words like “well,” “anyway,” and “now” when giving previously published remarks at medical conferences. Apparently they do this so their listeners will feel comfortable approaching them. Slips and disfluencies of everyday speaking can also be disseminated inadvertently. Media speech and fluent speech used to be synonymous, but the increased presence of video and audio on the Internet means that we’ll see and hear more talk as it happens naturally.

  Forms of new media have also claimed a rough-hewn, real-life style—which often involves messy speaking—in order to stand apart from mainstream commercial media. Whether this shift to the vérité is something enduring or just a passing fashion is hard to say. But it’s integral to the fascination with reality television shows; the newfound popularity of the documentary film as entertainment, not simply as a teaching tool or political instrument; the blog phenomenon, a galaxy of amateur writers that threatens to surpass, in readership, the mainstream media; and the increasing popularity of do-it-yourself media culture, from YouTube to public-domain audiobooks. Because podcasters have made the homespun (or laptop-spun) aesthetic a basic tool in their toolboxes, podcasts often feature slips, disfluencies, mispronunciations, background noises, and other intrusions.

  Not everyone likes the aesthetic shift. But norms about speaking may be evolving for deeper reasons that are as unmoveable as mountains. Perhaps American audiences want more variety than slick, scripted broadcaster voices; perhaps they no longer take glib fluency for honesty, substance, or charisma. Perhaps it’s another instance of informality in American social life. It could also be part of the same impulse revealed in exposed conduit and pipes in buildings or open restaurant kitchens—a postmodern desire to see the backstages of life where the action really happens.

  In the future, regardless of how our attitudes change about them, verbal blunders will become raw material. Instead of being ignored or thrown out, these hallmarks of natural, spontaneous speaking will inform technological enhancements and lubricate interactions between humans and machines. In Shanghai I visited a start-up software company, Saybot, which offers an online robot (or a “bot”) to help the Chinese learn spoken English. People who use Saybot online won’t be conscious that their errors have become useful. They will speak into their PC’s microphone in response to questions or directions from the system. If they speak correctly, the system will let them continue to the next lesson or level. Saybot does this by using speech recognition software to match the learner’s voice against a prerecorded acoustic waveform from a native speaker. Then it measures the learner’s variation from the native norm. If the waveforms match, the student can proceed; if they don’t, the system stores the mismatches in a short-term memory and allows the learner to play back each one. Human teachers correct students in the same way, but they can’t do it as precisely as the software can. Saybot also aggregates the error data from all of the learners, allowing it to be analyzed by a trained linguist who can tell whether Saybot users speak Mandarin, Cantonese, or some other Chinese language, as well as the success of Saybot lesson plans, which can be tweaked to focus on pronunciation patterns that students need to prac
tice. Verbal blunders have been used for decades as data, but they’ve probably never been transformed into commodities in this way.

  Psychologists have recently begun to understand how verbal blunders have helped guide social interactions, involving something called the “feeling of another’s knowing,” or how much one person intuits about another person’s thoughts or feelings. Disfluencies are among the important clues that listeners use to adjust their attention as well as their responses to other people.

  At the New School for Social Research, Michael Schober, a student of Herb Clark, is applying insights about verbal blunders to the survey and the poll. If you’ve ever filled out a government form, you’ve probably encountered terms such as “family” or “house” or “ethnic background” that seemed vague. And polls feature words that can be ambiguous. Pollsters get around this by simply asking the same question of everyone the same way—and they don’t offer clarifications, lest they bias their results. But Schober says they can listen to patterns in a person’s speaking that might reveal a subtle request for clarification. A person who needs the meaning of a question clarified is more likely to pause longer, say “uh” or “um,” and repair their sentences, Schober has found. So an interviewer trained to hear disfluencies would be attuned to an interviewee who needs extra clarification.

  The principle of “feeling of another’s knowing” can also be applied to interactions between humans and machines. One of the researchers at the forefront of this work is Liz Shriberg, a researcher at SRI International, where Flakey the robot once roamed. A slim, elegant woman with a quiet voice, she knows what spontaneous speaking looks like on the page, so she was reluctant to let me turn on my tape recorder—she was afraid of how her sentences would look.

  Working as a technician in a speech recognition lab in the 1990s, Shriberg witnessed human speech confusing a computer. In an experiment she was managing, the human subjects spoke over the phone to an automatic system in order to make airplane reservations, giving simple commands like “Show me flights from Boston to Dallas.” At first, the subjects actually talked to a “wizard,” or a human posing as a machine. The wizard had no problem comprehending an “uh” or a repeated word. But when a machine replaced the wizard, speakers consistently broke it when they paused, repeated a word, or said “um.”

  Also, when the humans realized the machine couldn’t process their normal speaking, they switched to a monotone, spoke in simple sentences, and had no pauses. “They started speaking to it like a robot,” Shriberg said. Her conclusion: “If you want machines to be as smart as people, you want machines to understand speech that’s natural, and natural speech has lots of disfluencies in it,” she says.

  But pause fillers, restarts, repeated words, and even silences plague machine ears. Automatic systems are trained on grammatically perfect written sentences, not real human speaking, with fragments, hesitations, and pause fillers. So the two (or more) parts of a restarted sentence confuse them; they also mistake “uh” for the English article “a.” In general, the human brain’s ability to comprehend speech is too complex to replicate. Some of Shriberg’s research has focused on how to use a speaker’s intonation to help the machine recognize a disfluency and what the speaker’s full utterance might be. Shriberg also helped build a computer system that recognized features of a person’s speech to identify him or her uniquely. The four most valuable features were what some people might find undesirable in speaking: the duration of words, any silent pauses longer than 150 milliseconds, the number of beginnings and ends of sentences (some of which may have been broken sentences and repairs), and the frequency of “uh” and “um.”

  If people are less disfluent with machines (as Sharon Oviatt, another researcher, showed), it’s because the tasks or interactions are constrained to activities such as asking about the weather or buying tickets. But in the future, what we know about disfluencies—that they’re a natural part of human speaking, that they’re meaningful, that they often reliably indicate a person’s mental or emotional state—will be incorporated into technologies that are broader in scope. These will be able to do more than take orders; they’ll be able to transcribe conversations or, like Hal, the computer in the movie 2001: A Space Odyssey, have wide-ranging conversations with people. The intelligence community is apparently interested in technology that will automatically transcribe spoken English as well as other languages. So is the court system, which spends millions of dollars on court reporters. Such technology will give the elderly and the disabled more robust voice-controlled appliances, vehicles, and smart homes; health care organizations may one day direct patients to robot counselors for health advice. Making machines perceive human speech isn’t the only problematic issue in designing these systems—indeed, some people may never accept them, no matter what the machines can listen to. A robot therapist may be able to respond to you sensibly despite your long pauses, sobs, and stammerings, but some people may not be able to get over the fact that their HMO has dictated that robot therapy is all it will pay for.

  Shriberg has been contacted by an educational company that wants to use student disfluencies to measure whether material is too hard for a student, and a toy company that wants to see if disfluencies indicate that a user is confused. Using this feedback, the machine could adjust how quickly it presents information or shift to a different level of difficulty. Such systems may also—like Flakey the robot—learn to say “uh” and “um” like people do. Right now, Shriberg says, the machines tend to barge in, interrupting people, and sound impolite. What could engineers do? They can soften the machine’s start with a pause filler. Or, if the machine uses the pause filler, it could get the human to stop talking.

  Machines that say “um.” Pollsters who listen for disfluencies. More verbatim transcripts. Normal, unedited speaking in broadcasts. What impact, if any, might these and other developments have in the long term on attitudes about verbal blunders and the people who make them?

  Jean Fox Tree predicts that neutral ideas about “uh” and “um” will someday become as mainstream as other scientific facts. “In our lifetime,” she told me, “we will accept that these [pause fillers] are meaningful. Forty or fifty years from now,” she added, “if not sooner, you can talk about this stuff in a high school and people are going to think it’s obvious.”

  Once you’re attuned to the great diversity of speech errors, you can find entertainment anywhere. Elizabeth Zwicky, the daughter of linguists, says that her experiences listening for slips of the tongue brightened her life, because no language environment, however boring, is without its verbal amusement. “It’s like being a birdwatcher,” she says. “Birdwatchers have a richer experience of birds than anybody else.” And may the blunderologists we may someday become not only forgive our blunders but enjoy them.

  Appendix A: Recommended Reading

  For a full bibliography, endnotes, and errata, go to www.umthebook.com.

  The classic, comprehensive view of human error is Human Error (1984), by James Reason. Reading Sigmund Freud’s The Psychopathology of Everyday Life (1904) is basic, as is Sebastiano Timpanaro’s rebuttal in The Freudian Slip (1976). Carlo Ginzburg’s essay on Giovanni Morelli and Sigmund Freud, “Clues: Roots of an Evidential Paradigm,” appears in a collection of his essays, Clues, Myths, and the Historical Method (1989). Jack Spector’s essay, “The Method of Morelli and Its Relation to Freudian Psychoanalysis,” published in 1969 in Diogenes, helped distinguish Morelli’s method from Freud’s adaptation. Though Meringer and Mayer’s Misspeaking and Misreading (Versprechen und Verlesen) has never been translated into English, an essay in English by Anne Cutler and David Fay in its 1978 edition is a very good introduction to the life and times of Rudolf Meringer.

  Victoria Fromkin edited a collection of early studies on slips of the tongue in Speech Errors as Linguistic Evidence (1973) and collected classics of the new slip science in Errors in Linguistic Performance: Slips of the Tongue, Ear, Pen, and Hand (1980). Another key collecti
on of essays appears in Slips of the Tongue and Language Production, edited by Anne Cutler and published in 1982. The academic literature on slips of the tongue is vast. A very good overview of research to date is Slips of the Tongue: Speech Errors in First and Second Language Production, by Nanda Poulisse (1999). Jeri Jaeger’s Kids’ Slips is not yet a classic, but it will be. Arnold Zwicky’s Mistakes (available at http://www.stanford.edu/~zwicky/mistakes.pdf) is appropriate for use in writing and linguistics classes.

  For the discussion of Reverend Spooner, I drew from William Hayter’s biography, Spooner, a Biography (1977), as well as a chapter from Julian Huxley’s memoir, On Living in a Revolution (1944). R. H. Robbins’ essay “The Warden’s Wordplay,” appeared in The Dalhousie Review in 1966. I also relied on John M. Potter’s essay, “What Was the Matter with Doctor Spooner?” which appeared in the collection edited by Fromkin, Errors in Linguistic Performance.

  A technical but exhaustive elaboration of a speech production model that considers both slips of the tongue and speech disfluencies is Speaking: From Intention to Articulation, by Willem Levelt (1989). Also technical and exhaustive is Herb Clark’s Using Language (1996). Those interested in the early days of disfluency research should look at “Hesitation Phenomena in Spontaneous English Speech,” by Howard Maclay and Charles Osgood. Frieda Goldman-Eisler’s papers are collected in Psycholinguistics: Experiments in Spontaneous Speech (1968), and George Mahl’s work is collected in Explorations in Nonverbal and Vocal Behavior (1987). Heather Bortfeld et al.’s paper “Disfluency Rates in Conversation: Effects of Age, Relationship, Topic, Role and Gender” appeared in Language and Speech in 2001; Herb Clark and Jean Fox Tree’s major work on “uh” and “um,” “Using ‘uh’ and ‘um’ in Spontaneous Speaking,” appeared in Cognition in 2002. Fox Tree’s article “Listeners’ Uses of Um and Uh in Speech Comprehension” predates this by a year in the journal Memory and Cognition. Roger Brown and David McNeill’s 1966 paper “The ‘Tip of the Tongue’ Phenomenon” in the Journal of Verbal Learning and Verbal Behavior is the classic there, though Alan Brown helpfully reviewed all the tip-of-the-tongue research in the Psychological Bulletin in 1991, and Bennett Schwartz’s Tip-of-the-Tongue States (2002) is a lengthy and often entertaining review of tip of the tongue work. Dissertations by Elizabeth Shriberg, Robin Lickley, and Robert Eklund also provided invaluable background.

 

‹ Prev