What accounts for these regional differences? As we have seen with the language style matching, people quickly adjust their speaking styles to others around them. The more time spent conversing, the more individuals begin seeing their worlds in similar ways. As a general rule, my neighbors have the same weather, eat similar foods, share the same community events, and deal with the same schools, tax collectors, stores, and bureaucracies as I do. The people in my community share many of the same issues with those in the next town and, to a certain degree, with those in the neighboring state. But as I travel farther and farther away from home, the weather, food, cultures, and concerns begin to change. As the social and physical environments change, so do the ways people approach their worlds and talk with others.
ALTHOUGH LANGUAGE DIFFERENCES should become more pronounced over greater distances, some variations in language can spring up in neighborhoods separated by only a few blocks where weather, terrain, ethnicity, social class, and every other factor is similar. I grew up in an oil-boom town in West Texas, where most families would move in for about four years before being transferred elsewhere. Even with the constant migration, new neighborhood children would quickly adopt accents and slang consistent with that neighborhood.
Even within schools, researchers have been able to isolate different language patterns among different subgroups. In an important analysis of a Detroit high school in the early 1980s, Penelope Eckert demonstrated that the language of the school’s jocks was as distinctive as that of its burnouts. Not unlike most secondary schools in the world, the different tightly knit groups adopted their own language styles in a way that reflected their group’s identity.
It’s not much of a stretch to imagine that different schools in the same geographical region could develop their own language styles. We have found evidence for this by analyzing over fifty thousand college admissions essays submitted by students who were accepted at the University of Texas at Austin over several years. Working with the school’s admissions office, linguist David Beaver and I looked at about two thousand essays from students from nine different high schools surrounding a single large metropolitan area in Texas. The students from the various high schools did equally well in high school and their first year in college and varied only modestly in their social class and ethnic makeup. Nevertheless, the ways they used pronouns, articles, prepositions, and other function words in their essays differed from school to school. In other words, each school had its own linguistic fingerprint.
College admissions essays, like This I Believe stories, are unique forms of writing. Most people will write something like them only a few times in their lives—if ever. Usually, the authors sit alone in their rooms (or coffee shops) and reflect on some of the bigger issues in their lives. Their stories reflect their families, their friends and community, and their society. The inner voices that guide their word choices are driven by the ways they think, what they attend to, their emotional states at the time, and their language history.
It is little wonder that self-reflective essays mirror people’s sense of place and the groups with which they spend most of their time. Even with relatively crude computer models, we can do much better than chance at estimating which part of the country, what city, and possibly what part of a city a person is from by the ways they use function words in their essays. If we are analyzing transcripts of people in conversations, these same function words provide hints to what people are doing, the situations they are in, and the nature of their connections to the people around them.
If you have paranoid tendencies, know that it is unlikely that a function-word-based predator drone will ever be developed. The words we use have always reflected who we are, where we are, and what we are doing. The who, where, and what have historically been obvious. Prior to the written word, if I were speaking with you, we would both know that I was talking (who), our location (where), and our current actions (what). Only through the fluke of technological advancements have we had a period where the who, where, and what of communicating became opaque. The delicious irony is that with additional advances in technology, we may eventually be able to determine the who, where, and what of communication at levels comparable to our ancestors more than five thousand years ago.
THIS CHAPTER HAS explored how the words people use in groups can reveal something about the groups themselves. Use of we-words by group members often suggests that the members identify with their group. Over time, as people become more comfortable with their group, everyone tends to use we-words more. When groups succeed or are threatened from the outside, group identity increases, with a corresponding increase in the use of we-words.
We-words reflect group identity but not the degree to which a group works well together. It may be possible to increase a team’s sense of identity, but that doesn’t mean the team will actually perform any better. There may be no I in team but there is no we in team either. Language analyses suggest that for groups to work best, they must think alike and pay close attention to the other team members. In all likelihood, language style matching reflects mutual interest and respect among different people in a group.
The definition of a group has been used rather loosely in this chapter. This is OK: I’m a social psychologist. What is particularly intriguing is that similar processes for the use of we-words and language style matching are apparent among dating couples, working laboratory groups, real-world work groups, online communities, and entire schools, communities, and societies. The unifying theme is that all of these groups use language to communicate. Words are the common currency of interaction whether written or spoken.
Finally, just as words of group members reveal information about group processes, they also tell us something about what groups are doing and where they are. In an odd way, function word usage is highly contagious. Whether in couples, small groups, neighborhoods, or communities, people tend to adopt the language styles of the people around them. Our words, especially our function words, inadvertently reveal what we are doing and where we are. Just as our accents, body language, and clothes reveal our social and psychological selves, so do our words.
If you are a private investigator, put away your spyglass. Instead, boot up your computer and start counting words.
CHAPTER 10
Word Sleuthing
IN STUDYING WORDS, I have frequently been asked to analyze language to answer questions that I would have never considered. Lawyers, historians, music lovers, political consultants, educators, intelligence agents, and others have occasionally contacted me to see if our language approach could give them a different perspective on a problem they have been thinking about.
This chapter brings together some of the more interesting projects my students and I have been playing with over the years. The topics vary quite a bit. Nevertheless, they showcase different ways words can be analyzed to answer novel questions.
USING WORDS TO IDENTIFY AUTHORS
The phone call I received from the senior partner in a law firm caught me off guard. He was curious if I could analyze an e-mail that had been sent to a member of his firm; let’s call her Ms. Livingston. It was quite sensitive, he confided, and it was important that he talk directly with the person who had sent the e-mail. The only problem was that the e-mail had been sent anonymously from an untraceable e-mail address. After I agreed to look at it, he sent me the following e-mail:
Ms. Livingston:
I think you should know that David Simpson has perpetuated the idea that you have no credibility among your colleagues. He says you altered depositions and falsified expense reports at your last job in New York. He says this is the reason you left so abruptly.
He has spread these stories to people in various departments, including Billing, Personnel, Public Relations and to those at the executive level. It is uncertain how and when our senior partners will deal with this. But if you start getting the cold shoulder, you will know why.
When I first heard of this, I was surprised, but took what
he said at face value. Of course, this was before I learned of his voracious appetite for propagating half-truths, gossip, and outright lies, all in the name of somehow making himself look knowledgeable and “better.”
Such a pity. He obviously has talent, but it is all negated by his vile, malicious tongue. All I can think of is a tremendous sense of insecurity. But I digress. I just thought you would like to know.
A friend
After receiving the e-mail, Ms. Livingston turned it over to the law firm. She dismissed the rumor as provably false but was concerned that if David Simpson really was spreading false rumors, it could damage her reputation along with that of the firm. I had spent several years developing methods to analyze language and personality but had never been paid to be a word detective.
What kind of person may have written the note? Is “A friend” a male or female and what is his or her approximate age? What is the person’s link to Ms. Livingston, to David Simpson, and to the firm? Any hints as to the person’s personality traits?
In the years since I worked on the case, several new ways of looking at words have been developed. One involves comparing the words “A friend” used with those of tens of thousands of regular bloggers. For example, by looking at just the function and emotion words, we can guess that there is a 71 percent chance that the author is female and a 75 percent chance that she is between the ages of thirty-five and forty-five. It is much harder to get a good read on her personality. One analysis suggests that there is a fairly good chance that the author of the e-mail is high in the trait of narcissism—meaning she may be somewhat conceited and manipulative.
Look more closely at the e-mail and other hints emerge. The person is psychologically connected to the firm (“our senior partners”) and has knowledge of rumors from across several departments within the firm. The person also is working to impress Ms. Livingston by using a large vocabulary. Particularly interesting is the use of words and phrases such as “voracious appetite,” “vile,” and “malicious tongue.” These are Old Testament words that, in other analyses, were primarily used by people between forty-two and forty-four years of age at the time of the project.
One other important clue was the layout and punctuation. The e-mail was professionally typed with paragraphs of equivalent size. There was only one space between the period and the beginning of the next sentence, which suggests the person learned to type after about 1985—when desktop computers became popular—or the person had some background in journalism or publishing before 1985, where the single space after a period was the norm. (My wife, who was in publishing before 1985, explained this to me.)
What happened? When I submitted my report to the senior partner, he was relieved because it precisely matched the person he had suspected—a conscientious women in her early forties with a background in newspapers who had been with the firm for several years. I never learned the final disposition of the case, but I see that Ms. Livingston is now a senior partner with the firm.
WHO WROTE IT? THE ART OF AUTHOR IDENTIFICATION
Deciphering linguistic clues to solve crimes has a rich tradition in criminology. The FBI, various national security agencies, and local police departments around the world occasionally seek the expertise of linguists to help decode ransom notes or written threats, or to assess who might have written legal or other documents.
One of the best-known early forensic linguists is Donald Foster, a professor of English at Vassar College. Using a mixture of computer and deductive skills, along with a broad knowledge of history and literature, Foster has worked with law enforcement agencies on high-profile cases such as the Unabomber, the 2001 anthrax attacks, and the 1997 JonBenét Ramsey murder case. He has also applied his methods to determine the authenticity of some works by Shakespeare and others. Perhaps his most successful venture was in identifying Joe Klein as the author of an anonymously published satirical novel on the Clinton presidency, Primary Colors.
Foster has been a controversial figure because several of his high-profile claims about authorship have not panned out. He has also been less than forthcoming about the details of his methods of author identification, something that reflects his training in English rather than statistics and science. Nevertheless, Foster’s approach has alerted the literary and forensic worlds to the promise of computer-based methods to identify authors and their work.
FINDING THE TELLS
World-class poker players closely watch and listen to their opponents in attempts to predict the cards they may be holding. Often players will pretend they have a poor set of cards when they have a good set; other times they will bluff by giving the impression they have a winning hand when they don’t. Experts look for telling signs of deception—or “tells.” Some players avoid looking around the table, others tap their feet, yet others talk more loudly. The ability to decipher tells can give card players a large advantage in high-stakes poker games.
There are various types of tells in people’s use of written language as well. Two are particularly good clues in identifying authors: function words and punctuation. This can be seen in looking back at the blogs we collected in 2001 as part of the September 11 project discussed in the last chapter. Recall that we saved about seventy blog entries from each of a thousand people in the two months before and after the 9/11 attacks. Every few years, my students and I revisit LiveJournal.com to see if the same people are still posting. Ten years later, 25 to 30 percent are still active. About 25 percent have erased their accounts. The remainder stopped posting, on average, five years after the attacks, in 2006. Many of the former posters migrated to other systems such as Facebook or Twitter.
Simply reading the last ten years of people’s posts provides an intimate picture of their lives. Not unlike Michael Apted’s Seven Up! documentary series, we have been able to track the unfolding experiences of the bloggers as they grow older. Many of the same issues still drive the authors. Even though some have now married, had children, and started careers, recurring insecurities, motives, and goals keep returning. Those who were happy and upbeat in 2001 tend to be the same optimistic people nine years later. For example, a young father writes in a random blog in 2001 about his favorite hockey team:
lucky lucky chicken bone. i shall do the happy-cup-dance. we shall win. we shall triumph. and there will be much rejoicing! i just need to get cable first. ok. i wasn’t just gonna post about hockey, but yvonne’s ready to go. yeah. shut up. you try resisting that sweet, sweet candeh.
And nine years later, you see the same person:
My first attempt at making salsa was, in my humble opinion, not too shabby. protip: don’t use Roma tomatoes. I’m not sure why the hell I thought they’d work out fine, but I was terribly wrong. Ok, not terribly just mildly. ah, salsa humor. I’m heading back to the mexi-mart today to pick up the goods to try another batch. Maybe i’ll have it done in time for the bbq. Who knows? Since my catharsis, I’ve been in an amazing headspace.
Obviously, these two writing samples are from the same person. I mean, anyone could spot it immediately.
Really?
Actually, we can see the similarities once we know that they were written by the same person. But what if we read blogs all day and came across the second one several hours after reading the first? In all likelihood, most people wouldn’t jump up yelling, “Aha! I have read that writing style earlier … yes, from the guy who wrote about hockey.” Could language experts or computers make a definitive match? Are language fingerprints as reliable as DNA or real fingerprints? The short answer is no. However, computerized language analyses do a reasonably good job at matching which writing goes with which person.
Imagine we had a large number of blog entries from twenty bloggers. Several years later, we retrieve a handful of new postings from each of the same twenty bloggers. Now imagine sitting on your living room floor with hundreds of pages of posts trying to match each current blog entry with the original posts of the twenty bloggers. All things being equal, anyone should be ab
le to match 5 percent of the blog posts correctly just due to chance alone. Most people would do terribly on this task. It is unlikely that you would match at rates any better than 10–12 percent. The writing style differences are too subtle and there is just too much information.
Computers are more patient and systematic. If we just analyze function words, the computer correctly matches the recent blog posts with the original authors about 29 percent of the time. This is actually impressive given the time lag between the writing of the posts.
But there is more to author identification than function words. Look at the consistency of punctuation. The following woman, for example, continues to use asterisks in the same way nine years apart. This was part of an early 2001 entry:
Oh.. I have also discovered a shy streak I didn’t know I had. I guess you would call it shyness. Somebody made me *blush*. Repeatedly. That is *weird*. I don’t blush.
And in 2010:
We *are* in post-post-punk now, aren’t we? The guys in the band made a joke about how they just wrote that song yesterday, and maybe a quarter of the people in the room didn’t get why the rest of us were chuckling. weird. *shrug*
Others use punctuation in equally unique but more subtle ways. From a twenty-seven-year-old male in 2001:
I mailed memorial gift checks to Immanuel [endowment donation in honor of Joan’s mother]; and St Anne’s - for my favorite accounting professor the Smythe scholarship. Frank & Rebecca brought over “Midnight in the Garden of Good & Evil” and a couple homebrews. My eyelids want to close so I better …
The Secret Life of Pronouns: What Our Words Say About Us Page 26