Nabokov's Favorite Word Is Mauve

Page 5

by Ben Blatt

Looking at how writers describe characters of the opposite sex provides us with a new way of understanding the choices, and unwitting assumptions, that factor into how we see other people—and who we see on the page.

Let’s start by looking at the word scream.

In the top 100 classic literature books, a form of scream appears after the word he or she a total of 158 times. For example, at the end of The Grapes of Wrath, when Rose is escaping from an impending flood, Steinbeck writes, “And Rose of Sharon had lost her restraint. She screamed fiercely under the fierce pains.” If we look at all the instances where male writers used the word scream, it is used twice as often after she than he. In other words, male writers make their female characters scream more often than their male characters.

But that’s not enough to say that male authors have gone rogue. For if you look at the use of scream by female authors, the result holds at an almost identical rate. In other words, female writers also make their female characters scream more often than their male characters.

The graphic below shows the usage rate of screamed (or screams) when it follows either he or she. “She screamed” was used at a rate of 6.0 for every 10,000 appearances of the word she in texts by male authors and 7.0 per 10,000 by female authors. Meanwhile “he screamed” was used at a rate of 3.8 for every 10,000 appearances of he in texts by female authors and 2.9 per 10,000 by male authors.

The number of instances of scream is not huge, but it’s still large enough to draw a conclusion: Male and female authors both are more apt to describe their female characters as screaming than their male characters. I also looked at the sample groups of recent New York Times bestsellers and modern literary fiction to see if there was any change over time or genre, and the pattern stayed the same. In these groups, screamed is used 50–100 % more often to describe female characters than male, and this holds for authors of both genders.

Just as there are words that tend to be paired with female characters, there are those that latch on to male characters. And while men might not be screaming as often, they sure do like to grin. Below is a chart in the same style for all the appearances of grinned following he or she in classic literature.

And in modern popular and literary fiction, authors of different eras and styles are consistent—though there’s a lot more grinning going on in popular fiction.

The patterns in the charts show that these word choices hold across time or genre. It was not my instinct going in that grinned was a gendered verb in any form—but the data reveals an undeniable trend. Men grin.

Below are the top five words, like screamed, that are used most often in classic literature to describe women over men.

Words Most Likely to Be Found as “She _________” as Opposed to “He _________”

1. Shivered

2. Wept

3. Murmured

4. Screamed

5. Married

And here are the top five words, like grinned, that are used most often in classic literature to describe men over women.

Words Most Likely to Be Found as “He _________” as Opposed to “She _________”

1. Muttered

2. Grinned

3. Shouted

4. Chuckled

5. Killed

Women murmur yet men mutter. Men shout; women scream. Women neither grin nor chuckle, but smile is more likely to follow she. Each of these trends holds across recent popular and literary fiction, with one interesting reversal: In modern literary fiction, married is more common after he than she.

Both men and women describe men as killing more often than women. And this is the only word in any of these lists that we can check against real-world data. Statistics show that men commit 90 % of all murders. Government agencies, however, do not keep stats on how many grins or chuckles are committed by each gender. Regardless, it seems like these words have come to be connoted with gender and that both male and female authors have picked up on those connotations.

But what if we took this a step further? What if there are words that men use to describe women, but which a woman would never use to describe herself or another woman? These are the words that could highlight the biggest differences in how we view the world. If you are an author writing about a character of the opposite gender, what makes that character believable or real? You want to make sure that you’re describing their thoughts and actions in a way that reflects how they see the world, using the language they would use. Otherwise, the illusion can shatter.

One word that fits this description is interrupted. It’s not the most common word in any writer’s works, but especially in classic literature it is used much more commonly in reference to female characters when the author is male.

There are also a handful of words, in the sample of books examined, that authors rarely invoke when describing the opposite sex. While men described all their characters as having fear, women assigned fear to their male characters significantly less often. See the chart below:

Or consider sobbed. It may not be the most common verb in the sample of 300 books, but it is revealing. Women use it to describe men and women, but men do not use it to describe themselves. If “real men” don’t cry, fictional men don’t sob.

Then, perhaps most interesting of all, there are the words that both sexes give to the opposite gender. Male authors describe their female characters as kissing at a higher rate than their male characters. Female authors do the opposite, describing their male characters kissing more often.

In all three of our samples, out of the top 150 words, kissed was the single most common word used to describe characters of the opposite gender. The top five are below:

Words Used Most Often to Describe Characters of the Opposite Gender

1. Kissed

2. Exclaimed

3. Answered

4. Loved

5. Smiled

While kissed and loved and smiled all go to the opposite sex, consider the use of hated. The h-word is used most often in classic literature to describe characters of the author’s own gender.

Here are the top five words used to describe characters of the author’s own gender:

Words Used Most Often to Describe Characters of the Same Gender

1. Heard

2. Wondered

3. Lay

4. Hated

5. Ran

Trying to draw too much meaning out of these findings is a bit like reading tea leaves. But I don’t think it’s unreasonable to speculate that some of the words writers used to describe the opposite sex, such as loved and kissed, serve as a kind of wish fulfillment. It may be a leap to base a theory of love on 300 novels, so I decided to test it on a bigger data pool as well. I downloaded more than 40,000 Literotica.com stories in the “Erotic Couplings” section and found a similar pattern with the word kiss.

I combined I kissed and He kissed for all male authors and I kissed and She kissed for female authors—to avoid missing first-person stories. And following is the rate for each combination. There’s a huge asymmetry, but not one that’s identical to the other samples. Female erotica authors almost always attribute the kissing to a man, while male authors split closer to 50-50. I’ll leave you to speculate about why, but one thing that’s clear is that in the realm of sexual fantasy people’s imaginations are not fully aligned.

All the examples in this section show how male and female writers describe the world, and their male and female characters, in different ways. It’s intriguing from a psychological angle, but these findings are also worth keeping in mind for authors who are trying to capture characters of all genders and trying to appeal to a wide range of readers.

After all, you don’t want all your characters to be clones of common stereotypes or of your own persona. Writing what you know is important, but not considering the perspective of others can lead to the downfall of a work. As science fiction author Joe Haldeman has quipped, if there’s one thing the “write what you know” default has
resulted in, it’s a glut of “mediocre novels about English professors contemplating adultery.” Let’s keep expanding the literary imagination.

Your style is an emanation from your own being.

—KATHERINE ANNE PORTER

The whodunit is not limited to the world of crime; it’s also a staple of literary scholarship. A book lands with a thud on an editor’s doorstep one morning, with no clues to its origins. It’s anonymous, pseudonymous, unattributable—yet unignorable.

Who wrote it? Interested critics might have their favorite suspects. Opportunistic writers may even quarrel over credit. But the answer, as with any mystery, lies in the cold, hard facts. Which is to say, aspiring literary detectives will need to turn to the numbers.

Let’s return to The Federalist Papers controversy from the introduction, one of the most famous literary mysteries solved in the past century. In order to urge ratification of the Constitution in the late 1780s, James Madison, Alexander Hamilton, and John Jay each wrote essays that appeared in New York newspapers under the pseudonym “Publius.” Between them, the three men wrote a total of 85 essays, but no one took credit for any individual essays until decades later. When Madison and Hamilton outlined who wrote each essay, there was a contradiction. Twelve of the essays were claimed by both Madison and Hamilton.

In 1963 two statistics professors, David Wallace and Frederick Mosteller, put forward evidence in Inference in an Authorship Problem that would end the near two-century-long debate. Their probabilistic case was objective and detailed. It quantified writing styles. It succeeded where qualitative arguments had suffered.

Their biggest step forward was treating words like random variables. Instead of viewing the words as sacred they looked at them the same way they would study the rolls of a die or a flip of a coin. The two looked at the frequency of hundreds of words, which was not easy to do in 1963. They took copies of each essay and dissected them, cutting the words apart and arranging them (by hand) in alphabetical order. At one point Mosteller and Wallace wrote, “during this operation a deep breath created a storm of confetti and a permanent enemy.”

In particular, they started looking at a handful of words that were used by one author but not the other. In his known papers Alexander Hamilton used the word while but never the word whilst. Madison used the word whilst but not while. The professors listed the rate of enough, while, whilst, and upon per 1,000 words in the Hamilton, Madison, and disputed papers.

The graph of Mosteller and Wallace’s figures on the previous page lends itself to an easy conclusion. Hamilton used enough and while but Madison and the disputed papers never did. Hamilton used upon to a high degree, but Madison and the disputed papers used it at a much lower rate. Whilst is absent from Hamilton’s writing but present in the disputed papers. It looks like it can’t be Hamilton, right?

But this was not enough for Mosteller and Wallace. It was just four words. If that’s all you saw you might think there is no reason for more data or more analysis. However, if Mosteller and Wallace had looked at according, whatever, when, and during they would have found the opposite:

The graph above makes the disputed papers line up with Hamilton’s patterns. I had to search through hundreds of words to find numbers this contradictory, but the point remains that not every word’s frequencies are constant in every text. Most words don’t line up perfectly for either Madison or Hamilton—the eight you see here are rare. And if you look through enough words, you’ll be able to find a handful that can support any conclusion: Hamilton, Madison, even you or me.

That’s why Mosteller and Wallace created a system to weigh the importance of a large number of factors. The exact details rely on some equations that we don’t need to get into here. But the thought process is straightforward. Each word allowed them to make a small calculation about who the likely author was. When the differences in word frequencies were all combined, the outliers cancelled out. All of those small probability calculations, when multiplied together, amassed to a rock-solid prediction: A text with this level of the usage, that level of during usage, that level of whatever usage, etc., would never in a thousand years have been written by Hamilton. It would have taken an outright miracle, a sudden change to every marker of his writing style, for Hamilton to pen those 12 essays. On the other hand, they sat neatly within the realm of Madison’s own style.

It’s worth noting that Mosteller and Wallace make a huge assumption by treating words like dice. The two assume that writers use roughly the same word frequencies throughout their works, and this assumption is critical to their equations’ success. If writers change their style to match different subjects, characters, and plots, then Mosteller and Wallace’s method would fail frequently. At the very least, the variation that a writer uses between their works needs to be insignificant compared to the variation between other authors for the method to work. That assumption ultimately held up for Hamilton and Madison: The method’s ability to arrive at a prediction confirms that there was an underlying consistency to the two Founding Fathers’ writing styles.

But I’ve long wanted to see just how far the theory can go—to test whether something like a literary fingerprint exists for famous writers.

The rest of this chapter will look in depth at Mosteller and Wallace’s assumption that word choice is constant. If it is correct, and style does not change from book to book, then their method should work just shades off 100 % of the time, regardless of genre. Forensic scientists are able to use fingerprints to identify people because the ridges on people’s fingers do not change. But are the stylistic fingerprints that each author leaves in their writing unique enough, and permanent enough, for Mosteller and Wallace to pick them up without fail?

Testing Mosteller and Wallace on Fiction

The uniqueness of fingerprints has been known for thousands of years. No two fingerprints are the same, and civilizations as early as ancient Babylon and China used them to ensure contracts.

Fingerprints don’t tell you anything about the suspect by themselves. The identification process only works if you have a set of the suspect’s fingerprints on file or a database to compare against an unknown print. What if the same could be done for books? Mosteller and Wallace’s method suggests that writers have a hidden fingerprint, too: Authors leave a pattern of words wherever they write. And in the last two chapters we’ve assembled quite a few samples.

To start experimenting with this idea, I gathered a mixed collection of great and popular books, almost 600 books by 50 different authors. This would serve as my full database. (The full list is included in the Notes section on page 264.) Then I chose one book, Animal Farm by George Orwell, and removed it from the sample.

Mosteller and Wallace didn’t build their method specifically for novels. And though people will sometimes attempt to identify one particular book, no one has ever gone through and replicated the professors’ original methods on a large set of novels by known authors. To find out if it could work, I started with a small test.

First I set Animal Farm as the unknown fingerprint. I then treated Hemingway’s ten novels and Orwell’s five other books as my known sample. With two possible options, Mosteller and Wallace pinpointed Orwell as the author of Animal Farm. It was a good start, but a coin would have a 50-50 chance of being accurate after one test.

Then, I expanded the list of candidates. One by one I set my computer to test Animal Farm against each of the other 48 authors in the sample. This includes authors considered among the greats, such as Faulkner and Wharton. It features many writers who have found huge popularity, such as Stephen King and J. K. Rowling. And it includes a handful of other writers who have achieved recent literary success, such as Jonathan Franzen and Zadie Smith. For each author I included their complete bibliography of novels. In each of the 48 test cases the result was the same: Mosteller and Wallace were able to correctly identify Orwell as the author of Animal Farm.

I wanted to see if this was a fluke. Perhaps Animal Farm was an out
lier with a weird style that had unusual results. I compared each of Orwell’s other five books (Burmese Days, A Clergyman’s Daughter, Keep the Aspidistra Flying, Coming Up for Air, and 1984) to the other 49 writers in the sample. Each time, I removed the book in question from Orwell’s sample and treated it as an unknown text. Out of 245 comparisons using Mosteller’s system, it was right 245 times. In every case it listed Orwell as the more probable author.

I then expanded further, testing every single book in the sample, pitting each one head-to-head against its actual author and each of the 49 other authors. This totaled 28,861 tests. I figured it would be the best way to confirm if Mosteller and Wallace has validity on long fiction.

Every time, the method was looking at the same basic 250 words. Of the almost 29,000 tests Mosteller’s system worked all but 176 times. This is over a 99.4 % success rate.

How is it possible that a system so simple works so well?

The reason it works is that authors do end up writing in a way that is both unique and consistent, just like an actual fingerprint is distinct and unchanging.

Consider Khaled Hosseini, Zadie Smith, and Neil Gaiman. They do not write about the same subjects or with the same tone, but they are all modern-day popular authors with overlapping international audiences. Mosteller and Wallace can distinguish their work with 100 % accuracy (28 out of 28) by looking only at 250 simple words. In fact, even just looking at the and and, the two most common words in the sample, you can see distinctions among the three writers. Take a look at the graph below.

The “fingerprint” of the and and is illuminating. If one datapoint’s label were removed from this chart, we’d have little trouble predicting the author based on where it falls. With the simple eyeball test you could guess right the majority of the time using only the two most common words.

‹ Prev Next ›