by Ben Blatt
I downloaded more than 9,000 novel-length fan-fiction stories (of 60,000-plus words) from fanfiction.net. This would be my “amateur” group, consisting of all stories written between 2010 and 2014 in the 25 most popular book universes (ranging from Harry Potter to Twilight to Phantom of the Opera to Janet Evanovich’s books). People writing stories this long are committed to their work, and many of them are strong writers. But on average, they’re not at the level of the bestsellers or the award winners of the literary world. So I compared the fan-fiction sample to all of the books that have ranked number one on the New York Times bestseller list since 2000, and also to the 100 most recent winners of major literary awards.IV
When set side by side, the difference is clear. The median fan-fiction author used 154 -ly adverbs per 10,000 words, which is much higher than either of the professional samples. The 300-plus megahits in the bestseller category averaged just 115 -ly adverbs per 10,000 words. And the 100 award winners have a median of 114 -ly adverbs. It’s not an apples-to-apples comparison, but the novels that sell well in bookstores come in with 25% fewer adverbs than the average novel that amateur writers post online. Less than 12% of all number one bestsellers had more than 154 adverbs, even though half of all fan fiction does.
* * *
The results of this chapter are one half common sense and one half mind-blowing.
Most writers and teachers will tell you that adverbs are bad. This is not a controversial stance to take. In many ways, the statistics presented above are just a confirmation of what we already knew.
But the fact that their use is somehow correlated with quality on a measurable level—even when just the best writers are being examined—is still shocking. It might not be a surprise that some beginner writers use adverbs as a crutch more often than professional writers, and that these traits may sometimes be noticeable. But even when looking at the life’s work of the best writers, the effect is present.
A statistical correlation, of course, does not imply causation. Fitzgerald’s The Great Gatsby used 128 adverbs per 10,000 words while his lesser-known The Beautiful and Damned used 176. If you picked up The Great Gatsby and stuck in 200 more adverbs, a bit less than one a page, it would have a higher rate than The Beautiful and Damned. Would this version of the book still be celebrated? What if you trimmed down the adverbs from The Beautiful and Damned? Would Leonardo DiCaprio be ready to suit up for the role of Anthony Patch?
The answer of course is that it’s not so simple. Adverb rate alone could not have such a direct impact on the success of a book. There are thousands and thousands of other aspects of writing in play. The Hemingway adverb stereotype may be true, but there are notable counterexamples—authors who have written successful books when increasing their adverb usage. Nabokov’s Lolita, for instance, has more adverbs than any of his other eight English novels.
One possible explanation for the overall trend we’re seeing is that adverbs are an indicator of a writer’s focus. An author writing with the clarity needed to describe vivid scenes and actions without adverbs, taking the time to whittle away the unnecessary words, might also be spending more time and effort making the rest of the text as perfect as possible. Or if one has a good editor, these words may be weeded out.
The “focus” hypothesis finds some support from the true master of writing without adverbs. And it’s not Hemingway.
The numbers revealed an overlooked champion. Combing through a large number of authors, there was but one author on the list of “greats” who outdid Hemingway: Toni Morrison. She may be a Nobel and Pulitzer Prize winner just like Hemingway, but her place at the height of concise writing isn’t often cited in English classrooms. Her adverb rate of 76 edges out Hemingway’s 80, and puts her well ahead of others like Steinbeck, Rushdie, Salinger, and Wharton.
Morrison has said in multiple interviews that she doesn’t use adverbs. Why? Because when she’s writing at her best, she can do without: “I never say ‘She says softly,’ ” Morrison tells us. “If it’s not already soft, you know, I have to leave a lot of space around it so a reader can hear that it’s soft.”
* * *
There you have it. And while I have no hard evidence that the logic of adverb usage carries over to wacky statistics-based prose, I went through the text of this chapter to search for -ly adverbs after writing 5,000 words on how awful they were. I found that in most cases they were unneeded. They often blunted the impact of my sentences. I deleted all -ly adverbs that were not used when quoting or citing others.
As a result, if you excuse the ones in quotes, you will find no -ly adverbs in this chapter. This makes for a usage rate of 0 per 10,000 that would rank this text ahead of (or tied) with all other texts ever written. Does that make this chapter, regardless of content, a step above average? Here we’ve found the limits of our statistics. But when trying to write standout prose, it can’t hurt to deliberately avoid the troublesome part of speech.
* * *
I. Some of Sinclair Lewis’s novels could not be found in digital form and were excluded.
II. It’s more wonky than we can get into here, but for a detailed explanation of why average rating falls short, head to the Notes section on p. 251.
III. Unlike the previous section’s aggregate graph, where books from different authors were combined unadjusted, the normalization here allows us to compare authors of different levels of popularity and adverb rate without any outliers skewing the combined chart. The authors are treated as if they have equal rates and popularities, so that we can concentrate on the trends within each author’s work.
IV. The selections for these award-winning books is described in detail in Chapter 2.
It is fatal for anyone who writes to think of their sex.
—VIRGINIA WOOLF, A ROOM OF ONE’S OWN
Let’s say we have two Facebook statuses. One is written by a woman, one by a man. You’ve been offered five dollars if you can guess which post is which; but you’ll only be given a short selection of words from each post. Given the samples below, would you be able to win that five dollars?
Selection 1: shit, league, shave
Selection 2: shopping, boyfriend, <3
You’d feel pretty confident in your ability to make a guess, right?
Now, what if you were asked to perform the same task, but shown these triplets instead?
Selection 3: actually, everything, their
Selection 4: above, something, the
There are fewer clues. But what if I told you that there’s a clear best guess?
* * *
For generations researchers have been studying the ways in which men and women differ in their writing—often with little concrete evidence to show for it. In recent years, however, computer scientists have been able to comb through overflowing social media data to be able to pinpoint small differences. It’s not just an academic exercise, either. The prize in this scenario is not five dollars, but billions in targeted ads. Some of the results of this research have been all too cliché and stereotypical (shopping skews female; league skews male). But some seem altogether perplexing: The innocuous words in selections 3 and 4 do indeed tend to vary between genders, and researchers have harnessed them to make shockingly accurate predictions.
I wanted to use the same methods to look at literature rather than tweets and Facebook posts. But before we make that leap, let’s explore what the two examples above are looking at when they classify writing as “male” or “female.”
The words in the first two selections—shit, shave, league, shopping, boyfriend, <3—all come from a paper published by researchers at the University of Pennsylvania that churned through millions of Facebook statuses in search of the select few words that are most indicative of gender. (As you probably guessed, Selection 1 contains the most “male” words, and Selection 2 contains the most “female” ones.)
That doesn’t mean that all men talk shit in their status updates or that all women talk about shopping. In fact, shopp
ing is not a word that’s used frequently by either gender. These “most indicative” stats measure the words that one group uses often compared to how rarely other groups use them. Shopping’s presence on the list, in a sense, says more about the fact that men don’t often talk about shopping than that women do. This method of discerning gender attempts to find the starkest contrasts between what men are writing about and what women are writing about. And for that reason, the findings above and in the chart below skew toward the extreme ends of gender norms.
Below, we see the top five Facebook status words for each gender, as well as similar findings gleaned from a range of different social media.
CORPUS
WORDS WITH LOPSIDED USAGE BY MALES
WORDS WITH LOPSIDED USAGE BY FEMALES
Facebook Status
Fuck, League, Shit, Fucking, Shave
Shopping, Excited, <3, Boyfriend, Cute
Chatroom Emoticons
;)
:D
Twitter Assent or Negation Terms
Yessir, Nah, Nobody, Ain’t
Okay, Yes, Yess, Yesss, Yessss, Nooo, Noooo, Cannot
Blogs
Linux, Microsoft, Gaming, Server, Software
Hubby, Husband, Adorable, Skirt, Boyfriend
The studies from which these findings originate are listed in the Notes section on page 262.
The words revealed by the second two selections—actually, everything, their, above, something, the—are very different at first glance. They don’t fall into one gender stereotype or another; rather they’re function words that everyone uses. But in a 2003 paper, computer scientists looked into gender differences in writing by examining samples from the British National Corpus (both fiction and nonfiction), and they came back with some curious results. Their biggest findings dealt with these very small words. For instance, they claimed that across all genres of writing “females use many more pronouns” (I, yourself, their) and males use “many more noun specifiers” (a, this, these).
The notion that such a general conclusion could be drawn using entirely mundane words sounds absurd. However, the paper went on to show that by using the frequencies of just a few dozen tiny words, the authors were able to create an algorithm that accurately predicted an author’s gender 80 % of the time when examining randomized documents. That’s a huge percentage—all based on tiny words like the six found in Selections 3 and 4. (And, for the record, you’d want to bet that Selection 3 was the one penned by a female author and that Selection 4 was written by a male.)
Both of these statistical methods rely on broad generalizations, but after reading about them in depth, I was curious. How do these ways of measuring and predicting gender hold up if we apply them to even bigger, tougher samples? And can they reveal anything interesting about the state of literature?
I decided to find out what they’d come up with when we compare men and women in classic and popular fiction. What words or books would show up as the “most male” or “most female”? And would the findings of that 2003 paper be able to predict a novelist’s gender with any reliability?
To explore these questions, I’ve identified three samples of books that we’ll come back to throughout the rest of this chapter: classics, recent bestsellers, and recent literary fiction. I will refer to them in shorthand for the rest of the text, but the exact rules I’ve used to derive each sample are below (and full lists can be found in the Notes section at the end of the book).
Classic Literature
I started with Stanford librarian Brian Kunde’s composite list of “The Best English Language Fiction of the Twentieth Century.” Kunde combined the results of many literature polls to compile the list. From this aggregate list I took the top fifty novels written by men and the top fifty written by women. These are the type of books you’d see from authors like Ernest Hemingway, Willa Cather, William Faulkner, or Toni Morrison.
Modern Popular Fiction
Starting with the end of 2014 and going backward, I found the last fifty number one New York Times fiction bestsellers written by men and the last fifty by women. I threw out any book that had a listed co-author. The resulting collection consists of books written by blockbuster authors like Nora Roberts, Stephen King, Jodi Picoult, and James Patterson.
Modern Literary Fiction
Starting with awards given at the end of 2014 and looking backwards, I found the last fifty novels written by men and the last fifty novels written by women that were on any of the following lists: New York Times Top Ten Books of the Year, Pulitzer Prize finalists, Man Booker Prize short list, National Book Award finalists, National Book Critics Circle finalists, and Time magazine’s best books of the year. The end sample of 100 books, ranging from 2009 to 2014, includes many authors like Jennifer Egan, Jonathan Franzen, Michael Chabon, and Zadie Smith.
* * *
I started with the first method given above, looking at the words used most disproportionately by each gender. This technique tends to find the extremes and stereotypes—the shopping and shave words that one gender uses most often compared to how rarely the other gender uses them.
For example, in the Classic Literature section, the word dress was used 2,069 total times and at least once in 97 of the 100 books. It was used at an average rate just over once per 10,000 words. While 35 female authors used it above this rate, only seven male authors did. I used this ratio to give it a score of 83 % (35:7) female, and all of the top words were determined by this ratio.
I chose this methodology, instead of a pure ratio, to control for the fact that some authors use specific words often to suit their plot. For instance, Tolkien used the word ring over 750 times in the three Lord of the Rings books (which counted as a single entity in the “classic book” sample). That’s more times total than it was used in all fifty classics by women. However, Lord of the Rings excluded, there is no evidence that men use the word ring more often than women (rather in all other books in the sample, female authors use it about twice as often).
I also restricted the “most male” and “most female” words to those that were used in at least fifty books in the sample of 100, so that they weren’t outliers. With these rules in mind, here are the words with the most extreme imbalance in the 100 classics I’ve gathered:
Most Gender-Indicative Words in Classic Literature
MALE
FEMALE
Chief
Pillows
Rear
Lace
Civil
Curls
Bigger
Dress
Absolutely
China
Enemy
Skirt
Fellows
Curtains
King
Cups
Public
Sheets
Contact
Shrugged
Many of these words appear to be driven by the plot and scope of the books they’re contained within: The male words tend to skew toward the military or governmental while the female words tend to skew toward the domestic. While fifty books by each gender is a small sample, writing turns out be consistent enough over time (or at least the subjects that writers of different genders choose to write about are consistent enough) that these words remain indicators even in modern literary and popular fiction. Chief is used more by men than women in popular fiction and modern literary fiction. Pillows is used more by women in popular fiction and modern literary fiction.
Of the 20 words above, 16 of them fall on the same side of the gender-skewed aisle in popular fiction. In literary fiction, 18 of them do. If these words were distributed in a random fashion, perhaps half would fall on the other side of the divide when we look at different samples. Even in a modest-sized sample, these words are consistently used more often by one gender than the other.
These words show the lopsidedness of certain topics, but not necessarily a difference between how the two sexes write. So next, I decided to try out the second, eerier method—that of pr
edicting gender using simple words like it and is—to see if it had any effectiveness in novels. Based on that original 2003 paper, a computer programmer named Neal Krawetz developed a quick system to guess the gender of an author using just 51 words. Drawing from the paper, Krawetz chose very common and ordinary words. Twenty-four were used at a higher rate among male writers: a, above, are, around, as, at, below, ever, good, in, is, it, many, now, said, some, something, the, these, this, to, well, what, and who. Twenty-seven words were used at a higher rate by female writers: actually, am, and, be, because, but, everything, has, her, hers, him, if, like, more, not, out, she, should, since, so, too, was, we, when, where, with, and your. For his simplified method of guessing gender, each word was given a point value based on its relative predictive value (these is +8 for male, while since is +25 for female). Each time a word is used in the text its point value is counted until a final total is tabulated.
If I wrote, The method is simple and crude, the algorithm would give that a male-female ratio of 91 % (+24 male for the, +18 male for is, +4 female for and) whereas The method is not too complicated would have a male-female ratio of 39 % (+24 male for the, +18 male for is, +27 female for not, +38 female for too).
It might be useful to reflect for a second on just how basic this system is. There is no context or other interference involved at all. It would guess that the sentence This sentence is written by a woman was written by a man. This is not a nuanced method. For the results below I removed the pronouns her, hers, him, and she to make sure it was not relying on simple gender giveaways. (Later in this chapter we’ll see how indicative gendered pronouns are in identifying the authors of fiction.) Without them, all that’s left is a handful of seemingly gender-neutral pronouns, conjunctions, and identifiers.