Nabokov's Favorite Word Is Mauve
Page 4
The computer scientists in the 2003 paper managed to predict an author’s gender at a rate of 80 %. I wasn’t able to match that when applying Krawetz’s method to classic literature. However, the system does (perhaps remarkably) succeed, managing to perform significantly better than chance. The simple method identified the correct author gender for the 100 classic books 58 times. Of the 100 modern number one bestsellers, it was right 66 times; and for the modern literary novels, it was right 58 times. Below are the classic books that it ranks as being most likely to have been written by a male or female author. In other words, these are the “most masculine” and “most feminine” novels, according to Krawetz’s method.
MOST MASCULINE CLASSIC NOVELS
MOST FEMININE CLASSIC NOVELS
A Portrait of the Artist as a Young Man—James Joyce
Ellen Foster—Kaye Gibbons
Charlotte’s Web—E. B. White
Rubyfruit Jungle—Rita Mae Brown
Orlando—Virginia Woolf*
A Clockwork Orange—Anthony Burgess*
Animal Farm—George Orwell
Their Eyes Were Watching God—Zora Neale Hurston
The Shipping News—Annie Proulx*
The Catcher in the Rye—J. D. Salinger*
Winesburg, Ohio—Sherwood Anderson
Bastard Out of Carolina—Dorothy Allison
Lord of the Flies—William Golding
The Color Purple—Alice Walker
Atlas Shrugged—Ayn Rand*
Wide Sargasso Sea—Jean Rhys
Death Comes for the Archbishop—Willa Cather*
Lady Chatterley’s Lover—D. H. Lawrence*
The Sun Also Rises—Ernest Hemingway
The Death of the Heart—Elizabeth Bowen
* Indicates the author is of the opposite gender from what the numbers predicted.
And here is the breakdown of recent bestsellers’ most masculine and feminine titles (using the Modern Popular Fiction sample). In this sample, there is just one error in the method’s top twenty categorizations.
MOST MASCULINE #1 BESTSELLERS
MOST FEMININE #1 BESTSELLERS
Inferno—Dan Brown
Kiss the Dead—Laurell K. Hamilton
The Fallen Angel—Daniel Silva
Power Play—Danielle Steel
The English Girl—Daniel Silva
Hit List—Laurell K. Hamilton
The Heist—Daniel Silva
Until the End of Time—Danielle Steel
Act of War—Brad Thor
Gone Girl—Gillian Flynn
Flash and Bones—Kathy Reichs*
Big Little Lies—Liane Moriarty
The First Phone Call from Heaven—Mitch Albom
Dead Ever After—Charlaine Harris
Kill Alex Cross—James Patterson
New York to Dallas—J. D. Robb
Cross My Heart—James Patterson
Frost Burned—Patricia Briggs
The Time Keeper—Mitch Albom
Dead Reckoning—Charlaine Harris
* Indicates the author is of the opposite gender from what the numbers predicted.
In the original paper, the researchers—all men, who were perhaps fearful of being derided for any politically incorrect theories—wrote that they would not speculate on the reasons for gender differences “to avoid baseless speculation with regard to interpretation of the data.” They did, however, go on to point to papers, written in the 1980s and 1990s, which looked at the differences in male and female conversations. The theories proposed in these papers held that men used more “informational” language, about objects, while women used more “involved” language, about relationships. (In their words, “ ‘Involved’ documents contain features which typically show interaction between the speaker/writer and the listener/reader, such as first and second person pronouns.”)
Such generalizations hint at how these words, even if they seem context free, may also be tapping into the same extremes and gender norms that our first example latched on to. For instance, above, around, and below were all male indicative words, and these are clearly “informational words” by the standards of the researchers. However, it’s unclear whether these words are used more by males because, as a general rule, males “are” more informational—or whether male authors, for all sorts of cultural and historical reasons, choose to write more stories on war and physical action that require that type of detail.
You may not be familiar with all the books in the list above, but you probably know that Dan Brown writes thrillers about global conspiracies while Danielle Steel writes romances about the rich and famous. That difference in genre could explain the effectiveness of Krawetz’s seemingly neutral method. Even though the 22 books in the Alex Cross series by James Patterson and 27 books in the Anita Blake series by Laurell K. Hamilton are all thrillers at the core, Hamilton’s series relies much more on Blake’s romantic conflict while Patterson’s series only features Cross’s love life as a much smaller subplot.
There’s a chicken-and-egg type dilemma that sets in when trying, as researchers have for decades, to pinpoint the mysterious stylistic DNA that differentiates male writing from female writing. Even the subtler predictive method seen here seems to be reaching back to an author’s initial decision about subject matter—what and who we choose to focus on in our stories—rather than revealing something more fundamental. It may seem odd that an 80 % accuracy rate or even a 60 % rate in novels is achievable. But consider what the following example says about who authors of different genders choose to write about.
* * *
In his novel Chance, Joseph Conrad wrote, “Being a woman is a terribly difficult trade since it consists principally of dealings with men.”
It’s a line that can be read as both sympathetic and unsympathetic. On the one hand, Conrad is showing awareness of unfair inequality, showing empathy for what women often put up with at the hands of men. Yet at the same time the one-liner suffers from a conceptual block, implying that a woman’s main ends are not self-contained but dependent on men. It reinforces that initial inequality: women as secondary, men as primary.
In each of Conrad’s 14 novels the main character is male. This shows a disparity already. But a book with one main character is limited to making just one choice about that character’s gender. I wanted to find a better metric for counting the balance of male and female characters in a novel.
After deliberating over more complex methods, I settled on a simple one: the ratio of uses of he compared to uses of she. It’s not perfect, but I think it can give you a sense of the gender skew in any book. The count of he versus she gives a rough breakdown of how the actions, thoughts, and descriptions of male characters match up against female characters.
For example, look at The Hobbit. Tolkien used he just under 1,900 times in the book. How many times did he use she? Once. It was toward the beginning, referring to Mrs. Bilbo Baggins. If you’ve read The Hobbit it’s not a stretch to say that it’s 99.9 % male. Everyone we see—all the elves, dwarves, hobbits, and even the birds—is male.
Joseph Conrad, who wrote his novels at the start of the twentieth century, also skewed male. In all 14 of his novels Conrad used the word he three times more often than she. For every three occasions wherein he described the actions, thoughts, or qualities of a man, he described a woman just once.
I decided to cast a wider net, looking at all 100 novels on our classic literature list. The following chart shows the he versus she percentage of each book. If, like Conrad, an author uses he three times for every she, the corresponding percentage would come out to 75 %. The dashed line indicates an even 50-50 split.
Books written by men are represented by black bars. Books written by women are represented by purple bars. If you take a minute to look through, it’s clear that the books with the most female mentions are written by women and the books with the most male mentions are written by men.
But saying most authors just prefer to wr
ite about their own gender would be an oversimplification. First, the most female-focused books are nowhere near as lopsided as the extreme male-focused books. The Prime of Miss Brodie was 21 % he and 79 % she. That’s the extreme example on the female side. Meanwhile, a book with the opposite split—79 % he and 21 % she—is in the middle of the pack on the male side. There are twenty books with more extreme male ratios.
Within classic literature by men, she was used over 48,000 times, while he was used 108,000 times. There is a huge discrepancy in the characters that male authors are describing. But the reverse is not true. In classic literature by women, she was used 89,000 times while he was used 90,000 times. The near identical rates of pronoun usage illustrate that in books by female authors, men and women are described at close to equal rates. Yet male authors include women less than half as often as they write about men.
The degree to which authors prefer writing about their own gender can be seen with the breakdown below.
• Of the 50 classic books by men, 44 used he more than she and 6 did the opposite.
• Of the 50 classic books by women, 29 used she more than he and 21 did the opposite.
Classic literature by men is about men by a quantifiable and overwhelming margin. Classic literature by women is about women more than men, but it’s within a short distance of an even split.
The chart on the opposite page shows the imbalance, looking at the number of books by each gender that fall within a given he:she ratio. For instance, 13 classics by male authors use he over she between 80–90 % of the time while just one classic by a woman does. The average rate for female writers is right around 50 % while the male rate is much higher.
At this point you might be thinking that much of the disparity could be a result of the era when these books were written. Sure, Joseph Conrad wrote three times as much about men than women, but he wrote more than 100 years ago.
It’s also notable that these “classics” were determined by rankings done by select individuals at organizations like the American Library Association and publications like Library Journal. Not every list was transparent in the selection process, so we don’t know the gender makeup of the people ranking the books. And even if you were to assume there was no existing bias today, books from the female perspective could have had more difficulty gaining “classic” status in the early twentieth century because of a bias then. If they were not critically popular in their day, the books would have to overcome much more to be on the minds of any literary scholar today.
However, looking at other selections of books that are contemporary, that are not just curated by a small group of individuals retrospectively, the trend is strikingly similar.
• Of the 50 recent New York Times bestsellers by men, 45 used he more than she and 5 did the opposite.
• Of the 50 recent New York Times bestsellers by women, 17 used she more than he and 33 did the opposite.
• Of the 50 modern literary books by men, 42 used he more than she and 8 did the opposite.
• Of the 50 modern literary books by women, 23 used she more than he and 27 did the opposite.
And once again, in the New York Times bestseller list and the modern literary sample, no female writer ever went so extreme as to use less than 20 % he. The same cannot be said of male writers.
Elmore Leonard once said of his writing, “Sometimes female characters start out as the wife or girlfriend, but then I realize, ‘No, she’s the book,’ and she becomes a main character. I surrender the book to her.” However, digging into the data, I don’t think Leonard did as well living up to this idea as he thought. Leonard wrote 45 novels. In not one book did he write she more than he.
This does not mean Leonard did not write a few strong, original female characters, but in each of 45 cases the book was male dominated. You might be familiar with Leonard’s Rum Punch (or its Tarantino adaptation Jackie Brown), which features the protagonist Jackie Burke. She might be an unforgettable female lead, but since the book has a he/she split of two to one she’s very much still living in a “man’s world,” as all of Leonard’s novels are.
I don’t mean to single out Leonard here. Many other successful writers have written only books that have a male focus. I imagine this list could be as long as you wanted it to be if you kept searching, but by my count, it includes at least the following: Joseph Conrad (14 of 14 novels), Theodore Dreiser (8 of 8), William Faulkner (19 of 19), F. Scott Fitzgerald (4 of 4), Ernest Hemingway (10 of 10), James Joyce (3 of 3), John Steinbeck (19 of 19), Kurt Vonnegut (14 of 14), Salman Rushdie (9 of 9), Jack London (20 of 20), William Gaddis (5 of 5), Elmore Leonard (45 of 45), Jonathan Franzen (4 of 4), Charles Dickens (20 of 20), Michael Chabon (7 of 7), John Cheever (5 of 5), Herman Melville (9 of 9), Cormac McCarthy (10 of 10), and Ray Bradbury (11 of 11).
It’s harder to find anyone who does the opposite. Willa Cather, Toni Morrison, Ayn Rand, Edith Wharton, Alice Walker, Gillian Flynn, Virginia Woolf, Charlotte Brontë, Zadie Smith, Agatha Christie, and Jennifer Egan have all written at least one book using he more than she. Jane Austen, author of six novels, including Pride and Prejudice, is the one writer I could find who never wrote a book with he more than she.
You might be familiar with the Bechdel test, a checklist test to determine if a work of fiction (most often, a movie) shows gender bias. The requirements, on paper, sound simple. In order to “pass” the test, the work must (A) include at least two women, (B) who talk to each other, (C) about something other than a man. The website bechdeltest.com tabulates movies according to the Bechdel test and, as of my writing this, lists 220 movies that were released in 2014. Of these 220 movies a total of 91 failed.
The he:she ratio in novels is revealing enough to unmask bias as well, and I think a rule can be built upon it that shows whether a given book skews too far one way or the other. Inspired in part by the Bechdel test, this metric is meant to be a firm answer to the question of whether a novel has a clear gender imbalance. This he:she ratio is also better fitted to a novel than the Bechdel test. Books, unlike films, aren’t necessarily constructed with a series of scenes with different combinations of characters. Using the pronoun ratio is a singular check that can be calculated in an instant.
My test is simple: If a novel describes male actions three times as much as female actions, it fails the quantitative test. If a book describes female actions three times as much as male actions, it fails also.
The three-to-one (or 75 %) barrier is an arbitrary cutoff. The lopsided ratio is in line with Conrad’s extreme imbalance. Knowing this ratio, it feels unsettling to pick up a book and be fully aware that for every three actions or descriptions of a man there will be just one mention of a woman (or vice versa).
Many great books fail this test. I’m aware of that. The chart on pages 42–43 shows the number of books which skew past the 25–75 split one way or another. Two classics, both by women, would fail for being too female heavy (The Prime of Miss Brodie by Muriel Spark and Talk Before Sleep by Elizabeth Berg). Meanwhile 27 classics exceed the 75 % male threshold. Twenty-four are by men and just three (Death Comes for the Archbishop by Willa Cather, The Good Earth by Pearl Buck, and Ordinary People by Judith Guest) are penned by women.
I realize many people reading this might believe that a quantitative he:she test holds no bearing on the success of a novel. The Old Man and the Sea fails. It’s a story with just a few characters. Other than the old man, fishermen, and a marlin there isn’t much else. With 99 % he, it fails the test hard.
But it should be noted that The Old Man and the Sea would not come close to passing the Bechdel test either. Does that mean the book must be considered sexist? No. It’s a book in part about isolation, so the lack of interaction is important. A book can fail either test and still be great, but there should be a justification.
For many of us, novels are a portal, a way of exploring the broader world and understanding how people act within it. We live in a world where one in two people are
women. There’s no reason to think that every novel must be in lockstep with this ratio, especially if the setting is unique. But if you are a reader and every book you read doesn’t even achieve a one-in-four ratio, chances are you’re not getting a true reflection of, or gaining a true appreciation for, how other people act in the world.
* * *
Popular crime writer P. D. James has said, “All fiction is largely autobiographical and much autobiography is, of course, fiction.” Word frequencies can’t help us explore the second part, but the first half of her quote, that all fiction is autobiographical, deals with the notion that all writers, consciously or unconsciously, write characters based on some part of themselves. In exploring this possibility, word frequencies offer a portal into an author’s mind.
Her theory, however playfully it was meant, does help explain why the gender balance of male authors is so skewed toward male characters. Writing characters based on yourself does have one advantage: It gives writers a chance to write about what they know.
However, I can’t imagine any writer advising a novelist to only include characters that are based on their own personality. If anything, the skill of a great writer is to create good, believable characters from different backgrounds and with different motivating forces. Gender is one of the biggest dividing lines between characters and one of the trickier challenges for some writers. Writers need to make characters of the opposite sex believable, which means needing to engage with cultural norms and vocabularies that meet with readers’ general expectations. At the same time, playing into stereotypes or oversimplifying is the quickest way to turn off readers from continuing.