Everybody Lies

Home > Other > Everybody Lies > Page 1
Everybody Lies Page 1

by Seth Stephens-Davidowitz




  DEDICATION

  To Mom and Dad

  CONTENTS

  Cover

  Title Page

  Dedication

  Foreword by Steven Pinker

  Introduction: The Outlines of a Revolution

  PART I: DATA, BIG AND SMALL

  1. Your Faulty Gut

  PART II: THE POWERS OF BIG DATA

  2. Was Freud Right?

  3. Data Reimagined

  Bodies as Data

  Words as Data

  Pictures as Data

  4. Digital Truth Serum

  The Truth About Sex

  The Truth About Hate and Prejudice

  The Truth About the Internet

  The Truth About Child Abuse and Abortion

  The Truth About Your Facebook Friends

  The Truth About Your Customers

  Can We Handle the Truth?

  5. Zooming In

  What’s Really Going On in Our Counties, Cities, and Towns?

  How We Fill Our Minutes and Hours

  Our Doppelgangers

  Data Stories

  6. All the World’s a Lab

  The ABCs of A/B Testing

  Nature’s Cruel—but Enlightening—Experiments

  PART III: BIG DATA: HANDLE WITH CARE

  7. Big Data, Big Schmata? What It Cannot Do

  The Curse of Dimensionality

  The Overemphasis on What Is Measurable

  8. Mo Data, Mo Problems? What We Shouldn’t Do

  The Danger of Empowered Corporations

  The Danger of Empowered Governments

  Conclusion: How Many People Finish Books?

  Acknowledgments

  Notes

  Index

  About the Author

  Copyright

  About the Publisher

  FOREWORD

  Ever since philosophers speculated about a “cerebroscope,” a mythical device that would display a person’s thoughts on a screen, social scientists have been looking for tools to expose the workings of human nature. During my career as an experimental psychologist, different ones have gone in and out of fashion, and I’ve tried them all—rating scales, reaction times, pupil dilation, functional neuroimaging, even epilepsy patients with implanted electrodes who were happy to while away the hours in a language experiment while waiting to have a seizure.

  Yet none of these methods provides an unobstructed view into the mind. The problem is a savage tradeoff. Human thoughts are complex propositions; unlike Woody Allen speed-reading War and Peace, we don’t just think “It was about some Russians.” But propositions in all their tangled multidimensional glory are difficult for a scientist to analyze. Sure, when people pour their hearts out, we apprehend the richness of their stream of consciousness, but monologues are not an ideal dataset for testing hypotheses. On the other hand, if we concentrate on measures that are easily quantifiable, like people’s reaction time to words, or their skin response to pictures, we can do the statistics, but we’ve pureed the complex texture of cognition into a single number. Even the most sophisticated neuroimaging methodologies can tell us how a thought is splayed out in 3-D space, but not what the thought consists of.

  As if the tradeoff between tractability and richness weren’t bad enough, scientists of human nature are vexed by the Law of Small Numbers—Amos Tversky and Daniel Kahneman’s name for the fallacy of thinking that the traits of a population will be reflected in any sample, no matter how small. Even the most numerate scientists have woefully defective intuitions about how many subjects one really needs in a study before one can abstract away from the random quirks and bumps and generalize to all Americans, to say nothing of Homo sapiens. It’s all the iffier when the sample is gathered by convenience, such as by offering beer money to the sophomores in our courses.

  This book is about a whole new way of studying the mind. Big Data from internet searches and other online responses are not a cerebroscope, but Seth Stephens-Davidowitz shows that they offer an unprecedented peek into people’s psyches. At the privacy of their keyboards, people confess the strangest things, sometimes (as in dating sites or searches for professional advice) because they have real-life consequences, at other times precisely because they don’t have consequences: people can unburden themselves of some wish or fear without a real person reacting in dismay or worse. Either way, the people are not just pressing a button or turning a knob, but keying in any of trillions of sequences of characters to spell out their thoughts in all their explosive, combinatorial vastness. Better still, they lay down these digital traces in a form that is easy to aggregate and analyze. They come from all walks of life. They can take part in unobtrusive experiments which vary the stimuli and tabulate the responses in real time. And they happily supply these data in gargantuan numbers.

  Everybody Lies is more than a proof of concept. Time and again my preconceptions about my country and my species were turned upside-down by Stephens-Davidowitz’s discoveries. Where did Donald Trump’s unexpected support come from? When Ann Landers asked her readers in 1976 whether they regretted having children and was shocked to find that a majority did, was she misled by an unrepresentative, self-selected sample? Is the internet to blame for that redundantly named crisis of the late 2010s, the “filter bubble”? What triggers hate crimes? Do people seek jokes to cheer themselves up? And though I like to think that nothing can shock me, I was shocked aplenty by what the internet reveals about human sexuality—including the discovery that every month a certain number of women search for “humping stuffed animals.” No experiment using reaction time or pupil dilation or functional neuroimaging could ever have turned up that fact.

  Everybody will enjoy Everybody Lies. With unflagging curiosity and an endearing wit, Stephens-Davidowitz points to a new path for social science in the twenty-first century. With this endlessly fascinating window into human obsessions, who needs a cerebroscope?

  —Steven Pinker, 2017

  INTRODUCTION

  THE OUTLINES OF A REVOLUTION

  Surely he would lose, they said.

  In the 2016 Republican primaries, polling experts concluded that Donald Trump didn’t stand a chance. After all, Trump had insulted a variety of minority groups. The polls and their interpreters told us few Americans approved of such outrages.

  Most polling experts at the time thought that Trump would lose in the general election. Too many likely voters said they were put off by his manner and views.

  But there were actually some clues that Trump might actually win both the primaries and the general election—on the internet.

  I am an internet data expert. Every day, I track the digital trails that people leave as they make their way across the web. From the buttons or keys we click or tap, I try to understand what we really want, what we will really do, and who we really are. Let me explain how I got started on this unusual path.

  The story begins—and this seems like ages ago—with the 2008 presidential election and a long-debated question in social science: How significant is racial prejudice in America?

  Barack Obama was running as the first African-American presidential nominee of a major party. He won—rather easily. And the polls suggested that race was not a factor in how Americans voted. Gallup, for example, conducted numerous polls before and after Obama’s first election. Their conclusion? American voters largely did not care that Barack Obama was black. Shortly after the election, two well-known professors at the University of California, Berkeley pored through other survey-based data, using more sophisticated data-mining techniques. They reached a similar conclusion.

  And so, during Obama’s presidency, this became the conventional wisdom in many parts of the media and in large swaths of the academy. The sources tha
t the media and social scientists have used for eighty-plus years to understand the world told us that the overwhelming majority of Americans did not care that Obama was black when judging whether he should be their president.

  This country, long soiled by slavery and Jim Crow laws, seemed finally to have stopped judging people by the color of their skin. This seemed to suggest that racism was on its last legs in America. In fact, some pundits even declared that we lived in a post-racial society.

  In 2012, I was a graduate student in economics, lost in life, burnt-out in my field, and confident, even cocky, that I had a pretty good understanding of how the world worked, of what people thought and cared about in the twenty-first century. And when it came to this issue of prejudice, I allowed myself to believe, based on everything I had read in psychology and political science, that explicit racism was limited to a small percentage of Americans—the majority of them conservative Republicans, most of them living in the deep South.

  Then, I found Google Trends.

  Google Trends, a tool that was released with little fanfare in 2009, tells users how frequently any word or phrase has been searched in different locations at different times. It was advertised as a fun tool—perhaps enabling friends to discuss which celebrity was most popular or what fashion was suddenly hot. The earliest versions included a playful admonishment that people “wouldn’t want to write your PhD dissertation” with the data, which immediately motivated me to write my dissertation with it.*

  At the time, Google search data didn’t seem to be a proper source of information for “serious” academic research. Unlike surveys, Google search data wasn’t created as a way to help us understand the human psyche. Google was invented so that people could learn about the world, not so researchers could learn about people. But it turns out the trails we leave as we seek knowledge on the internet are tremendously revealing.

  In other words, people’s search for information is, in itself, information. When and where they search for facts, quotes, jokes, places, persons, things, or help, it turns out, can tell us a lot more about what they really think, really desire, really fear, and really do than anyone might have guessed. This is especially true since people sometimes don’t so much query Google as confide in it: “I hate my boss.” “I am drunk.” “My dad hit me.”

  The everyday act of typing a word or phrase into a compact, rectangular white box leaves a small trace of truth that, when multiplied by millions, eventually reveals profound realities. The first word I typed in Google Trends was “God.” I learned that the states that make the most Google searches mentioning “God” were Alabama, Mississippi, and Arkansas—the Bible Belt. And those searches are most frequently on Sundays. None of which was surprising, but it was intriguing that search data could reveal such a clear pattern. I tried “Knicks,” which it turns out is Googled most in New York City. Another no-brainer. Then I typed in my name. “We’re sorry,” Google Trends informed me. “There is not enough search volume” to show these results. Google Trends, I learned, will provide data only when lots of people make the same search.

  But the power of Google searches is not that they can tell us that God is popular down South, the Knicks are popular in New York City, or that I’m not popular anywhere. Any survey could tell you that. The power in Google data is that people tell the giant search engine things they might not tell anyone else.

  Take, for example, sex (a subject I will investigate in much greater detail later in this book). Surveys cannot be trusted to tell us the truth about our sex lives. I analyzed data from the General Social Survey, which is considered one of the most influential and authoritative sources for information on Americans’ behaviors. According to that survey, when it comes to heterosexual sex, women say they have sex, on average, fifty-five times per year, using a condom 16 percent of the time. This adds up to about 1.1 billion condoms used per year. But heterosexual men say they use 1.6 billion condoms every year. Those numbers, by definition, would have to be the same. So who is telling the truth, men or women?

  Neither, it turns out. According to Nielsen, the global information and measurement company that tracks consumer behavior, fewer than 600 million condoms are sold every year. So everyone is lying; the only difference is by how much.

  The lying is in fact widespread. Men who have never been married claim to use on average twenty-nine condoms per year. This would add up to more than the total number of condoms sold in the United States to married and single people combined. Married people probably exaggerate how much sex they have, too. On average, married men under sixty-five tell surveys they have sex once a week. Only 1 percent say they have gone the past year without sex. Married women report having a little less sex but not much less.

  Google searches give a far less lively—and, I argue, far more accurate—picture of sex during marriage. On Google, the top complaint about a marriage is not having sex. Searches for “sexless marriage” are three and a half times more common than “unhappy marriage” and eight times more common than “loveless marriage.” Even unmarried couples complain somewhat frequently about not having sex. Google searches for “sexless relationship” are second only to searches for “abusive relationship.” (This data, I should emphasize, is all presented anonymously. Google, of course, does not report data about any particular individual’s searches.)

  And Google searches presented a picture of America that was strikingly different from that post-racial utopia sketched out by the surveys. I remember when I first typed “nigger” into Google Trends. Call me naïve. But given how toxic the word is, I fully expected this to be a low-volume search. Boy, was I wrong. In the United States, the word “nigger”—or its plural, “niggers”—was included in roughly the same number of searches as the word “migraine(s),” “economist,” and “Lakers.” I wondered if searches for rap lyrics were skewing the results? Nope. The word used in rap songs is almost always “nigga(s).” So what was the motivation of Americans searching for “nigger”? Frequently, they were looking for jokes mocking African-Americans. In fact, 20 percent of searches with the word “nigger” also included the word “jokes.” Other common searches included “stupid niggers” and “I hate niggers.”

  There were millions of these searches every year. A large number of Americans were, in the privacy of their own homes, making shockingly racist inquiries. The more I researched, the more disturbing the information got.

  On Obama’s first election night, when most of the commentary focused on praise of Obama and acknowledgment of the historic nature of his election, roughly one in every hundred Google searches that included the word “Obama” also included “kkk” or “nigger(s).” Maybe that doesn’t sound so high, but think of the thousands of nonracist reasons to Google this young outsider with a charming family about to take over the world’s most powerful job. On election night, searches and sign-ups for Stormfront, a white nationalist site with surprisingly high popularity in the United States, were more than ten times higher than normal. In some states, there were more searches for “nigger president” than “first black president.”

  There was a darkness and hatred that was hidden from the traditional sources but was quite apparent in the searches that people made.

  Those searches are hard to reconcile with a society in which racism is a small factor. In 2012 I knew of Donald J. Trump mostly as a businessman and reality show performer. I had no more idea than anyone else that he would, four years later, be a serious presidential candidate. But those ugly searches are not hard to reconcile with the success of a candidate who—in his attacks on immigrants, in his angers and resentments—often played to people’s worst inclinations.

  The Google searches also told us that much of what we thought about the location of racism was wrong. Surveys and conventional wisdom placed modern racism predominantly in the South and mostly among Republicans. But the places with the highest racist search rates included upstate New York, western Pennsylvania, eastern Ohio, industrial Michigan and rural
Illinois, along with West Virginia, southern Louisiana, and Mississippi. The true divide, Google search data suggested, was not South versus North; it was East versus West. You don’t get this sort of thing much west of the Mississippi. And racism was not limited to Republicans. In fact, racist searches were no higher in places with a high percentage of Republicans than in places with a high percentage of Democrats. Google searches, in other words, helped draw a new map of racism in the United States—and this map looked very different from what you may have guessed. Republicans in the South may be more likely to admit to racism. But plenty of Democrats in the North have similar attitudes.

  Four years later, this map would prove quite significant in explaining the political success of Trump.

  In 2012, I was using this map of racism I had developed using Google searches to reevaluate exactly the role that Obama’s race played. The data was clear. In parts of the country with a high number of racist searches, Obama did substantially worse than John Kerry, the white Democratic presidential candidate, had four years earlier. The relationship was not explained by any other factor about these areas, including education levels, age, church attendance, or gun ownership. Racist searches did not predict poor performance for any other Democratic candidate. Only for Obama.

  And the results implied a large effect. Obama lost roughly 4 percentage points nationwide just from explicit racism. This was far higher than might have been expected based on any surveys. Barack Obama, of course, was elected and reelected president, helped by some very favorable conditions for Democrats, but he had to overcome quite a bit more than anyone who was relying on traditional data sources—and that was just about everyone—had realized. There were enough racists to help win a primary or tip a general election in a year not so favorable to Democrats.

  My study was initially rejected by five academic journals. Many of the peer reviewers, if you will forgive a little disgruntlement, said that it was impossible to believe that so many Americans harbored such vicious racism. This simply did not fit what people had been saying. Besides, Google searches seemed like such a bizarre dataset.

 

‹ Prev