Everybody Lies
Page 23
Now might be a good time to talk a bit about my writing process. I am not a particularly verbose writer. This book is only about seventy-five thousand words, which is a bit short for a topic as rich as this one.
But what I lack in breadth, I make up in obsessiveness. I spent five months on, and wrote forty-seven drafts of, my first New York Times sex column, which was two thousand words. Some chapters in this book took sixty drafts. I can spend hours finding the right word for a sentence in a footnote.
I lived much of my past year as a hermit. Just me and my computer. I lived in the hippest part of New York City and went out approximately never. This is, in my opinion, my magnum opus, the best idea I will have in my life. And I was willing to sacrifice whatever it took to make it right. I wanted to be able to defend every word in this book. My phone is filled with emails I forgot to respond to, e-vites I never opened, Bumble messages I ignored.*
After thirteen months of hard work, I was finally able to send in a near-complete draft. One part, however, was missing: the conclusion.
I explained to my editor, Denise, that it could take another few months. I told her six months was my most likely guess. The conclusion is, in my opinion, the most important part of the book. And I was only beginning to learn what makes a great conclusion. Needless to say, Denise was not pleased.
Then, one day, a friend of mine emailed me a study by Jordan Ellenberg. Ellenberg, a mathematician at the University of Wisconsin, was curious about how many people actually finish books. He thought of an ingenious way to test it using Big Data. Amazon reports how many people quote various lines in books. Ellenberg realized he could compare how frequently quotes were highlighted at the beginning of the book versus the end of the book. This would give a rough guide to readers’ propensity to make it to the end. By his measure, more than 90 percent of readers finished Donna Tartt’s novel The Goldfinch. In contrast, only about 7 percent made it through Nobel Prize economist Daniel Kahneman’s magnum opus, Thinking, Fast and Slow. Fewer than 3 percent, this rough methodology estimated, made it to the end of economist Thomas Piketty’s much discussed and praised Capital in the 21st Century. In other words, people tend not to finish treatises by economists.
One of the points of this book is we have to follow the Big Data wherever it leads and act accordingly. I may hope that most readers are going to hang on my every word and try to detect patterns linking the final pages to what happened earlier. But, no matter how hard I work on polishing my prose, most people are going to read the first fifty pages, get a few points, and move on with their lives.
Thus, I conclude this book in the only appropriate way: by following the data, what people actually do, not what they say. I am going to get a beer with some friends and stop working on this damn conclusion. Too few of you, Big Data tells me, are still reading.
ACKNOWLEDGMENTS
This book was a team effort.
These ideas were developed while I was a student at Harvard, a data scientist at Google, and a writer for the New York Times.
Hal Varian, with whom I worked at Google, has been a major influence on the ideas of this book. As best I can tell, Hal is perpetually twenty years ahead of his time. His book Information Rules, written with Carl Shapiro, basically predicted the future. And his paper “Predicting the Present,” with Hyunyoung Choi, largely started the Big Data revolution in the social sciences that is described in this book. He is also an amazing and kind mentor, as so many who have worked under him can attest. A classic Hal move is to do most of the work on a paper you are coauthoring with him and then insist that your name goes before his. Hal’s combination of genius and generosity is something I have rarely encountered.
My writing and ideas developed under Aaron Retica, who has been my editor for every single New York Times column. Aaron is a polymath. He somehow knows everything about music, history, sports, politics, sociology, economics, and God only knows what else. He is responsible for a huge amount of what is good about the Times columns that have my name on them. Other players on the team for these columns include Bill Marsh, whose graphics continue to blow me away, Kevin McCarthy, and Gita Daneshjoo. This book includes passages from these columns, reprinted with permission.
Steven Pinker, who kindly agreed to write the foreword, has long been a hero of mine. He has set the bar for a modern book on social science—an engaging exploration of the fundamentals of human nature, making sense of the best research from a range of disciplines. That bar is one I will be struggling to reach my entire life.
My dissertation, from which this book has grown, was written under my brilliant and patient advisers Alberto Alesina, David Cutler, Ed Glaeser, and Lawrence Katz.
Denise Oswald is an amazing editor. If you want to know how good her editing is, compare this final draft to my first draft—actually, you can’t do that because I am not going to ever show anyone else that embarrassing first draft. I also thank the rest of the team at HarperCollins, including Michael Barrs, Lynn Grady, Lauren Janiec, Shelby Meizlik, and Amber Oliver.
Eric Lupfer, my agent, saw potential in this project from the beginning, was instrumental in forming the proposal, and helped carry it through.
For superb fact-checking, I thank Melvis Acosta.
Other people from whom I learned a lot in my professional and academic life include Susan Athey, Shlomo Benartzi, Jason Bordoff, Danielle Bowers, David Broockman, Bo Cowgill, Steven Delpome, John Donohue, Bill Gale, Claudia Goldin, Suzanne Greenberg, Shane Greenstein, Steve Grove, Mike Hoyt, David Laibson, A.J. Magnuson, Dana Maloney, Jeffrey Oldham, Peter Orszag, David Reiley, Jonathan Rosenberg, Michael Schwarz, Steve Scott, Rich Shavelson, Michael D. Smith, Lawrence Summers, Jon Vaver, Michael Wiggins, and Qing Wu.
I thank Tim Requarth and NeuWrite for helping me develop my writing.
For help in interpreting studies, I thank Christopher Chabris, Raj Chetty, Matt Gentzkow, Solomon Messing, and Jesse Shapiro.
I asked Emma Pierson and Katia Sobolski if they might give advice on a chapter in my book. They decided, for reasons I do not understand, to offer to read the entire book—and give wise counsel on every paragraph.
My mother, Esther Davidowitz, read the entire book on multiple occasions and helped dramatically improve it. She also taught me, by example, that I should follow my curiosity, no matter where it led. When I was interviewing for an academic job, a professor grilled me: “What does your mother think of this work you do?” The idea was that my mom might be embarrassed that I was researching sex and other taboo topics. But I always knew she was proud of me for following my curiosity, wherever it led.
Many people read sections and offered helpful comments. I thank Eduardo Acevedo, Coren Apicella, Sam Asher, David Cutler, Stephen Dubner, Christopher Glazek, Jessica Goldberg, Lauren Goldman, Amanda Gordon, Jacob Leshno, Alex Peysakhovich, Noah Popp, Ramon Roullard, Greg Sobolski, Evan Soltas, Noah Stephens-Davidowitz, Lauren Stephens-Davidowitz, and Jean Yang. Actually, Jean was basically my best friend while I wrote this, so I thank her for that, too.
For help in collecting data, I thank Brett Goldenberg, James Rogers, and Mike Williams at MindGeek and Rob McQuown and Sam Miller at Baseball Prospectus.
I am grateful for financial support from the Alfred Sloan Foundation.
At one point, while writing this book, I was deeply stuck, lost, and close to abandoning the project. I then went to the country with my dad, Mitchell Stephens. Over the course of a week, Dad put me back together. He took me for walks in which we discussed love, death, success, happiness, and writing—and then sat me down so we could go over every sentence of the book. I could not have finished this book without him.
All remaining errors are, of course, my own.
NOTES
The pagination of this electronic edition does not match the edition from which it was created. To locate a specific entry, please use your e-book reader’s search tools.
INTRODUCTION
2 American voters largely did not care that Barack Obama: Kati
e Fretland, “Gallup: Race Not Important to Voters,” The Swamp, Chicago Tribune, June 2008.
2 Berkeley pored through: Alexandre Mas and Enrico Moretti, “Racial Bias in the 2008 Presidential Election,” American Economic Review 99, no. 2 (2009).
2 post-racial society: On the November 12, 2009, episode of his show, Lou Dobbs said we lived in a “post-partisan, post-racial society.” On the January 27, 2010, episode of his show, Chris Matthews said that President Obama was “post-racial by all appearances.” For other examples, see Michael C. Dawson and Lawrence D. Bobo, “One Year Later and the Myth of a Post-Racial Society,” Du Bois Review: Social Science Research on Race 6, no. 2 (2009).
5 I analyzed data from the General Social Survey: Details on all these calculations can be found on my website, sethsd.com, in the csv labeled “Sex Data.” Data from the General Social Survey can be found at http://gss.norc.org/.
5 fewer than 600 million condoms: Data provided to the author.
7 searches and sign-ups for Stormfront: Author’s analysis of Google Trends data. I also scraped data on all members of Stormfront, as discussed in Seth Stephens-Davidowitz, “The Data of Hate,” New York Times, July 13, 2014, SR4. The relevant data can be downloaded at sethsd.com, in the data section headlined “Stormfront.”
7 more searches for “nigger president” than “first black president”: Author’s analysis of Google Trends data. The states for which this is true include Kentucky, Louisiana, Arizona, and North Carolina.
9 rejected by five academic journals: The paper was eventually published as Seth Stephens-Davidowitz, “The Cost of Racial Animus on a Black Candidate: Evidence Using Google Search Data,” Journal of Public Economics 118 (2014). More details about the research can be found there. In addition, the data can be found at my website, sethsd.com, in the data section headlined “Racism.”
13 single factor that best correlated: “Strongest correlate I’ve found for Trump support is Google searches for the n-word. Others have reported this too” (February 28, 2016, tweet). See also Nate Cohn, “Donald Trump’s Strongest Supporters: A New Kind of Democrat,” New York Times, December 31, 2015, A3.
13 This shows the percent of Google searches that include the word “nigger(s).” Note that, because the measure is as a percent of Google searches, it is not arbitrarily higher in places with large populations or places that make a lot of searches. Note also that some of the differences in this map and the map for Trump support have obvious explanations. Trump lost popularity in Texas and Arkansas because they were the home states of two of his opponents, Ted Cruz and Mike Huckabee.
13 This is survey data from Civis Analytics from December 2015. Actual voting data is less useful here, since it is highly influenced by when the primary took place and the voting format. The maps are reprinted with permission from the New York Times.
15 2.5 million trillion bytes of data: “Bringing Big Data to the Enterprise,” IBM, https://www-01.ibm.com/software/data/bigdata/what-is-big-data.html.
17 needle comes in an increasingly larger haystack: Nassim M. Taleb, “Beware the Big Errors of ‘Big Data,’ ” Wired, February 8, 2013, http://www.wired.com/2013/02/big-data-means-big-errors-people.
18 neither racist searches nor membership in Stormfront: I examined how internet racism changed in parts of the country with high and low exposure to the Great Recession. I looked at both Google search rates for “nigger(s)” and Stormfront membership. The relevant data can be downloaded at sethsd.com, in the data sections headlined “Racial Animus” and “Stormfront.”
18 But Google searches reflecting anxiety: Seth Stephens-Davidowitz, “Fifty States of Anxiety,” New York Times, August 7, 2016, SR2. Note, while the Google searches do give much bigger samples, this pattern is consistent with evidence from surveys. See, for example, William C. Reeves et al., “Mental Illness Surveillance Among Adults in the United States,” Morbidity and Mortality Weekly Report Supplement 60, no. 3 (2011).
18 search for jokes: This is discussed in Seth Stephens-Davidowitz, “Why Are You Laughing?” New York Times, May 15, 2016, SR9. The relevant data can be downloaded at sethsd.com, in the data section headlined “Jokes.”
19 “my husband wants me to breastfeed him”: This is discussed in Seth Stephens-Davidowitz, “What Do Pregnant Women Want?” New York Times, May 17, 2014, SR6.
19 porn searches for depictions of women breastfeeding men: Author’s analysis of PornHub data.
19 Women make nearly as many: This is discussed in Seth Stephens-Davidowitz, “Searching for Sex,” New York Times, January 25, 2015, SR1.
20 “poemas para mi esposa embarazada”: Stephens-Davidowitz, “What Do Pregnant Women Want?”
21 Friedman says: I interviewed Jerry Friedman by phone on October 27, 2015.
21 sampling of all their data: Hal R. Varian, “Big Data: New Tricks for Econometrics,” Journal of Economic Perspectives 28, no. 2 (2014).
CHAPTER 1: YOUR FAULTY GUT
26 The best data science, in fact, is surprisingly intuitive: I am speaking about the corner of data analysis I know about—data science that tries to explain and predict human behavior. I am not speaking of artificial intelligence that tries to, say, drive a car. These methodologies, while they do utilize tools discovered from the human brain, are less easy to understand.
28 what symptoms predict pancreatic cancer: John Paparrizos, Ryan W. White, and Eric Horvitz, “Screening for Pancreatic Adenocarcinoma Using Signals from Web Search Logs: Feasibility Study and Results,” Journal of Oncology Practice (2016).
31 Winter climate swamped all the rest: This research is discussed in Seth Stephens-Davidowitz, “Dr. Google Will See You Now,” New York Times, August 11, 2013, SR12.
32 biggest dataset ever assembled on human relationships: Lars Backstrom and Jon Kleinberg, “Romantic Partnerships and the Dispersion of Social Ties: A Network Analysis of Relationship Status on Facebook,” in Proceedings of the 17th ACM Conference on Computer Supported Cooperative Work & Social Computing (2014).
33 people consistently rank: Daniel Kahneman, Thinking, Fast and Slow (New York: Farrar, Straus and Giroux, 2011).
33 asthma causes about seventy times more deaths: Between 1979 and 2010, on average, 55.81 Americans died from tornados and 4216.53 Americans died from asthma. See Annual U.S. Killer Tornado Statistics, National Weather Service, http://www.spc.noaa.gov/climo/torn/fatalmap.php and Trends in Asthma Morbidity and Mortality, American Lung Association, Epidemiology and Statistics Unit.
33 Patrick Ewing: My favorite Ewing videos are “Patrick Ewing’s Top 10 Career Plays,” YouTube video, posted September 18, 2015, https://www.youtube.com/watch?v=Y29gMuYymv8; and “Patrick Ewing Knicks Tribute,” YouTube video, posted May 12, 2006, https://www.youtube.com/watch?v=8T2l5Emzu-I.
34 “basketball as a matter of life or death”: S. L. Price, “Whatever Happened to the White Athlete?” Sports Illustrated, December 8, 1997.
34 an internet survey: This was a Google Consumer Survey I conducted on October 22, 2013. I asked, “Where would you guess that the majority of NBA players were born?” The two choices were “poor neighborhoods” and “middle-class neighborhoods”; 59.7 percent of respondents picked “poor neighborhoods.”
36 a black person’s first name is an indication of his socioeconomic background: Roland G. Fryer Jr. and Steven D. Levitt, “The Causes and Consequences of Distinctively Black Names,” Quarterly Journal of Economics 119, no. 3 (2004).
37 Among all African-Americans born in the 1980s: Centers for Disease Control and Prevention, “Health, United States, 2009,” Table 9, Nonmarital Childbearing, by Detailed Race and Hispanic Origin of Mother, and Maternal Age: United States, Selected Years 1970–2006.
37 Chris Bosh . . . Chris Paul: “Not Just a Typical Jock: Miami Heat Forward Chris Bosh’s Interests Go Well Beyond Basketball,” PalmBeachPost.com, February 15, 2011, http://www.palmbeachpost.com/news/sports/basketball/not-just-a-typical-jock-miami-heat-forward-chris-b/nLp7Z/; Dave Walker, “Chris Paul’s Family to Co
mpete on ‘Family Feud,’ nola.com, October 31, 2011, http://www.nola.com/tv/index.ssf/2011/10/chris_pauls_family_to_compete.html.
38 four inches taller: “Why Are We Getting Taller as a Species?” Scientific American, http://www.scientificamerican.com/article/why-are-we-getting-taller/. Interestingly, Americans have stopped getting taller. Amanda Onion, “Why Have Americans Stopped Growing Taller?” ABC News, July 3, 2016, http://abcnews.go.com/Technology/story?id=98438&page=1. I have argued that one of the reasons there has been a huge increase in foreign-born NBA players is that other countries are catching up to the United States in height. The number of American-born seven-footers in the NBA increased sixteenfold from 1946 to 1980 as Americans grew. It has since leveled off, as Americans have stopped growing. Meanwhile, the number of seven-footers from other countries has risen substantially. The biggest increase in international players, I found, has been extremely tall men from countries, such as Turkey, Spain, and Greece, where there have been noticeable increases in childhood health and adult height in recent years.
38 Americans from poor backgrounds: Carmen R. Isasi et al., “Association of Childhood Economic Hardship with Adult Height and Adult Adiposity among Hispanics/Latinos: The HCHS/SOL Socio-Cultural Ancillary Study,” PloS One 11, no. 2 (2016); Jane E. Miller and Sanders Korenman, “Poverty and Children’s Nutritional Status in the United States,” American Journal of Epidemiology 140, no. 3 (1994); Harry J. Holzer, Diane Whitmore Schanzenbach, Greg J. Duncan, and Jens Ludwig, “The Economic Costs of Childhood Poverty in the United States,” Journal of Children and Poverty 14, no. 1 (2008).
38 the average American man is 5’9”: Cheryl D. Fryar, Qiuping Gu, and Cynthia L. Ogden, “Anthropometric Reference Data for Children and Adults: United States, 2007–2010,” Vital and Health Statistics Series 11, no. 252 (2012).