Stern, Howard, 157
stock market
data for, 55–56
and examples of Big Data searches, 22
Summers-Stephens-Davidowitz attempt to predict the, 245–48, 251–52
Stone, Oliver, 185
Stoneham, James, 266, 269
Storegard, Adam, 99–101
stories
categories/types of, 91–92
viral, 22, 92
and zooming in, 205–6
See also specific story
Stormfront (website), 7, 14, 18, 137–40
stretch marks, and pregnancy, 188–89
Stuyvesant High School (New York City), 231–37, 238, 240
suburban areas, and origins of notable Americans, 183–84
successful/notable Americans
factors that drive, 185–86
zooming in on, 180–86
suffering, and benefits of digital truth serum, 161
suicide, and danger of empowered government, 266, 267–68
Summers, Lawrence
and Obama-racism study, 243–44
and predicting the stock market, 245, 246, 251–52
Stephens-Davidowitz’s meeting with, 243–45
Sunstein, Cass, 140
Super Bowl games, advertising during, 221–25, 239
Super Crunchers (Gnau), 264
Supreme Court, and abortion, 147
Surowiecki, James, 203
surveys
in-person, 108
internet, 108
and lying, 105–7, 108, 108n
and pictures as data, 97
skepticism about, 171
telephone, 108
and truth about sex, 113, 116
and zooming in on hours and minutes, 193
See also specific survey or topic
Syrian refugees, 131
Taleb, Nassim, 17
Tartt, Donna, 283
TaskRabbit, 212
taxes
cheating on, 22, 178–80, 206
and examples of Big Data searches, 22
and lying, 180
and self-employed people, 178–80
and words as data, 93–95
zooming in on, 172–73, 178–80, 206
teachers, using tests to judge, 253–54
teenagers
adopted, 108n
as gay, 114, 116
lying by, 108n
and origins of political preferences, 169
and truth about sex, 114, 116
See also children
television
and A/B testing, 222
advertising on, 221–26
Terabyte, 264
terrorism, 18, 129–31
tests/testing
of high school students, 231–37, 253–54
and judging teacher, 253–54
and obsessive infatuations with numbers, 253–54
online behavior as supplement to, 278
and small data, 255–56
See also specific test or study
Thiel, Peter, 155
Think Progress (website), 130
Thinking, Fast and Slow (Kahneman), 283
Thome, Jim, 200
Tourangeau, Roger, 107, 108
towns, zooming in on, 172–90
Toy Story (movie), 192
Trump, Donald
elections of 2012 and, 7
and ignoring what people tell you, 157
and immigration, 184
issues propagated by, 7
and origins of notable Americans, 184
polls about, 1
predictions about, 11–14
and racism, 8, 9, 11, 12, 14, 133, 139
See also elections, 2016
truth
benefits of knowing, 158–63
handling the, 158–63
See also digital truth serum; lying; specific topic
Tuskegee University, 183
Twentieth Century Fox, 221–22
Twitter, 151–52, 160–61n, 201–3
typing errors by searchers, 48–50
The Unbearable Lightness of Being (Kundera), 233
Uncharted (Aiden and Michel), 78–79
unemployment
and child abuse, 145–47
data about, 56–57, 58–59
unintended consequences, 197
United States
and Civil War, 79
as united or divided, 78–79
University of California, Berkeley, racism in 2008 election study at, 2
University of Maryland, survey of graduates of, 106–7
urban areas
and life expectancy, 177
and origins of notable Americans, 183–84, 186
vagina, smells of, 19, 126–27, 161
Varian, Hal, 57–58, 224
Vikingmaiden88, 136–37, 140–41, 145
violence
and real science, 273
zooming in on, 190–97
See also murder
voter registration, 106
voter turnout, 9–10, 109–10
voting behavior, and lying, 106, 107, 109–10
Vox, 202
Walmart, 71–72
Washington Post, and words as data, 75, 94
Washington Times, and words as data, 75, 94–95
wealth
and life expectancy, 176–77
See also income distribution
weather, and predictions about wine, 73–74
Weil, David N., 99–101
Weiner, Anthony, 234n
white nationalism, 137–40, 145. See also Stormfront
Whitepride26, 139
Wikipedia, 14, 180–86
wine, predictions about, 72–74
wives
and descriptions of husbands, 160–61, 160–61n
and suspicions about gayness of husbands, 116–17
women
breasts of, 125, 126
butt of, 125–26
genitals of, 126–27
violence against, 121–22
See also girls; wives; specific topic
words
and bias, 74–76, 93–97
and categories/types of stories, 91–92
as data, 74–97
and dating, 80–86
and digital revolution, 278
and digitalization of books, 77, 79
and gay marriage, 74–76
and sentiment analysis, 87–92
and U.S. as united or divided, 78–79
workers’ rights, 93, 94
World Bank, 102
World of Warcraft (game), 220
Wrenn, Doug, 39–40, 41
Yahoo News, 140, 143
yearbooks, high school, 98–99
Yelp, 265
Yilmaz, Ahmed (alias), 231–33, 234, 234n
YouTube, 152
Zayat, Ahmed, 63–64, 65
Zero to One (Thiel), 155
zooming in
on baseball, 165–69, 165–66n, 171, 197–200, 200n, 203, 206, 239
benefits of, 205–6
on counties, cities, and towns, 172–90, 239–40
and data size, 171, 172–73
on doppelgangers, 197–205
on equality of opportunity, 173–75
on gambling, 263–65
on health, 203–5, 275
on income distribution, 174–76, 185
and influence of childhood experiences, 165–71, 165–66n, 206
on life expectancy, 176–78
on minutes and hours, 190–97
and natural experiments, 239–40
and origin of political preferences, 169–71
on pregnancy, 187–90
stories from, 205–6
on successful/notable Americans, 180–86
on taxes, 172–73, 178–80, 206
Zuckerberg, Mark, 154–56, 157, 158, 238–39
ABOUT THE AUTHOR
Seth Stephens-Davidowitz is a New York Times op-ed contributor, a visit
ing lecturer at The Wharton School, and a former Google data scientist. He received a BA in philosophy from Stanford, where he graduated Phi Beta Kappa, and a PhD in economics from Harvard. His research—which uses new, big data sources to uncover hidden behaviors and attitudes—has appeared in the Journal of Public Economics and other prestigious publications. He lives in New York City.
Discover Great Authors, Exclusive Offers, and more at hc.com.
COPYRIGHT
EVERYBODY LIES. Copyright © 2017 by Seth Stephens-Davidowitz. Copyright © 2017 by Seth Stephens-Davidowitz. All rights reserved under International and Pan-American Copyright Conventions. By payment of the required fees, you have been granted the nonexclusive, nontransferable right to access and read the text of this e-book on-screen. No part of this text may be reproduced, transmitted, downloaded, decompiled, reverse-engineered, or stored in or introduced into any information storage and retrieval system, in any form or by any means, whether electronic or mechanical, now known or hereafter invented, without the express written permission of HarperCollins e-books.
FIRST EDITION
Cover design by Lisa Amoroso
Cover photograph of elephant/zebra © Visuals Unlimited, Inc./Victor Habbick
Other zebras © Shutterstock/Aaron Amat
ISBN 978-0-06-239085-1
EPub Edition May 2017 ISBN 9780062390875
ABOUT THE PUBLISHER
Australia
HarperCollins Publishers (Australia) Pty. Ltd.
Level 13, 201 Elizabeth Street
Sydney, NSW 2000, Australia
www.harpercollins.com.au
Canada
HarperCollins Canada
2 Bloor Street East - 20th Floor
Toronto, ON M4W 1A8, Canada
www.harpercollins.ca
New Zealand
HarperCollins Publishers New Zealand
Unit D1, 63 Apollo Drive
Rosedale 0632
Auckland, New Zealand
www.harpercollins.co.nz
United Kingdom
HarperCollins Publishers Ltd.
1 London Bridge Street
London SE1 9GF, UK
www.harpercollins.co.uk
United States
HarperCollins Publishers Inc.
195 Broadway
New York, NY 10007
www.harpercollins.com
* Google Trends has been a source of much of my data. However, since it only allows you to compare the relative frequency of different searches but does not report the absolute number of any particular search, I have usually supplemented it with Google AdWords, which reports exactly how frequently every search is made. In most cases I have also been able to sharpen the picture with the help of my own Trends-based algorithm, which I describe in my dissertation, “Essays Using Google Data,” and in my Journal of Public Economics paper, “The Cost of Racial Animus on a Black Candidate: Evidence Using Google Search Data.” The dissertation, a link to the paper, and a complete explanation of the data and code used in all the original research presented in this book are available on my website, sethsd.com.
* Full disclosure: Shortly after I completed this study, I moved from California to New York. Using data to learn what you should do is often easy. Actually doing it is tough.
* While the initial version of Google Flu had significant flaws, researchers have recently recalibrated the model, with more success.
* In 1998, if you searched “cars” on a popular pre-Google search engine, you were inundated with porn sites. These porn sites had written the word “cars” frequently in white letters on a white background to trick the search engine. They then got a few extra clicks from people who meant to buy a car but got distracted by the porn.
* One theory I am working on: Big Data just confirms everything the late Leonard Cohen ever said. For example, Leonard Cohen once gave his nephew the following advice for wooing women: “Listen well. Then listen some more. And when you think you are done listening, listen some more.” That seems to be roughly similar to what these scientists found.
* Another reason for lying is simply to mess with surveys. This is a huge problem for any research regarding teenagers, fundamentally complicating our ability to understand this age group. Researchers originally found a correlation between a teenager’s being adopted and a variety of negative behaviors, such as using drugs, drinking alcohol, and skipping school. In subsequent research, they found this correlation was entirely explained by the 19 percent of self-reported adopted teenagers who weren’t actually adopted. Follow-up research has found that a meaningful percent of teenagers tell surveys they are more than seven feet tall, weigh more than four hundred pounds, or have three children. One survey found 99 percent of students who reported having an artificial limb to academic researchers were kidding.
* Some may find it offensive that I associate a male preference for Judy Garland with a preference for having sex with men, even in jest. And I certainly don’t mean to imply that all—or even most—gay men have a fascination with divas. But search data demonstrates that there is something to the stereotype. I estimate that a man who searches for information about Judy Garland is three times more likely to search for gay porn than straight porn. Some stereotypes, Big Data tells us, are true.
* I think this data also has implications for one’s optimal dating strategy. Clearly, one should put oneself out there, get rejected a lot, and not take rejection personally. This process will allow you, eventually, to find the mate who is most attracted to someone like you. Again, no matter what you look like, these people exist. Trust me.
* I wanted to call this book How Big Is My Penis? What Google Searches Teach Us About Human Nature, but my editor warned me that would be a tough sell, that people might be too embarrassed to buy a book with that title in an airport bookstore. Do you agree?
* To further test the hypothesis that parents treat kids of different genders differently, I am working on obtaining data from parenting websites. This would include a much larger number of parents than those who make these particular, specific searches.
* I analyzed Twitter data. I thank Emma Pierson for help downloading this. I did not include descriptors of what one’s husband is doing right now, which are prevalent on social media but wouldn’t really make sense on search. Even these descriptions tilt toward the favorable. The top ways to describe what a husband is doing right now on social media are “working” and “cooking.”
* Full disclosure: When I was fact-checking this book, Noah denied that his hatred of America’s pastime is a key part of his personality. He does admit to hating baseball, but he believes his kindness, love of children, and intelligence are the core elements of his personality—and that his attitudes about baseball would not even make the top ten. However, I concluded that it’s sometimes hard to see one’s own identity objectively and, as an outside observer, I am able to see that hating baseball is indeed fundamental to who Noah is, whether he’s able to recognize it or not. So I left it in.
* This story shows how things that seem bad may be good if they prevent something worse. Ed McCaffrey, a Stanford-educated former wide receiver, uses this argument to justify letting all four of his sons play football: “These guys have energy. And, so, if they’re not playing football, they’re skateboarding, they’re climbing trees, they’re playing tag in the backyard, they’re doing paintball. I mean, they’re not going to sit there and do nothing. And, so, the way I look at it is, hey, at least there’s rules within the sport of football. . . . My kids have been to the emergency room for falling off decks, getting in bike crashes, skateboarding, falling out of trees. I mean, you name it . . . Yea, it’s a violent collision sport. But, also, my guys just have the personality, where, at least they’re not squirrel-jumping off mountains and doing crazy stuff like that. So, it’s organized aggression, I guess.” McCaffrey’s argument, made in an interview on The Herd with Colin Cowherd, is one I had never heard before. After reading the Dahl/DellaVigna paper, I ta
ke the argument seriously. An advantage of huge real-world datasets, rather than laboratory data, is that they can pick up these kinds of effects.
* You can probably tell by this part of the book I tend to be cynical about good stories. I wanted one feel-good story in here, so I am leaving my cynicism to a footnote. I suspect PECOTA just found out that Ortiz was a steroid user who stopped using steroids and would start using them again. From the standpoint of prediction, it is actually pretty cool if PECOTA was able to detect that—but it makes it a less moving story.
* A famous 1978 paper that claimed that winning the lottery does not make you happy has largely been debunked.
* I have changed his name and a few details.
* In looking for people like Yilmaz who scored near the cutoff, I was blown away by the number of people—in their twenties through their fifties—who remember this test-taking experience from their early teens and speak about missing a cutoff in dramatic terms. This includes former congressman and New York City mayoral candidate Anthony Weiner, who says he missed Stuy by a single point. “They didn’t want me,” he told me, in a phone interview.
* Since everybody lies, you should question much of this story. Maybe I’m not an obsessive worker. Maybe I didn’t work extraordinarily hard on this book. Maybe I, like lots of people, can exaggerate just how much I work. Maybe my thirteen months of “hard work” included full months in which I did no work at all. Maybe I didn’t live as a hermit. Maybe, if you checked my Facebook profile, you’d see pictures of me out with friends during this supposed hermit period. Or maybe I was a hermit, but it was not self-imposed. Maybe I spent many nights alone, unable to work, hoping in vain that someone would contact me. Maybe nobody e-vites me to anything. Maybe nobody messages me on Bumble. Everybody lies. Every narrator is unreliable.
Everybody Lies Page 29