Clearly, many horrific searches never lead to horrible actions.
That said, it is at least theoretically possible that there are some classes of searches that suggest a reasonably high probability of a horrible follow-through. It is at least theoretically possible, for example, that data scientists could in the future build a model that could have found that Stoneham’s searches related to Donato were significant cause for concern.
In 2014, there were about 6,000 searches for the exact phrase “how to kill your girlfriend” and 400 murders of girlfriends. If all of these murderers had made this exact search beforehand, that would mean 1 in 15 people who searched “how to kill your girlfriend” went through with it. Of course, many, probably most, people who murdered their girlfriends did not make this exact search. This would mean the true probability that this particular search led to murder is lower, probably a lot lower.
But if data scientists could build a model that showed that the threat against a particular individual was, say, 1 in 100, we might want to do something with that information. At the least, the person under threat might have the right to be informed that there is a 1-in-100 chance she will be murdered by a particular person.
Overall, however, we have to be very cautious using search data to predict crimes at an individual level. The data clearly tells us that there are many, many horrifying searches that rarely lead to horrible actions. And there has been, as of yet, no proof that the government can predict a particular horrible action, with high probability, just from examining these searches. So we have to be really cautious about allowing the government to intervene at the individual level based on search data. This is not just for ethical or legal reasons. It’s also, at least for now, for data science reasons.
CONCLUSION
HOW MANY PEOPLE
FINISH BOOKS?
After signing my book contract, I had a clear vision of how the book should be structured. Near the start, you may recall, I described a scene at my family’s Thanksgiving table. My family members debated my sanity and tried to figure out why I, at thirty-three, couldn’t seem to find the right girl.
The conclusion to this book, then, practically wrote itself. I would meet and marry the girl. Better still, I would use Big Data to meet the right girl. Perhaps I could weave in tidbits from the courting process throughout. Then the story would all come together in the conclusion, which would describe my wedding day and double as a love letter to my new wife.
Unfortunately, life didn’t match my vision. Locking myself in my apartment and avoiding the world while writing a book probably didn’t help my romantic life. And I, alas, still need to find a wife. More important, I needed a new conclusion.
I pored over many of my favorite books in trying to find what makes a great conclusion. The best conclusions, I concluded, bring to the surface an important point that has been there all along, hovering just beneath the surface. For this book, that big point is this: social science is becoming a real science. And this new, real science is poised to improve our lives.
In the beginning of Part II, I discussed Karl Popper’s critique of Sigmund Freud. Popper, I noted, didn’t think that Freud’s wacky vision of the world was scientific. But I didn’t mention something about Popper’s critique. It was actually far broader than just an attack on Freud. Popper didn’t think any social scientist was particularly scientific. Popper was simply unimpressed with the rigor of what these so-called scientists were doing.
What motivated Popper’s crusade? When he interacted with the best intellectuals of his day—the best physicists, the best historians, the best psychologists—Popper noted a striking difference. When the physicists talked, Popper believed in what they were doing. Sure, they sometimes made mistakes. Sure, they sometimes were fooled by their subconscious biases. But physicists were engaged in a process that was clearly finding deep truths about the world, culminating in Einstein’s Theory of Relativity. When the world’s most famous social scientists talked, in contrast, Popper thought he was listening to a bunch of gobbledygook.
Popper is hardly the only person to have made this distinction. Just about everybody agrees that physicists, biologists, and chemists are real scientists. They utilize rigorous experiments to find how the physical world works. In contrast, many people think that economists, sociologists, and psychologists are soft scientists who throw around meaningless jargon so they can get tenure.
To the extent this was ever true, the Big Data revolution has changed that. If Karl Popper were alive today and attended a presentation by Raj Chetty, Jesse Shapiro, Esther Duflo, or (humor me) myself, I strongly suspect he would not have the same reaction he had back then. To be honest, he might be more likely to question whether today’s great string theorists are truly scientific or just engaging in self-indulgent mental gymnastics.
If a violent movie comes to a city, does crime go up or down? If more people are exposed to an ad, do more people use the product? If a baseball team wins when a boy is twenty, will he be more likely to root for them when he’s forty? These are all clear questions with clear yes-or-no answers. And in the mountains of honest data, we can find them.
This is the stuff of science, not pseudoscience.
This does not mean the social science revolution will come in the form of simple, timeless laws.
Marvin Minsky, the late MIT scientist and one of the first to study the possibility of artificial intelligence, suggested that psychology got off track by trying to copy physics. Physics had success finding simple laws that held in all times and all places.
Human brains, Minsky suggested, may not be subject to such laws. The brain, instead, is likely a complex system of hacks—one part correcting mistakes in other parts. The economy and political system may be similarly complex.
For this reason, the social science revolution is unlikely to come in the form of neat formulas, such as E = MC2. In fact, if someone is claiming a social science revolution based on a neat formula, you should be skeptical.
The revolution, instead, will come piecemeal, study by study, finding by finding. Slowly, we will get a better understanding of the complex systems of the human mind and society.
A proper conclusion sums up, but it also points the way to more things to come.
For this book, that’s easy. The datasets I have discussed herein are revolutionary, but they have barely been explored. There is so much more to be learned. Frankly, the overwhelming majority of academics have ignored the data explosion caused by the digital age. The world’s most famous sex researchers stick with the tried and true. They ask a few hundred subjects about their desires; they don’t ask sites like PornHub for their data. The world’s most famous linguists analyze individual texts; they largely ignore the patterns revealed in billions of books. The methodologies taught to graduate students in psychology, political science, and sociology have been, for the most part, untouched by the digital revolution. The broad, mostly unexplored terrain opened by the data explosion has been left to a small number of forward-thinking professors, rebellious grad students, and hobbyists.
That will change.
For every idea I have talked about in this book, there are a hundred ideas just as important ready to be tackled. The research discussed here is the tip of the tip of the iceberg, a scratch on the scratch of the surface.
So what else is coming?
For one, a radical expansion of the methodology that was used in one of the most successful public health studies of all time. In the mid-nineteenth century, John Snow, a British physician, was interested in what was causing a cholera outbreak in London.
His ingenious idea: he mapped every cholera case in the city. When he did this, he found the disease was largely clustered around one particular water pump. This suggested the disease spread through germ-infested water, disproving the then-conventional idea that it spread through bad air.
Big Data—and the zooming in that it allows—makes this type of study easy. For any disease, we can explore Google
search data or other digital health data. We can find if there are any tiny pockets of the world where prevalence of this disease is unusually high or unusually low. Then we can see what these places have in common. Is there something in the air? The water? The social norms?
We can do this for migraines. We can do this for kidney stones. We can do this for anxiety and depression and Alzheimer’s and pancreatic cancer and high blood pressure and back pain and constipation and nosebleeds. We can do this for everything. The analysis that Snow did once, we might be able to do four hundred times (something as of this writing I am already starting to work on).
We might call this—taking a simple method and utilizing Big Data to perform an analysis several hundred times in a short period of time—science at scale. Yes, the social and behavioral sciences are most definitely going to scale. Zooming in on health conditions will help these sciences scale. Another thing that will help them scale: A/B testing. We discussed A/B testing in the context of businesses getting users to click on headlines and ads—and this has been the predominant use of the methodology. But A/B testing can be used to uncover things more fundamental—and socially valuable—than an arrow that gets people to click on an ad.
Benjamin F. Jones is an economist at Northwestern who is trying to use A/B testing to better help kids learn. He has helped create a platform, EDU STAR, which allows for schools to randomly test different lesson plans.
Many companies are in the education software business. With EDU STAR, students log in to a computer and are randomly exposed to different lesson plans. Then they take short tests to see how well they learned the material. Schools, in other words, learn what software works best for helping students grasp material.
Already, like all great A/B testing platforms, EDU STAR is yielding surprising results. One lesson plan that many educators were very excited about included software that utilized games to help teach students fractions. Certainly, if you turned math into a game, students would have more fun, learn more, and do better on tests. Right? Wrong. Students who were taught fractions via a game tested worse than those who learned fractions in a more standard way.
Getting kids to learn more is an exciting, and socially beneficial, use of the testing that Silicon Valley pioneered to get people to click on more ads. So is getting people to sleep more.
The average American gets 6.7 hours of sleep every night. Most Americans want to sleep more. But 11 P.M. rolls around, and SportsCenter is on or YouTube is calling. So shut-eye waits. Jawbone, a wearable-device company with hundreds of thousands of customers, performs thousands of tests to try to find interventions that help get their users to do what they want to do: go to bed earlier.
Jawbone scored a huge win using a two-pronged goal. First, ask customers to commit to a not-that-ambitious goal. Send them a message like this: “It looks like you haven’t been sleeping much in the last 3 days. Why don’t you aim to get to bed by 11:30 tonight? We know you normally get up at 8 A.M.” Then the users will have an option to click on “I’m in.”
Second, when 10:30 comes, Jawbone will send another message: “We decided you’d aim to sleep at 11:30. It’s 10:30 now. Why not start now?”
Jawbone found this strategy led to twenty-three minutes of extra sleep. They didn’t get customers to actually get to bed at 10:30, but they did get them to bed earlier.
Of course, every part of this strategy had to be optimized through lots of experimentation. Start the original goal too early—ask users to commit to going to bed by 11 P.M.—and few will play along. Ask users to go to bed by midnight and little will be gained.
Jawbone used A/B testing to find the sleep equivalent of Google’s right-pointing arrow. But instead of getting a few more clicks for Google’s ad partners, it yields a few more minutes of rest for exhausted Americans.
In fact, the whole field of psychology might utilize the tools of Silicon Valley to dramatically improve their research. I’m eagerly anticipating the first psychology paper that, instead of detailing a couple of experiments done with a few undergrads, shows the results of a thousand rapid A/B tests.
The days of academics devoting months to recruiting a small number of undergrads to perform a single test will come to an end. Instead, academics will utilize digital data to test a few hundred or a few thousand ideas in just a few seconds. We’ll be able to learn a lot more in a lot less time.
Text as data is going to teach us a lot more. How do ideas spread? How do new words form? How do words disappear? How do jokes form? Why are certain words funny and others not? How do dialects develop? I bet, within twenty years, we will have profound insights on all these questions.
I think we might consider utilizing kids’ online behavior—appropriately anonymized—as a supplement to traditional tests to see how they are learning and developing. How is their spelling? Are they showing signs of dyslexia? Are they developing mature, intellectual interests? Do they have friends? There are clues to all these questions in the thousands of keystrokes every child makes every day.
And there is another, not-trivial area, where plenty more insights are coming.
In the song “Shattered,” by the Rolling Stones, Mick Jagger describes all that makes New York City, the Big Apple, so magical. Laughter. Joy. Loneliness. Rats. Bedbugs. Pride. Greed. People dressed in paper bags. But Jagger devotes the most words for what makes the city truly special: “sex and sex and sex and sex.”
As with the Big Apple, so with Big Data. Thanks to the digital revolution, insights are coming in health. Sleep. Learning. Psychology. Language. Plus, sex and sex and sex and sex.
One question I am currently exploring: how many dimensions of sexuality are there? We usually think of someone as gay or straight. But sexuality is clearly more complex than that. Among gay people and straight people, people have types—some men like “blondes,” others “brunettes,” for instance. Might these preferences be as strong as the preferences for gender? Another question I am looking into: where do sexual preferences come from? Just as we can figure out the key years that determine baseball fandom or political views, we can now find the key years that determine adult sexual preferences. To learn these answers, you will have to buy my next book, tentatively titled Everybody (Still) Lies.
The existence of porn—and the data that comes with it—is a revolutionary development in the science of human sexuality.
It took time for the natural sciences to begin changing our lives—to create penicillin, satellites, and computers. It may take time before Big Data leads the social and behavioral sciences to important advances in the way we love, learn, and live. But I believe such advances are coming. I hope you see at least the outlines of such developments from this book. I hope, in fact, that some of you reading this book help create such advances.
To properly write a conclusion, an author should think about why he wrote the book in the first place. What goal is he trying to achieve?
I think the largest reason I wrote this book is as a result of one of the most formative experiences of my life. You see, a little more than a decade ago, the book Freakonomics came out. The surprise bestseller described the research of Steven Levitt, an award-winning economist at the University of Chicago mentioned frequently in this book. Levitt was a “rogue economist” who seemed to be able to use data to answer any question his quirky mind could think to ask: Do sumo wrestlers cheat? Do contestants on game shows discriminate? Do real estate agents get you the same deals they get for themselves?
I was just out of college, having majored in philosophy, with little idea what I wanted to do with my life. After reading Freakonomics, I knew. I wanted to do what Steven Levitt did. I wanted to pore through mountains of data to find out how the world really worked. I would follow him, I decided, and get a Ph.D. in economics.
So much has changed in the intervening twelve years. A couple of Levitt’s studies were found to have coding errors. Levitt said some politically incorrect things about global warming. Freakonomics has gone out of favor in
intellectual circles.
But I think, a few mistakes aside, the years have been kind to the larger point Levitt was trying to make. Levitt was telling us that a combination of curiosity, creativity, and data could dramatically improve our understanding of the world. There were stories hidden in data that were ready to be told and this has been proven right over and over again.
And I hope this book might have the same effect on others that Freakonomics had on me. I hope there is some young person reading this right now who is a bit confused on what she wants to do with her life. If you have a bit of statistical skill, an abundance of creativity, and curiosity, enter the data analysis business.
This book, in fact, and if I can be so bold, may be seen as next-level Freakonomics. A major difference between the studies discussed in Freakonomics and those discussed in this book is the ambition. In the 1990s, when Levitt made his name, there wasn’t that much data available. Levitt prided himself on going after quirky questions, where data did exist. He largely ignored big questions where the data did not exist. Today, however, with so much data available on just about every topic, it makes sense to go after big, profound questions that get to the core of what it means to be a human being.
The future of data analysis is bright. The next Kinsey, I strongly suspect, will be a data scientist. The next Foucault will be a data scientist. The next Freud will be a data scientist. The next Marx will be a data scientist. The next Salk might very well be a data scientist.
Anyway, those were my attempts to do some of the things that a proper conclusion does. But great conclusions, I came to realize, do a lot more. So much more. A great conclusion must be ironic. It must be moving. A great conclusion must be profound and playful. It must be deep, humorous, and sad. A great conclusion must, in one sentence or two, make a point that sums up everything that has come before, everything that is coming. It must do so with a unique, novel point—a twist. A great book must end on a smart, funny, provocative bang.
Everybody Lies Page 22