Although many users tend to distinguish between what happens online from what happens IRL (in real life), the data that is generated from their use of social media—from posting reactions to the season finale of a show to liking photos from Saturday night out—is generated from life outside the Internet. In other words, Facebook data is IRL data. And it is only increasing as people live their lives more and more on their phones and on the Internet. This means that, for an analyst, there’s often no need to ask questions: You simply create algorithms that find discrete patterns in a user’s naturally occurring data. And once you do that, the system itself can reveal patterns in the data that you otherwise would have never noticed.
Facebook users curate themselves all in one place in a single data form. We don’t need to connect a million data sets; we don’t have to do complicated math to fill in missing data. The information is already in place, because everyone serves up their real-time autobiography, right there on the site. If you were creating a system from scratch to watch and study people, you couldn’t do much better than Facebook.
In fact, a 2015 study by Youyou, Kosinski, and Stillwell showed that, using Facebook likes, a computer model reigned supreme in predicting human behavior. With ten likes, the model predicted a person’s behavior more accurately than one of their co-workers. With 150 likes, better than a family member. And with 300 likes, the model knew the person better than their own spouse. This is in part because friends, colleagues, spouses, and parents typically see only part of your life, where your behavior is moderated by the context of that relationship. Your parents may never see how wild you can get at a 3 A.M. rave after dropping two hits of MDMA, and your friends may never see how reserved and deferential you are in the office with your boss. They all have slightly different impressions of who you are. But Facebook peers into your relationships, follows you around in your phone, and tracks what you click and buy on the Internet. This is how data from the site becomes more reflective of who you “really are” than the judgments of friends or family. In some respects, a computer model can know a person’s habits better than they even know themselves—a finding that compelled the researchers to add a warning. “Computers outpacing humans in personality judgment,” they wrote, “presents significant opportunities and challenges in the areas of psychological assessment, marketing, and privacy.”
With access to enough Facebook data, it would finally be possible to take the first stab at simulating society in silico. The implications were astonishing: You could, in theory, simulate a future society to create problems like ethnic tension or wealth disparity and watch how they play out. You could then backtrack and change inputs, to figure out how to mitigate those problems. In other words, you could actually start to model solutions to real-world issues, but inside a computer. For me, this whole idea of society as a game was super epic. I was obsessed with the idea of the institute that Kogan suggested to me, and became extremely eager to somehow make it happen. And it wasn’t just our pet obsession; professors all over were getting just as enthused. After meetings at Harvard, Kogan emailed me about their feedback, saying, “The operative term is game changing and revolutionizing social science.” And at first, it seemed like Stillwell and Kosinski were excited, too. Then Kogan let slip to them that CA had a budget of $20 million. And all the academic camaraderie ground immediately to a halt.
Kosinski sent Kogan an email saying they wanted half a million dollars up front, plus 50 percent of all “royalties” for the use of their Facebook data. We had not even proven this could work at scale in a field trial yet, and they were already demanding huge amounts of money. Nix told me to refuse, and this made Kogan panic that the project was going to fall apart before it even began. So the day after we rejected Kosinski’s demand for cash, Kogan said he could do it on his own, on his original terms—he would help us get the data, CA would pay for it at cost, and he would get to use it for his research. Kogan said he had access to more apps that had the same friends-collection permission from Facebook and that he could use those apps. I was immediately wary, thinking that Kogan might have just been planning to use Stillwell and Kosinski’s app under the radar. But Kogan insisted to me that he’d built his own. “Okay,” I said. “Prove it. Give me a dump of data.” To make sure these were not just pulled from the other app, we gave Kogan $10,000 to pilot his new app with a new data set. He agreed and did not ask for any money for himself, so long as he could keep a copy of the data.
Although he never told me this at the time, Kosinski has since said that he intended to give the money from the licensing of the Facebook data to the University of Cambridge. However, the University of Cambridge also strongly denies that it was involved with any Facebook data projects, so it is unclear that the university was aware of this potential financial arrangement, or would have accepted the funds if offered.
The following week, Kogan sent SCL tens of thousands of Facebook profiles, and we did some tests to make sure the data was as valuable as we’d hoped. It was even better. It contained complete profiles of tens of thousands of users—name, gender, age, location, status updates, likes, friends—everything. Kogan said his Facebook app could even pull private messages. “Okay,” I told him. “Let’s go.”
* * *
—
WHEN I STARTED WORKING with Kogan, we were eager to set up an institute that would warehouse the Facebook, clickstream, and consumer data we were collecting for use by psychologists, anthropologists, sociologists, data scientists—any academics who were interested. Much to the delight of my fashion professors at UAL, Kogan even let me add several clothing-style and aesthetic items that I could test for my Ph.D. research. We planned to go to different universities around the world, continuing to build up the data set so we could then start modeling things in the social sciences. After some professors at Harvard Medical School suggested we could access millions of their patient genetic profiles, even I was surprised at how this idea was evolving. Imagine the power, Kogan told me, of a database that linked up a person’s live digital behavior with a database of their genes. Kogan was excited—with genetic data, we could run powerful experiments unpacking the nature-vs.-nurture debate. We knew we were on the cusp of something big.
We got our first batch of data through a micro-task site called Amazon MTurk. Originally, Amazon built MTurk as an internal tool to support an image-recognition project. Because the company needed to train algorithms to recognize photographs, the first step was to have humans label them manually, so the AI would have a set of correctly identified photos to learn from. Amazon offered to pay a penny for each label, and thousands of people signed up to do the work.
Seeing a business opportunity, Amazon spun out MTurk as a product in 2005, calling it “artificial artificial intelligence.” Now other companies could pay to access people who, in their spare time, were willing to do micro-tasks—such as typing in scans of receipts or identifying photographs—for small amounts of money. It was humans doing the work of machines, and even the name MTurk played on this. MTurk was short for “Mechanical Turk,” an eighteenth-century chess-playing “machine” that had amazed crowds but was actually a small man hiding in a box, manipulating the chess pieces through specially constructed levers.
Psychologists and university researchers soon discovered that MTurk was a great way to leverage large numbers of people to fill out personality tests. Rather than have to scrounge for undergraduates willing to take surveys, which never gave a truly representative sample anyway, researchers could draw all kinds of people from all over the world. They would invite MTurk members to take a one-minute test, paying them a small fee to do so. At the end of the session, there would be a payment code, which the person could input on their Amazon page, and Amazon would transfer payment into the person’s account.
Kogan’s app worked in concert with MTurk: A person would agree to take a test in exchange for a small payment. But in order to get paid, they would hav
e to download Kogan’s app on Facebook and input a special code. The app, in turn, would take all the responses from the survey and put those into one table. It would then pull all of the user’s Facebook data and put it into a second table. And then it would pull all the data for all the person’s Facebook friends and put that into another table.
Users would fill out a wide battery of psychometric inventories, but it always started with a peer-reviewed and internationally validated personality measure called the IPIP NEO-PI, which presented hundreds of items, like “I keep others at a distance,” “I enjoy hearing new ideas,” and “I act without thinking.” When these responses were combined with Facebook likes, reliable inferences could then be made. For example, extroverts were more likely to like electronic music and people scoring higher in openness were more likely to like fantasy films, whereas more neurotic people would like pages such as “I hate it when my parents look at my phone.” But it wasn’t simply personality traits we could infer. Perhaps not surprisingly, American men on Facebook who liked Britney Spears, MAC Cosmetics, or Lady Gaga were slightly more likely to be gay. Although each like taken in isolation was almost always too weak to predict anything on its own, when those likes were combined with hundreds of other likes, as well as other voter and consumer data, then powerful predictions could be made. Once the profiling algorithm was trained and validated, it would then be turned onto the database of Facebook friends. Although we did not have surveys for the friend profiles, we had access to their likes page, which meant that the algorithm could ingest the data and infer how they likely would have responded to each question if they had taken a survey.
As the project grew over the summer, more constructs were explored, and Kogan’s suggestions began to match exactly what Bannon wanted. Kogan outlined that we should begin examining people’s life satisfaction, fair-mindedness (fair or suspicious of others), and a construct called “sensational and extreme interests,” which has been used increasingly in forensic psychology to understand deviant behavior. This included “militarism” (guns and shooting, martial arts, crossbows, knives), “violent occultism” (drugs, black magic, paganism), “intellectual activities” (singing and making music, foreign travel, the environment), “occult credulousness” (the paranormal, flying saucers), and “wholesome interests” (camping, gardening, hiking). My personal favorite was a five-point scale for “belief in star signs,” which several of the gays in the office joked we should spin off into an “astrological compatibility” feature and link it to the gay dating app Grindr.
Using Kogan’s app, we would not only get a training set that gave us the ability to create a really good algorithm—because the data was so rich, dense, and meaningful—but we also got the extra benefit of hundreds of additional friend profiles. All for $1 to $2 per app install. We finished the first round of harvesting with money left over. In management, they always say there is a golden rule for running any project: You can get a project done cheap, fast, or well. But the catch is you can choose only two, because you’ll never get all three. For the first time in my life, I saw that rule totally broken—because the Facebook app Kogan created was faster, better, and cheaper than anything I could have imagined.
* * *
—
THE LAUNCH WAS PLANNED for June 2014. I remember it was hot: Even though the summer was coming, Nix kept the air-conditioning off to lower the office bills. We had spent several weeks calibrating everything, making sure the app worked, that it would pull in the right data, and that everything matched when it injected the data into the internal databases. One person’s response would, on average, produce the records of three hundred other people. Each of those people would have, say, a couple hundred likes that we could analyze. We needed to organize and track all of those likes. How many possible items, photos, links, and pages are there to like across all of Facebook? Trillions. A Facebook page for some random band in Oklahoma, for example, might have twenty-eight likes in the whole country, but it still counts as its own like in the feature set. A lot of things can go wrong with a project of such size and complexity, so we spent a lot of time testing the best way to process the data set for when it scaled. Once we were confident that everything worked, it was time to launch the project. We put $100,000 into the account to start recruiting people via MTurk, then waited.
We were standing by the computer, and Kogan was in Cambridge. Kogan launched the app, and someone said, “Yay.” With that, we were live.
At first, it was the most anticlimactic project launch in history. Nothing happened. Five, ten, fifteen minutes went by, and people started shuffling around in anticipation. “What the fuck is this?” Nix barked. “Why are we standing here?” But I knew that it would take a bit of time for people to see the survey on MTurk, fill it out, then install the app to get paid. Not long after Nix started complaining, we saw our first hit.
Then the flood came. We got our first record, then two, then twenty, then a hundred, then a thousand—all within seconds. Jucikas added a random beeping sound to a record counter, mostly because he knew Nix had a thing for stupid sound effects, and he found it amusing how easy it was to impress Nix with gimmicky tech clichés. Jucikas’s computer started going boop-boop-boop as the numbers went insane. The increments of zeroes just kept building, growing the tables at exponential rates as friend profiles were added to the database. This was exciting for everyone, but for the data scientists among us, it was like an injection of pure adrenaline.
Jucikas, our suave chief technology officer, grabbed a bottle of champagne. He was always full of bonhomie, the life of the party, and he made sure we had a case of champagne in the office at all times for just such occasions. He had grown up extremely poor on a farm in the waning days of the Lithuanian SSR, and over the years he had remade himself into a Cambridge elite, a dandy whose motto seemed to be live it up today, because tomorrow you might die. With Jucikas, everything was extreme and over the top. That’s why he’d bought for the office an antique saber from the Napoleonic Wars, which he now intended to use. Why open champagne the normal way when you can use a saber?
He grabbed a bottle of Perrier-Jouët Belle Epoque (his favorite), loosened the cage holding the cork, held the bottle at an angle, and elegantly swiped the saber down the side. The entire top snapped clean off, and champagne gushed out. We filled the flutes and toasted our success, enjoying the first of many bottles we would drink that night. Jucikas went on to explain that sabering champagne is not about brute force; it’s about studying the bottle and hitting the weakest spot with graceful precision. Done correctly, this requires very little pressure—you essentially let the bottle break itself. You hack the bottle’s design flaw.
* * *
—
WHEN MERCER FIRST MADE the investment, we assumed we had a couple of years to get the project fully running. But Bannon shot that notion down right away. “Have it ready by September,” he said. When I suggested that was too quick, he said, “I don’t care. We just gave you millions, and that’s your deadline. Figure it out.” The 2014 midterms were coming, and he wanted what he now started referring to as Project Ripon—named after the small town in Wisconsin where the Republican Party was formed—to be up and running. Many of us rolled our eyes at Bannon, who started to get weirder and weirder after the investment. But we thought we just had to placate his niche political obsessions to achieve our potential at creating something revolutionary in science. The ends would justify the means, we kept telling ourselves.
He started traveling to London more frequently, to check on our progress. One of those visits happened to be not long after we launched the app. We all went into the boardroom again, with the giant screen at the front of the room. Jucikas made a brief presentation before turning to Bannon.
“Give me a name.”
Bannon looked bemused and gave a name.
“Okay. Now give me a state.”
“I don’t know,” he sai
d. “Nebraska.”
Jucikas typed in a query, and a list of links popped up. He clicked on one of the many people who went by that name in Nebraska—and there was everything about her, right up on the screen. Here’s her photo, here’s where she works, here’s her house. Here are her kids, this is where they go to school, this is the car she drives. She voted for Mitt Romney in 2012, she loves Katy Perry, she drives an Audi, she’s a bit basic…and on and on and on. We knew everything about her—and for many records, the information was updated in real time, so if she posted to Facebook, we could see it happening.
And not only did we have all her Facebook data, but we were merging it with all the commercial and state bureau data we’d bought as well. And imputations made from the U.S. Census. We had data about her mortgage applications, we knew how much money she made, whether she owned a gun. We had information from her airline mileage programs, so we knew how often she flew. We could see if she was married (she wasn’t). We had a sense of her physical health. And we had a satellite photo of her house, easily obtained from Google Earth. We had re-created her life in our computer. She had no idea.
“Give me another,” said Jucikas. And he did it again. And again. And by the third profile, Nix—who’d hardly been paying attention at all—suddenly sat up very straight.
Mindfuck Page 13