The Numerati

Page 9

by Stephen Baker

As far as I can figure, unless the political data miners tie me to my wife’s donations (she has a different last name but the same address), I blend into a large crowd of voters. That means they’ll have to dig around for proxies. Which other ones tell the story? The traditional route is to focus on neighborhoods, ethnicity, income level, gender, and church attendance—all of the items that remained unchanged in my parents’ case and failed to signal their political sea change. I return to Yankelovich’s Smith. Which bits of data are most telling, I ask, these age-old variables or the far more exciting stuff about cats and cooking magazines? It doesn’t have to be a choice, he tells me. He starts with the traditional groupings and then uses the newer data to highlight those of us who don’t fit the mold. Of all the details in my file, the one Smith would seize on first is that I live in Democratic Essex County, New Jersey. Counties across the country are blue or red, he says, and that’s a good place to start. This is laughably crude. I tell him that while my county may vote Democratic, it’s extremely diverse. It extends north from the rough Democratic wards of Newark, through liberal latte enclaves like Montclair, and into Republican suburbs. He responds that this is not the end of the analysis, but the beginning. For Democratic counties, Spotlight runs an algorithm looking for possible Republicans. These are people they’re eager to bring back into the fold, or convert. In red counties, they plug in a different formula looking for signs of blue. In the early stages of this sorting process, they’re hunting for exceptions to the rule. Those who look different are potential swing voters.

What makes them different? The data miners dig down into each county and take a look. Money is key. If our neighbors two houses down make three times as much as the average income on our block, or if they spent twice as much as I did for my house, they stand out. Why don’t they live with their own kind? It might be a signal that they have different values. The Spotlight formula looks at their age and whether they have children living at home. “All these things have a bearing,” Smith says. “Roughly 40 percent to 50 percent of the variance in people’s values can be explained just by knowing their life stage and household characteristics.” (And following my case, all of those details only confirm that my wife and I fit neatly into the patterns of a Montclair neighborhood teeming with Still Waters.) But as they dig deeper into the data, maybe they’ll find something to set us apart. It’s at this point that they start looking at more specifics—including the newer behavioral data.

I should point out that this protracted process, which might read like the work of a single analyst burrowing tirelessly through file cabinets, is carried out by a computer in the tiniest fraction of a second. It zips through our neighbors, our genders and ethnic groups; it looks at our magazine subscriptions, our credit ratings. It notes whether we live in a mobile home or a row house, whether we’ve ever taken a cruise. Altogether, it digests more than 100 different pieces of data for each voter. Bobbing up and down in that sea are clues that lead them—at least in theory—to profile each one of us as a political animal and to predict our behavior. They do scores of these analyses per second.

Let’s look at how Spotlight might identify the techno-oriented Right Clicks, who account for 6 percent of the electorate. (Why the name Right Clicks? Folks who know their way around computers know all the extra tricks you can do by clicking the right side of the mouse. The rest of us click mainly on the left.) Right Clicks are the more committed half of the family-oriented segment, which they share with Civic Sentries. They lean Republican. But if a Democratic congressional candidate can come up with a list of, say, 10,000 of them, she can hit them with a direct mail appeal to their techno-libertarian values. It might stress, for example, that the government’s broad surveillance of Internet communications is a Big Brotherly intrusion on our privacy—and that she would push for a more technologically sophisticated approach to finding terrorists.

The statistical method Spotlight uses to ferret out this tribe is known by data geeks as multiple discriminate analysis. Using the original test group, researchers create a model of a Right Click that they will then apply to the voting population at large. The model encapsulates a ranking of the details most likely to set each group apart. Assume, Smith tells me, that most of the Right Clicks in Spotlight’s surveyed population are male, most have a broadband connection to the Internet, and most are white. Which of these three variables is most likely to distinguish them from the other tribes? Given the nature of the group, it’s likely to be the broadband. Standing by itself, it’s the roughest of predictors. Plenty of us have broadband. But the focus is on the statistical gap, or variance, in broadband subscriptions. How much more often do Right Clicks have them than the rest of us? Smith’s team calculates that number and uses it to build the first piece of the model. The process continues. The team finds the second most important variable, and the third, and keeps feeding them to the computer. The researchers don’t quit, Smith says, until they finally get down to a category—it might be the 50th, or 60th—that they deem statistically irrelevant. Maybe the fact that a particular voter is dogless doesn’t make much of a difference. While the researchers have been introducing the ever less predictive variables, the machine has quietly been digesting all of the different probability rankings and turning them into a mathematical prototype of a Right Click. This is a preliminary model. Using it, the machine can theoretically sift through other consumer records and successfully pick out Right Clicks. The team tests it. If the model falls short and places too many Barn Raisers or Crossing Guards in the Right Click pool, they tweak and test again.

These models, when loosed, can sniff out voters everywhere. Imagine a pack of bloodhounds roaming America’s cities, suburbs, and farmlands. Their heads aren’t swimming with the scent of suspected murderers and rapists (those are sensory data). Instead they’re packed with mathematical profiles. The neighborhoods they prowl exist in a database. Every time they pass the home of someone who appears to match one of these profiles, whether it’s a conservative Bootstrapper (committed to individual initiative and “alloyed by a strong belief in a divine hand in human affairs”) or the nostalgic Stand Pats (who long for a return to past values and feel that modern ambiguities “menace a lifestyle committed to patriotism, faith, family, community, and morality”), these hounds paw, whine, bark, or whatever it takes, in this matrix of numbers, to leave a record. (Good dog. Smart dog.) Yankelovich, in fact, has now run Spotlight’s values models through every name in its jumbo database. So now, some 175 million of us are pegged as unwitting members of one or another of Spotlight’s ten tribes.

As Gotbaum describes this method to me, I’m thinking far beyond these fanciful dogs. If political sleuths can build models of certain types of people, how far can others go? American jails and prisons house an ever-so-rich and varied population of criminals. As I write, other facilities, from the steamy barracks at Guantánamo Bay to holding tanks in the Middle East, house hundreds of suspected terrorists. What if researchers trawled through the personal data of convicted child molesters and then built a mathematical model of a pedophile? Would it be okay for schools or churches to screen job applicants by using this measure? If the tool had a proven correlation—say 50 percent, or 85 percent—would they be meeting their obligation to protect children if they ignored it? Would they be legally liable? What about those of us who are innocent but turn up as false positives in these analyses? Can we sue? As the Numerati advance, they’re going to be measuring and profiling countless aspects of human behavior. This is going to raise torturous moral questions, ones that until now we never knew enough to ask.

Gotbaum tells me that his project was a success. His political sniffers, he says, managed to tag the three groups of swing voters with an accuracy rate of 75 percent. In many realms, getting one out of four wrong would be abject failure. But for a politician in these early days of microtargeting, reaching 7,500 voters with a targeted message is cause for celebration, even if the message spills over to another 2,500.
That’s a far higher hit ratio than broadcast TV can achieve. To reach my community in North Jersey with a political ad, for example, candidates often have to buy airtime on expensive New York stations. This means that their message spills to millions in New York and neighboring Connecticut who can never vote for them. They’re also paying to reach loads of children, illegal immigrants, and the significant crowd of eligible voters who don’t bother going to the polls. For campaigns accustomed to such staggering degrees of waste, reaching a targeted voter on three out of four tries sounds almost too good to be true.

Looking at it the other way, one quarter of us—43.75 million American voters—are pegged to the wrong tribe. Gotbaum says that the errors put voters into a neighboring group. In other words, the system is almost right and doesn’t mistake Bible-thumping conservatives for Communists. But still, what kind of science gets it wrong a quarter of the time? In two words: this one. Here, as on the other stomping grounds of the Numerati, the key is to forget about the truth—or at least put it to one side. While truth is vital and highly relevant in the world of machines (aviation engineers swear by it), the kind of statistical analysis we’re discussing here, whether it’s predicting our behavior as house hunters or wine shoppers, is by its nature approximate. It’s based on probability. It involves all kinds of proxies in the place of real evidence. Truth is not a make-or-break test for the Numerati. They triumph if they come up with better, quicker, or cheaper answers than the status quo. Google, for example, doesn’t provide definitive answers. It simply leads us to promising Web pages. In less than a second, it usually plunks us in the right neighborhood. And because the earlier standard in Internet searching often left us lost and rudderless, Google rocketed to the top. Its approximations, crude as they were, turned a crew of algorithm-writing Numerati into a juggernaut.

The same goes in politics. Can the Numerati build models that connect candidates with voters at the right price? Are there areas where they can whip today’s status quo, the precinct chiefs and TV advertising? Increasingly, both parties are concluding that the answer is yes. That’s why political microtargeting—the domain of the Numerati—is the latest rage.

IN THE EARLY DAYS of 2001, President Bush’s chief political strategist, Karl Rove, was still wondering what had gone wrong. Going into the last 72 hours of the previous November’s elections, the Bush team had been leading in all the polls—and yet Al Gore had won the popular vote. And he had come within a few dangling chads and one Supreme Court vote of winning the whole shebang. How could the Bush team guarantee that this wouldn’t happen again?

Rove gathered strategists over the following months into the so-called 72-Hour Task Force. Later the task force condensed their conclusions into a PowerPoint presentation that circulated within the Republican Party. It called for all kinds of improvements, from getting out a coherent message to recruiting election-day volunteers. But one of the most prominent was microtargeting. “It’s the oldest adage in advertising,” say the notes accompanying the slides. “It is always easiest to sell people what they want to buy.” To that end, Rove’s task force urged party activists to “take every list you can get your hands on, and add the information to your voter file. This can be everything from lists of realtors, lists of chamber of commerce members, church directories, professional associations . . . We must adopt the idea that no list is too small.”

In Applebee’s America, Bush’s campaign strategist Dowd and his coauthors detail the Republican approach. The goal was to map the political DNA of voters in Michigan, an important swing state. Like Spotlight, the Bush team combined surveying with large consumer-behavior databases. But the approach was different. The questions hewed much closer to political issues. Instead of searching for core values, the team measured responses to political issues that had already arisen in public debate. It was just a matter of figuring out which ones moved them. Were voters upset by the prospect of gay marriage? Did they fear terror attacks? Were they outraged by smog or child porn on the Internet? When the team got through the surveying, they combined demographics and survey data to create 31 finely tuned segments, such as Terrorism Moderates and Middle-Aged Female Weak Republicans. Then they used every bit of data they could get their hands on, from magazine subscriptions to voting records, to peg the state’s voters. “If John Doe earned $150,000, drove a Porsche, subscribed to a golf magazine, paid National Rifle Association dues, and told a Bush pollster he was a pro–tax cut conservative who backed President Bush’s war on terrorism, the Bush team figured that anybody with similar lifestyle tastes would hold similar political views,” according to Dowd and his team.

Let’s imagine a voter who’s harder to pigeonhole than that gun-toting suburbanite. Maybe he lives on the very same cul-de-sac and makes piles of money, but he voted twice in the past decade, according to the records, in a Democratic primary. And he drives a ten-year-old Subaru, a liberal car if there ever was one. Hmmm. Political analysts increasingly look at voters and give them numerical grades, much like Fair Isaac’s credit-risk scores. In a 2005 governor’s race in Virginia, for example, every single voter in the state received a “likelihood score,” from zero to 100, that he or she would vote for the Democratic candidate, Tim Kaine. The inscrutable Subaru driver I just described might rate a 50. This scoring system made targeting easy. The Kaine campaign wrote off the voters with low scores. And they hardly bothered with those registering scores of 90 or above (except as potential donors). That would be preaching to the choir and a waste of resources. Instead, they focused on promising swing voters, those with scores from 55 to 75. “If you were a 60, you were getting communicated with. We were all over you,” recalls Kaine’s victorious campaign manager, Mike Henry.

But what message were they delivering to that pool of swing voters in Virginia? Josh Gotbaum would argue that those voters represented a big stew of Barn Raisers, Civic Sentries, and Hearth Keepers, garnished perhaps with a few Right Clicks and Inner Compasses. His approach would call for each group to receive a different stream of mail and telephone calls. Even the same issue—raising the minimum wage, for instance—would be framed differently. Community-minded Inner Compasses might hear that their neighbors needed a decent wage to live a healthy life, while the more conservative Civic Sentries would learn that a higher minimum wage would give hardworking families what they needed to make it on their own. In the Virginia race, Kaine’s team had to pick issues that would appeal to all of these swing groups. Following a large voter survey, they focused on better schools and wider roads. But that was 2005. Henry says that in coming elections, the targeting will be far more sophisticated. He and others—especially those working for Democratic presidential nominee Barack Obama—are gearing up for an unprecedented explosion of political warfare fueled by data.

They’ll be wrestling with layer upon layer of statistical complexity far beyond the tribes I’ve mentioned. Think for a second about one of those Virginia voters. How much is a 90 worth if he votes only once a decade? How about a 55 who braves blizzards and floods to make it to the polls? Two variables, level of support and likelihood of voting, are both crucial. And now the political Numerati are reaching into the tool kits of economists to calculate a projected rate of return for every advertising and promotional dollar spent on each one of us. In other words, how much will it cost to turn you into a vote for their side?

“I was working with a theoretical economist I went to graduate school with,” Mark Steitz is telling me. Steitz, a longtime Democratic consultant, operates out of a townhouse just up Connecticut Avenue from the cafés and bookshops of Dupont Circle. “We started thinking abstractly about the best way to formulate this problem,” he says. “And we came up with this triangle.” He clicks his computer, and a red-and-blue image appears on the screen. This so-called simplex triangle represents the universe of voters in an election. The position of each voter on the triangle is determined by two calculations: the probability that the voter favors Republicans or Democrats, and the probability t
hat he or she will vote. Steitz draws a vertical line up the triangle, a so-called isoquant. Each voter along this line is of equal value, he says. A voter who leans to the Democrats 75 percent of the time and votes in every election is on the same isoquant—and has the same value—as a voter who’s 100 percent Democrat and votes three out of four elections. Those two voters, Steitz says, “are indistinguishable to me.” His triangle, at this stage, is largely theoretical. Yet as politicos learn more about voters, they’ll be able to plug us into more of these types of mathematical formulas.

As this happens, the calculations grow only more complicated—a trend that plays to the strengths of the political Numerati. Some votes, it turns out, are worth far more than others. Each side in an election needs only 50 percent of the votes, plus one. That one vote at the end could be worth millions of dollars. Just think of the handful of contested Florida votes in the 2000 election between George W. Bush and Al Gore. And yet a vote that lifts a candidate to 60 percent, or to 40 percent, has only symbolic value. And that last wavering voter, according to Steitz’s triangle, will be the most expensive to acquire. As the Numerati develop tools to model voters and measure the effectiveness of campaign spending—its “yield,” in economic terms—political parties will be able to look at each election as a marketplace. As the polls swing, the relative value of each voter rises and falls. Some of us are cheap, virtual throw-aways. Some will be prohibitively expensive, not worth the investment. But those of us who shape up as the difference makers will be the targets of increasingly customized come-ons. Analysts will know which of us are wrestling with college tuition and which of us fear that our jobs will leave for India. Some might even voice a concern about an outbreak of rabies that threatens our cats. If the politicians get it right—which is no sure thing—the campaign messages will address our concerns and reflect our values. It’ll be as though they finally understand us. Who knows? Maybe they’ll even learn not to call us at dinnertime. We’ll feel, if only for a few short weeks of a frantic campaign, appreciated.

‹ Prev Next ›