The Numerati

Page 6

by Stephen Baker

As merchants learn more about us, it’s going to be easier for them to figure out which customers to reward and which ones to punish. This won’t make much difference to butterfly shoppers. They’re oblivious. But in the age of the retailing Numerati, life for barnacles might get grim.

WITH ALL THIS TALK of butterflies and buckets, I ask Ghani, where is the individual? I expected to see myself modeled as a shopper, and here I am, sitting in buckets with other frozen-fish buyers and brand traitors. What’s become of customization? Where’s the fully formed mathematical model of the cheapskate who never pays the extra buck for yellow or red bell peppers? I’m talking about the reluctant clothes shopper, the one rushing through the mall with a tightening back who always takes two laps around the garage before finding his car? In short, where am I in all this data?

Ghani smiles as he delivers the bad news. There’s no fully formed “me” in that data. There’s no you, at least not yet. We exist in these databases as shards of our behavior, my hang-up with the bell peppers, your habit of casually tossing a bag of M&Ms onto the pile as you wait at the checkout. (By the way, those seemingly impulsive purchases, often accompanied by a what-the-hell shrug, are no afterthoughts, Ghani’s data shows. Many shoppers buy the candy bars and breath mints more predictably than they purchase milk or toilet paper.) In any case, all of those pieces of our shopping selves reside in endless buckets with other people’s slivers. Much as we might find it flattering to sit in a unique bucket all by our lonesome, for retailers there’s no point. They don’t have a customized marketing campaign for me or for you. They want to sell pork or crew-neck sweaters. And for this, they’d like to bring together 1,000 or 50,000 people. Just because they like to microtarget doesn’t mean that they wouldn’t rather reach lots of people with the same message. They still love big numbers. They just prefer to target customers more intelligently. It would be easy to mistake these new buckets for the demographic groupings marketers have worked with for decades: Hispanics, yuppies, soccer moms, the super rich who inhabit the 90210 zip code in Los Angeles. Those are buckets too. But there’s a world of difference.

In the old days, marketers knew next to nothing about the individual, so they assumed that he or she shared values and urges with similar people—those who also made six-figure salaries or had a last name with a vowel at the end. This was a crude indicator. But given the information they had, it was the best they could do. And in the decades of industrialized consumption, in the 1950s and ’60s, it wasn’t half bad. Choices were limited. Why bother learning about a person if, chances were, he had little choice but to watch The Honeymooners, eat one of three different kinds of peanut butter on his sandwich, or buy a car that looked pretty much like a Chevy? We have thousands more choices now, from the supermarket shelves to the remote on the TV, not to mention the Internet. So marketers, as Tacoda’s Dave Morgan demonstrates, can shift their focus from who we are to how we behave. For this, they need the new buckets.

To see how different these new groupings are, consider the demographics of these buckets we inhabit. Start by looking at the skinflints who, like me, forgo the pleasures of red and yellow bell peppers. In this green-pepper bucket I’ll wager that I’m surrounded by people of all races. Both genders are represented (though I’d imagine, based on my family sample, that more of us are guys). We drive all kinds of cars. Some of us hunt; others would just as soon outlaw guns. The district attorney might be in there, sharing bucket space with the FBI’s most-wanted killer. You could say we have nothing in common, and you’d be absolutely right—except for one thing: our behavior when it comes to buying bell peppers.

These bits of our behavior sit in thousands of buckets, all of them created automatically by machines. Most of them—like my green-pepper bucket—are never used. If you strung all of your buckets one after the next, you’d see your own special combination, your unique shopping genome. Spend time with microtargeting marketers these days, and you’ll hear them refer to these behavior patterns as a consumer’s DNA. This comparison is not fair or accurate, though it sounds temptingly simple. Unlike our genetic code, our behavior changes all the time. We learn. (Who knows? After one tasty Moroccan meal I might be inspired to spring for a basket of exorbitant red peppers imported from Holland.)

Still, forget those technicalities for a minute. Think of buckets as genes. Each base pair of a gene (which provides instructions to produce amino acids) is described by combinations of two of four chemicals known as nucleotides. They’re represented by the letters A, G, T, and C. That basic code is pretty simple. But there are key variations, both in the DNA code for individual genes and in the 3.2 billion base pairs in the genome. To a large degree, those differences shape our bodies and our lives, distinguishing us not only from other plants and animals, but also from each other.

Since the 1990s, thousands of the world’s leading mathematicians and computer scientists have been drawing up algorithms to comb through vast databases of DNA and other health data. They’re looking for patterns in those billions of base pairs that might point to a proclivity for leukemia, creative genius, alcoholism, or perhaps a deadly allergy to peanuts. The research is still at an early stage, but scientists have built an enormous mathematical toolbox for linking symptoms to variations in the four building blocks of DNA.

Why does that matter to a grocer? For now, it doesn’t. But let’s say that a supermarket, a few years down the road, organizes each aspect of our shopping data into four groups. For example, we buy candy at the checkout

More than 90 percent of the time

From 25 to 89 percent

From 1 to 24 percent

Never

With modern computing, it wouldn’t be that hard to organize thousands, or even millions, of our grocery-shopping habits into similar groups of four. They’ll be arbitrary, much like the census or the categories on insurance forms. The point here, however, isn’t to model one entire person accurately but instead to decode the patterns of human behavior. Consider the people who buy luxury chocolates. Is there anything in their purchasing behavior that appears to trigger chocolate lust? Grocers have wrestled with these questions for centuries. They make sensible correlations. Chocolate lovers might be interested in almonds. Catch them at the holidays and before Valentine’s Day. But how about the correlations that humans wouldn’t think to look for, such as the romance-movie lovers who clicked on Alamo car rental ads? How do grocers unearth those hidden links?

This is where the data-mining algorithms could come in and lead to randomized experiments with shoppers, Ghani says. Once the retailers have our behaviors grouped into four variables, they can retool one of these genomic algorithms and feed our shopping data to it. The computers whir through our purchases, looking at literally billions of combinations. The great majority are utterly senseless. Do people who buy both Brussels sprouts and sugared cereal also buy Swiss chocolates more than the mean? No sane person would bother looking for such a connection. That’s why it’s the perfect job for computers. Set them on a hunt, and they might find correlations we humans would never think to consider. Just as they’ve helped medical researchers find genetic markers pointing to certain types of breast cancer and Huntington’s disease, they might tell grocers what kinds of fruit to promote to buyers of canned food or what types of magazines dog-food buyers tend to read. These suggestions may sound frivolous. But if a retailer can tweak promotions, bucket by bucket, and gain a boost of even 2 percent of sales, it’s cause to rush down aisle seven and pop a magnum of Mumm’s. They measure profit margins in this industry by the tenth of a percent.

As Ghani talks about shopping patterns and genomic researchers, I think about putting all the people we’ve been talking about—the grocers, the microtargeting advertisers, the mathematical geneticists—into one room. They wouldn’t seem to have much in common. Yet they do. In nearly every industry, the data we produce is represented by ones and zeros. It all travels through the same networks and vies for space in the same compu
ters. This means that the mathematical tools used to analyze this data can cross disciplines and industries, from the barnyard to the aisles of Saks, almost effortlessly. This has a nearly miraculous multiplier effect—the brains working in one industry can power breakthroughs in many others. Researchers long isolated in different fields, different departments on campus, different industries are now solving the same problems. The analysis of networks, for example, extends from physics to sociology. In a sense, all of these scientists are working in one global networked laboratory.

All of which is to say that researchers whose tools will one day decipher the secrets of your shopping—perhaps the subconscious patterns you don’t even know about yet—may not be working for Wal-Mart or Google or Ghani’s team at Accenture. Today they might be studying earthworms or nanotechnology, or maybe the behavior of Democratic voters in swing states. For example, one researcher at Microsoft, David Heckerman, was hard at work building a program to comb e-mail traffic and identify spam. He knew that spammers systematically altered their mailings to break through ever more sophisticated defenses. He was dealing with a phenomenon similar in nature to biological mutations. His system had to anticipate these variations. Heckerman, a physician as well as a computer scientist, knew that if his tool could detect mutations in spam, it might also work in medicine. Sure enough, in 2003, he shifted his focus to HIV, the virus that causes AIDS. His tools, with their legacy in spam, could eventually lead to an AIDS vaccine. “It’s the very same [software] code,” he says. In the Numerati’s world, breakthroughs can come from any direction.

CONSIDER FOR a moment the clothes you put on this morning. If Rayid Ghani and his colleagues had a picture of you as you made your way down to breakfast or out the door, would they know from your clothing what tribe to put you in? Chances are, they could come pretty close. Humans have specialized in tribal recognition since we climbed down from trees. It’s a survival skill.

But how does Ghani teach that skill to a machine? Computers, after all, have to figure out what kind of clothing we’re buying if they’re going to classify us as dweebs, business drones, hip-hoppers, earth mothers, or whatever other fashion buckets the marketers create. It is true, of course, that armies of people could flip through these garments, giving each one a tribal tag. But this procedure would cost a bundle, and the workers (who themselves come from different tribes) would surely disagree on what’s sexy, fashion-forward, or retro. Humans are just too subjective. This is a job for computers. However, when it comes to classifying clothing, Ghani says, machines fare no better than the most clueless of humans—at least for now. So the Accenture team in the Chicago lab has to cheat.

Here’s how. They hire a group of people to teach the computer. These trainers slog through a questionnaire from an online department store catalog. For several hundred garments, they answer a series of multiple-choice questions. Is it formal or casual? Is it business attire? On a scale of one to ten, how sporty is it? How trendy is it? What age group is it for? On and on. Several people evaluate each item. This smooths over their individual quirks and produces a consensus. As the humans answer these questions, the computer learns about each piece of clothing. If it were human, perhaps it would be able to develop an eye for what’s sporty and what’s hip, and then be able to classify the rest of the fashion universe by itself. But computers don’t yet have such discerning eyes. Instead, the machine focuses on the promotional language that accompanies each picture. Zesty! Hot! Spring fever! It learns to associate those words with the values spelled out by its human trainers.

In the end, the computer builds up a matrix of words, all of them defined by their statistical relationships to each category of clothing. Bra, to cite an obvious example, would have a near-zero probability of belonging in men’s wear. In every example marked by the humans, it shows up for women. But that doesn’t tell the computer whether a certain bra is sporty, casual, or Gen Y. For that, it must find clues in other words.

Ghani shows me the vocabulary his system has mastered. He calls up “conservative” words. The computer spits out trouser, classic, blazer, Ralph, and Lauren. Words that rank low on the conservative scale? Ghani calls it up and laughs. “Leopard! That’s a good one.” Others are rose, chemise, straps, flirty, spray, silk, and platform. I’d say the computer has figured out a thing or two. When Ghani asks it for “high brand appeal,” DKNY and imported show up, along with that now familiar duo of Ralph and Lauren. (This system, Ghani explains, has no fancy understanding of context. Unlike other artificial intelligence programs, it is unburdened by grammar. It just plows through the English words it has encountered and pegs each one to a set of probabilities.)

Figuring out that a certain white blouse is business attire for a female baby boomer is merely step one for the computer. The more important task is to build a profile of the shopper who buys that blouse. Let’s say it’s my wife. She goes to Macy’s and buys four or five items for herself. Underwear, pants, a couple of blouses, maybe a belt. All of the items fit that boomer profile. She’s coming into focus. Then, on the way out she remembers to buy a birthday present for our 16-year-old niece. Last time we saw her, this girl was wearing black clothing with a lot of writing on it, most of it angry. She told us she was a goth. So my wife goes into an “alternative” section and—what the hell?—picks up one of those dog collars bristling with sharp spikes.

How does Ghani’s system interpret this surprising deviation? Jaime Carbonell, a professor of machine learning at Carnegie Mellon, thinks about these issues a lot. In the early days, he says, consumers were often averaged. He noticed that Amazon.com, for example, saw that he was interested in Civil War history and in computational biology. So it combined them. He got recommendations for the history of biology and the north-south divide on some scientific question. “The average modeling doesn’t work well,” he says. “We’re not the average of our interests.” The newer approach is to use clustering software. This divides his interests into different groups and gives him recommendations based on each one.

Let’s say my wife’s purchases were clustered. The system could look at most of her purchases and conclude that she’s a female boomer. The dog collar? It’s what statisticians call an outlier. In these early days, it’s something that’s safer to ignore. But as analysis gets more sophisticated, it will latch onto those bits of our lives that appear to be deviations. After all, which details are more likely to lay us bare, our day-to-day behavior that appears “normal” or the apparent quirks that we often work to hide? A detective will opt for the outlier in a New York second. The marketer might too. But it’s tough to make sense of such data with automatic systems.

In any case, suppose that next week my wife returns to the same store and buys piercing tools and green hair dye. At that point, the software might turn the spike collar she purchased, that apparent outlier, into its own cluster. So what would that new cluster tell us about her? Hard to say. Is she a middle-aged professional who commutes Monday through Friday in sober attire and then, on weekends, straps on the spiked collar and goes goth? Could be. Or perhaps she’s buying for two people. Ghani says that some systems in grocery stores look at the different clusters and try to come to conclusions about the composition of a family. Others look at the different signals as varying dimensions of one person. Sometimes, though, “mutually exclusive” purchases in the same cart—small socks and big shoes—indicate that more than one person is involved.

Accenture’s automatic fashion maven isn’t yet grappling with such subtle distinctions. It’s still in the research phase. But once this type of technology is in the marketplace, stores will have strong signals as to what types of shoppers we are. At the same time, they’ll be compiling ever more detailed and valuable customer lists. As we’ll see, plenty of other marketers, such as those in dating services or political groups, would pay richly for, say, a list of 10,000 trendy Gen Y women in Seattle, Chicago, or Miami. And yes, there will be lively markets, no doubt, for assorted varieties of go
ths.

LET’S SAY YOU go to a department store with a shopping list. If you come back missing a couple of items, the store has failed an important test. Even if you locate and buy everything on the list, your visit, from the store’s perspective, falls short of an unqualified success. No, they want you to stumble upon countless temptations as you make your way up and down the aisles. In their dreams, you teeter up to the checkout under such a pile of serendipitous finds that you have to pay a young assistant or two to help you lug it to the car.

How to make that happen? The first step is to map our migrations through the store. In the old days, some store managers and museum curators would gauge foot traffic by the wear on the floor tiles. Then they would redeploy their offerings to draw customers off the beaten paths. But that approach is a tad slow for the Numerati.

Ghani and his team have another idea. As we walk around the Accenture office, cameras hanging from the ceiling are tracking our every move. There are about 40 of them, Ghani says matter-of-factly. From my perspective, it’s insidious workplace surveillance. With this kind of spy network installed in my skyscraper offices in New York, I think I’d find myself rationing my trips to the bathroom. But Ghani and his colleagues view the cameras as just one more experiment, this one to track workers and customers. The Accenture workers are offering themselves as specimens, and they don’t seem to mind a bit.

This type of monitoring system isn’t that relevant to Accenture’s lab setting, where the flow of information counts for more than bodily movements. But Ghani sees growing numbers of cameras tracking the movements of customers and employees in big stores, hotels, and casinos. They could also find a home in factories. Such cameras are already installed as a security measure, Ghani says. So now it’s just a matter of giving the camera another job.

‹ Prev Next ›