Chapter 4
* * *
Blogger
“NOW LET ME tell you of my long tale which I have taken time to tell.” These words draw me in. I sip my coffee and scroll down the page.
It’s a long blog entry written by a woman who calls herself “Tears of Lust.” She writes about driving all night to Columbus, Ohio, with her boyfriend, Kenny, and her sidekick, Lizzy. The three of them are headed to an anime convention. As they drive through the night, Kenny starts to feel sick. By the time they reach Columbus, he’s “passed out on the [hotel] bed.”
Kenny ends up in Columbus’s Grant Medical Center and faces an emergency appendectomy. “Kenny made his phone calls to relatives before he went under,” Tears of Lust writes. “And me and Lizzy went to the waiting area. We watched Kill Bill and read a magazine to each other before returning to the café which was closed but they had the best vending machines like the ones that rotated and the ones that were refrigerated. It was awesome.”
The story goes on. It’s a medical saga woven with the wanderings and cravings of a consumer. “At one point,” she writes, “[Kenny] thought he heard his dead aunt talking to him so he assumed he was going to die during surgery.” But Kenny survives, and he recuperates as the writer and Lizzy happily watch Memoirs of a Geisha and The Da Vinci Code on DVD. Later, we learn some details about his stomach problems, his nearly collapsed lungs, and an abscessed wound. He’s so tender that when he and his girlfriend make love a few days later, Tears of Lust has to climb gingerly on top, “like a naughty nurse.”
That’s just a quick dip into the blogosphere, an enormous and expanding pool brimming with some of our most personal data. Up to now, we’ve seen how employers can track our procrastination and our e-mails, and how they’ll be able, increasingly, to optimize us as workers. We’ve seen how advertisers attempt to turn our mouse clicks and movements into mathematical models that anticipate our every urge. In what we’ve seen so far, it’s others who have their way with our growing mountain of data. They grab it, they analyze it, they use it. Whether we’re shopping or taking out a loan, we’re laboring for the Numerati in much the way a drosophila fly works for a white-coated lab technician. Sometimes we get discounts and prizes. Sometimes we can say no. But once we agree to an offer, we’re specimens. And yet, in the world of blogs and YouTube and social networking sites like MySpace, millions of people broadcast their lives voluntarily. They pile up details by the shovel load. Privacy often looks like an afterthought, if it’s considered at all. People like Tears of Lust aren’t pawns. They’re running the blogosphere. But that doesn’t mean they can’t be used.
On a frosty winter morning in New Jersey, I take the laptop to a coffee shop and call up Technorati, a blog search engine. There I look for a post brimming with the kind of private details most of my friends and acquaintances would rather keep to themselves. To limit the field to informal writers who hold nothing back, I type a misspelled “diahhrea” in the search box. The first post that pops up is by Tears of Lust.
For market researchers, blog posts like this one open a window onto a consumer’s life. Blogs and social networks offer up-to-the-minute intelligence—something marketers have long dreamed of. For decades, soap makers, brewers, and movie studios have attempted to simulate the marketplace, at great expense, by bringing together focus groups. These small groups of people, usually numbering from eight to a dozen, agree to try out the latest jelly beans or toothpaste, watch competing ads, or view a Hollywood release. Marketers watch to see if group members squirm or yawn during horror movies, if they nod or recoil while watching a political attack commercial. They have to make the most of each gathering because putting together focus groups is expensive and budgets are tight.
Now that people like Tears of Lust are publishing their feelings about a host of products, it’s as if a universe of focus groups is forming online. Tens of millions of people participate. Many write copiously. And from a marketer’s view, many are gloriously indiscreet about practically everything. True, some of them, like Tears of Lust, shield their identity, or at least change their name. Marketers don’t care. What they relish is the unfiltered peek at the moving gears and conveyor belts of peer pressure, bias, and desire.
Bloggers tend to be younger than the average consumer and a bit more tech savvy. Statistically speaking, they don’t reflect society at large. Still, it’s a big and surprisingly diverse pool of people, numbering more than 20 million. Grandmothers blog. CEOs blog. Marketers can dive into these online journals to find opinions about nearly everything and to track trends. The only trouble is that no one employs big enough teams of readers to keep up with the blogs. No one could. It’s too much text for human eyes. And the themes of the blogs, much like our lives, wander all over the place. They’re hell to organize. The only way to harvest and file the customer insights streaming from blogs is to turn over the work to computers.
IT’S A WHITE winter in Colorado. Every weekend, it seems, another blizzard blows in. Little surprise, then, that I get a fabulous rental deal on a convertible. The wind whips the cloth top as I drive it west through Denver and up toward the snowy mountains, to the university town of Boulder and the home of Umbria Communications. It’s a company that mines the millions of words pouring into blogs every hour. Its purpose is to learn what you and I and everyone else in the consuming world are thinking and, especially, what we’re hankering for.
Howard Kaushansky, Umbria’s president and founder, got my attention early on when describing Umbria’s business. “We turn the world of blogs into math,” he said one day, while visiting New York. “And then we turn you into math.” A colleague and I had just launched our own blog. The idea of turning it into math sounded like a lot of work. And turning me into math? I supposed such a thing was possible, but I had only a vague idea why Umbria’s team would bother and no clue as to what kind of equation I would become. Truth be told, as I drive into Boulder, I’m still struggling with the concept. I’ve made my way through entire chapters on hidden Markov models and Bayesian analysis. I’ve even wrestled briefly with so-called support vector machines. But I’m still not sure what you and I look like in numerical form. That’s one of the things I’ve come to Boulder to learn.
Kaushansky has smooth rounded features on a wiry frame, neatly coifed graying hair, and the restless mien of a marketer. In fact, he’s a lawyer by training, and he’s been running businesses in analytics and data mining for the past 15 years. He builds and sells them. The last one was Athene Software, a predictive analytics company he sold in 2001. With Umbria, he’s focusing the analytics on blogs.
A bicycle wheel leans against the wall in Kaushansky’s office. I ask if he rides. He does, he says, brightening. Extensively. I do too, I tell him. The area around Boulder is a “dream” for cyclists, he says. (I hold back from my spiel about the unexpected biking jewels in New Jersey. People don’t care.) I ask him where he lives. He points out the window toward an anvil-shaped mountain, Flatiron. He lives on the other side of it. He sees bear out there, herds of elk some mornings, coyotes. Leave your domestic pets outside after dark, he says, and they’ll get devoured. He spends weeks on end away from this mountain spread, flying around the country. He’s trying to convince nearly every type of company to tune in—through Umbria—to what their customers are saying on the blogs.
Kaushansky founded Umbria in 2004. Since then, the company has built a system that automatically reads millions of blog posts every day. The first step, he tells me, is to learn something about the author of each blog. Is the person a male or a female? A teenager? A twenty-something? A boomer? The computer looks for clues such as sentence structure, word choice, quirks in punctuation. How many middle-aged men do you know, for example, who end a sentence like this:!!!!!!!!!!!!!!!? Sometimes the computer reads through a post, shrugs its digital shoulders, and gives up. It doesn’t see the telltale signs. That post goes unclassified. Despite such setbacks, says Kaushansky, Umbria’s computer is able to build up l
arge piles of posts for each gender and generation. They sort the authors into those categories.
The next step is to figure out what each group of authors is writing about. In a decade or two, automatic readers like Umbria’s will likely dive deep into the content of written documents, perhaps analyzing an author’s mood, income, and educational level. Maybe the computer will come to conclusions about the individual’s circle of friends or be able to predict his or her behavior. For now, though, with only a tiny sliver of a second to devote to each blog post, Umbria is delivering far simpler fare. It’s looking to see if the writers have opinions about services or products—a new cell phone, for example, or the call center for a large bank. The only conclusion it reaches is whether the blogger has a favorable or unfavorable opinion. Thumbs up, thumbs down.
It sounds crude. What makes the blog world especially valuable to marketers, though, is not its precision but its unfiltered immediacy. Opinions change day by day, sometimes hour by hour. Let’s say that one of Umbria’s customers launches a new deodorant on Tuesday and promotes it with $4 million of TV advertising over the following week. How can marketers find out if the advertising has reached the target audience? Most of us don’t rush out to buy deodorant, no matter how compelling the ad. We might have another two or three months left on the stick in the medicine cabinet. So sales figures won’t provide quick feedback. Traditional Web pages, the kind that search engines like Google comb through, are static, a bit like a library. They’re sorted by relevance, not timeliness. Chances are, the most “relevant” Web page is the company’s own press release. That doesn’t help one bit. To learn what we’re thinking, the deodorant company must reach beyond the more formal Web to what bloggers and social networks are saying about the product.
This may sound like an outlandish example. People blogging about deodorant? But now that every single person online can become a global publisher in the five minutes it takes to set up a free blog, the sorts of details that people publish might surprise you. I search on Technorati for “deodorant,” and within minutes I find a post from Jeff, a 46-year-old “ex-touring musician turned husband/dad,” who lives in St. Cloud, Minnesota. He takes us on a tour of his bathroom cabinet, presenting his views on everything from floss (“It has to be the really thin stuff since my back teeth are very close together”) to mouthwash (“I’ve never been a mouthwash kind of guy. And that disgusting commercial where that guy swooshes hot, spitty mouthwash around . . . for ten minutes doesn’t help much either”). He informs us that he stopped buying cologne after marrying, since he no longer needed “to wear bait.” And yes, he weighs in on deodorant: “If any of you have teenage boys, I hope for your sake they don’t find out about Axe, or any of the other popular body sprays. My wife and I have had to intervene to let them know that they don’t need to spray on a whole can at once when a half a can will get the job done just fine. Ugh.”
By rounding up this kind of consumer insight, Umbria can provide the advertiser with a report showing how much buzz their ads generated the first day, or the first week, of the campaign. It can determine whether the response was favorable and how it matched up with the competition. (In this example, the demographic details are crucial. If the company is marketing the deodorant to Jeff’s teenage kids, the “Ugh” from their father might not even be a negative. Jeff makes it easy for Umbria’s computer by putting his age and gender on the blog. (We even learn that he’s a Leo.) This type of research turns traditional surveying on its head. Unprompted by marketers, bloggers like Jeff volunteer the answers to millions of potential questions. “In a sense, we’re very similar to the game show Jeopardy!” Kaushansky says. “People have already said that they like a certain car or dislike a movie. It’s our job to formulate the questions.”
Kaushansky’s team is also starting to divide bloggers into different groups, or tribes. Kaushansky envisions nearly endless tribal affiliations. Doritos munchers, bikers for Obama, MINI Cooper enthusiasts. Once the company has sorted bloggers into tribes, it can start digging for correlations between tribes and products. It was through analysis of blogs, for example, that Kaushansky learned that the Gatorade tribe includes not only athletes and fitness nuts but also heavy drinkers on college campuses. Many use it as a mixer in hopes that electrolytes in the drink will soften hangovers. If this was news to Gatorade executives (and to their credit, it wasn’t), the company could consider extending its promotional partnerships beyond the likes of Nike and Cannondale, perhaps reaching out to Bacardi or Absolut vodka.
Sometimes this tribal knowledge helps marketers draw distinctions between consumers. One cell phone company, Kaushansky tells me, started charging extra for Bluetooth data connections—radio signals that replace wires. The people who wear the blinking phone clip on their ear use Bluetooth to relay their conversation to their handset. News of the extra charge on the phone bill sent bloggers into a fury. But Umbria, Kaushansky says, studied the blogs and discovered that almost all of the anger came from one tribe: the “power users.” Those are the folks who spend lots of time and money hunched over their handsets, sending e-mails and photos and fooling around with spreadsheets. The other tribes—the fashionistas, the music lovers, the cheapskates—shrugged off the Bluetooth charge. Many of them probably didn’t know what it was. With this intelligence, the phone company, conceivably, could raise the price a few bucks on the handsets favored by power users, and then offer them “free” Bluetooth. Meanwhile, they could continue charging everyone else.
This is a new stage in market intelligence. While still primitive, it’s easy to see where it’s headed. The Numerati are training computers to digest our words automatically. They are coming to conclusions about who we are and what we think. And as these computer systems gain in speed and ability, they’ll feast on more of our communications, extending far beyond blogs. Automatic readers like Umbria’s can stretch into social networks like MySpace and FaceBook, the meeting places of entire generations. They can surf the comments on interactive video games and dig deeper into our e-mails, picking out our hobbies and passions and selling that information to advertisers. Given technology like Umbria’s, scores of companies with access to our words will be positioned to track, minute by minute, the shifting patterns of human thought. Umbria and its competitors, from Nielsen BuzzMetrics to Google, are betting that marketers, government officials, and politicians will pay richly for the insights they come up with. And this analysis of our words may be speeding ahead even faster in the shadows. Following the terrorist attack of 2001, intelligence officials in the United States gained access to enormous flows of Internet and telephone traffic. The National Security Agency, which has the largest staff of mathematicians in the country, is mining the traffic hour by hour.
Until recently, our words, whether spoken, typed, sung, or scrawled, performed their magic beyond the range of mathematicians. It wasn’t just that language, with its countless shades of tone and shifting nuances, resisted the rigid hierarchies of geometers and computer scientists. (That’s still an issue, as we’ll see.) No, the problem was more fundamental. Our words didn’t hang around for analysis. The sentences we spoke traveled through the air or along copper wires before alighting, every so briefly, in forgetful minds. They faded faster than cut flowers. Our written words moldered on pages, a select few of them stuffed away in envelopes and notebooks. Most of them weren’t in the public domain, much less on the hard drives of powerful computers.
That has changed. For starters, our queries to search engines provide a detailed timeline of what online humanity is interested in—what we’re looking for, what we would like to buy. But those queries, most of them just three or four words long, are bare bones. They point in a direction but divulge only odds and ends about the people who write them. Think about what you searched for online over the past week. Those queries might trace your pursuit of a high-definition TV or your research for a geology project on the Pleistocene era. But they could easily miss important events in a person’s life�
�the death of a parent, perhaps, or a battle against addiction. Outfits like Umbria are working to glean new marketing insights from these online rambles. Imagine that they might want to create a bucket of several thousand female bloggers trying to quit smoking. It wouldn’t be hard. Now, do they appear more interested than average in chocolates or in white wine? In these early days, Umbria is focusing on simpler stuff. But a wide-ranging sampler of human life is circulating on blogs, ready to be harvested. It’s as if humanity itself were squeezed right into Umbria’s offices, sheltered from the winter winds outside, typing on command. Once the words show up, they’re available for eternity, to be matched, compared, crunched, parsed, and repackaged as marketing intelligence.
Maybe you don’t write a blog and you stay off social networks. If so, you could be lulled into thinking that companies like Umbria learn about other people, not you. But that’s not exactly the case. Umbria and other analytics companies are going to school on blogs. Once this new generation of automatic reading and learning machines scopes out the blog world, they can broaden their focus to everything else we write. In fact, it’s already happening. Spam-fighting companies such as Postini—a division of Google since mid-2007—sift through millions of e-mails issuing from Fortune 500 companies. They check them for signs that employees are leaking company secrets or carrying out insider deals. Other companies sweep through the hard drives of corporate flotillas of personal computers, scanning words to make sure that employees aren’t using the equipment for their own sordid or selfish ends.
The Numerati Page 10