Book Read Free

Super Crunchers

Page 4

by Ian Ayres


  Harrah’s uses this information to predict how much a particular gambler can lose and still enjoy the experience enough to come back for more. It calls this magic number the “pain point.” And once again, the pain point is calculated by plugging customer attributes into a regression formula. Given that Shelly, who likes to play the slots, is a thirty-four-year-old white female from an upper-middle-class neighborhood, the system might predict her pain point for an evening of gambling is a $900 loss. As she gambles, if the database senses that Shelly is approaching $900 in slot losses, a “luck ambassador” is dispatched to pull her away from the machine.

  “You come in, swipe your card, and are sitting at a slot,” Teradata’s Gnau said. “When you get close to that pain point, they come out and say, ‘I see you’re having a rough day. I know you like our steakhouse. Here, I’d like you to take your wife to dinner on us right now.’ So it’s no longer pain. It becomes a good experience.”

  To some, this kind of manipulation is the science of diabolically separating as many dollars from a customer as possible on a repeated basis. To others, it is the science of improving customer satisfaction and loyalty—and of making sure the right customers get rewarded. It’s actually a bit of both. I’m troubled that Harrah’s is making what can be an addictive and ruinous experience even more pleasurable. But because of Harrah’s pain-point predictions, its customers tend to leave happier.

  The Harrah’s strategy of targeting benefits is being adopted in different retail markets. Teradata found, for example, that one of its airline clients was giving perks to its frequent fliers based solely on how many miles they flew each year, with Platinum customers getting the most benefits. But the airline hadn’t taken account of how profitable these customers were. They didn’t plug in other available information, such as how much Platinum fliers paid for tickets, where they bought them, whether they called customer service, and most important, whether they traveled on flights where the airline actually made money. After Teradata crunched the numbers taking into account these bottom-line attributes, the airline found out that almost all of its Platinum fliers were unprofitable. Teradata’s Scott Gnau summed it up, “So they were giving people an incentive to make them lose money.”

  The advent of tera mining means that the era of the free lunch is over. Instead of having more profitable customers subsidizing the less profitable, firms will be able to target rewards to their most profitable customers. But caveat emptor! In this brave new world, you should be scared when a firm like Harrah’s or Continental becomes particularly solicitous of your business. It probably means you have been paying too much. Airlines are learning to give upgrades and other favorable treatment to the customers that make them the most money, not just the ones that fly the most. Airlines can then “encourage people to become more profitable,” Gnau explains, by charging you more, for example, for buying tickets through a call center than for buying them online.

  This hyper-individualized segmentation of consumers also lets firms offer new personalized services that clearly benefit society. Progressive insurance capitalizes on the new capabilities of data mining to define extremely narrow groups of customers, e.g., motorcycle riders ages thirty and above, with college educations, credit scores over a certain level, and no accidents. For each cell, the company runs regressions to identify factors that most closely correlate with that group’s losses. Super Crunching on this radically expanded set of factors lets it set prices on types of consumers who were traditionally written off as uninsurable.

  Super Crunching has also created a new science of extraction. Data mining increases firms’ ability to charge individualized prices that predict our individualized pain points. If your walk-away price is higher than mine, tera mining will lead firms to take a bigger chunk out of you one way or another. In a Super Crunching world, consumers can’t afford to be asleep at the wheel. It’s no longer safe to rely on the fact that other consumers care about price. Firms are figuring out more and more sophisticated ways to treat the price-oblivious differently than the price-conscious.

  Tell Me What You Know About Me

  Tera mining sometimes gives businesses a decided information advantage over their customers. Hertz, after analyzing terabytes of sales data, knows a lot more than you do about how much gas you’re likely to leave in the tank if you prepay for the gas. Cingular knows the probability that you will go beyond your “anytime minutes” or leave some unused. Best Buy knows the probability that you will make a claim on an extended warranty. Blockbuster knows the probability that you will return the rental late.

  In each of these cases, the companies not only know the generalized probability of some behavior, they can make incredibly accurate predictions about how individual customers are going to behave. The power of corporate tera mining creepily suggests the opening lines of Psalm 139:

  You have searched me and you know me.

  You know when I sit and when I rise; you perceive my thoughts from afar.

  You discern my going out and my lying down; you are familiar with all my ways.

  We may have free will, but data mining can let business emulate a kind of aggregate omniscience. Indeed, because of Super Crunching, firms sometimes may be able to make more accurate predictions about how you’ll behave than you could ever make yourself.

  But instead of trying to prohibit statistical analysis, we might react to the possibility of advantage-taking by simply making sure that consumers know that the number crunching is going on. The rise of these predictive models suggests the possibility of a new kind of disclosure duty. Usually government only requires firms to tell a consumer about their products or services (“made in Japan”). Now firms sometimes know more about consumers than the consumers know about themselves. We could require firms to educate consumers about themselves. It might be helpful if Avis told you, before you agree to prepay for gasoline, that other people like you tend to leave more than a third of a tank full when they return the car—so that the effective price for prepaid gas is really four bucks per gallon. Or Verizon might be asked to tell you when their statistical model predicts that you’re on the wrong phone plan.

  Government could also Super Crunch some of its enormous datasets to inform citizens about themselves. Indeed, Super Crunching could truly help reinvent government. The IRS nowadays is almost universally disliked. Yet the IRS has tons of information that could help people if only it would analyze and disseminate the results. Imagine a world where people looked to the IRS as a source for useful information. The IRS could tell a small business that it might be spending too much on advertising or tell an individual that the average taxpayer in her income bracket gave more to charity or made a larger IRA contribution. Heck, the IRS could probably produce fairly accurate estimates about the probability that small businesses (or even marriages) would fail. In fact, I’m told that Visa already does predict the probability of divorce based on credit card purchases (so that it can make better predictions of default risk). Of course, this is all a bit Orwellian. I might not particularly want to get a note from the IRS saying my marriage is at risk. (A little later on, we will take on whether all this Super Crunching is really a good idea. Just because it’s possible to make accurate predictions about intimate matters doesn’t mean that we should.) But I might at least want the option of having the government make predictions about various aspects of my life. Instead of thinking of the IRS as solely a taker, we might also think of it as an information provider. We could even change its name to the “Information & Revenue Service.”

  Consumers Fight Back

  Even without government’s help, entrepreneurs are bringing new services to market which use Super Crunching as a consumer advocacy tool. Coming to the aid of consumers, these firms are using data-crunching to counteract the excesses of seller-side price extraction. The airline industry is especially fertile ground for such advocacy, because airlines engage in increasingly bewildering pricing shenanigans—trying to find in their databases any crevice of
an opportunity to enhance their “revenue yield.”

  What’s a consumer to do? Enter Oren Etzioni, a professor of computer science at the University of Washington. On a fateful day in 2002, Etzioni was peeved to learn that the people sitting next to him on an airplane trip had bought their tickets for a much lower price merely because they waited to buy their tickets later. He had a student go out and try to forecast whether particular airline fares would increase or decrease as you got closer to the travel date. With just a little data, the student could make pretty accurate forecasts about whether it was a good idea to buy early or wait.

  Etzioni ran with the idea in a big way. What he did is a prime example of how consumer-oriented Super Crunching can counteract the number-crunching price manipulations of sellers. He created Farecast.com, a travel website that lets you search for the lowest current fare. Farecast goes further than other fare-search sites; it adds an arrow that simply points up or down telling you which way Farecast predicts fares are headed. Even a prediction that the fare is likely to go up is valuable, because it lets consumers know that they should hurry up and pull the trigger.

  “We’re doing the same thing the weatherman does,” said Hugh Crean, Farecast’s chief executive. “We haven’t achieved clairvoyance, nor will we. But we’re doing travel search with a real level of advocacy for the consumer.” Henry H. Harteveldt, a vice president and principal travel analyst at Forrester Research in Cambridge, says Farecast is trying to level the informational playing field for travelers. “Farecast provides guidance, much like a stockbroker, about whether you should take action now, or whether you should wait.”

  The company (which was originally named Hamlet and had the motto “to buy or not to buy”) is based on a serious Super Crunch. In a five-terabyte database, it keeps fifty billion prices that it purchased from ITA Software, a company that sells price data to travel agents, websites, and computer reservation services. Farecast has information on nearly all the major carriers except Jet Blue and Southwest (who do not provide data to ITA). Farecast can indirectly account for and even predict Jet Blue and Southwest pricing by looking at how other airlines on the same routes react to price changes of the two missing competitors.

  Farecast bases its predictions on 115 indicators that are reweighed every day for every market. It pays attention not just to historic pricing patterns, but also to a host of factors that will shift the demand or supply of tickets—things like the price of fuel or the weather or who wins the National League pennant. It turns all this information into an up-arrow if it predicts the price will go up, or a down-arrow if it predicts the price will go down. “It’s like going to the ballet,” Harteveldt says. “We don’t see the many years of practice and toil and blood and sweat and strain that the ballet dancer has experienced. We’re only there in the auditorium watching them as they dance gracefully on the stage. With Farecast, we see the graceful dancing onstage. We don’t see the data-crunching, we don’t really care about the data-crunching.”

  Farecast turns the tera-crunching tables on the airlines. It uses the same databases and even some of the same statistical techniques that airlines have been using to extract money from consumers. But Farecast isn’t the only service that has been crunching numbers to help the little guy.

  There are a bunch of other services popping up that crunch large datasets to predict prices. Zillow.com in just a few months has become one of the most visited real estate sites on the net. Zillow crunches a dataset of over sixty-seven million home prices to help both buyers and sellers price their homes.

  And if you can predict the selling price of a house, why not the selling price of a PDA? Accenture is doing just that. Rayid Ghani, a researcher at Accenture’s Information Technology group, for the past two years has been mining the data from 50,000 eBay auctions to predict the price that PalmPilots and other PDAs will ultimately sell for. He hopes to convince insurance companies or eBay itself to offer sellers price-protection insurance that guarantees a minimum price they’ll receive. Explains Ghani, “You’ll put a nice item on eBay. Then if you pay me ten dollars, I’ll guarantee it will go for at least a thousand dollars. And if it doesn’t, I’ll pay you the difference.” Of course, auction bidders will also be interested in these predictions. Bidcast software that will suggest whether you should bid now or wait for the next item is sure to be coming to a web portal near you.

  Sometimes Super Crunching is helping consumers just get through the day. Inrix’s “Dust Network” crunches data on the speed of a half million commercial vehicles to predict traffic jams. Today’s large commercial fleets of taxis and delivery vans are equipped with global positioning systems that in real time can relay information not just about their position but about how fast they’re going. Inrix combines this traffic-flow information with information about the weather, accidents, and even when schools and rock concerts are letting out, to provide instantaneous advice on the fastest way to get from point A to point B.

  Meanwhile, Ghani is working to use Super Crunching to personalize our shopping experience further. Soon, supermarkets may ask us to swipe our loyalty cards as we enter the store—at which point the store will data mine through our previous shopping trips and make a prediction of what foods we’re running out of. Ghani sees a day when the supermarket will become a food shopping advisor, telling us what we need to buy and offering special deals for the day’s shopping trip.

  The simple predictive power of a good data crunch can be applied to almost any activity where people do the same thing again and again. Super Crunching can be used to give one side an edge in a commercial transaction, but there’s no reason why it has to be the seller. As more and more data becomes increasingly available for free, consumer services like Farecast and Zillow will step forward and crunch it.

  In Regressions We Trust

  These services not only tell you which way the price is going to move, they also tell you how confident they are in their estimates. So with Farecast a consumer might learn not only that the fare is expected to drop, but also that this type of prediction turns out to be correct 80 percent of the time. Farecast knows that it doesn’t always have enough data to make a very precise prediction. Other times it does. So it lets you know not only its best guess, but how confident it is in that guess. Farecast not only tells you how confident it is, but it puts its money where its mouth is. For $10, it will provide you with “Fareguard” insurance—which guarantees that an offered airfare will remain valid for a week, or Farecast will make up the difference.

  This ability to report a confidence level in predictions underscores one of the most amazing things about the regression technique. The statistical regression not only produces a prediction, it also simultaneously reports how precisely it was able to predict. That’s right—a regression tells you how accurate the prediction is. Sometimes there are just not enough historical data to make a very precise estimate and the output of the regression technique tells you just this. Indeed, it gets even better, because the regression tells you not only the precision of the regression equation on the whole, it also tells you the precision with which it was able to estimate the impact of each individual term in the regression equation.

  So Wal-Mart learns three different kinds of things from its employment test regression. First, it learns how long a particular applicant is likely to stay on the job. Second, it learns how precisely it made this prediction. The predicted longevity of an applicant might be thirty months, but the regression will separately report the probability that the applicant would work less than fifteen months. If the thirty months prediction is fairly accurate, the probability that the applicant will work only fifteen months would be pretty small, but for inaccurate predictions this probability might begin to balloon. A lot of people want to know whether they can really trust a regression prediction. If the prediction is imprecise (say because of poor or incomplete data), the regression itself will be the first one to tell you not to rely on it. When was the last time you heard a traditio
nal expert tell you the precision of his or her estimate?

  And finally, the regression output tells Wal-Mart how precisely it was able to measure the impact of individual parts of the regression equation. Wal-Mart isn’t about to report the results of its regression formula. However, the regression output might tell Wal-Mart that applicants who think “there is room in every corporation for a non-conformist” are likely to work 2.8 months less than people who disagree. The prediction associated with the specific question is 2.8 fewer months, holding everything else about the applicant constant. The regression output can go even further and tell Wal-Mart the chance that “non-conformist” applicants will end up working longer. Depending on the accuracy of the 2.8-month prediction, this probability or a contrary influence might be 2 percent or 40 percent. The regression begins the process of validating itself. It tells you the impact of more rainfall on wine, and whether that particular influence is really valid.

  All the World’s a Mine

  Tera mining of customer records, airline prices, and inventories is peanuts compared to Google’s goal of organizing all the world’s information. Google reportedly has five petabytes of storage capacity. That’s a whopping 5,000 terabytes (or a quadrillion bytes). At first, it may not seem that a search engine really has much to do with data mining. Google makes a concordance of all the words used on the Internet and then if you search for “kumquat,” it simply sends you a list of all the web pages that use that word the most times. Yet Google uses all kinds of Super Crunching to help you find the kumquat pages you really want to see.

  Google has developed a Personalized Search feature that uses your past search history to further refine what you really have in mind. If Bill Gates and Martha Stewart both Google “blackberry,” Gates is more likely to see web pages about the email device at the top of his results list, while Stewart is more likely to see web pages about the fruit. Google is pushing this personalized data mining into almost every one of its features. Its new web accelerator dramatically speeds up access to the Internet—not by some breakthrough in hardware or software technology—but by predicting what you are going to want to read next. Google’s web accelerator is continually pre-picking web pages from the net. So while you’re reading the first page of an article, it’s already downloading pages two and three. And even before you fire up your browser tomorrow morning, simple data mining helps Google predict what sites you’re going to want to look at (hint: it’s probably the same sites that you look at most days).

 

‹ Prev