You May Also Like
Page 8
Assigning stars to a cultural product is itself a curious—and long contentious—enterprise. It seemed to kick off in books, actually, with Edward O’Brien’s inaugural edited volume, The Best Short Stories of 1915. As he described in an introduction, the stories he selected “fell naturally into four groups” (my italics). These were denoted by asterisks, the more the better, all the way up to three (for stories that deserve “a position of some permanence in our literature”). With the vision of the disinterested critic, he declared, “I have permitted no personal preference or prejudice to influence my judgment consciously for or against a story” (later in the book we shall see just how difficult that is). O’Brien’s star system—and indeed the very act of choosing the “best” stories of the year—itself came under some withering criticism.*1 Reviewing The Best Short Stories of 1925, a critic for The New York Times, chiding O’Brien’s “dogmatic” valuation system, declared, “A great many people will believe almost anything that any one tells them positively enough.” Star history gets a bit murky but seems to finally pop up in film in a review by Irene Thirer in the July 31, 1928, edition of the New York Daily News. She writes, “Judging movies via the star system, as we’re going to do henceforth as a permanent thing”—implying it was already under way. She then pans Port of Missing Girls with a single star.*2
People have been quibbling over stars ever since. One obvious problem is that because people’s tastes are different, what one person thinks is a three-star movie may be for you a five-star flick. This is why Netflix distinguishes between the overall number of stars and the metric “Our best guess for you.” This lays taste right out on the table: You liked this movie 0.7 more than others. While we might take this to be some purer expression of “our” taste, one complication is that, as with all recommendation engines, that number is partially derived from what other people are doing. Another problem is that you may just rate differently—with a high or low bias—regardless of what you actually thought about the movie. “Some people I know are very selective on giving high ratings,” says Amatriain. “So two or three stars for them is not necessarily a bad rating.”
This points to something interesting about Netflix and its ratings. Perhaps as a holdover from the days when we received our opinions largely from reviewers, who had their own rating systems, we might think of a star rating as a kind of stable measure of quality, or at least of one’s taste. At both the individual and the aggregate level, however, Netflix stars are far from fixed. Rather, they are like free markets: prone to corrections, bubbles, hedges, inflation, and other forms of statistical “noise.”
In early 2004, to take one case, there was a “sudden rise in the average movie rating” on Netflix. Did Hollywood films suddenly get better? Actually, the recommendation system did. “Users are increasingly rating movies that are more suitable for their own taste,” wrote Yehuda Koren, a researcher who participated in the Netflix Prize. In other words, the movies got better because they were chosen by more people who thought they were better. Depending on how you look at it, this could be thought of as a kind of selection bias—the people who were likely to like a movie were rating it more favorably—or as a kind of market equilibrium in taste: People were more accurately finding the movies (that is, the supply) they were more likely to like (that is, the demand).
Things are even messier at the individual level. Ask someone to re-rate a movie he has already seen, and more likely than not he will rate it differently. Simply by altering a user’s initial rating, experiments have shown, you can affect how that same person re-rates it later. People seem to rate things differently when they rate a bunch of films en masse (training their algorithms) versus a single film. People rate television shows differently than films. “The average rating on a TV show tends to be much higher than on a movie,” Yellin said. Has television gotten better than film? “My intuition is that there’s selection,” he said. “Who’s likely to rate The Sopranos? Not someone who watched five minutes and didn’t like it because it wasn’t really part of their life. It’s the person who committed to it and spent a hundred hours of their life watching it.” On the other hand, “who will rate Paul Blart: Mall Cop? It might not be a very good movie, but it’s ninety minutes long. Your bar or criteria might be different.”
Similarly, the same movie seen on streaming versus DVD might have different ratings. “Especially if a movie is much more visceral,” Yellin said—like a “very emotional” Spielberg title. “It’s going to have an impact on you, but that impact might be ephemeral. So if you rated it right at the credits, you might give it a higher rating. A week later, it might not have that effect on you.” Watching a movie alone might yield a lower rating than watching a movie with enthusiastic friends.
And so on. “I was deep into the ratings game for years,” Yellin said gravely, sounding like a jaded gangster reflecting on his unsavory past on the streets. I sensed he was striving for some purity in those ratings, a Platonic ideal of what we like. “You question how much hair I have? I tore my hair out trying to understand these kinds of things.” Ratings, in the end, were not as potent a signal of what people would watch as one might think. Neither are things like gender and geography. “If you know nothing else, it will help a tiny bit,” Yellin said. “But if they watch five things on Netflix, we will know magnitudes more about them than age, gender, where they live.” You are what you watch.
—
All this talk of how ratings have been deemphasized does not mean that recommendations are any less important. They are indeed more central than ever to Netflix’s algorithmic work, driving some 75 percent of all viewing.
Now, though, they are more implicit. Rather than tell you what you like, Netflix now in essence shows you what you like, in “personalized” rows whose architecture has essentially been created by your own behavior. “Everything is a recommendation,” as Amatriain liked to say of the new, “beyond the five stars” thinking. Even searching for things—a sign that “we are not able to show them what to watch”—feeds into the recommendation engine. Knowing what you are looking for betrays what you might like. Doing anything on Netflix is itself a kind of meta-recommendation: The site, like much of the Internet, is one big constant experiment in preferences, a series of “A/B tests” you probably participated in without being aware. Did moving the search box to the left or the right of the online shoe retailer lead you to buy more products? Did putting a row on your splash page titled “Foreign Dramas from the 1980s” get you to watch more foreign dramas from the 1980s?
The rows reflect a kind of middle ground between two extremes of signals that in and of themselves are not wholly useful: The first is your stated likes. These can lead into a kind of taste cul-de-sac, full of obscure, interesting films that you rarely get around to watching. “Overfitting” is the algorithmic word: The engine makes recommendations that are, in a sense, too perfect—and perfectly sterile.
The second is popularity. This is the antithesis of “personalization,” Amatriain told me; then again, if you are trying to optimize consumption, “a member is most likely to watch what most others are watching.” This can lead to the Shawshank Redemption Problem, or the rather superfluous recommendation of something the whole world has seen. The Shawshank Redemption is Netflix’s highest-ever-rated film, a film so universally lauded on the site it has almost no predictive power beyond its own seemingly inherent likability. “People love that film all over the frickin’ place,” Yellin marveled, shaking his head.
Perhaps as a concession to the inexorable noisiness of human taste, Netflix does not rely entirely on the behavior of users themselves to make recommendations. It also has a paid army of human “taggers” erecting a labyrinth of cinematic meta-data. Rather than trying to figure out what makes two people’s taste similar, Netflix has found it is often easier to ascertain what makes two films similar. This can lead to curious discoveries. The presence of the director Pedro Almodóvar may forge a link between two films, no matter how differen
t they may be, where nothing else would. But meta-data by themselves can mislead. Recommending Dogville—a film as polarizing as Napoleon Dynamite—to people who watched The Hours or Moulin Rouge, simply because Nicole Kidman was in both of them, could be disastrous.
But meta-data can also tease out things we might not have discovered ourselves. The often quirkily specific, human-generated genre rows remind us, as I have noted, of how categories can influence our preferences. We like things as something, even if, with a film like The Big Lebowski, it can take a while to figure out what “it” is. Netflix’s quirky genres try to shape meaning from what might otherwise seem capricious suggestions. “Recommendations can be too out-there,” Yellin said. “You’re like, ‘Wow, why would it say that just because I rated Raise the Red Lantern five stars that I’m going to really like this Japanese kids’ movie?” Yellin pointed to his laptop. On his Netflix page was an array of recommendations: Gomorrah, Valhalla Rising, Enter the Void, and Un Chien Andalou. They were all contained in a genre dubbed Mind-Bending Foreign Dramas. “I got psyched looking at this,” he said, “but if you had shown it to me without any context, it might not be as compelling.” As the writer Alexis Madrigal described it, “It’s not just that Netflix can show you things you might like, but that it can tell you what kinds of things those are.”
That these two things can influence each other is not only one of the curious forms of quantum entanglement found in the Big Data of recommendation systems but a fact of human taste.
EVERYONE’S A CRITIC: LET A THOUSAND KVETCHES BLOOM
My husband and I found this “off-the-beaten-path” place one night while driving on a dark desert highway. Our room was a bit dated (mirrors on the ceiling LOL!) but we were pleasantly surprised to find that we had been upgraded—our room even had champagne on ice! But the place has a serious noise problem: we were woken up in the middle of the night by voices coming from somewhere down the hallway. While I would agree with the previous reviewer that it is “Such a Lovely Place!” I have very mixed feelings. The worst thing, however, were the checkout policies, which I found to be completely unacceptable.
You may recognize the above as my mashing up of two familiar narratives: the lyrics to the Eagles’ “Hotel California” and a review on the travel Web site TripAdvisor.com. You know “Hotel California” because you have heard it to death on FM radio. And if you have spent any time on TripAdvisor.com, you will, after reading the twenty-eighth review of a hotel, have begun to absorb its gentle cadences: the casual, confessional tone; the banter with other reviewers; the personality that seems to come across at once as both the relatable everyman being wronged and the aggrieved diva with a heightened sense of entitlement. Then there is the “but”—a hallmark of the “speech act” known as a complaint. As the linguist Harvey Sacks once noted, complaints tend to follow a standard pattern: “a piece of praise plus ‘but’ plus something else.” The praise typically comes first, as if to say, “This is not me being unreasonable.”
Reading these sorts of reviews, I cannot help but wonder, where did people, previously, before the Internet and social media, channel this torrent of opinion? If the hotel shower’s water pressure was not quite to one’s liking, where was there, besides the captive audience at the front desk, to channel this disquietude? Then, as now, a person having a poor experience might simply have vowed never to visit the place again. He could have told friends and family about this experience, and this casual griping might have rippled out to a few people. But how could he warn that stranger, down the road, heading toward the proverbial Hotel California, that it might not be worth her money?
It may already seem difficult to remember, but in the days before the Internet, and then smartphones, to do something like eat at an unknown restaurant meant relying on a clutch of quick-and-dirty heuristics. The presence of a lot of truck drivers or cops at a lonely diner was a supposed claim to its quality (though it might simply have been the only option around). For “ethnic” food, there was the classic “We were the only non-[insert ethnicity] people in there.” Or one spent anxious minutes on the sidewalk, under the watchful gaze of the host, reading curling, yellowed reviews from local weeklies, wondering if the opinion of a critic who passed by one afternoon in 1987 still held.
We lived in an information-poor environment. To choose a hotel in an unfamiliar city, we might have paged through a guidebook. But what if that guidebook only covered a few hotels and was not recently updated? We might have relied simply on brands: I stayed at this hotel in Akron, so I will stay at the one in Davenport. But what if the Akron one was much better run?
“The difficulty of distinguishing good quality from bad is inherent in the business world,” wrote the economist George Akerlof. His famous “lemon problem” took the used-car market as the quintessential case of information asymmetry: The seller knew much more about the quality of the car than the buyer. This could lead to the buyer’s being cheated. Because of that very danger, however, the price the seller was able to offer could be depressed. Brands, for Akerlof, were one way for consumers to “retaliate” in the face of a poor product, by not giving it their business, across the board, in the future. And chains, brands writ upon the landscape, could offer that same assurance. The customer knew what to expect. However modest that expectation was, it was better than having an expectation violated.
Eating at the chain restaurant on the highway posed its own information problem. “The customers are seldom local,” Akerlof wrote. “The reason is that these well-known chains offer a better hamburger than the average local restaurant; at the same time, the local customer, who knows his area, can usually choose a place he prefers.”
Let us call this the “lemon chicken problem.” That local consumer, having more information than you, was always going to eat better. In an information-poor environment, you could settle for a series of blandly average experiences but never find that one transcendent place to which, as travel magazines like to say, “locals flock.” Of course, by the time tourists began to flock, that business probably began caring less about what locals thought, and maybe the quality slipped—because how many would be back anyway? In a pinch, you could simply go with your gut. Sometimes you left clutching it.
The arrival of Web sites like Yelp and TripAdvisor and Amazon fundamentally altered things. That mold on the shower curtain in room 224? Tell the world about it! That hidden place on Route 51 that made the amazing doughnuts? A touch of GPS, an aggregate of “user-generated content,” and you were suddenly privy to an experience you might have previously missed.
That “electronic word of mouth” can move markets is beyond question. On Amazon.com, a National Bureau of Economic Research study has found, an increase in the “average star” of a book gives that book a “higher relative share” of all books on the site. A group of researchers in Ireland, meanwhile, found a “TripAdvisor effect”: After the service was introduced in Ireland, hotels’ TripAdvisor aggregate ratings rose over a two-year period. Hotels were either responding to online feedback or indeed trying to earn higher ratings. Either way, guests got better rooms. Hotels in Las Vegas, meanwhile, where TripAdvisor was already known, saw no change. Like some version of the “efficient market” hypothesis, all information had already been “priced” into the Vegas hotels’ reviews.
On Yelp, Michael Luca, an economist at Harvard University, found in the Seattle market that a one-star increase in a restaurant’s rating lifted its revenue as much as 9 percent. The effect was “being driven entirely from independent restaurants.” This makes sense: Because chains, in essence, fill in the gaps in word of mouth, they do not depend on it for their business. What could you say about a chain that someone would not already know? Does the world care if you do not happen to like the secret sauce on a McDonald’s Big Mac? No—because billions of others apparently do.
Luca also found that chains, after Yelp was introduced in the market he was studying, began to lose market share to independent restaurants. Pict
ure Akerlof’s prototypical customer in 1963, eating his slightly better-than-average hamburger at a roadside chain, magically granted a smartphone: Suddenly he could learn where to get a great hamburger. As Luca notes, the “utility” of going to an independent restaurant was higher. Eaters had nothing to lose but the chains. It was that “efficient market” again: When all “known information” has already been built into stock prices (or restaurant ratings), amateur investors (or amateur diners) can do as well as experts. One might even argue that Yelp, and the broader transmission of online taste, have helped drive the emergence of better chain restaurant options.
But electronic word of mouth introduces its own problems. Instead of a paucity of information, you may these days encounter the reverse problem: too much information. You wade into a Yelp entry for a simple diagnosis of whether a place is worth your money. You find zigzagging polarities of experience: A meal was “to die for”; the same meal was “pretty lame.” Or you find yourself pulled into the narrow channels of people’s proclivities—a dislike of the music, a digression into the flatware design. Having sifted through a morass of reviews, you may begin to feel a kind of hangover. Either you quit the place altogether, or by the time you arrive, you already feel weighted by a certain exhaustion of expectation, as if you had already consumed the experience and were now simply going through the motions.
Reading through the reviews of a restaurant, you may find yourself reviewing the reviewers. For as important as the question of whether they liked it is, Are they like us? One looks for signals of authority and a shared outlook. A red flag for me, for example, is the word “awesome.” It is not simply that I think the word has lost most of its connotation. It is that I place less trust in the opinion of someone who uses it (for example, “awesome margaritas”*3—and you may trust me less for not trusting it. The word “anniversary” or “honeymoon” in a review portends people with inflated expectations for their special night. Their complaint with any perceived failure by the restaurant or hotel to rise to this solemn occasion is not necessarily ours. I reflexively downgrade reviewers writing with syrupy dross picked up from hotel brochures (“It was a vision of perfection”) or employing such trite abominations as “sin-fully delicious!”