You May Also Like

Home > Other > You May Also Like > Page 9
You May Also Like Page 9

by Tom Vanderbilt


  This idea that we are interested as much in what a person’s review says about him as it does about the restaurant or hotel is, in one respect, not new. Our previous choices were informed either by friends we trusted or by critics whose voices seemed to carry authority. But suddenly the door has been opened to a multitude of voices, each bearing no preexisting authority or social trust. Critics have always been suspected of having their own preferences and biases, but on the Internet a thousand critics have bloomed. The messy, complicated, often hidden dynamics of taste and preference, and the battles over them, are suddenly laid out in front of us.

  —

  The rise of this crowd-sourced aggregate of amateur reviewers is generally seen as an egalitarian blossoming, freeing consumers from the tyranny of individual, elite mandarins, each harboring his own agenda and tastes. “The excising of the expert review is happening right across the board,” declared the writer Suzanne Moore in The Guardian. “Who needs expertise when every Tom, Dick and Harriet reviews everything for free anyway. Isn’t this truly democratic? The nature of criticism is changing, so this hierarchy of expertise is crumbling.”

  One can almost hear the anticipatory echoes of something like Yelp in the context of the Spanish philosopher José Ortega y Gasset’s 1930 tract The Revolt of the Masses. The multitude, he wrote, once “scattered about the world in small groups,” appears “as an agglomeration,” it has “suddenly become visible,” and where it once occupied the “background of the social stage,” it “now has advanced to the footlights and is the principal character.” The disgruntled and disenfranchised diner is now able to make or break a restaurant through sheer collective will. Against this leveling of critical power, the old guard fulminates. Ruth Reichl, the former editor of Gourmet, raised the clarion: “Anybody who believes Yelp is an idiot. Most people on Yelp have no idea what they’re talking about.”

  There are complications with this idea that the Internet has done away with the need for experts and for critical authority. For one, much reviewing energy on Yelp is precisely an effort to establish one’s bona fides. A reviewer for an Indian restaurant in midtown Manhattan lays down a triple claim of authority: “I am a foodie and my love for Indian food (as an Indian) is tough to match. I eat at this restaurant at least once a week. Really innovative mix of ingredients, and yet extremely authentic.” Not only is he a foodie; he is an Indian foodie who, like all true food critics, has eaten here more than once. And we will not unpack that thorny word “authentic.” Slippery though it may be, the word “authentic,” or synonyms thereof, seems to lead to higher ratings for Yelp restaurants.

  Yelp is filled with this sort of signaling, as economists call it, subtle references conferring one’s authority in an effort to rise above the masses of similar reviewers. (“I knew the chef from his previous stint at…” or, “Of all the Henan cuisine places I’ve eaten, this is one of the best.”) It is “conventional signaling”: There is nothing in the signal itself to verify what you are saying beyond the fact that you are saying it. If you wear a T-shirt saying, “I ♥ NYC,” who are we to doubt your ardor? There is little “cost,” in money or energy, involved in the signal; hence, there is little reliability. What is there, actually, to keep these signals from losing all reliability? As Judith Donath argues, their honesty may in essence come simply “because there is little motivation to produce them dishonestly.” There is also little motivation to doubt their veracity. So online, where anonymity rules and, as Donath argues, “everything is a signal,” how can you quickly assess the quality of a review?

  Even as it aggregates its democratic horde, Yelp itself strives to reintroduce hierarchy, through its class of “elite” reviewers. They wear badges—a kind of signal—and are picked by a team known as the Council. “We don’t share how it’s done,” a Yelp spokesperson declared, as if describing the covert hiring of Michelin inspectors. This is a bit of a paradox. We are said to live in a world where traditional expert authority—from the media to the government to the health-care establishment—is now suspect. But have the online review sites (with Amazon’s “Top Reviewer” and TripAdvisor’s “Top Contributor” designations) simply reconstituted a new form of expertise, the curious phenomenon of “lay expertise”?

  How much trust do we put in this new class of experts? When you glance at a restaurant or hotel or book review online, do you simply look at the aggregate number of stars, or do you skim down into the thicket of individual opinions? If the power of online word of mouth comes from the ability to quantify a collective mass of opinion—liberating us from the narrowness of one person’s perspective—what is the value of reading any one review?

  In his Yelp study, Luca reported examples of “Bayesian learning.” In other words, people reacted more strongly to reviews that seemed to have more information. Elite Yelp reviewers, he found, had twice the statistical impact as nonelites. Another group that had an outsized effect on Yelp were users of Groupon, the online coupon site. Groupon users, once on Yelp, write longer reviews that are better liked than those of the average Yelp user, as research has shown. This influence has real weight: They also seem to bring down the average review for a restaurant. Curiously, it is not that they are critical per se. In fact, Groupon users on Yelp, the authors pointed out, are more “moderate.”

  The idea of the masses liberating the objects of criticism from the tyranny of critics is clouded by the number of reviewers who seem to turn toward petty despotism. Reading Yelp or TripAdvisor reviews, particularly of the one-star variety, one quickly senses the particular axe being ground: the hostess who shot the “wrong” look to the “girls’ night” group; the waiter who did not respond with enthusiasm to the cuteness of the diner’s toddler; the “judgmental attitude” of a server; a greeting that is too effusive or insufficiently so; the waiter deemed “too uneasy with being a waiter”; or any number of episodes (these are all actual examples I have gleaned from the site) that have little to do with food. They are labor disputes: between patrons’ capital and the endlessly subjective expectation of what they should receive.

  As so much of the service economy now revolves around “affective labor”—the enforced smiles that organizations induce their employees to give “guests”—evaluations of the “product” turn increasingly subjective and interpersonal. The writer Paul Myerscough has observed, “Work increasingly isn’t, or isn’t only, a matter of producing things, but of supplying your energies, physical and emotional, in the service of others.” For those who feel they did not receive the right kind of emotional energy, Yelp becomes a place to catalog these litanies of complaint. How are we to know the reviewer was not simply having a bad day?

  —

  At the extreme end of the trust problem with online reviews are those that are actually fake: planted by the rival restaurateur, the jealous author, the jilted hotel guest. Nearly one-fourth of Yelp reviews are rejected by the site’s own authenticity filters. The frequency of these false ratings, as Luca and Georgios Zervas have found, tends to follow fairly predictable patterns. The more negative a restaurant’s reputation, or the fewer the reviews, the greater the chance for a false positive review. When restaurants are of a similar type (for example, “Thai” or “Vegan”) and geographically closer, the odds goes up for a false negative review. Similar patterns are observed on TripAdvisor.

  Sometimes the reasons for deception are rather unclear: A study by Eric Anderson and Duncan Simester of one online apparel site found that in 5 percent of all reviews customers had not actually purchased the item (but had purchased many other things at the site). These reviews tended to be more negative, and the authors hypothesized the customers were acting as de facto “brand managers”—a form of that customer “retaliation” that Akerlof described.

  But for whatever reason it is done, how does one know a review is false? Consider these snippets of two reviews:

  I have stayed at many hotels traveling for both business and pleasure and I can honestly stay that The James i
s tops. The service at the hotel is first class. The rooms are modern and very comfortable.

  My husband and I stayed at the James Chicago Hotel for our anniversary. This place is fantastic! We knew as soon as we arrived we made the right choice! The rooms are BEAUTIFUL and the staff very attentive and wonderful!!

  As it turns out, the second of these reviews is fake. A group of Cornell University researchers created a machine-learning system that can tell, with accuracy near 90 percent, whether a review is authentic or not. This is far better than trained humans typically achieve; among other problems, we tend to suffer from “truth bias”—a wish to assume people are not lying.

  To create the algorithm, the Cornell team largely relied on decades of research into the way people talk when they are confabulating. In “invented accounts,” people tend to be less accurate with contextual details, because they were not actually there. Fake hotel reviews, they found, had less detailed information about things like room size and location. Prevaricating reviewers used more superlatives (the best! The worst!). Because lying takes more mental work, false reviews are usually shorter. When people lie, they also seem to use more verbs than nouns, because it is easier to go on about things you did than to describe how things were. Liars also tend to use personal pronouns less than truth tellers do, presumably to put more “space” between themselves and the act of deception.

  But doesn’t the fake example above have plenty of personal pronouns? Indeed, the Cornell team found that people actually referred to themselves more in fake reviews, in hopes of making the review sound more credible. Curiously, the researchers noted that people used personal pronouns less in fake negative than in fake positive reviews, as if the distancing were more important when the lie was meant to sound nasty. Lying in general is arguably easier online, absent the interpersonal and time pressures of trying to make up something on the spot in front of someone. How easy? When I ran my imagined “Hotel California” review through Review Skeptic, a Web site created by a member of the Cornell team, it was declared “truthful.”

  Fake reviews do exist and undoubtedly have economic consequences. But the enormous amount of attention they have received in the media, and all the energy dedicated to automatically sniffing out deceptive reviews, may leave one with the comfortable assumption that all the other reviews are, simply, “true.” While they may not be knowingly deceptive, there are any number of ways they are subject to distortion and biases, hidden or otherwise.

  The first problem is that hardly anyone writes reviews. At one online retailer, it was less than 5 percent of customers—hardly democratic. And the first reviewers of a product are going to differ from people who chime in a year later; for one thing, there are existing reviews to influence the later ones. Merely buying something from a place may tilt you positive; people who rated but did not buy a book on Amazon, as Simester and Anderson discovered, were twice as likely to not like it. Finally, customers are often moved to write a review because of an inordinately positive or negative experience. So ratings tend to be “bimodal”—not evenly distributed across a range of stars, but clustered at the top and the bottom. This is known as a “J-shaped distribution” or, more colorfully, the “brag and moan phenomenon.”

  The curve is J-shaped, not reverse-candy-cane-shaped, because of another phenomenon in online ratings: a “positivity bias.” On Goodreads.​com, the average is 3.8 stars out of 5. On Yelp, one analysis found, the reviews suffer from an “artificially high baseline.” The average of all reviews on TripAdvisor is 3.7 stars; when a similar property is listed on Airbnb, it does even better, because owners can review guests. Similarly, on eBay, hardly anyone leaves negative feedback, in part because, in a kind of variant of the famed “ultimatum game,” both buyer and seller can rate each other. Positivity bias was so rampant that in 2009 eBay overhauled its system. Now vendors, rather than needing to reach a minimum threshold of stars to ensure they were meeting the site’s “minimum service standard,” needed to have a certain number of negative reviews. They had to be bad to be good.

  A few years ago, YouTube had a problem: Everyone was leaving five-star reviews. “Seems like when it comes to ratings,” the site’s blog noted, “it’s pretty much all or nothing.” The ratings, the site’s engineers reasoned, were primarily being used as a “seal of approval,” a basic “like,” not as some “editorial indicator” of overall quality (the next most popular rating was one star, for all the dislikers). Faced with this massively biased, nearly meaningless statistical regimen, they switched to a “thumbs up/thumbs down” rating regimen. Yet the binary system is hardly without flaws. The kitten video that has a mildly cute kitten—let us be honest, a fairly low bar—is endowed with the same sentiment as the world’s cutest kitten video. But in the heuristic, lightning-fast world of the Internet, where information is cheap and the cost of switching virtually nil, people may not want an evaluation system that takes as much time as the consumption experience. And so all likes are alike.

  And then there is the act of reviewing the review—or the reviewer. The most helpful reviews actually make people more likely to buy something, particularly when it comes to “long tail” products. But these reviews suffer from their own kinds of curious dynamics. Early reviews get more helpfulness votes, and the more votes a review has, the more votes it tends to attract. On Amazon, reviews that themselves were judged more “helpful” helped drive more sales—regardless of how many stars were given to the product.

  What makes a review helpful? A team of Cornell University and Google researchers, looking at reviewing behavior on Amazon.​com, found that a review’s “helpfulness” rating falls as the review’s star rating deviates from the average number of stars. Defining “helpfulness” is itself tricky: Did a review help someone make a purchase, or was it being rewarded for conforming with what others were saying? To explore this, they identified reviews in which text had been plagiarized, a “rampant” practice on Amazon, they note, in which the very same review is used for different products. They found, with these pairs, that when one review was closer to the stars of all reviews, it was deemed more helpful than the other. In other words, regardless of its actual content, a review was better when it was more like what other people had said.

  —

  Taste is social comparison. As Todd Yellin had said to me at Netflix, “How many times have you seen someone in an unfamiliar situation—like ‘I’m at an opera and I’ve never been before’—they’ll look right, they’ll look left, they’ll look around. ‘Is this a good one?’ ” When the performance is over, whether a person joins in a standing ovation may have as much to do with what the surrounding crowd is doing as with whether he actually liked it. By contrast, when we cannot see what someone has chosen, as studies have shown, odds are we will choose differently.

  Small wonder, then, that on social media, where the opinion of many others is ubiquitous and rather inescapable, we should find what Sinan Aral, a professor of management at MIT, has called “social influence bias.” Aral and his colleagues wanted to know if the widespread positivity bias in rating behavior was due to previous ratings. How much of that four-and-a-half-star restaurant rating is down to the restaurant itself, and how much to previous people voting it four and a half stars? Does that first Instagram “like” lead to more likes than a picture with no likes?

  So Aral and his colleagues devised a clever experiment, using a Digg-style “social news aggregation” site where users post articles, make comments on articles, and then “thumb up” or “thumb down” those comments. They divided some 100,000 comments into three groups. There was a “positive” group, in which comments had been artificially seeded with an “up” vote. Then there was a “negative” group, where comments were seeded “down.” A control group had no comments.

  As with other sites, things kick off with an initial positivity bias. People were already 4.6 times more likely to vote up than down. When the first vote was artificially made “up,” however, it led to an e
ven greater cascade of positivity. Not only was the next vote more likely to be positive, but the ones after that were too. When a first comment was negative, the next comment was more likely to also be negative. But eventually, those negatives would be “neutralized” by a counterforce of positive reviewers, like some cavalry riding in to the rescue.

  What was happening? The researchers argued that up or down votes per se were not bringing out the people who generally like to vote up or down. It was that the presence of a rating on a comment encouraged more people to rate—and to rate even more positively than might be expected. Even people who were negative on control comments (the ones that had no ratings) tended to be more positive on comments seeded with a “down” vote. As Aral describes it, “We tend to herd on positive opinions and remain skeptical of negative ones.”

  The stakes are higher than just a few clicks on Digg temporarily lifting an article above the tide. An early positive review, authentic or not, can send a subtle ripple through all later reviews. Aral’s study found that seeding a review positively boosted overall opinion scores by 25 percent, a result that persisted. Early positive reviews can create path dependence. Even if one went through and removed false reviews, the damage would have been done; those reviews might have influenced “authentic” reviews. “These ratings systems are ostensibly designed to give you unbiased aggregate opinion of the crowd,” Aral told me. But, as with that standing ovation, can we find our own opinion amid the roar of the crowd?

 

‹ Prev