You May Also Like

Home > Other > You May Also Like > Page 24
You May Also Like Page 24

by Tom Vanderbilt


  As Moormann studied the cats, I would study the owners. The woman with cat paw prints painted on her fingernails. The one who apologized as she struggled to restrain her charge. “I was once bit through the fingernail by a Persian,” Moormann said wearily. I cannot help thinking there is some truth in the old cliché of owners taking on the appearance of their pets. The woman clutching an Oriental short-hair, when seen in profile, turned out to have a similarly long, sloping nose. As Moormann stroked the tail of one Persian, I caught its owner absently running her hand through her own hair.

  For Moormann, there are two ways of judging cats. There is the “analytic style,” in which “the whole is the sum of its parts.” Each cat breed has a range of “points” that are assigned for certain attributes: eyes, color, tail. The best cat has the most points across these categories. This seems objective, but the judge, Moormann has written, “forgets that there is no objective measuring device available other than his/her own brain.” In the “holistic style,” by contrast, “the whole can be more than the sum of its parts.” The judge in this case begins with “an ideal mental image of the cat,” and the closer that cat seems to be to that image, the more the judge will like it. “The whole should have something special, something charismatic,” he says. “Something that feels good, but you cannot describe it completely. All the parts fit together and make something additional which is very beautiful.” The danger here, Moormann warns, is that the judge may lose the trees for the forest, overlooking flaws because of the “halo effect” of some larger impression.

  As each cat was hoisted off the table, Moormann would scrawl a number of stars next to each cat’s name. Sometimes he would write “BV,” for “best variety.” The stars are his own scaled system, meant less as some Michelin-style indication of quality than simply as a way to distinguish, and remember, individual cats in what can be a long succession of fur, claws, and arched backs. “If I have too many cats in one day, I am not able to do this; there’s too much interference. Cats are cats.” Judging them is no simpler than herding them.

  —

  Memory may be the most important skill for any judge. A “trained eye” may tell someone where to look. To make a quality judgment, however, means not only remembering what the judge has seen that day but measuring that against all the other cats or figure skaters he has ever seen. We remember what we like, but perhaps even more accurately we like what we remember.

  There are many ways a judge in a competition might be biased. The gymnastics judge who shares a language with a competitor might give that gymnast a higher ranking (and this is why a Dutch judge is at a French cat show). The judge in an American Idol–style show who favors pop might be less enthusiastic about the heavy metal group. Or a judge with a strong personality at a table might sway the group. A Belgian study—and I must note here a distinct Low Countries bias in expert judging research—looked at competitive rope jumping (yes, it exists!). The researchers found that when judges were shown video clips of performances whose scores had been artificially manipulated upward, they voted higher. When the scores were falsely set lower, they followed suit. Judges, it seems, want to be judged well by their fellow judges.

  One of the simplest and most innocent forms of bias, however, is memory itself. It has been found, for example, in various types of competitions, that people who performed later seemed to do better. You might think, as you headed to a job interview or some other competition with a number of candidates, that going later might be a liability. The judges, you would reason, may be tired. They might have already, in effect, decided. And yet, in studies that have looked at everything from classical music competitions to synchronized swim meets, researchers have found a clear and compelling pattern: The later contestants appeared, the higher they scored.

  The Belgian (!) researcher Wändi Bruine de Bruin analyzed several decades’ worth of voting data from the Eurovision Song Contest—an arguably more palatable task than actually listening to all the songs. She first controlled for potential “home advantage.” Not only do German judges, for example, like German acts a bit more; they also like the acts from countries that border Germany a bit more than others. She emerged with another strong, linear relationship: Performers who appeared later were judged higher. “Judges,” she concluded, “may base their final rating of a performance on how well they remember it.”

  In competitions where judges watch all of the contestants before finally issuing scores, this makes intuitive sense. It echoes findings of “primacy bias” and “recency bias” in so-called list memory: We seem to remember the first and last entrants in any kind of list or series. This is either because we shift those items into short- and long-term memory or because the first and last things are themselves distinctive: Nothing comes before or after. There is a reason the “first” of a thing (a car, a pet, and so on) is used as a prompt in computer security questions: It stands out more in your memory than your third. Poets and songwriters do not reminisce over fourth loves.

  What happens when judges make their judgments just after each contestant has made an appearance, when the performance will still be clear in their memory? Curiously, the “later is better” effect seems to show up here as well. In looking at data from the World and European Figure Skating Championships, which are judged on a “step by step” basis, Bruine de Bruin again found an upward, linear pattern of scores, even when the appearance order of contestants was randomly drawn. What was going on? Bruine de Bruin suggests that judges may consider the first performance as its own discrete thing. With each successive performance, however, judges began to look for what was better and different from the previous performance.

  This has been called, after work by the psychologist Amos Tversky, the “direction of comparison effect.” Later performances are compared only with earlier ones; the earlier ones, as they are happening, cannot be compared with later performances. So the scores tend to travel in one direction, with one important qualifier, which I will shortly return to: Judges need to be looking for instances of positive difference.

  Another dynamic troubles serially judged competitions; let us call it “The Best Is Yet to Come Effect.” Scores tend to get more extreme toward the end. Judges may be unsure how good or bad early competitors are and vote conservatively, reserving their strongest judgments for the final entrants. Later contestants, in turn, having seen what they are up against, may be motivated to perform at a higher level. Not uncommon are comments like that of the English gymnast Louis Smith: “If my main rival…goes through his routine and puts in a high score, it gives me the opportunity to think, ‘Okay, maybe I need to try my harder routine.’ ” Athletes may intuit that an eye-catching move, strikingly different from what their immediate rival has done, will net them a higher score. Indeed, one analysis of gymnastics data, taking advantage of the different scores awarded for “difficulty” and “execution” (a system created after the notorious judging fiasco at the 2004 Olympics in Athens), finds what it calls a “difficulty bias.” Even though the two metrics are supposed to be independent, the analysis found that when contestants try harder moves, their execution scores are “artificially inflated.”

  But a novel series of experiments by the German researchers Thomas Mussweiler and Lysann Damisch shows why judging bias, and not simply athletes rising to the occasion, may be behind score inflation. They begin by observing that athletes’ scores tended to be higher “if the preceding gymnast presented a good rather than a flawed performance.” This could simply be athletes adjusting: A gymnast coming on the heels of a terrible performance might decide to “play it safe” and get a respectably high score, rather than “going for broke.”

  Mussweiler and Damisch, however, argue that something else is going on. When we make comparison judgments, we instinctively look for similarities between things, or differences. Typically, we favor similarity—“one of the building blocks of human cognition,” suggests Mussweiler—because sensing similarity is not only extremely useful, but qui
ck and easy (children, after all, are not asked in puzzles to “spot the similarities”). You meet a new person, you immediately think how he or she reminds you of someone you know, not all the ways they are unlike someone you know. Even the search for differences tends to happen after this initial establishing of similarities. But this initial, often subconscious, decision (whether things feel more similar or more different) then goes on to profoundly influence how we feel about those things. When we perceive things to be similar, we tend toward “assimilation”—which will typically make us like something more: A good wine is lifted when it comes after a great wine. But if we emphasize differences among things to be judged, “contrast” will result. Judges will, in essence, be looking for things not to like.

  In another experiment, Mussweiler and Damisch gathered a group of experienced German gymnastics judges and showed them clips of two low-vault routines. Judges were broken into two groups: One saw a high-quality routine, the other a low-quality routine. Then everyone saw a “moderate”—pretty good—routine. The groups were split another way: To one group of judges, the gymnasts in the two routines were both presented as “Australian.” But another set of judges saw “Australian” gymnasts followed by “Canadian” gymnasts (in reality the same gymnasts in both clips). The researchers noted a curious effect: When both gymnasts were “Australian,” the following gymnast benefited, scorewise, by following the good performance, but when he followed the “poor” performance, his score was actually brought down. Being Australian connected him, in the judges’ mind, to the previous performer—good or bad. But when the second gymnast was thought to be “Canadian,” the reverse pattern was found: Now the “Canadian” gymnast got a lower score when he followed a good “Australian” score—and better when he followed a poor one. In other words, the same performance was judged differently depending upon what came before—and how those things were connected by the judges. As much as by the strength of their routines, the gymnasts were being subtly compared by nationality, and they were either suffering or benefiting by the comparison.

  —

  The German gymnastic judges were judging even before they were judging by deciding how similar the two gymnasts were. Even if the fact of noticing the gymnasts’ “different” nationalities was not intended as a qualitative judgment, merely making the observation seems to have influenced how the judges felt about the performance.

  Humans seem to operate under a “similarity bias,” a kind of presumptive desire that people we meet are more like us than not. When we think things are similar, they literally become more similar. In what is known as the “cheerleader effect,” a person asked to rate the attractiveness of individuals gives them a higher score when they are in a group versus when they are alone. Any idiosyncrasies that, in isolation, might trigger someone’s dislike seem, in a grouping, to be averaged out, or less noticeable. For similar reasons, people are rated as more attractive when they are seen in videos versus a static image—because the judgment is not based on one make-or-break image.

  These effects do not show up only in contests. We are making comparisons all the time, and these influence how we feel about things, even ourselves. We seem to make comparisons even when we are not aware we are doing so. In another study by Mussweiler, students were asked “to reflect upon their athletic abilities” for one minute. As they did, images were subliminally flashed on a computer screen for about fifteen milliseconds. While the students did not recall seeing the images of Michael Jordan, Bill Clinton, or others who were shown, the answers they gave on their own athletic ability seemed directly influenced by whom they were unwittingly comparing themselves with. The more “extreme” the comparison—that is, Jordan—the worse they got. But a subconscious glimpse of someone like Bill Clinton seemed to turn them into better athletes. “Participants compared themselves with potential standards,” wrote Mussweiler, “even if they were unaware of them.”

  What we are comparing things with matters. A study by Tversky offered subjects a choice of six dollars or an “elegant Cross pen” (he never mentions the value, but assume it’s more than six dollars). Nearly a third of subjects opted for the pen, with the rest taking the cash. A second group could pick from the Cross pen, the cash, or a second pen that was “distinctly less attractive.” Only 2 percent of the subjects wanted the cheaper pen. Suddenly, though, more people were clamoring for the Cross pen. The presence of the less attractive pen made the more attractive pen even more attractive. The reverse can happen as well. Research of actual speed-dating trials showed that potential daters (men, it turns out) became less interested in dating a woman, no matter how attractive she was perceived to be, when she followed a more attractive woman in the dating rounds.

  The way we are making comparisons also matters. As mentioned earlier, when people are looking for the good things that distinguish each successive option in a list of choices, the later items fare better. But when they are making comparisons based on what is uniquely bad about each choice, suddenly the early option looks better.

  One study presented subjects with a list of attributes of potential blind dates. When the second choice presented had positive qualities that were not shared by the first candidate, the subjects preferred the choice that came later. But when the second choice had negative qualities that were different from the first, they actually preferred the first. As the study’s authors described it, qualities that are shared by the candidates are essentially recalled with equal clarity and thus cancel each other out. What is different about the second candidate suddenly stands out in memory. So what is uniquely good about the second candidate seems better than what is uniquely good about the first; conversely, the negative qualities of the second candidate seem worse than those of the first, so we reverse our preference.

  —

  As the authors of a Carnegie Mellon University study note, “Judging one experience can unduly influence our judgment of subsequent events and thus ‘color’ the entire sequence of experiences.” What we might think of as our fairly hard-set preferences are often subtly manipulated on the fly, like some kind of “choose your own adventure” game.

  Consider “the 11th Person Game.” This is an “admittedly objectifying” thought exercise devised by the interaction designer Chris Noessel. The next time you are in a public place, point to a random doorway and ask a friend to choose one of the next ten people who walk through the door as a potential romantic partner. There are two rules: You cannot return to any previous person you passed up, and if, when the tenth person comes through the door, you have not chosen anyone, the eleventh becomes your de facto choice.

  This is, as you might have noted, a serially judged competition; the fact that you cannot “go back” makes it different from most contests. In fact, as the psychological work on judged competitions shows, it is often hard for judges to “go back” and honestly reevaluate earlier candidates in the face of later ones. It gets even more difficult as the list grows longer and as each new entry “resets” the comparison standard.

  In the beginning of the 11th Person Game, Noessel noted, players tend to robustly reject people. But over time, as the potential eleventh person looms, and choices begin to dwindle, players stop looking for flaws in each new person and start looking for “what’s right about a given person.” The slightly awkward grin becomes an entrancingly winning smile. A person’s preference set, and search strategy, are suddenly reordered by the structure of choice. Standards change.

  In Paris, Moormann was well aware of the potential pitfalls in making comparisons between cats, particularly among a host of entrants that might be, to the average eye, virtually indistinguishable. The first task is to group them into levels: good, very good, excellent. This is a natural “chunking” exercise that helps memory and discrimination. But merely grouping them might make them more similar to each other than they actually are. As Tversky notes, “Similarity serves as a basis for the classification of objects, but it is also influenced by the adopted classificatio
n.” That is, the best “good” cat may not be that qualitatively far from the worst “excellent” cat, but each may be pulled “down” or “up” by being placed in a grouping with others.

  What if there are a number of very good cats that are quite similar? “It is not very easy,” Moormann said with a sigh. Cats are awarded weighted points for various features. The “subdimensions” of one cat, he has written, “are simultaneously compared with all the subdimensions of all the other cats within a group of cats.” This is a “gigantic mental enterprise.” Cat shows occur in real life, with cats that move, owners that kvetch, spectators who gawp, with all the bedlam of the show humming in the background. When one inspects cat after cat, “it seems likely the average judge cannot handle more than three dimensions simultaneously.” Some judges, he suggested, might make choices on “the type of head alone.”

  Hovering over it all is the standard. This is the written description laying out, at formidable length, what each breed should actually look like. As Moormann says, judges are looking for universal qualities—“whether the cat is pleasing, whether the lines are good”—but each breed has its specific qualities. Throughout the afternoon, I paged through the breed standard book by Moormann’s side. This is a curious document. There are achingly specific aesthetic prescriptions: The Chausie may have “some flecking or speckling” that “may occur on the stomach” but, it warns, “not to the degree of belly spots.” With the Burmese, the guide admonishes, “there should be no evidence of obesity, paunchiness, weakness, or apathy.” “Long, whippy tails” are bad for some cats, good for others.

 

‹ Prev