The Enigma of Reason: A New Theory of Human Understanding

Home > Other > The Enigma of Reason: A New Theory of Human Understanding > Page 30
The Enigma of Reason: A New Theory of Human Understanding Page 30

by Dan Sperber


  He unearths inconsistencies in the incriminating arguments. One witness claims to have seen the murder from the other side of the street, through the windows of a passing train of the elevated subway. Another witness claims to have heard the boy threaten his father—“I’ll kill him!”—and then a body fall a few seconds later. But how could the second witness have heard anything amid the deafening sound of the train?

  More inconsistencies emerge in other jurors’ arguments. The boy’s fingerprints cannot be found on the knife. Not a problem for juror four: the boy is a cold-blooded murderer who wiped the knife, still tainted with his father’s blood. The defendant was caught by the police walking home three hours after the crime. Why come back to the crime scene? Juror four has an answer: “He ran out in a state of panic after he killed his father, and then, when he finally calmed down, he realized that he had left the knife there.” But then, points out the more skeptical juror, how do you square that with the fact that “he was calm enough to see to it that there were no fingerprints on the knife?”

  Inconsistencies can take the form of double standards. The boy is from the slums, and juror ten knows they’re “born liars”; “you can’t believe a word they say.” Yet he has no problems accepting the testimony of the witness who claims she saw the boy commit the murder, even though “she’s one of them too,” as our juror points out.

  Sometimes the inconsistency surfaces so blatantly that it doesn’t even need pointing out. Juror three has vehemently defended the guilty verdict, relying in large part on the testimony of the man who said he heard the fight and saw the kid leave the apartment right after. To see this, however, the witness claims that he got up and crossed the length of the whole building in fifteen seconds—something impossible for this old man with a limp. But that is not an issue for our juror three: the witness must have been mistaken in his estimate—after all, “he was an old man, half the time he was confused. How could he be positive about anything?”

  These inconsistencies slowly sway most of the jurors toward reasonable doubt, but not all of them. Juror ten remains impervious to rational considerations. In the end, they must shame him to make him relent. Still, most of the work is done by argumentation. It is argumentation that allows the holes in the prosecution’s case to surface. It is argumentation that highlights double standards. It is argumentation that lays bare inconsistencies. It is argumentation that raises doubt in the jurors’ minds. In 12 Angry Men, argumentation saves a boy’s life.1

  Argumentation Is Underrated

  In Chapters 11 through 14, we have emphasized the “bad” sides of reason, those that have given rise to an enigma in the first place: reason is biased; reason is lazy; reason makes us believe crazy ideas and do stupid things. If we put reason in an evolutionary and interactivist perspective, these traits make sense: a myside bias is useful to convince others; laziness is cost-effective in a back-and-forth; reason may lead to crazy ideas when it is used outside of a proper argumentative context. We have also repeatedly stressed that all of this is for the best—in the right context, these features of reason should turn into efficient ways to divide cognitive labor.

  Crucially, this defense of argumentative reasoning depends on how people evaluate others’ reasons: they have to be sensitive to what the philosopher Jürgen Habermas has called the “forceless force of the better argument.”2 They have to be able to reject weak arguments while accepting strong enough ones, even if that means completely changing their minds. In Chapter 12, we presented evidence that people are good at evaluating others’ arguments, but these experiments still targeted solitary reasoners provided with a single argument to evaluate—a far cry from the back-and-forth of an argumentative exchange.

  It is now time to make good on our promise and to show that in the right interactive context, reason works. It allows people to change each other’s minds so they end up endorsing better beliefs and making better decisions.

  For more than twenty years, Dave Moshman, psychologist and educational researcher, had asked his students to solve the Wason four-card selection task (the one from Figure 17) individually and then in small groups. While individual performance was its usual low—around 15 percent correct—something extraordinary was happening in the course of group discussions. More than half of the groups were getting it right. When Moshman teamed with Molly Geil to conduct a controlled version of this informal experiment, groups reached 80 percent of correct answers.

  It may be difficult for someone who hasn’t read article after article showing pitiful performance on the Wason four-card selection task to realize just how staggering this result is. No sample of participants had ever reached anywhere close to 80 percent correct answers on the standard version of the task. Students at the best American universities barely reach 20 or 25 percent of correct answers when solving the task on their own.3 Participants paid to get it right still fail abysmally.4

  What Moshman and Geil have achieved is the equivalent of getting sprinters to run a 100-meter race in five seconds by making them run together.

  You’d think that such an extraordinary result would get the attention of psychologists. Not at all: it went completely neglected. Perhaps no one really knew what to do with it. The only researchers who paid attention to Moshman and Geil’s result were those whose theories were compromised. They asked for replications5—not unfairly, given the suspicious tendency of many psychology experiments not to replicate.6 While not always as dramatic as in the original experiment, the improved performance with group discussion has proven very robust.7 It also works very well with other tasks, such as the Paul and Linda problem we introduced in Chapter 12.8 Try the experiment with friends, colleagues, or students—it works unfailingly.

  Skeptical researchers also suggested that argumentation had little to do with the improvement in performance. Rather than paying attention to the content of each other’s arguments, group members, they suggested, rely on superficial attributes to decide which answer to adopt. Perhaps people simply follow the most confident group member.9 This alternative explanation makes some sense: confidence can be an important determinant of the outcome of group discussion, for better or worse.10

  This lower-level interpretation, however, offers a very poor description of what happens when groups discuss a reasoning task. Looking at the transcripts, it is apparent that those whose views prevail are not just saying “I know that for a fact” with a confident tone. They put forward one argument after the other.11 We also know that a single participant with the correct answer can convince a group that unanimously embraces the wrong answer, even if she is initially less confident than the other group members.12

  How does the exchange of arguments fare when it’s impossible to demonstrate, in the strict logical sense, that a given answer is correct and that the others are mistaken? When argumentation lacks such demonstrative force, other factors become more relevant in convincing people or in evaluating claims, such as who seems more competent, or how many people support a given opinion. Still, even for problems that do not have a single definite solution, group performance is generally above that of the average group member. In some cases it is even superior to that of the best individual in the group.13 Incidentally, even when groups fail to surpass the answer of the best individual performer, one is still better off going with the answer of the group unless there is a clear way to tell who is the best performer in the first place.

  When Groups Work and When They Don’t

  Skepticism toward group efficiency is not entirely misplaced. When argumentation is not involved, group performance is disappointing. A hundred years ago, the agronomical researcher Maximilien Ringelmann noticed a weird pattern: tractors, horses, and humans seemed to be less efficient when performing a task jointly.14 For instance, people pushed less hard to move a cart when they were doing it together.

  Since the decrease in performance held for machines and animals as well as humans, Ringelmann assigned most of the blame to coordination problems: the s
trength is not applied simultaneously, which decreases the total strength exerted at any given time. However, observing prisoners powering a flour mill, he also noted that motivation could be an important factor for humans: “the result was mediocre because after only a little while, each man, trusting in his neighbor to furnish the desired effort, contented himself by merely following the movement of the crank, and sometimes even let himself be carried along by it.”15 Several decades later, social psychologists would show that such motivational factors are often the main culprits for group underperformance, labeling this phenomenon “social loafing.”16

  Groups can have disappointing performance not only when pooling physical force but also on a variety of cognitive problems. Brainstorming is a typical example. By and large, group brainstorming doesn’t work. In a typical brainstorming session, participants are told not to voice their criticisms, so that they feel free to suggest even wild ideas. This doesn’t work: a brainstorming group typically generates fewer and worse ideas than if the ideas of each individual working in isolation had been gathered.17 By contrast, telling people that “most studies suggest that you should debate and even criticize each other’s ideas” allows them to produce more ideas.18

  That group performance should be disappointing in many domains only makes the successes of argumentation even more remarkable. When people argue, even about seemingly dull mathematical or logical tasks, there is no social loafing or cognitive disruption. Instead, their motivation is increased by the dialogical context. They respond to each other’s arguments and build on them. Many great thinkers have noted the importance of a lively debate to fuel their intellect. Here is Montaigne:

  The study of books is a languishing and feeble motion that heats not, whereas conversation teaches and exercises at once. If I converse with a strong mind and a rough disputant, he presses upon my flanks, and pricks me right and left; his imaginations stir up mine; jealousy, glory, and contention, stimulate and raise me up to something above myself; and acquiescence is a quality altogether tedious in discourse.19

  For a wide variety of tasks, argumentation allows people to reach better answers. The results reviewed so far, however, stem from laboratory experiments conducted in a highly controlled setting with participants who have not met before and will not see each other after the experiment. In the real world, things are different. Problems can be tremendously difficult and may not have a complete and satisfactory solution. Scientists look for the principles that govern the universe. Politicians try to get laws passed in deeply divided and confrontational parliaments. Judges search for a way to give due respect to legitimate but conflicting interests. In these situations, personal biases and affinities interfere or even take precedence. Strongly held convictions and values are attacked and staunchly defended. Does argumentation still have a positive role to play in managing more complex problems and overcoming emotional convictions?

  How to Make Better Predictions

  Prediction is hard, especially about the future. (Ironically this common aphorism also illustrates the difficulty of learning about the past, since it has been attributed, in one form or another, to everyone and their cousin, from Confucius to Yogi Berra.)20 Phillip Tetlock, the expert of expert political judgment, wanted to find out just how hard it is to make good predictions in politics.21 In the late 1980s he recruited 300 political experts, many with PhDs and years of experience, and asked them to do their job: make predictions about political events. Fast-forward fifteen years. The predictions are compared to the actual outcomes. How do the experts perform? Very poorly. They barely beat the proverbial dart-throwing chimp—random answers—and they are easily topped by simple statistical extrapolations from existing data.22 Prediction is hard.23

  In a way, the main problem faced by the experts isn’t so much that they weren’t accurate but that they weren’t aware that they weren’t accurate. The world is a complicated place, and even experts face stringent cognitive limitations on the amount of information they can acquire and process. But that didn’t stop them from making extreme forecasts: the experts were often saying that a given event was nearly certain to happen or not to happen.24 Experts were much too confident in the power of their pet theories to predict the future.

  In line with the experiments on polarization and overconfidence described in Chapter 14, Tetlock found that reasoning biases were responsible for these extreme and, more often than not, mistaken predictions. He observed that when making their predictions, experts have “difficulty taking other points of view seriously”25 and that their “one-sided justifications are pumping up overconfidence.”26

  Reason also creates distortions in the way experts revise their beliefs. When an event happens as predicted by their favored theory, the experts grow more confident. But when things don’t happen as they were expected to, our political experts turn into expert excuse finders. The war that failed to erupt is just about to be declared. A small accident of history prevented their predictions from coming true. Politics is too complicated anyway … 27 Some experts were so skilled at finding excuses that they became even more convinced that their theories were right after the events had proven them wrong.28 Would the experts have been better off if they had been able to talk things over?

  Predictions may have never mattered more to our species than during the Cold War, when the risk of an all-out atomic war was barely prevented by an “equilibrium of terror.” The U.S. Air Force was one of the actors looking for better forecasts about the effects of nuclear war, and for this it turned to the RAND Corporation. Averaging the opinions of several experts offers a simple and efficient solution to improve on their forecasts. Yet two of RAND’s researchers, Norman Dalkey and Olaf Helmer, thought they could do better by giving each expert information about what the other experts answered—such as the average answer, for instance. Experts made new predictions based on this information; the predictions were averaged and again provided to all the participants, who got to make another prediction; and so forth for a few rounds.

  This reiterated averaging technique, known as Delphi, was first used to figure out how many bombs would the Russians have to drop on U.S. industrial targets to reduce their output by three-quarters.29 Fortunately, Dalkey and Helmer never found out if their method had yielded accurate forecasts in this specific case, but since the 1950s, many studies have shown that it can improve a variety of predictions, from defense issues to medical diagnoses.30

  The Delphi method offers several advantages over face-to-face discussions. In face-to-face discussions, providing the best forecast may be less important than pleasing a senior colleague or keeping with the consensual opinion. Face-to-face discussions also require getting a group of busy experts in one room at the same time, which is not always easy to arrange. Delphi’s anonymous questionnaires solve both problems.

  Yet if the argumentative theory is right, the original form of Delphi is missing out on a major way of improving predictions: the exchange of reasons. If Zoe believes Italy has an 80 percent chance of winning the next soccer world cup, and Michael tells her he thinks it’s only 20 percent, what should Zoe do? Balance the two opinions and adjust the odds to 50/50? On average, people put in this situation only go part of the way toward the other opinion. Zoe could settle on 60 percent, for instance. After all, she knows why she thinks Italy should have good odds, but she doesn’t know the reasons for Michael’s opinion.31 If she knew why Michael is giving Italy low odds, she might be more inclined to change her mind.

  In the Delphi method, instead of receiving only an average of others’ forecasts, the experts can also be given the reasons for others’ forecasts. Gene Rowe and George Wright looked at the difference this makes to the predictions.32 As a matter of fact, the reasons did not make the experts change their minds more often. Indeed, they were more likely to cling to their initial opinion than when provided only with averages. But reasoning was not failing; it was merely revealing its discriminating power.

  Not all reasons are good rea
sons. If Michael tells Zoe he thinks Italy has a 20 percent chance to win based on the predictions of an octopus,33 she would be crazy to update her estimate at all. But if Michael tells Zoe he has insider information about the failing health of Italy’s key forward, she might simply adopt Michael’s odds. This is exactly what the participants in Rowe and Wright’s experiment were doing. They were not changing their minds more, but they were doing it more discriminately. They were changing their minds when they should, in the direction they should. And they were making better predictions.34

  Providing a one-time summary of reasons is a good step forward, but it falls short of making the best of reasoning. Reasoning thrives in the back-and-forth of conversation, when people can exchange arguments and counterarguments. Online communication enables groups of experts to exchange arguments at a distance, opening up prospects for even better forecasts.

  Twenty years after his original study of expert political judgment, Phillip Tetlock, together with Barbara Mellers and other colleagues, launched an even more ambitious experiment.35 Over 1,300 participants were recruited and asked to make geopolitical predictions. Participants from a first group worked alone, kept in the dark about other participants’ forecasts so as to maintain the independence of their judgments. Participants assigned to the second group also worked alone, but as in the early versions of the Delphi method, they were provided with statistical information about others’ forecasts.

  The third group was divided into teams of about twenty people who were allowed to discuss the forecasts together, online. Nearly all of their predictions proved more accurate than those of the independent forecasters, and they also beat the second group on two-thirds of the forecasts. Exchanging arguments had allowed them to produce significantly better predictions.

 

‹ Prev