So, at an impasse, the prosecution brought in an instructor in mathematics from the local state college. He explained that he could prove identity through probability, by multiplying together the individual probabilities of each salient characteristic. The prosecution started with these assumptions, written out in a table:
Where do these figures come from? You may well wonder. Could anyone honestly say that exactly this proportion of each population of possible cars, wearers or non-wearers of mustaches, or girls with hair in ponytails would have been likely to pass through San Pedro that morning? And, incidentally, that figure for “interracial couple in car”—was this in distinction to intraracial couples in cars or to interracial couples on the sidewalk? No such questions were raised.
When the premises are flawed, only worse can follow, and it did. The prosecution represented A through F as independent events; but could you, for instance, genuinely assume that “man with mustache” and “Negro man with beard” are independent? Having assumed independence, the prosecution merrily multiplied all these probabilities, coming up with a chance of 1 in 12 million that all these characteristics would occur together.
Even taking the individual probabilities to be correct and the assumption of their independence to be justified, the prosecution is still not on firm ground. If the chance is 1 in 12 million that any one couple will have this combination of characteristics, does that mean you would need to go through 12 million other couples to find a match? Not quite. Let’s look at the problem: we’re trying to establish the chance of a match of all these characteristics between two couples drawn at random out of a population—that is, the chance of a random event occurring at least twice.
Does this sound familiar? We are back in the dice game of the Chevalier de Méré. Had Pascal been miraculously reanimated at the L.A. County Courthouse, his testimony would have been very useful. He could point out that Los Angeles is a big place: once the population from which you draw your couples gets up around 4 million, the probability of two occurrences of these “1 in 12 million” characteristics (using the same power law calculation we used in Chapter 2) actually rises to about 1 in 3.
To give injustice its final varnishing, the prosecution made one of the most fundamental and common mistakes of probability—which, for obvious reasons, is known as the Prosecutor’s Fallacy. This is the leap from the probability of a random match to the probability of innocence: having first assumed that the chance that another couple had the characteristics of the Collinses was 1 in 12 million, the prosecutor then assumed a 1-in-12-million chance that they had not robbed Mrs. Brooks (indeed, he cranked up the odds against innocence to “one in a billion” in his closing statement). Why is this a fallacy? Well, let’s simplify it: what if the only identifying criteria were, say, being black and male? The chance of a random match over the U.S. population is around 1 in 20; does that, therefore, make any given black male defendant 19/20 guilty? Obviously not; but why not? Because there are uncounted ways of being innocent and only one way of being guilty; our legal system presumes innocence not just because the emperor Vespasian decreed it, but because of the underlying probabilities.
Even the fair-minded can fall into the Prosecutor’s Fallacy, however, because they are bamboozled by the way these probabilities are presented. Suppose that we say instead: “Out of every 20 Americans, 1 is black and male. Two hundred people passed the crime scene that day—so we can expect, on average, 10 of them to be black males. Therefore, in the absence of other evidence, the chances that the defendant is the culprit is 1 in 10.” Then, probability evidence would be a help rather than a hindrance. Juries, like doctors, find percentages much less comprehensible than frequencies—and are therefore much more likely to accept the prosecutor’s view unquestioningly when told that the chance that, say, a DNA match is accidental is 0.1 percent rather than 1 in 1,000.
The Collins jury was swept away by the numbers and voted guilty; the conviction was overturned on appeal, and the case is taught in every law school as a primer in errors to avoid—but these are persistent errors which, though chased out the door, return through the window.
In late 1999 Sally Clark, an English lawyer, was tried for murdering her two infant sons. Eleven-week-old Christopher had died in 1996, of what doctors believed at the time was a lung infection; a little over a year later, 8-week-old Harry died suddenly at home. The medical evidence presented at the trial was complex, confusing, sometimes contradictory, and generally inconclusive, revolving around postmortem indications that might suggest shaking, suffocation, or attempts at resuscitation—or might even be artificial products of the autopsy. Sally Clark’s defense was that both children had died naturally. The phrase “sudden infant death syndrome” arose sometime during the pretrial discovery—and with it, probability stalked into the courtroom.
One of the most important prosecution witnesses was Professor Sir Roy Meadow—not a statistician but a well-known pediatric consultant. His principal reason for being there was to give medical evidence and to point out the suspicious similarities between the babies’ deaths.
Meadow gave no probabilities for these apparent coincidences; but he had recently written the preface for a well-constructed, government-sponsored study of sudden infant death syndrome (SIDS) in which the odds of death were calculated against certain known factors (smoking, low income, young mothers). Sally Clark did not smoke; she was well paid and over 27—so the statistically determined likelihood of a death from SIDS in her family was 1 in 8,543. Following the lead of the study’s authors, Meadow went on to speculate about the likelihood of two SIDS deaths appearing in the same family: “Yes, you have to multiply 1 in 8,543 times 1 in 8,543 . . . it’s approximately a chance of 1 in 73 million.”
He repeated the figure and added: “In England, Wales, and Scotland there are about say 700,000 live births a year, so it is saying by chance that happening will occur about once every hundred years.”
The prosecuting barrister pounced: “So is this right, not only would the chance be 1 in 73 million but in addition in these two deaths there are features which would be regarded as suspicious in any event?” Professor Meadow replied, “I believe so.”
The voice, once loosed, has no way to return. Did the professor realize that the numbers he was reciting would not only help imprison someone wrongly for four years—but would bring his own career to an ignominious end?
What was so wrong in what he said? Three things: first, SIDS is not the null hypothesis, the generic assumption if murder is ruled out. Nor is it a specific disease—it is by definition a death for which there is no apparent cause. Meadow himself later pointed this out: “All it is is a ‘don’t know.’” Ignorance, though, is not the same as randomness. We can certainly say that, in the UK, 1 family in 8,543 with no risk factors will, on average, suffer a death from SIDS: that is a statistical figure; it comes from observation. But to postulate that probability is at work—that these deaths result from rolling some vast 8,543-sided die—is not justified. Indeed, where any cause can be identified, such as a genetic predisposition or an environmental pollutant, the probability of two similar deaths in the same family would be much higher.
The second problem was the implication that a low probability of SIDS implied the guilt of Sally Clark. This prosecutor avoided committing his eponymous fallacy, but his use of “in addition” created a strong sense that any further evidence against the babies’ mother merely reduced the already minuscule probability that their deaths could have happened naturally.
The third and most important flaw was that SIDS actually had nothing to do with the case. Sally Clark’s defense team had never claimed that the babies’ deaths were SIDS; it claimed they were natural. The postmortem examinations had revealed that something was wrong: these were not inexplicable deaths, they were unexplained. There was therefore no reason to discuss the probability of SIDS at all—except that the prosecution had assumed it would be the basis of the defense and had therefore spent time and effort
in securing expert testimony on it. Unfortunately, none of these three objections was brought up in the cross-examination of Professor Meadow.
Who can know what will sway the heart of a jury? The medical evidence was complex and equivocal: there was nothing conclusive, nothing memorable. Now, out of a peripheral issue that should not even have been discussed, there arose this dramatic figure—73 million to 1. In his summing up, the judge did indeed warn against excessive reliance on these numbers: “However compelling you may find them to be, we do not convict people in these courts on statistics. It would be a terrible day if that were so.” But convict they did.
Four years later, Sally Clark’s second appeal succeeded. There were evidentiary grounds, but the court also found that: “putting the evidence of 1 in 73 million before the jury with its related statistic that it was the equivalent of a single occurrence of two such deaths in the same family once in a century was tantamount to saying that without consideration of the rest of the evidence one could be just about sure that this was a case of murder.”
In a remarkable development, the Royal Statistical Society itself had issued an announcement to protest against “a medical expert witness making a serious statistical error, one which may have had a profound effect on the outcome of the case.” Sir Roy Meadow was eventually struck off the medical rolls. The police and prosecution guidelines for infant mortality that he had helped develop—popularly described as “one is tragic, two suspicious, three murder”—were scrapped; even the accepted standards for medical evidence came under suspicion. The press, which had reveled in morbid images of monster mothers, swiveled around to attack witchfinder doctors. The odds, always a dangerous way to deal with uncertainty, were reversed.
Yet even had Professor Meadow’s 1-in-73 million calculation been relevant to the case, Bayes’ theorem might have prevented a miscarriage of justice—because it would have made clear what likelihoods were actually being compared. If, in the absence of all other evidence, we agree to odds of 1 in 73 million against two cases of SIDS in the same family, what—using the same methods of calculation—are the odds against two cases of infanticide? One in 8.4 billion. Numbers need not only serve the prosecution; the statistical knife cuts both ways.
The likelihood of seeing Bayes make regular appearances in court is low. Juries are supposed to be ordinary people, using common sense in making their decisions, and judges are naturally dubious about anything that tends to replace common sense with the mysterious mechanism of calculation. The fear is that people may have an inflated respect for what they do not understand and convict an innocent suspect because “you can’t argue with the figures.”
Unfortunately, bad probability has tended to drive out good. There are particular kinds of evidence for which Bayes’ theorem would be a relatively uncontroversial guide for the perplexed—for example, in evaluating identifications from fingerprint evidence, paternity tests, and DNA matching. Here, the “learning term” in Bayes’ equation is not a figure taken from the air: there are solid statistical reasons behind the numbers describing how likely a given identification is, or how reliable a positive result should be, given a genuine association.
Galton first proposed fingerprint evidence as a forensic tool, and more than a century’s experience has failed to disprove the statistical uniqueness of everyone’s individual set. Of course, matching the blurred partial print from the crime scene to the neat inked file card is a different matter—you would expect there to be known error rates that could be included in Bayesian calculation. But no; although all other expert testimony is now required by the Supreme Court to include its intrinsic error rates—the so-called Daubert ruling—fingerprint evidence is presented as absolute: a 100 percent sure yes or no. Indeed, the world’s oldest and largest forensic professional organization forbids its members to make probabilistic statements about fingerprint identification, deeming it to be “conduct unbecoming.”
Even where error rates are permitted, the courts are uneasy. In R. v. Adams, a recent British rape case, a positive DNA match was the only basis for identification; all the other evidence pointed away from the accused. The prosecution’s expert witness gave the ratio of likelihood for the match, given the two hypotheses of guilt and innocence, as “at least 1/2,000,000 and possibly as high as 1/200,000,000.” The defense’s expert witness told the jury that the correct way to combine this ratio with the prior probabilities was through Bayes’ theorem. Thus far, all was uncontroversial.
The defense went on to explain how Bayes’ theorem would combine the probabilities based on the other evidence (that the victim did not recognize the accused in a lineup, that the accused was fifteen years older than the victim’s description of her attacker, that the accused had an alibi for the time of the attack) before applying the likelihood ratio from the DNA match. Using conservative numbers for these independent probabilities results in a prior probability of guilt of around 1/3,600,000. This makes the likelihood ratio for the DNA match a critical question, because if it’s only 1/2,000,000, Bayes’ theorem produces a likelihood of guilt of .36. If it’s 1/200,000,000, the likelihood is .98. Knowing those two numbers should have focused the jury’s deliberation on the key question: what was the real likelihood ratio of the DNA evidence?
Whatever it was the jury took into consideration, Adams was convicted—and the appeals court roundly condemned the use of Bayes’ theorem:The percentages [sic] chosen are matters of judgement: that is inevitable. But the apparently objective numerical figures used in the theorem may conceal the element of judgement on which it entirely depends . . . to introduce Bayes’ Theorem, or any similar method, into a criminal trial plunges the jury into inappropriate and unnecessary realms of theory and complexity deflecting them from their proper task.
One might ask what is the jurors’ proper task, if it is not to use every means to clarify and define how they apply their common sense? But this is the current view: the technique has been forbidden, not because it doesn’t work, but because the court may not understand it. We can continue to err, as long as we err in ways we find familiar.
In 1913, John Henry Wigmore, Dean of the Northwestern School of Law in Chicago, proposed “a novum organum for the study of Judicial Evidence.” The reference to Francis Bacon cut two ways: it implied both that a science of evidence was possible and that it had not yet been achieved—that law still languished in medieval obscurity. Wigmore had come across a sentence in W. S. Jevons’ Principles of Science that summed up the problems of shaping a mass of evidence to produce a just decision: “We are logically weak and imperfect in respect of the fact that we are obliged to think of one thing after another.”
Dean Wigmore set out to arrange evidence not in time, but in space: not as a sequence, but as a network, showing “the conscious juxtaposition of detailed ideas for the purpose of producing rationally a single final idea.” Like all worthy tasks, this meant starting with a clean sheet of paper and a well-sharpened pencil.
From right to left, the hypotheses—the “to be proved’s”—range from those favoring the prosecutor to those favoring the defendant; each stands at the head of a chain of evidence, direct or circumstantial. The symbols at the nodes in the chain plot the 14 different types of fact—including facts as alleged, as seen by the court, and as generally accepted. Each fact can be marked with its inherent credibility, from “provisional credit” through belief and doubt to strong disbelief. Explanatory or corroborative evidence stands next to its relevant fact, drawing off or adding on credibility. Hearsay, contradiction, and confused testimony are neither resolved nor excluded but given their weight and allowed to make their contribution to the “net persuasive effect of a mixed mass of data.”
The ingenuity of Wigmore’s chart is twofold: first, it keeps the whole case under the eye at once. We do not have to run back in search of discarded evidence. Second, it preserves what has been well described as the “granularity” of fact. Our duty, as judge or juror, is to postpone judgment until we hear
all—but that is almost impossible; once we hear what we consider the clinching fact, we tend to measure all further facts against it, rather than weighing all together. The chart dissects and pins out evidence so that we can judge the local relevance and credibility of each fact—including the apparent clincher—before we move on to the case as a whole. As Wigmore said, it doesn’t tell us what our belief ought to be; it tells us what our belief is and how we reached it.
No one, though, seemed willing to learn the complicated symbols, and Wigmore’s novum organum fell lifeless from the press. Insomniac lawyers read it, legal scholars admired it, but it never revolutionized the analysis of evidence. Northwestern’s course in Wigmore charting declined, once the Dean had retired, from required to elective—and then to the hazy limbo of summer school. The parallel with Bacon was closer than Wigmore had thought: it would take sixty years for his ideas, too, to be generally accepted.
Once to every man and nation comes the moment to decide, In the strife of truth with falsehood, for the good or evil side.
Well, actually, it is somewhat more than once: in even the simplest Wigmore chart, we can expect something like 2n moments to decide for n pieces of evidence. It doesn’t scan as well, though; nor does it make it easy for us to maintain an open mind through the journey across that jungle of choices. Computers, however, have no such difficulty: they happily gorge on data; they willingly cross-reference facts; they fastidiously maintain the essential distinction between what is known and the weight attached to it; and they can perform Bayesian calculations perfectly. These abilities have brought computers to the doors of the courtroom and created a new form of investigative agent: the forensic programmer.
Chances Are Page 22