But is this really true? It is only true if all men are equally likely to have committed the crime. For example, that they all had the same access to the crime scene.
The choice of appropriate comparison population also matters. How did the expert witness estimate the random match probability? What is the real prevalence of this profile? And does the random match probability mean that this profile would occur only once in 20,000 individuals? No, the calculated frequency is only an estimate that may be wrong in either direction.
DNA evidence is also easier to plant at a crime scene than for example fingerprints (DNA evidence is easier to manufacture or distort). In Scientific Conversations by Claudia Dreifus, forensic mathematician Charles Brenner says about the O.J. Simpson case and DNA evidence: "The defense did something very clever from the DNA point of view: They said the evidence was planted. Their basic strategy was even if it matches, it was a plant. They gave up on the
172
strategy of disproving the DNA evidence. There obviously was a match in the blood. They never denied it."
In the O.J. Simpson trial, the defense argued that fewer than one out of 1,000 wife abusers kill their wives. Therefore evidence of abuse is irrelevant and should not be admissible in a murder trial. But the appropriate probability is not the probability that a man who abuses his wife kills her. The relevant comparison population to consider is wives who have been abused by their husbands and thereafter been murdered by someone. The relevant question is therefore: What is the probability that a man killed his wife given that he abused her and given that she was actually killed by someone? And Nicole Brown Simpson was killed, not just abused.
John Allen Paulos says in Innumeracy that given reasonable facts about murder and abuse, it has been shown that if a man abuses his wife or girlfriend and she is later killed, the abuser is the killer more than 80 percent of the time. But this doesn't mean that the probability that the husband or boyfriend is guilty of murder is 80%. It is just one piece of evidence among many that needs to be considered.
Do errors happen in testing?
Yes, one type of error is a false positive and another is a false negative. A false positive is akin to a false alarm. A false negative is the missing of a real effect. For example, some factors that can influence the reliability of medical test results and cause false positives are: the clinical accuracy of the test method (compared with some "gold standard"), patient preparation, medical conditions, medications and laboratory errors. People may also make errors when collecting and handling samples, in their interpretation of test results or in reporting test results correctly.
john tested positive for a rare disease with a mortality rate of BO%. How scared should he be?
What is the chance that anyone (a person chosen randomly and belonging to the same risk group as John) actually has the disease given that they tested positive? The predictive value of the test depends both on the clinical accuracy of the test and prior probabilities or the proportion of individuals with the disease in the population we are testing at agiven time (prevalence). Clinical accuracy is composed of sensitivity (the frequency of positive test that results in positive samples) and
specificity (the frequency of negative test that results in negative samples).
Assume a population of 100,000 people.
The frequency of people with the disease in the population is 0.1 % i.e. one person in 1,000 has the disease. Before the test John had a 0.1% chance of having
173
the disease and a 99.9 % chance of not having the disease. If the test was 100% accurate, 100 people should test positive and 99,900 should test negative. These are the prior probabilities.
The test has a 97% sensitivity or true positive rate. This means that 97 out of 100 people with the disease correctly test positive. It also means that 3 people out of 100 with the disease wrongly test negative (false negatives).
The test has a 95% specificity or true negative rate. This means that 95 out of 100 people without the disease correctly test negative. 5% of the time the test is incorrect. 5% of the people without the disease or 4,995 people wrongly test positive (false positives).
Since John is told he tested positive, the information he needs is the frequency of people that test positive and have the disease (true positives) and the frequency of people that test positive but don't have the disease (false positives).
Test positive
Given disease 97 (100 X 0.97)
Given not disease 4,995 (99,900 X 0.05)
Total 5,092
Out of every 1,000 people belonging to the same risk group as John, we can expect that 19 people have the deadly disease given they test positive (97/5,092). The probability that John has the deadly disease given he tested positive is about 1.9% or very low. Out of 5,092 tests, most are false positives indicating the disease when there is no disease.
What if a randomly tested person tests negative? There are 3 false negatives and 94,905 true negatives, meaning that there is more than a 99.9% chance that the person doesn't have the disease.
The label "tested positive" can be scary but remember that the test is not the disease. A test could fall in the group of false positives. But which is worst? To belong to the group of false positives - diagnosed as having the disease without having it, or false negatives - diagnosed as not having the disease but having it? The higher the prior probability or the more common a disease, the more reliable the outcome of the test. Conversely, the lower the prior probability or the more rare the disease, the less reliable the outcome of the test. Even a highly accurate test can yield an unreliable result if it tests for a rather uncommon disease. This assumes that the individual tested does not belong to a group of
people at higher risk of having the disease.
Ask: What is the frequency of people with the disease in the relevant comparison population before I consider specific case evidence? How accurate is the medical test?
174
The above reasoning can be used to help evaluate the reliability of diagnostic tests or screening procedures. Some examples are screening for or diagnosis of breast cancer, prostate cancer, colorectal cancer, HIV or drug use.
Estimating the frequency of false positives and false negatives are also important when evaluating the reliability of polygraph tests (used in criminal investigations or for screening employees) and identification systems.
In polygraph tests, false positives occur when innocent persons are found deceptive. False negatives occur when guilty persons are found non-deceptive.
In identification systems, a false positive occurs when a system accepts a match where there is none. A false negative occurs when a system fails to recognize a match where there is one.
The probability of false positives is also a factor to consider when evaluating the value of DNA profile evidence. This means that the jury in Bill's case also needs to consider the false positive rate. The jury needs to ask: What is the probability that the laboratory reports a match between two samples that don't match? A reported match doesn't necessarily mean a true match. Errors happen. A possible explanation for the forensic match may be error due to contamination (accidental or deliberate), mishandling the evidence, or switching the samples. For example, in one rape case, technicians from the Houston police crime laboratory told the jury that they found a DNA match between a rapist's DNA and a male suspect. The man was convicted in 1999 and sent to prison for 25 years. In 2003, the Houston Police Department said that the DNA was not from the convicted man.
When evaluating case evidence we must consider the prior probability, the
probability of a random match, and the probability of a false positive.
175
- NINE -
MISREPRESENTATIVE EVIDENCE
Conditions, environments and circumstances change People like to look for systems that have worked over the past 20 years or so. Ifyou could make money based on what has worked
the past 20 years, all of the rich
est people would be librarians.
- Warren Buffett
Bertrand Russell said in Problems of Philosophy: "The man who has fed the chicken every day throughout its life at last wrings its neck instead." Often the past is a good guide to the future - but not always. Statistics are a record of the past, not a prediction of the future. We can't automatically assume that the future will mirror the past. Processes and circumstances change. Warren Buffett says: "Conditions relating to technology and all aspects of human behavior can make the future a lot different than the past."
We need to consider changes in conditions before using past evidence to predict likely future outcomes. For example, it's in the nature of businesses and economic conditions to change. Competition and demand changes. If there are more ways for creating competition or less demand, we have to change the equation. Ask: Why was past experience the way it was? What reason is there to suppose that the future will resemble the past? Has the environment changed? Are the conditions similar? Are the context and circumstances that caused the past still present?
We also make mistakes if we ignore that past performance may have been achieved under far different circumstances than today. As Warren Buffett says, "The same mistake that a baseball manager would were he to judge the future prospects of a 42-year-old center fielder on the basis of his lifetime batting average." Management performance may also be conditioned on environment. What makes an individual successful in one environment does not guarantee success in
another. Ask: What is the company's or manager's ability to handle adversity?
"We can sell more if we market the illness rather than the drug,"said the manager of TransCorp's pharmaceutical division.
Is the frequency of a disease really increasing? We need to consider other factors
176
before we conclude that the frequency of an event has changed. For example, a disease may be more correctly diagnosed than it was in the past. Often we only see what we have names for - a disease that was formerly classified as "disease X" or "cause unknown" may now be re-classified or get a name. There may also be technological improvements in the collection and reporting of data. There may also be business incentives at work. For example, widening a market by creating a new condition, redefining adisease or exaggerating a minor one, thereby having more people labeled as having a disease.
If conditions change, we must update our assumptions to reflect the present
environment. Before we use the change as evidence for what is likely to happen, ask: What has changed? Are there more ways for some undesirable event to happen? Is the change permanent or temporary?
The single case or unrepresentative samples
Four out of five doctors recommend the drug.
This statement doesn't tell us anything if we don't know how many doctors were observed. Maybe it was just 1O; an observation that can't be extrapolated to include all doctors. A small sample size has no predictive value. The smaller the sample is, the more statistical fluctuations and the more likely it is that we find chance events. We need a representative comparison group, large enough sample size, and long enough periods of time.
Small samples can cause us to believe a risk is lower or higher than reality. Why? A small sample increases the chance that we won't find a particular relationship where it exists or find one where it doesn't exist.
Charles Munger gives an example of the importance of getting representative data - even if it's approximate:
The water system of California was designed looking at a fairly short period of weather history. If they'd been willing to take less perfect records and look an extra hundred years back, they'd have seen that they weren't designing it right to handle drought conditions which were entirely likely.
You see that again and again - that people have some information they can count well and they have other information much harder to count. So they make the decision based only on what they can count well. And they ignore much more important information because its quality in terms of numeracy is less - even though it's very important in terms of reaching the right cognitive result. All I can tell you is that around Wesco and Berkshire, we try not to be like that. We have Lord Keynes' attitude, which Warren quotes all the time: "We'd rather be roughly right than precisely wrong." In other words, if something
177
is terribly important, we'll guess at it rather than just make our judgment based on what happens to be easily countable.
Chance and performance
No victor believes in chance.
- Friedrich Wilhelm Nietzsche
Past performance is no guarantee of future results. Consider evidence that describes what happens in most similar situations or to most people.
Sometimes a track record is not a good indicator of what is likely to happen in the future. It may show up by luck. Imagine a room filled with 1,000 monkeys. Each is trying to predict the direction (up or down) of interest rates. At the end of 10 predictions, one monkey has a perfect record of predicting the direction of interest rates. He is considered agenius and the greatest economist in history. Even if it was just by chance. As soon as we have a large population of forecasters that predict events where chance plays a role, someone will be right, get press coverage and be presented as a hero. He will hold lectures and give sensible explanations.
Sometimes we only see the good performers. Partly because winners have a tendency to show up (one monkey). Losers don't (999 monkeys). Often we aren't interested in the losers anyway. But we shouldn't be amazed to see winners if there is a large population to choose from. 10,000 monkeys and we find 10 geniuses. When we measure performance we must consider both the number of successes (one monkey), the number of failures (999 monkeys), and the size of the relevant comparison population they came from (1,000 monkeys). The more people (or monkeys), involved in something where chance plays a role, the more likely it is that some people have great performances just by chance. An exception is in a group of high performers where we can observe some common
characteristics that may be a causal factor and not luck.
The same mistakes may happen when people base their conclusions on mere effects and ignore the influence of chance. Think about 100 monkeys. They each roll a die once. Select those 16 monkeys (1/6x100) who rolled a six. As a cure for their "roll-a-six" tendency we give them a new drug. After taking the drug they roll the die again. Now only 2 or 3 monkeys (1/6x16) rolled a six. The rest were "cured." Our false conclusion: "The drug obviously worked."
A con artist sets up a trap. He calls John with a tip. "Watch this stock. It will go up." After 3 correct predictions, John sends him his money. The con artist disappears.
What John didn't know was that the con artist made the same call to 80 people.
178
He told half of them the stock would go up, and the other half the stock would go down. And one of his predictions is sure to be right. 40 people were impressed. At the second call 20 people were impressed and at his third and last call he was considered a genius by 10 people who all sent him their money.
Ignoring failures
Evidence must be drawn from the frequency of both success and failure over time. Often we only consider the successful and supporting outcomes. The epidemiological literature refers to this as survival bias. Only the characteristics of the survivors of a disease or outcome under study are included in the study. Those who have died before the end of the study are excluded. If these are patients with more severe risk factors, the study reduces an apparent association between risk factors and outcomes. Survival bias is also common in all studies made after the outcomes have occurred (including back testing). They only focus on surviving cases or patients. The people who have died are not in the sampling pool. People may also select or omit certain information by only publishing positive outcomes and omitting negative ones.
If we only study successes or survivors, a performance record may look better than it really is. Charles Munger says that
we give too little attention to failures:
It is assumed by many business school graduates, and by almost all consultants, that a corporation can easily improve its outcome by purchasing unrelated or tenuously related businesses. According to this widely shared view, if only the obvious steps had been taken, if the right "mission statement" had been adopted and the right "experts" hired, then each railroad, instead of remaining bound in chains by new forms of competition and obsolete and hostile laws and union rules, would have become another Federal Express, another United Parcel Service, or even another brilliant performer in the mode of Emerson Electric.
Our experience, both actual and vicarious, makes us less optimistic about easy solutions through business acquisition. We think undue optimism arises because successful records draw too much attention. Many people then reason as I would ifl forecasted good prospects in big-time tennis after observation limited to Ivan Lendl and Steffi Graf, or good prospects in the California lottery after limiting observations to winners. The converse is also true, only more so. Far too little attention is given to the terrible effects on shareholders (or other owners) of the worst examples of corporate acquisitions such as CBS-DuMont, Xerox-Scientific Data Systems, General Electric-Utah International, Exxon-Reliance Electric... and Avon Products.
Seeking Wisdom Page 25