How Not to Be Wrong : The Power of Mathematical Thinking (9780698163843)

Home > Other > How Not to Be Wrong : The Power of Mathematical Thinking (9780698163843) > Page 7
How Not to Be Wrong : The Power of Mathematical Thinking (9780698163843) Page 7

by Ellenberg, Jordan


  It’s the very same effect that makes political polls less reliable when fewer voters are polled. And it’s the same, too, for brain cancer. Small states have small sample sizes—they are thin reeds whipped around by the winds of chance, while the big states are grand old oaks that barely bend. Measuring the absolute number of brain cancer deaths is biased toward the big states; but measuring the highest rates—or the lowest ones!—puts the smallest states in the lead. That’s how South Dakota can have one of the highest rates of brain cancer death while North Dakota claims one of the lowest. It’s not because Mount Rushmore or Wall Drug is somehow toxic to the brain; it’s because smaller populations are inherently more variable.

  That’s a mathematical fact you already know, even if you don’t know you already know it. Who’s the most accurate shooter in the NBA? A month into the 2011−12 season, five players were locked in a tie for the highest shooting percentage in the league: Armon Johnson, DeAndre Liggins, Ryan Reid, Hasheem Thabeet, and Ronny Turiaf.

  Who?

  That’s the point. These were not the five best shooters in the NBA. These were people who barely ever played. Armon Johnson, for instance, appeared in one game for the Portland Trail Blazers. He took one shot. He made it. The five guys on the list took thirteen shots between them and hit them all. Small samples are more variable, so the leading shooter in the NBA is always going to be somebody who’s only taken a handful of shots and who got lucky every time. You would never declare that Armon Johnson was a more accurate shooter than the highest-ranking full-time player on the list, Tyson Chandler of the Knicks, who made 141 out of 202 shots over the same time period.* (Any doubt on this point can be put to rest by looking at Johnson’s 2010−11 season, when he shot a steadfastly ordinary 45.5% from the field.) That’s why the standard leaderboard doesn’t show guys like Armon Johnson. Instead, the NBA restricts the rankings to players who’ve reached a certain threshold of playing time; otherwise, part-time nobodies with their small sample sizes would dominate the list.

  But not every ranking system has the quantitative savvy to make allowances for the Law of Large Numbers. The state of North Carolina, like many others in this age of educational accountability, instituted incentive programs for schools that do well on standardized tests. Each school is rated on the average improvement of student test scores from one spring to the next; the top twenty-five schools in the state on this measure get a banner to hang in the gym and bragging rights over the surrounding towns.

  Who wins this kind of contest? The top scorer in 1999, with a 91.5 “performance composite score,” was C. C. Wright Elementary in North Wilkesboro. That school was on the small side, with 418 students in a state where elementary schools average almost 500 kids. Not far behind Wright were Kingswood Elementary, with a score of 90.9, and Riverside Elementary, with 90.4. Kingswood had just 315 students, and tiny Riverside, in the Appalachian town of Newland, had only 161.

  In fact, the small schools cleaned up on North Carolina’s measure in general. A study by Thomas Kane and Douglas Staiger found that 28% of the smallest schools in the state made the top twenty-five at some point in the seven-year window they studied; among all schools, only 7% ever got the banner in the gym.

  It sounds like small schools, where teachers really know the students and their families and have time to deliver individualized instruction, are better at raising test scores.

  But maybe I should mention that the title of Kane and Staiger’s paper is “The Promise and Pitfalls of Using Imprecise School Accountability Measures.” And that smaller schools did not show any tendency, on average, to have significantly higher scores on the tests. And that the schools that were assigned state “assistance teams” (read: that got a dressing-down from state officials for low test scores) were also predominantly smaller schools.

  In other words, as far as we know, Riverside Elementary is no more one of the top elementary schools in North Carolina than Armon Johnson is the sharpest shooter in the league. The reason small schools dominate the top twenty-five isn’t because small schools are better, but because small schools have more variable test scores. A few child prodigies or a few third-grade slackers can swing a small school’s average wildly; in a large school, the effect of a few extreme scores will simply dissolve into the big average, hardly budging the overall number.

  So how are we supposed to know which school is best, or which state is most cancer-prone, if taking simple averages doesn’t work? If you’re an executive managing a lot of teams, how can you accurately assess performance when the smaller teams are more likely to predominate at both the top and bottom tier of your rankings?

  There is, unfortunately, no easy answer. If a tiny state like South Dakota experiences a rash of brain cancer, you might presume that the spike is in large measure due to luck, and you might estimate that the rate of brain cancer in the future is likely to be closer to the overall national number. You could accomplish this by taking some kind of weighted average of the South Dakota rate with the national rate. But how to weight the two numbers? That’s a bit of an art, involving a fair amount of technical labor I’ll spare you here.

  One relevant fact was first observed by Abraham de Moivre, an early contributor to the modern theory of probability. De Moivre’s 1756 book The Doctrine of Chances was one of the key texts on the subject. (Even then, the popularization of mathematical advances was a vigorous industry; Edmond Hoyle, whose authority in matters of card games was so great that people still use the phrase “according to Hoyle,” wrote a book to help gamblers master the new theory, called An Essay Towards Making the Doctrine of Chances Easy to those who Understand Vulgar Arithmetic only, to which is added some useful tables on annuities.)

  De Moivre wasn’t satisfied with the Law of Large Numbers, which said that in the long run the proportion of heads in a sequence of flips gets closer and closer to 50%. He wanted to know how much closer. To understand what he found, let’s go back and look at those coin flip counts again. But now, instead of listing the total number of heads, we’re going to record the difference between the number of heads actually flipped and the number of heads you might expect, 50% of the flips. In other words, we’re measuring how far off we are from perfect head-tail parity.

  For the ten-coin trials, you get:

  1, 1, 0, 1, 0, 1, 2, 2, 1, 0, 0, 4, 2, 0, 2, 1, 0, 2, 2, 4 . . .

  For the hundred-coin trials:

  4, 4, 2, 5, 2, 1, 3, 8, 10, 7, 4, 4, 1, 2, 1, 0, 10, 7, 5 . . .

  And for the thousand-coin trials:

  14, 1, 11, 28, 37, 26, 8, 10, 22, 8, 7, 11, 11, 10, 30, 10, 3, 38, 0, 6 . . .

  You can see that the discrepancies from 50-50 get bigger in absolute terms as the number of coin flips grows, even though (as the Law of Large Numbers demands) they’re getting smaller as a proportion of the number of flips. De Moivre’s insight is that the size of the typical discrepancy* is governed by the square root of the number of coins you toss. Toss a hundred times as many coins as before and the typical discrepancy grows by a factor of 10—at least, in absolute terms. As a proportion of the total number of tosses, the discrepancy shrinks as the number of coins grows, because the square root of the number of coins grows much more slowly than does the number of coins itself. The thousand-coin flippers sometimes miss an even distribution by as many as 38 heads; but as a proportion of total throws, that’s only 3.8% away from 50-50.

  De Moivre’s observation is the same one that underlies the computation of the standard error in a political poll. If you want to make the error bar half as big, you need to survey four times as many people. And if you want to know how impressed to be by a good run of heads, you can ask how many square roots away from 50% it is. The square root of 100 is 10. So when I got 60 heads in 100 tries, that was exactly one square root away from 50-50. The square root of 1,000 is about 31; so when I got 538 heads in 1,000 tries, I did something even more surprising, even though I got only 53.8% heads in the lat
ter case and 60% heads in the former.

  But de Moivre wasn’t done. He found that the discrepancies from 50-50, in the long run, always tend to form themselves into a perfect bell curve, or, as we call it in the biz, the normal distribution. (Statistics pioneer Francis Ysidro Edgeworth proposed that the curve be called the gendarme’s hat, and I have to say I’m sorry this didn’t catch on.)

  The bell curve/gendarme’s hat is tall in the middle and very flat near the edges, which is to say that the farther a discrepancy is from zero, the less likely it is to be encountered. And this can be precisely quantified. If you flip N coins, the chance that you’ll end up being off by at most the square root of N from 50% heads is about 95.45%. The square root of 1,000 is about 31; indeed, eighteen of our twenty big thousand-coin trials above, or 90%, were within 31 heads of 500. If I kept playing the game, the fraction of times I ended up somewhere between 469 and 531 heads would get closer and closer to that 95.45% figure.*

  It feels like something is making it happen. Indeed, de Moivre himself might have felt this way. By many accounts, he viewed the regularities in the behavior of repeated coin flips (or any other experiment subject to chance) as the work of God’s hand itself, which turned the short-term irregularities of coins, dice, and human life into predictable long-term behavior, governed by immutable laws and decipherable formulae.

  It’s dangerous to feel this way. Because if you think somebody’s transcendental hand—God, Lady Luck, Lakshmi, doesn’t matter—is pushing the coins to come up half heads, you start to believe in the so-called law of averages: five heads in a row and the next one’s almost sure to land tails. Have three sons, and a daughter is surely up next. After all, didn’t de Moivre tell us that extreme outcomes, like four straight sons, are highly unlikely? He did, and they are. But if you’ve already had three sons, a fourth son is not so unlikely at all. In fact, you’re just as likely to have a son as a first-time parent.

  This seems at first to be in conflict with the Law of Large Numbers, which ought to be pushing your brood to be split half and half between boys and girls.* But the conflict is an illusion. It’s easier to see what’s going on with the coins. I might start flipping and get 10 heads in a row. What happens next? Well, one thing that might happen is you’d start to suspect something was funny about the coin. We’ll return to that issue in part II, but for now let’s assume the coin is fair. So the law demands that the proportion of heads must approach 50% as I flip the coin more and more times.

  Common sense suggests that, at this point, tails must be slightly more likely, in order to correct the existing imbalance.

  But common sense says much more insistently that the coin can’t remember what happened the first ten times I flipped it!

  I won’t keep you in suspense—the second common sense is right. The law of averages is not very well named, because laws should be true, and this one is false. Coins have no memory. So the next coin you flip has a 50-50 chance of coming up heads, the same as any other. The way the overall proportion settles down to 50% isn’t that fate favors tails to compensate for the heads that have already landed; it’s that those first ten flips become less and less important the more flips we make. If I flip the coin a thousand more times, and get about half heads, then the proportion of heads in the first 1,010 flips is also going to be close to 50%. That’s how the Law of Large Numbers works: not by balancing out what’s already happened, but by diluting what’s already happened with new data, until the past is so proportionally negligible that it can safely be forgotten.

  SURVIVORS

  What applies to coins and test scores applies to massacres and genocides, too. If you rate your bloodshed by proportion of national population eliminated, the worst offenses will tend to be concentrated in the smallest countries. Matthew White, author of the agreeably morbid Great Big Book of Horrible Things, ranked the bloodlettings of the twentieth century in this order, and found that the top three were the massacre of the Herero of Namibia by their German colonists, the slaughter of Cambodians by Pol Pot, and King Leopold’s war in the Congo. Hitler, Stalin, Mao, and the big populations they decimated don’t make the list.

  This bias toward less populous nations presents a problem—where is our mathematically certified rule for figuring out precisely how much distress to experience when we read about the deaths of people in Israel, Palestine, Nicaragua, or Spain?

  Here’s a rule of thumb that makes sense to me: if the magnitude of a disaster is so great that it feels right to talk about “survivors,” then it makes sense to measure the death toll as a proportion of total population. When you talk about a survivor of the Rwandan genocide, you could be talking about any Tutsi living in Rwanda; so it makes sense to say that the genocide wiped out 75% of the Tutsi population. And you might be justified to say that a catastrophe that killed 75% of the population of Switzerland was the “Swiss equivalent” of what befell the Tutsi.

  But it would be absurd to call someone in Seattle a “survivor” of the World Trade Center attack. So it’s probably not useful to think of deaths at the World Trade Center as a proportion of all Americans. Only about one in a hundred thousand Americans, or 0.001%, died at the World Trade Center that day. That number is too close to zero for your intuition to grasp hold of it; you have no feeling for what that proportion means. And so it’s dicey to say that the Swiss equivalent to the World Trade Center attacks would be a mass murder that killed 0.001% of the Swiss, or eighty people.

  So how are we supposed to rank atrocities, if not by absolute numbers and not by proportion? Some comparisons are clear. The Rwanda genocide was worse than 9/11 and 9/11 was worse than Columbine and Columbine was worse than one person getting killed in a drunk-driving accident. Others, separated by vast differences in time and space, are harder to compare. Was the Thirty Years’ War really more deadly than World War I? How does the horrifyingly rapid Rwanda genocide stack up against the long, brutal war between Iran and Iraq?

  Most mathematicians would say that, in the end, the disasters and atrocities of history form what we call a partially ordered set. That’s a fancy way of saying that some pairs of disasters can be meaningfully compared, and others cannot. This isn’t because we don’t have accurate enough death counts, or firm enough opinions as to the relative merits of being annihilated by a bomb versus dying of war-induced famine. It’s because the question of whether one war was worse than another is fundamentally unlike the question of whether one number is bigger than another. The latter question always has an answer. The former does not. And if you want to imagine what it means for twenty-six people to be killed by terrorist bombings, imagine twenty-six people killed by terrorist bombings—not halfway across the world, but in your own city. That computation is mathematically and morally unimpeachable, and no calculator is required.

  FIVE

  MORE PIE THAN PLATE

  Proportions can be misleading even in simpler, seemingly less ambiguous cases.

  A recent working paper by economists Michael Spence and Sandile Hlatshwayo painted a striking picture of job growth in the United States. It’s traditional and pleasant to think of America as an industrial colossus, whose factories run furiously night and day producing the goods the world demands. Contemporary reality is rather different. Between 1990 and 2008, the U.S. economy gained a net 27.3 million jobs. Of those, 26.7 million, or 98%, came from the “nontradable sector”: the part of the economy including things like government, health care, retail, and food service, which can’t be outsourced and which don’t produce goods to be shipped overseas.

  That number tells a powerful story about recent American industrial history, and it was widely repeated, from The Economist to Bill Clinton’s latest book. But you have to be careful about what it means. Ninety-eight percent is really, really close to 100%. So does the study say that growth is as concentrated in the nontradable part of the economy as it could possibly be? That’s what it sounds like—but that’s not quite
right. Jobs in the tradable sector grew by a mere 620,000 between 1990 and 2008, that’s true. But it could have been worse—they could have declined! That’s what happened between 2000 and 2008; the tradable sector lost about 3 million jobs, while the nontradable sector added 7 million. So the nontradable sector accounted for 7 million jobs out of the total gain of 4 million, or 175%!

  The slogan to live by here is:

  Don’t talk about percentages of numbers when the numbers might be negative.

  This may seem overly cautious. Negative numbers are numbers, and as such they can be multiplied and divided like any others. But even this is not as trivial as it first appears. To our mathematical predecessors, it wasn’t even clear that negative numbers were numbers at all—they do not, after all, represent quantities in exactly the same way positive numbers do. I can have seven apples in my hand, but not negative seven. The great sixteenth-century algebraists, like Cardano and François Viète, argued furiously about whether a negative times a negative equaled a positive; or rather, they understood that consistency seemed to demand that this be so, but there was real division about whether this had been proved factual or was only a notational expedient. Cardano, when an equation he was studying had a negative number among its solutions, had the habit of calling the offending solution ficta, or fake.

  The arguments of Italian Renaissance mathematicians can at times seem as recondite and irrelevant to us as their theology. But they weren’t wrong that there’s something about the combination of negative quantities and arithmetic operations like percentage that short-circuits one’s intuition. When you disobey the slogan I gave you, all sorts of weird incongruities start to bubble up.

  For example, say I run a coffee shop. People, sad to say, are not buying my coffee; last month I lost $500 on that part of my business. Fortunately, I had the prescience to install a pastry case and a CD rack, and those two operations made a $750 profit each.

 

‹ Prev