Freakonomics Revised and Expanded Edition

Home > Other > Freakonomics Revised and Expanded Edition > Page 4
Freakonomics Revised and Expanded Edition Page 4

by Steven D. Levitt


  1b2a34d4ac42d23b141acd24a3a12dadbcb4a2134141

  db2abad1acbdda212b1acd24a3a12dadbcb400000000

  d43a3a24acb1d32b412acd24a3a12dadbcb422143bc0

  1142340c2cbddadb4b1acd24a3a12dadbcb43d133bc4

  d43ab4d1ac3dd43421240d24a3a12dadbcb400000000

  dba2ba21ac3d2ad3c4c4cd40a3a12dadbcb400000000

  144a3adc4cbddadbcbc2c2cc43a12dadbcb4211ab343

  3b3ab4d14c3d2ad4cbcac1c003a12dadbcb4adb40000

  d43aba3cacbddadbcbca42c2a3212dadbcb42344b3cb

  214ab4dc4cbdd31b1b2213c4ad412dadbcb4adb00000

  313a3ad1ac3d2a23431223c000012dadbcb400000000

  d4aab2124cbddadbcb1a42cca3412dadbcb423134bc1

  dbaab3dcacb1dadbc42ac2cc31012dadbcb4adb40000

  db223a24acb11a3b24cacd12a241cdadbcb4adb4b300

  d122ba2cacbd1a13211a2d02a2412d0dbcb4adb4b3c0

  1423b4d4a23d24131413234123a243a2413a21441343

  db4abadcacb1dad3141ac212a3a1c3a144ba2db41b43

  db2a33dcacbd32d313c21142323cc300000000000000

  1b33b4d4a2b1dadbc3ca22c000000000000000000000

  d12443d43232d32323c213c22d2c23234c332db4b300

  d4a2341cacbddad3142a2344a2ac23421c00adb4b3cb

  Take a look at the answers in bold. Did fifteen out of twenty-two students somehow manage to reel off the same six consecutive correct answers (the d-a-d-b-c-b string) all by themselves?

  There are at least four reasons this is unlikely. One: those questions, coming near the end of the test, were harder than the earlier questions. Two: these were mainly subpar students to begin with, few of whom got six consecutive right answers elsewhere on the test, making it all the more unlikely they would get right the same six hard questions. Three: up to this point in the test, the fifteen students’ answers were virtually uncorrelated. Four: three of the students (numbers 1, 9, and 12) left more than one answer blank before the suspicious string and then ended the test with another string of blanks. This suggests that a long, unbroken string of blank answers was broken not by the student but by the teacher.

  There is another oddity about the suspicious answer string. On nine of the fifteen tests, the six correct answers are preceded by another identical string, 3-a-1-2, which includes three of four incorrect answers. And on all fifteen tests, the six correct answers are followed by the same incorrect answer, a 4. Why on earth would a cheating teacher go to the trouble of erasing a student’s test sheet and then fill in the wrong answer?

  Perhaps she is merely being strategic. In case she is caught and hauled into the principal’s office, she could point to the wrong answers as proof that she didn’t cheat. Or perhaps—and this is a less charitable but just as likely answer—she doesn’t know the right answers herself. (With standardized tests, the teacher is typically not given an answer key.) If this is the case, then we have a pretty good clue as to why her students are in need of inflated grades in the first place: they have a bad teacher.

  Another indication of teacher cheating in classroom A is the class’s overall performance. As sixth graders who were taking the test in the eighth month of the academic year, these students needed to achieve an average score of 6.8 to be considered up to national standards. (Fifth graders taking the test in the eighth month of the year needed to score 5.8, seventh graders 7.8, and so on.) The students in classroom A averaged 5.8 on their sixth-grade tests, which is a full grade level below where they should be. So plainly these are poor students. A year earlier, however, these students did even worse, averaging just 4.1 on their fifth-grade tests. Instead of improving by one full point between fifth and sixth grade, as would be expected, they improved by 1.7 points, nearly two grades’ worth. But this miraculous improvement was short-lived. When these sixth-grade students reached seventh grade, they averaged 5.5—more than two grade levels below standard and even worse than they did in sixth grade. Consider the erratic year-to-year scores of three particular students from classroom A:

  The three-year scores from classroom B, meanwhile, are also poor but at least indicate an honest effort: 4.2, 5.1, and 6.0. So an entire roomful of children in classroom A suddenly got very smart one year and very dim the next, or more likely, their sixth-grade teacher worked some magic with her pencil.

  There are two noteworthy points to be made about the children in classroom A, tangential to the cheating itself. The first is that they are obviously in poor academic shape, which makes them the very children whom high-stakes testing is promoted as helping the most. The second point is that these students (and their parents) would be in for a terrible shock once they reached the seventh grade. All they knew was that they had been successfully promoted due to their test scores. (No child left behind, indeed.) They weren’t the ones who artificially jacked up their scores; they probably expected to do great in the seventh grade—and then they failed miserably. This may be the cruelest twist yet in high-stakes testing. A cheating teacher may tell herself that she is helping her students, but the fact is that she would appear far more concerned with helping herself.

  An analysis of the entire Chicago data reveals evidence of teacher cheating in more than two hundred classrooms per year, roughly 5 percent of the total. This is a conservative estimate, since the algorithm was able to identify only the most egregious form of cheating—in which teachers systematically changed students’ answers—and not the many subtler ways a teacher might cheat. In a recent study among North Carolina schoolteachers, some 35 percent of the respondents said they had witnessed their colleagues cheating in some fashion, whether by giving students extra time, suggesting answers, or manually changing students’ answers.

  What are the characteristics of a cheating teacher? The Chicago data shows that male and female teachers are equally prone to cheating. A cheating teacher tends to be younger and less qualified than average. She is also more likely to cheat after her incentives change. Because the Chicago data ran from 1993 to 2000, it bracketed the introduction of high-stakes testing in 1996. Sure enough, there was a pronounced spike in cheating in 1996. Nor was the cheating random. It was the teachers in the lowest-scoring classrooms who were most likely to cheat. It should also be noted that the $25,000 bonus for California teachers was eventually revoked, in part because of suspicions that too much of the money was going to cheaters.

  Not every result of the Chicago cheating analysis was so dour. In addition to detecting cheaters, the algorithm could also identify the best teachers in the school system. A good teacher’s impact was nearly as distinctive as a cheater’s. Instead of getting random answers correct, her students would show real improvement on the easier types of questions they had previously missed, an indication of actual learning. And a good teacher’s students carried over all their gains into the next grade.

  Most academic analyses of this sort tend to languish, unread, on a dusty library shelf. But in early 2002, the new CEO of the Chicago Public Schools, Arne Duncan, contacted the study’s authors. He didn’t want to protest or hush up their findings. Rather, he wanted to make sure that the teachers identified by the algorithm as cheaters were truly cheating—and then do something about it.

  Duncan was an unlikely candidate to hold such a powerful job. He was only thirty-six when appointed, a onetime academic all-American at Harvard who later played pro basketball in Australia. He had spent just three years with the CPS—and never in a job important enough to have his own secretary—before becoming its CEO. It didn’t hurt that Duncan had grown up in Chicago. His father taught psychology at the University of Chicago; his mother ran an afterschool program for forty years, without pay, in a poor neighborhood. When Duncan was a boy, his afterschool playmates were the underprivileged kids his mother cared for. So when he took over the public schools, his allegiance lay more with schoolchildren and their families than with teachers and their union.

  The best way to get rid of cheating teachers, Duncan had decided, was to readminister the standardized exam. He only had the resources to retest 120 classrooms, however, so he asked the creators of the cheati
ng algorithm to help choose which classrooms to test.

  How could those 120 retests be used most effectively? It might have seemed sensible to retest only the classrooms that likely had a cheating teacher. But even if their retest scores were lower, the teachers could argue that the students did worse merely because they were told that the scores wouldn’t count in their official record—which, in fact, all retested students would be told. To make the retest results convincing, some non-cheaters were needed as a control group. The best control group? The classrooms shown by the algorithm to have the best teachers, in which big gains were thought to have been legitimately attained. If those classrooms held their gains while the classrooms with a suspected cheater lost ground, the cheating teachers could hardly argue that their students did worse only because the scores wouldn’t count.

  So a blend was settled upon. More than half of the 120 retested classrooms were those suspected of having a cheating teacher. The remainder were divided between the supposedly excellent teachers (high scores but no suspicious answer patterns) and, as a further control, classrooms with mediocre scores and no suspicious answers.

  The retest was given a few weeks after the original exam. The children were not told the reason for the retest. Neither were the teachers. But they may have gotten the idea when it was announced that CPS officials, not the teachers, would administer the test. The teachers were asked to stay in the classroom with their students, but they would not be allowed to even touch the answer sheets.

  The results were as compelling as the cheating algorithm had predicted. In the classrooms chosen as controls, where no cheating was suspected, scores stayed about the same or even rose. In contrast, the students with the teachers identified as cheaters scored far worse, by an average of more than a full grade level.

  As a result, the Chicago Public School system began to fire its cheating teachers. The evidence was only strong enough to get rid of a dozen of them, but the many other cheaters had been duly warned. The final outcome of the Chicago study is further testament to the power of incentives: the following year, cheating by teachers fell more than 30 percent.

  You might think that the sophistication of teachers who cheat would increase along with the level of schooling. But an exam given at the University of Georgia in the fall of 2001 disputes that idea. The course was called Coaching Principles and Strategies of Basketball, and the final grade was based on a single exam that had twenty questions. Among the questions:

  How many halves are in a college basketball game?

  a. 1 b. 2 c. 3 d. 4

  How many points does a 3-pt. field goal account for in a basketball game?

  a. 1 b. 2 c. 3 d. 4

  What is the name of the exam which all high school seniors in the state of Georgia must pass?

  a. Eye Exam

  b. How Do the Grits Taste Exam

  c. Bug Control Exam

  d. Georgia Exit Exam

  In your opinion, who is the best Division I assistant coach in the country?

  a. Ron Jirsa

  b. John Pelphrey

  c. Jim Harrick Jr.

  d. Steve Wojciechowski

  If you are stumped by the final question, it might help to know that Coaching Principles was taught by Jim Harrick Jr., an assistant coach with the university’s basketball team. It might also help to know that his father, Jim Harrick Sr., was the head basketball coach. Not surprisingly, Coaching Principles was a favorite course among players on the Harricks’ team. Every student in the class received an A. Not long afterward, both Harricks were relieved of their coaching duties.

  If it strikes you as disgraceful that Chicago schoolteachers and University of Georgia professors will cheat—a teacher, after all, is meant to instill values along with the facts—then the thought of cheating among sumo wrestlers may also be deeply disturbing. In Japan, sumo is not only the national sport but also a repository of the country’s religious, military, and historical emotion. With its purification rituals and its imperial roots, sumo is sacrosanct in a way that American sports will never be. Indeed, sumo is said to be less about competition than about honor itself.

  It is true that sports and cheating go hand in hand. That’s because cheating is more common in the face of a bright-line incentive (the line between winning and losing, for instance) than with a murky incentive. Olympic sprinters and weightlifters, cyclists in the Tour de France, football linemen and baseball sluggers: they have all been shown to swallow whatever pill or powder may give them an edge. It is not only the participants who cheat. Cagey baseball managers try to steal an opponent’s signs. In the 2002 Winter Olympic figure-skating competition, a French judge and a Russian judge were caught trying to swap votes to make sure their skaters medaled. (The man accused of orchestrating the vote swap, a reputed Russian mob boss named Alimzhan Tokhtakhounov, was also suspected of rigging beauty pageants in Moscow.)

  An athlete who gets caught cheating is generally condemned, but most fans at least appreciate his motive: he wanted so badly to win that he bent the rules. (As the baseball player Mark Grace once said, “If you’re not cheating, you’re not trying.”) An athlete who cheats to lose, meanwhile, is consigned to a deep circle of sporting hell. The 1919 Chicago White Sox, who conspired with gamblers to throw the World Series (and are therefore known forever as the Black Sox), retain a stench of iniquity among even casual baseball fans. The City College of New York’s championship basketball team, once beloved for its smart and scrappy play, was instantly reviled when it was discovered in 1951 that several players had taken mob money to shave points—intentionally missing baskets to help gamblers beat the point spread. Remember Terry Malloy, the tormented former boxer played by Marlon Brando in On the Waterfront? As Malloy saw it, all his troubles stemmed from the one fight in which he took a dive. Otherwise, he could have had class; he could have been a contender.

  If cheating to lose is sport’s premier sin, and if sumo wrestling is the premier sport of a great nation, cheating to lose couldn’t possibly exist in sumo. Could it?

  Once again, the data can tell the story. As with the Chicago school tests, the data set under consideration here is surpassingly large: the results from nearly every official match among the top rank of Japanese sumo wrestlers between January 1989 and January 2000, a total of 32,000 bouts fought by 281 different wrestlers.

  The incentive scheme that rules sumo is intricate and extraordinarily powerful. Each wrestler maintains a ranking that affects every slice of his life: how much money he makes, how large an entourage he carries, how much he gets to eat, sleep, and otherwise take advantage of his success. The sixty-six highest-ranked wrestlers in Japan, comprising the makuuchi and juryo divisions, make up the sumo elite. A wrestler near the top of this elite pyramid may earn millions and is treated like royalty. Any wrestler in the top forty earns at least $170,000 a year. The seventieth-ranked wrestler in Japan, meanwhile, earns only $15,000 a year. Life isn’t very sweet outside the elite. Low-ranked wrestlers must tend to their superiors, preparing their meals, cleaning their quarters, and even soaping up their hardest-to-reach body parts. So ranking is everything.

  A wrestler’s ranking is based on his performance in the elite tournaments that are held six times a year. Each wrestler has fifteen bouts per tournament, one per day over fifteen consecutive days. If he finishes the tournament with a winning record (eight victories or better), his ranking will rise. If he has a losing record, his ranking falls. If it falls far enough, he is booted from the elite rank entirely. The eighth victory in any tournament is therefore critical, the difference between promotion and demotion; it is roughly four times as valuable in the rankings as the typical victory.

  So a wrestler entering the final day of a tournament on the bubble, with a 7–7 record, has far more to gain from a victory than an opponent with a record of 8–6 has to lose.

  Is it possible, then, that an 8–6 wrestler might allow a 7–7 wrestler to beat him? A sumo bout is a concentrated flurry of force and speed and lever
age, often lasting only a few seconds. It wouldn’t be very hard to let yourself be tossed. Let’s imagine for a moment that sumo wrestling is rigged. How might we measure the data to prove it?

  The first step would be to isolate the bouts in question: those fought on a tournament’s final day between a wrestler on the bubble and a wrestler who has already secured his eighth win. (Because more than half of all wrestlers end a tournament with either seven, eight, or nine victories, hundreds of bouts fit these criteria.) A final-day match between two 7–7 wrestlers isn’t likely to be fixed, since both fighters badly need the victory. A wrestler with ten or more victories probably wouldn’t throw a match either, since he has his own strong incentive to win: the $100,000 prize for overall tournament champion and a series of $20,000 prizes for the “outstanding technique” award, “fighting spirit” award, and others.

  Let’s now consider the following statistic, which represents the hundreds of matches in which a 7–7 wrestler faced an 8–6 wrestler on a tournament’s final day. The left column tallies the probability, based on all past meetings between the two wrestlers fighting that day, that the 7–7 wrestler will win. The right column shows how often the 7–7 wrestler actually did win.

  So the 7–7 wrestler, based on past outcomes, was expected to win just less than half the time. This makes sense; their records in this tournament indicate that the 8–6 wrestler is slightly better. But in actuality, the wrestler on the bubble won almost eight out of ten matches against his 8–6 opponent. Wrestlers on the bubble also do astonishingly well against 9–5 opponents:

  As suspicious as this looks, a high winning percentage alone isn’t enough to prove that a match is rigged. Since so much depends on a wrestler’s eighth win, he should be expected to fight harder in a crucial bout. But perhaps there are further clues in the data that prove collusion.

 

‹ Prev