Koretz retested students in a district that had shown impressive gains; he found that the gains disappeared when the students took a different test of similar material, a test that had been used by the district in the recent past. Clearly the reported gains were illusory. The skills the students had learned were specific to the test and were not generalizable to unexpected situations. The scores had gone up, but the students were not better educated.20
Of what value is it to the student to do well on a state reading test if he cannot replicate the same success on a different reading test or transfer these skills to an unfamiliar context?
Excessive test preparation distorts the very purpose of tests, which is to assess learning and knowledge, not just to produce higher test scores. Koretz demonstrates that the problem with high-stakes testing—that is, test-based accountability—is that it corrupts the tests as measures of student performance. Koretz cites a well-known aphorism in social science known as Campbell’s Law: “The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor.”21 Written by sociologist Donald T. Campbell in 1975, this saying has become legendary as a description of the way organizations in every field change their behavior to meet external measures. As Koretz shows, the changes induced by accountability pressures corrupt the very purpose of schooling by causing practitioners to focus on the measure rather than on the goals of education.
Koretz offers many examples of goal distortion drawn from medicine, job training, industry, and other fields. Most cardiologists in New York stopped performing surgery on critically ill cardiac patients, he writes, after the state began issuing scorecards that reported mortality rates. To avoid getting a bad score, many doctors refused to operate on risky patients; some patients were turned away who might have survived surgery. Similarly, when the airline industry was required to report on-time arrivals, they manipulated the statistics by changing the expected duration of flights; as a result, the on-time statistics became meaningless.22 Richard Rothstein has described how test-based accountability has corrupted education, narrowed the curriculum, and distorted the goals of schooling. By holding teachers accountable only for test scores in reading and mathematics, he writes, schools pay less attention to students’ health, physical education, civic knowledge, the arts, and enrichment activities.23 When faced with demands to satisfy a single measure, people strive to satisfy that measure but neglect the other, perhaps more important goals of the organization.
The pressure to increase test scores is likely to produce higher scores, whether by coaching or cheating or manipulating the pool of test takers. As long as the state or district superintendent continues to report good news about student performance, the public seems satisfied, and the media usually sees no reason to investigate whether the gains are real. State and local leaders want to claim credit for improvement, rather than determine whether the improvement was meaningful.
The starkest display of score inflation is the contrast between the state-reported test scores, which have been steadily (and sometimes sharply) rising since the passage of NCLB, and the scores registered by states on NAEP, the federal assessment program. NCLB permitted the states to write their own standards, pick their own tests, and decide for themselves how to define proficiency. The decision to let the states decide how well they were doing was a bow to federalism and local control, but it created a bizarre situation in which every state was left to determine what would be its passing mark and what percent of its students had reached it. Given the necessity to report gains, many states reported steady—and sometimes amazing—progress toward the mandated goal of 100 percent proficiency. Texas, for example, reported in 2007 that 85.1 percent of its students in grades four and eight were proficient readers, but on NAEP tests, only 28.6 percent were. Tennessee claimed that 90 percent of its students were proficient readers, but NAEP reported that 26.2 percent were. Similarly, Nebraska told the public that 90.5 percent of students in these grades were proficient, but NAEP said the number was 34.8 percent.24
NAEP monitors trends; if the state says its scores are rising but its scores on NAEP are flat, then the state reports are very likely inflated. In a choice between the state’s self-reported scores and an audit test, the public should trust the audit test.
Most states were content to report their impressive gains to the public, congratulating themselves for their wise planning and implementation of standards-based reform. Once in a great while, however, a dissident voice was heard. This happened in 2007 in North Carolina, when a state commission was created to review the state’s policies on testing and accountability. That commission reported that “there is too much time spent on testing” and recommended a reduction in the number of tests students were required to take. Sam Houston, chairman of the commission, said, “We’re testing more but we’re not seeing the results. We’re not seeing graduation rates increasing. We’re not seeing remediation rates decreasing. Somewhere along the way testing isn’t aligning with excellence.”25
However, few states or districts challenged the dominant paradigm of test-based accountability. It was the federal law, and they had to comply. NCLB fueled a growing demand for accountability, as well as a booming testing industry. Journalists referred to proponents of tough accountability as “reformers.” These reformers, the new breed of corporate-style superintendents, were hailed for their willingness to crack down on teachers and principals and to close schools if their students’ scores did not go up. Some states and districts introduced merit pay plans, which tied teacher compensation to their students’ test scores. Some districts, such as Chicago, New York City, and Washington, D.C., closed schools in response to students’ test scores; these same districts even gave cash payments to students in pilot programs if they increased their grades or scores. Others gave bonuses to principals or fired them, depending on their school’s test scores.
One problem with test-based accountability, as currently defined and used, is that it removes all responsibility from students and their families for the students’ academic performance. NCLB neglected to acknowledge that students share in the responsibility for their academic performance and that they are not merely passive recipients of their teachers’ influence. Nowhere in the federal accountability scheme are there measures or indicators of students’ diligence, effort, and motivation. Do they attend school regularly? Do they do their homework? Do they pay attention in class? Are they motivated to succeed? These factors affect their school performance as much as or more than their teachers’ skill.26
Similarly, the authors of the law forgot that parents are primarily responsible for their children’s behavior and attitudes. It is families that do or do not ensure that their children attend school regularly, that they are in good health, that they do their homework, and that they are encouraged to read and learn. But in the eyes of the law, the responsibility of the family disappears. Something is wrong with that. Something is fundamentally wrong with an accountability system that disregards the many factors that influence students’ performance on an annual test—including the students’ own efforts—except for what teachers do in the classroom for forty-five minutes or an hour a day.
Accountability as we know it now is not helping our schools. Its measures are too narrow and imprecise, and its consequences too severe. NCLB assumes that accountability based solely on test scores will reform American education. This is a mistake. A good accountability system must include professional judgment, not simply a test score, and other measures of students’ achievement, such as grades, teachers’ evaluations, student work, attendance, and graduation rates. It should also report what the school and the district are providing in terms of resources, class sizes, space, well-educated teachers, and a well-rounded curriculum. Furthermore, a good accountability system might include an external inspection of schools by trained observers to evaluate th
eir quality on a regular schedule, though not necessarily every single year. In a state or a large district, low-performing schools might be reviewed frequently, while schools that consistently get good reports might get a visit every few years. The object of inspection should not be to assay the school as a prelude to closing it or to impose a particular way of teaching, but to help the school improve.
Consider the distinction between what we might think of as “positive accountability,” where low scores trigger an effort to help the school, and “punitive accountability,” where low scores provide a reason to fire the staff and close the school. In a strategy of positive accountability, district officials take decisive and consistent steps to improve low-performing schools. One example was the Chancellor’s District in New York City, established by Chancellor Rudy Crew in 1996. Crew placed fifty-eight of the city’s lowest-performing schools into a noncontiguous district and targeted them for intensive assistance. He saturated them with additional services and resources. He reduced class size, with no more than twenty students in kindergarten through third grade, and no more than twenty-five students in grades four through eight. He lengthened the school day. Students who needed extra help could get tutoring every afternoon. After-school activities extended the school day to 6 p.m. Each school was required to follow a prescribed curriculum and instructional program, with a heavy dose of literacy and mathematics. Extra pay was awarded to draw certified teachers to the Chancellor’s District. Students in these schools registered significant improvement in reading, but not in mathematics (as compared to students in other low-performing schools). Eleven of the fifty-eight schools did not improve and were closed. The Chancellor’s District was singled out for commendation by the Council of the Great City Schools for raising student achievement in the lowest-performing schools. After the school system was transferred to mayoral control in 2002, the district was abolished.27
Another (albeit mixed) example of positive accountability can be found in Florida, where the state gives a single letter grade, ranging from A to F, to all public schools. This is a practice I abhor, as I think it is harmful to stigmatize a complex institution with a letter grade, just as it would be ridiculous to send a child home with a report card that contained only a single letter grade to summarize her performance in all her various courses and programs. That said, after the grades are handed out, the state quickly steps in to help the D and F schools with technical support, consultants, coaches, and materials. As a result of the state’s supportive response, most of the low-rated schools have improved. For nearly seven years, the state sanctioned F-rated schools by giving vouchers to their students, who could use them to attend a private or better-performing public school. In 2006, a Florida court declared the voucher program unconstitutional.28
The third example of positive accountability is Atlanta, where Superintendent Beverly Hall established a series of interventions to help struggling schools. The district, whose students are 91 percent African American and Hispanic and three-quarters low-income, was known for low performance before her arrival. But since her appointment in 1999, Atlanta’s public schools have steadily improved. She raised the quality of the professional staff by careful hiring, “meaningful evaluations, and consistent job-embedded professional development.” Before the enactment of NCLB, Hall established accountability targets for every school, including the percentage of students who meet standards and the percentage who exceed them, as well as student attendance and enrollment in higher-level courses. When a school meets 70 percent of its targets, the entire staff receives a bonus, including cafeteria workers, bus drivers, the school nurse, and teachers. Hall replaced 89 percent of the principals, who had allegedly been hired on the basis of personal connections. She closed some schools whose enrollments had dropped when Atlanta demolished its public housing projects. Her strategy was slow and steady, and it paid off. Not only did Atlanta see strong improvement in its state test scores and graduation rates (which are not always meaningful indicators), but Atlanta showed impressive gains on the National Assessment of Educational Progress. It was the only one of eleven cities tested from 2003 to 2007 that showed significant progress in both reading and mathematics in fourth and eighth grades. The American Association of School Administrators selected Hall as its National Superintendent of the Year in 2009.29
In the NCLB era, when the ultimate penalty for a low-performing school was to close it, punitive accountability achieved a certain luster, at least among the media and politicians. Politicians and non-educator superintendents boasted of how many schools they had shuttered. Their boasts won them headlines for “getting tough” and cracking down on bad schools. But closing down a school is punitive accountability, which should happen only in the most extreme cases, when a school is beyond help. Closing schools should be considered a last step and a rare one. It disrupts lives and communities, especially those of children and their families. It destroys established institutions, in the hope that something better is likely to arise out of the ashes of the old, now-defunct school. It accelerates a sense of transiency and impermanence, while dismissing the values of continuity and tradition, which children, families, and communities need as anchors in their lives. It teaches students that institutions and adults they once trusted can be tossed aside like squeezed lemons, and that data of questionable validity can be deployed to ruin people’s lives.
The goal of accountability should be to support and improve schools, not the heedless destruction of careers, reputations, lives, communities, and institutions. The decision to close a school is a death sentence for an institution; it should be recognized as a worst-case scenario. The abject failure of a school represents the failure of those in charge of the district, not just the people who work in the school.
The trouble with test-based accountability is that it imposes serious consequences on children, educators, and schools on the basis of scores that may reflect measurement error, statistical error, random variation, or a host of environmental factors or student attributes. None of us would want to be evaluated—with our reputation and livelihood on the line—solely on the basis of an instrument that is prone to error and ambiguity. The tests now in use are not adequate by themselves to the task of gauging the quality of schools or teachers. They were designed for specific purposes: to measure whether students can read and can do mathematics, and even in these tasks, they must be used with awareness of their limitations and variability. They were not designed to capture the most important dimensions of education, for which we do not have measures.
This issue was addressed in 1988, when a group of esteemed members of the National Academy of Education, led by psychologist Robert Glaser, commented on the value of the National Assessment of Educational Progress. They worried that NAEP might measure only reading, mathematics, and writing. They wrote, “While these competencies are important prerequisites for living in our modern world and fundamental to general and continuing education, they represent only a portion of the goals of elementary and secondary schooling.” They represent neither the humanities nor the “aesthetic and moral aims of education” that cannot be measured. The scholars warned that “when test results become the arbiter of future choices, a subtle shift occurs in which fallible and partial indicators of academic achievement are transformed into major goals of schooling. . . . Those personal qualities that we hold dear—resilience and courage in the face of stress, a sense of craft in our work, a commitment to justice and caring in our social relationships, a dedication to advancing the public good in our communal life—are exceedingly difficult to assess. And so, unfortunately, we are apt to measure what we can, and eventually come to value what is measured over what is left unmeasured.”30
Tests are necessary and helpful. But tests must be supplemented by human judgment. When we define what matters in education only by what we can measure, we are in serious trouble. When that happens, we tend to forget that schools are responsible for shaping character, developing sound minds in healthy bod
ies (mens sana in corpore sana), and forming citizens for our democracy, not just for teaching basic skills. We even forget to reflect on what we mean when we speak of a good education. Surely we have more in mind than just bare literacy and numeracy. And when we use the results of tests, with all their limitations, as a routine means to fire educators, hand out bonuses, and close schools, then we distort the purpose of schooling altogether.
CHAPTER NINE
What Would Mrs. Ratliff Do?
MY FAVORITE TEACHER was Mrs. Ruby Ratliff. She is the teacher I remember best, the one who influenced me most, who taught me to love literature and to write with careful attention to grammar and syntax. More than fifty years ago, she was my homeroom teacher at San Jacinto High School in Houston, and I was lucky enough to get into her English class as a senior.
Mrs. Ratliff was gruff and demanding. She did not tolerate foolishness or disruptions. She had a great reputation among students. When it came time each semester to sign up for classes, there was always a long line outside her door. What I remember most about her was what she taught us. We studied the greatest writers of the English language, not their long writings like novels (no time for that), but their poems and essays. We read Shakespeare, Keats, Shelley, Wordsworth, Milton, and other major English writers. Now, many years later, in times of stress or sadness, I still turn to poems that I first read in Mrs. Ratliff’s class.
The Death and Life of the Great American School System Page 21