Unions have many critics, including some within their own ranks who complain that their leaders fail to protect teachers against corporate reformers. Other critics want the unions to become more assertive in policing their own ranks and getting rid of incompetents and malingerers. But the critics most often quoted in the media see unions as the major obstacle to education reform. They fault the unions for their resistance to using test scores to evaluate teachers. They want administrators to have the freedom to fire teachers whose students’ test scores do not improve and to replace them with new teachers who might raise those scores. They want to use test scores as the decisive tool of evaluation. Their goal is a school system in which scores go up every year and in which teachers who don’t contribute to that result can be promptly removed. In response to NCLB, which required steady improvement in test scores each year, many, including President Barack Obama, endorsed the idea of using students’ test scores to evaluate their teachers.
In the NCLB era, the media attached the term “reformer” to those educators and officials who turned to market-based, data-driven reforms to produce higher scores. These free-market reformers advocated testing, accountability, merit pay, and charter schools, and most were notably hostile to unions. The unions objected to the reformers’ efforts to judge teachers solely by their students’ test scores, and the reformers sought to break the power of the unions. The reformers said that having “great” teachers or “effective” teachers was the key to their goals, and they wanted the union to get out of their way.
In the decade before NCLB, reformers agreed that the teacher was the key to educational improvement, but they pursued a different path. In 1996, five years before the passage of No Child Left Behind, the National Commission on Teaching & America’s Future issued a report called What Matters Most: Teaching for America’s Future. The chairman of the commission was Governor James B. Hunt Jr. of North Carolina. It included the presidents of the two major teachers’ unions, business leaders, university presidents, and other educators. Its executive director was Linda Darling-Hammond, then of Teachers College, Columbia University.
The commission set a goal that by 2006, all children would be taught by excellent teachers. To reach this goal, the commission proposed higher standards for teacher education programs, high-quality professional development, more effective recruitment practices, a greater commitment to professionalism, and schools that support good teaching. The commission recommended additional compensation for teachers who won national board certification, received licenses to teach in another subject, or demonstrated greater pedagogical skills and content knowledge. But it specifically rejected schemes to connect teacher pay to students’ test scores. The scores, the report warned, are only “crude measures” that “do not take into account the different backgrounds and prior performances of students, the fact that students are not randomly distributed across schools and classrooms, the shortcomings in the kinds of learning measured by current standardized tests, and the difficulty in sorting out which influences among many—the home, the community, the student him- or herself, and multiple teachers—are at play.” It noted further that “attempts to link student test scores to rewards for teachers and schools have led to counterproductive incentives for keeping out or pushing out low-achieving students, retaining them in a grade so their scores look higher, or assigning them to special education where their scores don’t count, rather than teaching them more effectively.”11
After the passage of NCLB, however, everything changed. Efforts to improve teacher professionalism were swept away by the law’s singular focus on raising test scores. Schools that did not meet this demand faced public humiliation and possible closure. Superintendents and principals were commanded by the law to get test scores higher every year until every student was proficient. The idea of teacher professionalism became an antique notion; far more compelling was the search for teachers who would get the scores up, especially in urban districts, where superintendents pledged to close the achievement gap between African American/Hispanic students and white/Asian students.
NCLB required that the scores rise in reading and mathematics in every grade from third through eighth, which meant that this year’s fourth grade had to get a higher score than last year’s fourth grade. It didn’t take long for school officials to realize that they needed what were called “growth models,” so the progress of individual children could be tracked over time. This way of measuring academic improvement was known as “value-added assessment” (VAA), a technique that was developed mainly by William Sanders of the University of Tennessee. A statistician and (at that time) adjunct professor in the university’s College of Business Administration, Sanders had worked as a statistical consultant to agricultural, manufacturing, and engineering industries. His value-added method aimed to calculate the extent to which teachers contributed to the gains made by their students, as compared to other factors. Drawing on his studies, which were purely statistical in nature (i.e., not involving classroom observations), Sanders concluded that “the most important factor affecting student learning is the teacher. In addition, the results show wide variation in effectiveness among teachers. The immediate and clear implication of this finding is that seemingly more can be done to improve education by improving the effectiveness of teachers than by any other single factor. Effective teachers appear to be effective with students of all achievement levels, regardless of the level of heterogeneity in their classrooms.”12 Sanders contrasted his method—which involved calculating the rate of progress that students make on standardized tests over a period of years—with what he called “a laissez faire approach,” that is, “appropriate more resources and free educators to utilize their own professionalism.” 13 The “laissez faire approach” sounded very much like the remedies proposed by the National Commission on Teaching & America’s Future, although the commission would not have characterized its proposals as “laissez faire.” In Sanders’ view, this approach had created huge variability among schools and had failed. What was needed, Sanders insisted, was a rigorous, data-based analysis such as his own.
The idea of value-added assessment made sense, at least on the surface. If you compare the test scores of specific students from year to year, or from September to June, then you can pinpoint which students got the biggest gains and which made no gains at all. The scores of the students can be matched to their teachers, and patterns begin to emerge, making it possible to identify which teachers regularly get large gains in their classes, and which get few or none. Using the value-added scores, districts would be able to rank their teachers by their ability to increase gains. Those at the top would be considered the superstars, and those at the bottom would improve or get fired.
Value-added assessment is the product of technology; it is also the product of a managerial mind-set that believes that every variable in a child’s education can be identified, captured, measured, and evaluated with precision. Computers make it possible to assemble the annual test scores of thousands of students and quickly analyze which students gained the most, which gained nothing, and which lost ground on standardized tests. Sanders the statistician soon became Sanders the educational measurement guru. As the methodology gained adherents, education policy increasingly became the domain of statisticians and economists. With their sophisticated tools and their capacity to do multivariate longitudinal analysis, they did not need to enter the classroom, observe teachers, or review student work to know which teachers were the best and which were the worst, which were effective and which were ineffective. Discussions of what to teach and what constituted a quality education receded into the background; those issues were contentious and value-laden, not worthy of the attention of the data-minded policy analysts. Using value-added models, the technical experts could evaluate teachers and schools without regard to the curriculum or the actual lived experiences of their students. What mattered most in determining educational quality was not curriculum or instruction, but data.
<
br /> NCLB did not incorporate value-added assessment, and its failure to do so was grounds for frequent criticism. Of what value was it to know whether this year’s fourth grade did better on the state test than last year’s fourth grade? Wasn’t it more important to determine whether the students in this year’s fourth grade learned more by the time they moved to fifth grade? And wasn’t it better still to be able to measure how much the scores of specific children had gone up or down over time? Even better was to link the scores of specific students to specific teachers. Of course, the missing consideration in the debates among economists and policymakers was the quality of the assessments. If the assessments were low-level, multiple-choice tests, and if teachers were intensely prepping their students for the tests, then could it really be said that these were measures of learning? Or that they were indicators of better teaching? Or were they instead measures of how well children had been drilled to respond to low-level questions?
Eric Hanushek of Stanford University studied the problem of how to increase the supply of high-quality teachers. Hanushek is a friend of mine, and one of the nation’s best economists of education. In 2004, I invited him and his colleague Steven Rivkin to present a paper at a conference at the Brookings Institution. Reviewing a large number of studies, they noted that teachers’ salaries, certification, education, and additional degrees had little impact on student performance. The variables that mattered most in the studies they reviewed were teachers’ experience and their scores on achievement tests, but most studies found even these variables to be statistically insignificant. They cited studies showing that teachers in their first year of teaching, and to some extent their second as well, “perform significantly worse in the classroom” than more experienced teachers. Hanushek and Rivkin concluded that the best way to improve teacher quality was to look at “differences in growth rates of student achievement across teachers. A good teacher would be one who consistently obtained high learning growth from students, while a poor teacher would be one who consistently produced low learning growth.” Since the current requirements for entry into teaching are “imprecise” or not consistently correlated with teaching skill, they argued, it made no sense to tighten up the credentialing process. Instead, “If one is concerned about student performance, one should gear policy to student performance.” Hanushek and Rivkin projected that “having five years of good teachers in a row” (that is, teachers at the 85th percentile) “could overcome the average seventh-grade mathematics achievement gap between lower-income kids (those on the free or reduced-price lunch program) and those from higher-income families. In other words, high-quality teachers can make up for the typical deficits seen in the preparation of kids from disadvantaged backgrounds.” In light of these findings, Hanushek and Rivkin recommended that states “loosen up” the requirements for entering teaching and pay more attention to whether teachers are able to get results, that is, better student performance on tests.14
At the conference, Richard Rothstein responded that the policy implications of the Hanushek-Rivkin paper were “misleading and dangerous.” He objected to the authors’ view that school reform alone could overcome the powerful influence of family and social environment. He dismissed their claims about closing the achievement gap between low-income students and their middle-class peers in five years, an assertion similar to one previously advanced by Sanders. Sanders said that students with teachers in the top quintile of effectiveness for three consecutive years would gain 50 percentile points as compared to those who were assigned to the lowest quintile. Rothstein said their reasoning was circular: “good teachers can raise student achievement, and teachers are defined as good if they raise student achievement.” Thus, one cannot know which teachers are effective until after they had produced consistent gains for three to five straight years. But, said Rothstein, if these top teachers were then assigned to low-income schools, then middle-income schools would necessarily have less effective teachers. Rothstein found it hard to imagine how such a policy might be implemented.15
Yet there was something undeniably appealing about the idea that a string of “effective” or “top-quintile” teachers could close the achievement gap between low-income students and their middle-income peers and between African American students and white students. And there was something appalling about the idea that a string of mediocre or bad teachers would doom low-performing students to a life of constant failure, dragging them down to depths from which they might never recover. The bottom line was that the teacher was the key to academic achievement. A string of top-quintile teachers could, on their own, erase the learning deficits of low-income and minority students, or so the theory went.
This line of reasoning appealed to conservatives and liberals alike; liberals liked the prospect of closing the achievement gap, and conservatives liked the possibility that it could be accomplished with little or no attention to poverty, housing, unemployment, health needs, or other social and economic problems. If students succeeded, it was the teacher who did it. If students got low scores, it was the teacher’s fault. Teachers were both the cause of low performance and the cure for low performance. The solution was to get rid of bad teachers and recruit only good ones. Of course, it was difficult to know how to recruit good teachers when the determination of their effectiveness required several years of classroom data.
A 2006 paper by Robert Gordon, Thomas J. Kane, and Douglas O. Staiger, titled “Identifying Effective Teachers Using Performance on the Job,” took the argument a step further. Like Hanushek and Rivkin, these authors maintained that “paper qualifications,” such as degrees, licenses, and certification, do not predict who will be a good teacher. The differences, they said, between “stronger teachers” and “weaker teachers” become clear only after teachers have been teaching for “a couple of years.” Their solution was to recruit new teachers without regard to paper credentials and to measure their success by their students’ test scores. They agreed that value-added measures of student performance were essential in identifying effective teachers. They recommended that school districts pay bonuses to effective teachers who teach in high-poverty schools. And they recommended that the federal government provide grants to states to build data systems to “link student performance with the effectiveness of individual teachers over time.” These recommendations were of more than academic interest, because one of the authors, Robert Gordon of the Center for American Progress, a Washington-based think tank, was subsequently selected by the Obama administration to serve as deputy director for education in the Office of Management and Budget, where he was able to promote his policy ideas. And sure enough, President Obama’s education program included large sums of money for states to build data systems that would link student test scores to individual teachers, as well as funds for merit pay plans that would reward teachers for increasing their students’ test scores. In choosing his education agenda, President Obama sided with the economists and the corporate-style reformers, not with his chief campaign adviser, Linda Darling-Hammond.16
The Gordon, Kane, and Staiger study followed teachers in their first, second, and third years. It concluded that students assigned to a teacher in the bottom quartile of all teachers (ranked according to their students’ gains) lost on average 5 percentile points compared to similar students. Meanwhile, a student who was assigned to a teacher in the top quartile gained 5 percentile points. Thus, the difference between being assigned to a low- or high-rated teacher was 10 percentile points. Noting that the black-white achievement gap is estimated to be 34 percentile points, they reached this startling conclusion: “Therefore, if the effects were to accumulate, having a top-quartile teacher rather than a bottom-quartile teacher four years in a row would be enough to close the black-white test score gap.”17
So, depending on which economist or statistician one preferred, the achievement gap between races, ethnic groups, and income groups could be closed in three years (Sanders), four years (Gordon, Kane, and Staiger), or five year
s (Hanushek and Rivkin). Over a short period of time, this assertion became an urban myth among journalists and policy wonks in Washington, something that “everyone knew.” This particular urban myth fed a fantasy that schools serving poor children might be able to construct a teaching corps made up exclusively of superstar teachers, the ones who produced large gains year after year. This is akin to saying that baseball teams should consist only of players who hit over .300 and pitchers who win at least twenty games every season; after all, such players exist, so why should not such teams exist? The fact that no such team exists should give pause to those who believe that almost every teacher in almost every school in almost every district might be a superstar if only school leaders could fire at will.
The teacher was everything; that was the new mantra of economists and bottom-line school reformers. And not only was the teacher the key to closing the achievement gap, but the most effective teachers did not need to have any paper credentials or teacher education. There was no way to predict who would be a good teacher.18 So there was no reason to limit entry into teaching; anyone should be able to enter the profession and show whether she or he could raise test scores.
Some scholars questioned whether value-added assessment should be used for consequential personnel decisions. Economist Dale Ballou wrote in 2002 that value-added assessment was “useful when viewed in context by educators who understand local circumstances,” but that it was potentially dangerous when used for accountability and high-stakes personnel decisions. The tests were not accurate enough to serve as the basis for high-stakes decisions. Test scores, he wrote, were affected not only by students’ ability and by random influences (such as the weather or students’ emotional state), but also by statistical properties such as measurement error and random error. These errors affect student scores, and they get “noisier” (less reliable) when used to calculate gain scores and then to attribute the gains to a specific teacher. Gain scores, he pointed out, are influenced by factors other than teachers and schools; social and demographic factors affect not only the starting point but the “rate of progress” that students make. Yet, he noted, most value-added methods do not control for those non-school factors. Also problematic was that gain scores are not necessarily comparable, because test questions are not of equal difficulty. If the gains are not comparable, then the results are meaningless, he said. Ballou, who subsequently wrote articles with Sanders, warned that “there are too many uncertainties and inequities to rely on such measures for high-stakes personnel decisions.”19
The Death and Life of the Great American School System Page 23