The next step is to determine what data can be collected to prove the big goal has been achieved. The data will almost always be test scores, from either a state standardized test, a district test, or a test the teacher finds or creates on her own. A sixth-grade teacher might know that students need at least an 85 on the state’s end-of-year English test to be considered for competitive high schools. So she will pore over exam questions from previous years and will target every lesson plan, homework assignment, and student assessment toward building the skills that will enable her students to do well on that test. If she has to, she will host evening tutoring sessions at McDonald’s, tempting her students with free food. She will write a weekly class newsletter with celebratory “shout-outs” to students who perform well on quizzes. When she calls or visits a student’s home, she will seek to “invest” parents in the big goal.
During the recruitment and selection process, TFA seeks corps members who are likely to embrace this backward-planning, data-driven mind-set. The organization constantly tracks which recruits produce the largest test score gains for their students, reviews those teachers’ characteristics, then looks for new candidates who display similar achievements and behaviors during the interview process.
Teach for America was founded during an era of teacher shortages and promoted on the basis that it was filling a great need. Today, with teacher layoffs and high unemployment, the organization cannot justify itself on the same grounds and instead explicitly advertises its corps members as more effective than veteran teachers. The research consensus on TFA suggests that corps members are about equally effective at raising students’ test scores as teachers from all other pathways, though better in math than in reading and writing. A September 2013 study from Mathematica Policy Research found that TFA middle and high school math teachers outperform other math teachers in their schools, though only by the equivalent of students gaining 3 points on a 100-point test. The researchers could not discern exactly why. Like other research on teacher credentials, the study found that regardless of the pathway into the classroom, teachers who majored in math or who had attended selective colleges did not seem to significantly outperform other teachers with less impressive résumés.*2 So if the “best and brightest” theory isn’t true—if traditional meritocratic credentials aren’t the reason that TFA teachers are good at their jobs—then what accounts for many corps members’ success?
The work of John Hattie provides some clues. He is an education researcher from New Zealand who has reviewed eight hundred meta-analyses that summarize the results of over fifty thousand education studies. Hattie has found that completely separate from a teacher’s demographic traits, a few specific teacher behaviors—including some emphasized by TFA—powerfully influence student achievement. A wealth of research indicates that one of the best things a teacher can do for her students is to set high, individualized expectations for each one of them, regardless of a child’s past performance or whether he comes to class with a label such as low-income, special education, or learning disabled. Effective teachers believe all children can learn—a fundament of the TFA philosophy too—and reject the idea of intelligence as an inborn trait, instead seeing it as something a teacher can develop in every student. In general, academically ineffective teachers are those who set the bar too low; some evidence suggests that half of what is taught in most classes is already known by most students. That brings us to another teacher behavior Hattie identifies as potentially transformative, and which TFA promotes: formative assessment. To avoid teaching children what they have previously learned, teachers should assess students at the beginning of the school year and at the beginning of new units, to identify their strengths and weaknesses. Students should be quizzed again when units end, to determine if concepts and skills have been successfully taught. A cache of studies from cognitive scientists confirms that students score higher on end-of-course standardized tests when they have been periodically quizzed along the way.
Although education research seems to confirm some of TFA’s practices and mind-sets, it calls others into question, particularly those having to do with student discipline. When teachers provide constant, controlling behavioral feedback, as Arpino and Walmsley were being taught to do, they waste precious time they could be spending giving feedback related to the academic content of the lesson, which is far more powerful in terms of raising student achievement. One of the challenges of training a new teacher, Hattie writes, is convincing her that “developing a strong desire to control student behavior can be inconsistent with implementing many conceptual approaches to teaching.”
There has been little convincing research done on those “no excuses” teaching strategies: incentive systems (pizza for good behavior and high test scores); the focus on children’s posture and eye contact as teachers read or lecture; and the school uniforms and silent hallways. Yet this type of teaching has exploded in prominence since the mid-1990s, driven in large part by the strategies and rhetoric used at the KIPP charter schools and adopted by TFA and many of the other charters at which its corps members and alumni increasingly work.*3
The KIPP schools (pronounced “kip”) are the most celebrated in America. They were founded in 1994 by two TFA alums, Dave Levin and Mike Feinberg, who, as struggling first-year teachers in Houston, became entranced by the classroom strategies of Harriet Ball. A magnetic, six-foot-one African American woman, Ball seemed to work miracles with her African American fourth graders. Her students exuberantly sang songs—mnemonics about what they were learning, as well as exhortations of why they were learning (’cause knowledge is power, and power is money, and I want it!)—but when Ball snapped her fingers, they went dead silent. If children acted up or didn’t do their homework, she threatened to transfer them to another teacher’s classroom. They were lucky, she told them, to have her as a teacher, and they better not waste the opportunity.
Ball emphasized education as a privilege and literacy as the pathway to personal financial empowerment—an ideology with a long, proud history in the African American community, traceable to Booker T. Washington. Her high expectations for her students echoed the promise of Rhody McCoy, the superintendent in Ocean Hill—Brownsville, that children would learn if teachers set the right tone. In the 1990s a multiracial group of Generation X education reformers began to adopt and translate these strategies. Levin and Feinberg named their schools KIPP—the Knowledge Is Power Program—after the refrain in Ball’s song. Her “no excuses” methods supposedly proved, as Wendy Kopp has written, that “education can trump poverty,” as long as a teacher accepts her responsibility as the “key variable” driving student outcomes. “We”—not parents, not neighborhoods, not school funding or health care or racism or stable housing—“control our students’ success and failure,” states Teaching as Leadership.
This ideal of the all-powerful individual teacher, solely responsible for raising student achievement in measurable ways, soon transcended start-ups like Teach for America and KIPP to become the foundation of national education policy making during the Obama years. It received a big boost from a new way of evaluating teachers and schools, called value-added measurement.
On May 20, 2003, Kati Haycock of the Education Trust appeared in front of Congress to share her opinion on how No Child Left Behind could be improved. Haycock had been instrumental in pushing for NCLB. Now she had a new message for lawmakers: It was clear that individual teachers, even more than standards or schools themselves, were “the number one ingredient of high achievement” for kids. She cited a body of research from a University of Tennessee statistician named William Sanders. Using a technique called value-added measurement, Haycock said, Sanders had proved that a child with a sequence of good teachers could demonstrate up to 50 points of gain on a 100-point standardized test, “the difference between entry into a selective college and a lifetime working at McDonald’s.” Teachers should be evaluated, she said, on whether they “produce student learning gains.”
S
anders’s claims were stunning. The most important education research of the 1950s and 1960s had been conducted not by testing experts, but by psychologists and sociologists. Kenneth Clark and James Coleman had looked at a broad range of factors that influenced children’s school performance and overall well-being: how many books their parents owned, what toys they played with, whether schools had science laboratories or libraries. When the older generation of researchers tried to pinpoint what made a teacher successful, they often looked for particular personality traits like warmth, extroversion, and conscientiousness. But the explosion of state testing programs since the 1970s provided researchers like Sanders with an unprecedented data trove. Statisticians and economists have used this achievement data to ask a much narrower question: Which teachers raise or lower a child’s test scores?
Value-added measurement is the method researchers developed to find an answer. In its relatively crude, early form, value-added simply used a student’s score on an end-of-year standardized test to predict her score on the following year’s exam. Teachers who presided over larger-than-expected jumps in scores earned above-average value-added ratings. (For example: Sarah scored an 89 in third-grade math. The typical child who scores 89 gets a 91 next year, but Sarah scored a 93 in fourth grade. Those 2 points of unpredicted achievement gain are attributed to her fourth-grade teacher and are computed into the teacher’s value-added rating.)
A more sensitive early value-added formula was developed in Dallas in the mid-1990s, where statisticians recognized the fact that disadvantaged students tend to experience slower academic growth than their middle-class peers, no matter how good their teachers. That’s because poor children are more likely to experience out-of-school disruptions, such as poor nutrition, a move, or homelessness, which can affect learning. The Dallas research team created a value-added equation that included controls for children’s demographic traits, such as parental income and proficiency in English, essentially giving teachers who worked with disadvantaged kids bonus points. This technique found smaller, yet still significant, teacher effects on kids’ test scores.
Value-added measurement changed pretty much everything in our national conversation about student achievement. To assess a school’s improvement or decline, No Child Left Behind compared the “snapshot” score of one group of third graders on an end-of-year math test to the scores of the children who were in third grade the previous year. These snapshots made the teachers and schools that serve poor children look especially bad, because those schools earned low scores year after year. Snapshots obscured whether any individual student was doing better or worse over time. Growth measures that track one group of children over the course of several years, like value-added, present a more nuanced picture. But in 2001, when NCLB was designed, most policy makers in Washington hadn’t heard about value-added. While the law’s real-world consequences were playing out in schools across the country, value-added research grew much more sophisticated. Economists created experiments that randomly assigned students within one school to various teachers, and then measured differences in test score growth. That method eliminated the bias caused by principals clustering the most challenging or most able students in particular classrooms. Researchers also identified more sensitive controls for the factors that influence a child’s test score but are not related to his classroom teacher’s performance. A value-added model developed by the University of Wisconsin for New York City included controls not only for family income and English proficiency, but also for a student’s race, gender, disability status, how often he was absent from class, whether he had been enrolled in summer school, and whether he had recently moved, been suspended, or repeated a grade. The New York City value-added model also compared teachers only to other teachers who taught similar-sized classes, and who had the same number of years of experience.
Using these methods, labor economists produced a massive body of research. It suggested that a teacher’s pathway into the classroom—whether through a traditional teachers college, a graduate-level program in teaching, or an alternative program like Teach for America—hardly mattered with regard to how well they raised student test scores, nor did their college major. There was more value-added variation between teachers within a school than across all the schools in a district—a hopeful finding proving what many urban teachers had long argued: that even “failing” schools employ some excellent educators. First-year teachers were not very good, but they made major leaps in effectiveness by the end of their second year on the job, and they continued improving steadily for five to ten years, after which their measurable performance generally flatlined.
The results of these experiments remain “noisy,” as social scientists say. When value-added is calculated for a teacher using just a single year’s worth of test score data, the error rate is 35 percent—meaning more than one in three teachers who are average will be misclassified as excellent or ineffective, and one in three teachers who excel or are terrible will be called average. Even with three years of data, one in four teachers will be misclassified. It is difficult, if not impossible, to compute an accurate value-added score for teachers who work in teams within a single classroom—a method rapidly growing in popularity—or for the two-thirds of teachers who teach grades or classes not subject to standardized tests.
Some advocates of value-added downplayed these problems and made huge claims based on the technique. The Stanford economist Eric Hanushek, a fellow at the conservative Hoover Institution and proponent of cutting school funding, advanced the hypothesis that if poor children were assigned five “good teachers in a row”—those with value-added scores in the top 15 percent—it would completely close the academic achievement gap between the poor and the middle class. In a 2006 paper for the Brookings Institution, three economists, Robert Gordon, Thomas Kane, and Douglas Staiger, used similar logic to estimate that firing the bottom 25 percent of first-year teachers annually, as determined by a single year’s worth of value-added data, could create $200 billion to $500 billion in economic growth for the country, by enabling poor children to earn higher test scores and go on to obtain better jobs.
The most important thing to realize about these claims, which appear frequently in the media, is that they are untested. According to Tulane University economist Doug Harris, another leading value-added scholar, no experiment has ever been conducted in which poor children are randomly assigned to multiple high value-added teachers in a row, to test if the achievement gap totally closes. “It’s still purely hypothetical,” he told me, “and it would be an incredibly tough experiment to pull off.” Even if such an experiment did take place, Harris guesses that it would fail to confirm the hypothesis that teachers alone can close achievement gaps. Here’s why: The Hanushek theory is that five teachers who each add 10 points to a child’s test score will move that child from the fortieth to the ninetieth achievement percentile over the course of five years. But in real-world conditions, value-added gains tend to fade out over time; next year the average child will lose 50 percent of the test score gains she made this year, and by three years from now she will have lost 75 percent of this year’s gains. According to Harris, that means the academic and economic effects of having multiple above-average teachers in a row have been inflated by more than half. Effective teachers can narrow, but not close, achievement and employment gaps that reflect broader income, wealth, and racial inequalities in American society.
This reality was demonstrated by the most celebrated value-added study ever conducted. Economists Raj Chetty, John Friedman, and Jonah Rockoff tried to figure out if teachers who were good at raising test scores were also good at improving their students’ long-term life outcomes—in other words, if value-added was a good proxy for some of the other goals, aside from raising test scores, that we want teachers to fulfill. Using tax returns and school district records from an unnamed large city, they examined twenty years of data from more than one million children and their teachers, tracking the students from
third grade through young adulthood. One finding was that the current achievement gap is driven much more by out-of-school factors than by in-school factors; differences in teacher quality account for perhaps 7 percent of the gap. But it turned out that the group of students who had been assigned to just one top value-added teacher—a teacher one standard deviation more effective than the norm—experienced small, yet observable, differences in life outcomes. These students earned, on average, 1.3 percent more per year, the difference between a salary of $25,000 and $25,325. They were 2.2 percent more likely to be enrolled in college at age twenty, and were 4.6 percent less likely to become teen mothers.
The researchers posited that if there were a way to systematically move the top value-added teachers to the lowest-performing schools, perhaps 73 percent of the test score achievement gap could be closed. That, however, is a gargantuan policy challenge: When a separate Department of Education/Mathematica trial offered more than one thousand high value-added teachers $20,000 to transfer to a low-income school, less than a quarter chose to apply for the jobs. (Those who did transfer produced test score gains among elementary school kids, but not among middle schoolers.) There was another major caveat, which Chetty, Friedman, and Rockoff acknowledged: Like almost every other major value-added study ever conducted, this one took place in a low-stakes setting, meaning that teachers were not being evaluated or paid according to their students’ test scores. It was possible, the three economists noted, that in a higher-stakes setting, test scores would lose their predictive power, for instance in cases where they reflected not students’ true learning but rather teaching to the test or cheating.
The Teacher Wars Page 24