Human Diversity
Page 31
Convincing evidence on the longer-term impacts of scaled-up pre-K programs on academic outcomes and school progress is sparse, precluding broad conclusions. The evidence that does exist often shows that pre-K-induced improvements in learning are detectable during elementary school, but studies also reveal null or negative longer-term impacts for some programs.25
Whether the glass is half full or half empty is a matter of perspective. My own view is that if a four-year-old who is experiencing pain or deprivation at home spends some hours of the day in a warm and nurturing environment, that is a good in itself that does not need to be justified by continued impact 20 years later. Pre-K can also be a positive socializing experience for children who aren’t experiencing pain and deprivation at home—that’s why pre-K programs are so popular with upper-middle-class parents for their own children. On both counts, I do not oppose spending money on pre-K programs that provide warm and nurturing environments for children in need. However, ascertaining the proportion of programs that actually do provide warm and nurturing environments for children in need is a neglected research topic.
The cautionary aspect of the two “consensus statements” is that they are consistent with Proposition #10. Recall that the mean effect size for programs since 1980 was just 0.16 in the Duncan and Magnuson meta-analysis. That’s the effect on exit tests—which then fades out. Teachers in those pre-K programs are using the same methods they’ve used for the last 50 years, and nothing gives reason to expect some dramatic new pedagogy is in our future. If the potential for helping children in early childhood is to be realized, new tools will have to be found.
“The First Premise Is Wrong When It Comes to Self-Concept”
Over the last half century, psychologists and educators have spent immense effort experimenting with ways in which achievement in life can be enhanced by changing the way people think about their own abilities and potential—self-concept. The three major manifestations of this effort have involved self-esteem, stereotype threat, and growth mindset.
The Self-Esteem Movement
The self-esteem movement came first, with its origins often attributed to Nathaniel Branden’s The Psychology of Self-Esteem, published in 1969.26 Branden himself, who had first come to public attention as Ayn Rand’s principal disciple, treated self-esteem as an internalized sense of self-responsibility and self-sufficiency. But the self-esteem movement took on a life of its own. It soon discarded those core conditions of proper self-esteem and instead focused instead on having a favorable opinion of oneself, independently of objective justification for that favorable opinion. Children were to be praised because praise fosters self-esteem. Criticism should be avoided because criticism undermines self-esteem. Classroom competitions should be avoided because they damage the self-esteem of the losers.
From the 1970s through the 1990s, low self-esteem took on the aura of a meta-explanation for many of society’s major problems.27 And since low self-esteem was the problem, high self-esteem was the solution. Psychological health, high educational performance, earnings as an adult—whatever the desired outcome, higher self-esteem would help produce it.
The empirical underpinnings of the self-esteem movement came crashing down in the early 2000s. A team of scholars led by Roy Baumeister, formerly an advocate for self-esteem interventions, reviewed 15,000 studies that had been written on the relationship of self-esteem to the development of children and concluded that improving self-esteem does not raise grades or career achievement, or have any other positive effect.28 You can still find remnants of the enthusiasm for self-esteem in the public schools (for example, in the persistence of the “everyone gets a trophy” mindset), but the scholarly standing of simple self-esteem as a way to improve childhood outcomes has declined precipitously.
Stereotype Threat
The label stereotype threat and the concept itself were introduced in a seminal article by Claude Steele and Joshua Aronson in 1995.29 The authors administered the same test to two sets of African American students. The test was described as a problem-solving exercise (a neutral description) to one sample and as an IQ test (activating a negative stereotype of the intelligence of African Americans) to the other. The authors concluded that the threatening condition raised concerns about being judged by the stereotype and thereby degraded the performance of the experimental sample. The study got widespread publicity and the concept caught on. Soon stereotype threat was extended to negative stereotypes about women. Its popularity as a concept rose as rapidly as that of self-esteem had climbed a quarter of a century earlier. By 2003, only eight years after the initial article, stereotype threat was covered in two-thirds of introductory psychology textbooks.30
From the beginning, the effects of stereotype threat have been widely misunderstood. The original paper by Steele and Aronson was interpreted in the media as showing that once it had been removed, the ethnic difference in test scores disappeared.31 What Steele and Aronson actually showed is that the ethnic gap can be increased when stereotype threat is activated. That’s not the same as evidence that the gap shrinks when it is removed.
The most commonly studied form of stereotype threat involves women and math, using the hypothesis that activating the negative stereotype “women aren’t good at math” depresses women’s math scores. Five meta-analyses of such studies were published from 2008 through 2016. Despite a host of methodological issues that have been treated differently by the different authors, the estimated effect sizes have clustered within a fairly narrow range, as indicated below.
Author: Nguyen and Ryan (2008)
Effect size (d): –0.21
Author: Stoet and Geary (2012)
Effect size (d): –0.17
Author: Picho et al. (2013)
Effect size (d): –0.24
Author: Flore and Wicherts (2015)
Effect size (d): –0.22
Author: Doyle and Voyer (2016)
Effect size (d): –0.29
Given how closely the effect sizes are grouped, it is bemusing to read the authors’ perspectives on whether the glass is half full or half empty. The authors of three of the studies treat their effect sizes more or less at face value and think they have practical implications (Nguyen and Ryan, Picho et al., Doyle and Voyer). In contrast, Stoet and Geary and Flore and Wicherts are both worried about the degree to which there is evidence of publication bias (only studies that find stereotype threat reach publication), a lack of control groups in many studies, and other methodological weaknesses.[32]
The studies of race-based stereotype threat do not have an equivalent body of meta-analytic results, in large part because of the difficulty of assembling large sample sizes. The Nguyen and Ryan meta-analysis reported an effect size for ethnic minorities of –0.32.[33] Psychologists Gregory Walton and Steven Spencer combined three field studies using African American participants to test race-based stereotype threat. Based on the mean level of prior performance, they reported an effect size of –0.27.[34]
In interpreting these results, the overhanging problem is “researcher degrees of freedom”—a phrase that refers to the many decisions researchers have to make in the course of collecting and analyzing data combined with the tendency to make those decisions in ways that favor the hypothesis they are testing.[35] The problem is most acute for topics that have high political and emotional salience. Stereotype threat is a classic example. Researcher degrees of freedom affect both the decisions during the research and the decision whether to publish negative results. There are several indications that such decisions have been a problem with stereotype threat research:
Replications often fail to confirm the earlier results.[36]
The evidence for stereotype threat has dissipated over time.37
Publication bias (failure to report negative results) appears to have been a reality.[38]
In 2019, scholars at the University of Minnesota dealt with these and other issues in the most comprehensive meta-analysis of stereotype threat to date, focusing
on the high-stakes test settings in which stereotype threat should theoretically cause the most problems. For the studies relevant to high-stakes settings, the effect size of stereotype threat was –.14 (lowering test scores), a small effect that was further reduced to –.09 after correcting for publication bias. The authors summarized their findings as follows:
Based on the results of the focal analysis, operational and motivational subsets, and publication bias analyses, we conclude that the burden of proof shifts back to those that claim that stereotype threat exerts a substantial effect on standardized test takers. Our best estimate of stereotype threat effects within groups in settings with conditions most similar to operational testing is small and inflated by publication bias.39
Given this assessment from the largest and most rigorous meta-analysis of a quarter century of attempts to demonstrate stereotype threat, it seems unlikely that a significant role for stereotype threat exists.
The Growth Mindset Movement
For many people, including me, the self-esteem movement as it developed in practice was inherently problematic: Parents and teachers were encouraged to praise children independently of their actual accomplishments. Shouldn’t parents and teachers be encouraging earned self-esteem? In 1998, psychologists Carol Dweck and Claudia Mueller put another spin on that concern: When we praise children for an accomplishment, should we praise their intelligence or their effort? They conducted six experiments using items from Standard Progressive Matrices (a widely used test of nonverbal ability that involves no reading) as the task assigned to 5th graders. Subsequently, some of the students were praised for their intelligence, others were praised for the effort that they put into the test, and others received praise that didn’t attribute the achievement to anything (e.g., “That’s a really high score”). The results showed large and consistent effects. Children who had been praised for being intelligent subsequently displayed less task persistence and less task enjoyment. They became more concerned about getting a good score than about learning new things. They became protective of their image as “smart” and reluctant to jeopardize it.40 The article, bluntly titled “Praise for Intelligence Can Undermine Children’s Motivation and Performance,” was especially jarring for a society in which many upper-middle-class parents incessantly tell their children how smart they are.
Concluding the findings was this one: “Children praised for intelligence described it as a fixed trait more than children praised for hard work, who believed it to be subject to improvement.”41 That finding was the seed of the growth mindset movement, which has had at least as much effect on public education in the United States as the self-esteem movement did. It has given birth to nonprofit organizations such as PERTS (Project for Education Research that Scales) and a for-profit company, MindsetWorks, which sells curricula for teaching growth mindset.42 Advocates of growth mindset have received millions of dollars in research grants from the Department of Education, the Institute of Educational Sciences, and the Bill & Melinda Gates Foundation, among others.43 “Growth mindset theory has had a profound impact on the ground,” wrote educational scholar Carl Hendrick. “It is difficult to think of a school today that is not in thrall to the idea that beliefs about one’s ability affect subsequent performance, and that it’s crucial to teach students that failure is merely a stepping stone to success.”44
The essence of the theory is the distinction between fixed mindsets and growth mindsets. Fixed mindsets see attributes such as intelligence as being fixed and are accompanied by the student’s readiness to give up in the face of failure. Growth mindsets see attributes such as intelligence as malleable and are accompanied by a readiness to see failure as an opportunity to try again, try harder, and get better.45
Isn’t this tantamount to saying that g can be significantly increased—something that runs counter to a large body of literature? Advocates of the growth mindset think of it another way. Students’ beliefs can get in the way of realizing their cognitive potential. An unwarranted belief in one’s own incompetence is an example. Removing that belief may not increase cognitive potential, but it can increase achievement. Similarly, growth mindset theory does not seek basic changes in personality, but a reorientation of the way the student construes effort or setbacks in school.46
In 2018, a team of five psychologists published a meta-analysis (first author was Victoria Sisk) of the effects of growth mindsets regarding two questions: Is there a relationship between a growth mindset and academic achievement? Is there evidence that growth mindset interventions produce improvements in academic achievement?
The relationship of growth mindset to academic achievement. This meta-analysis analyzed the results of 273 studies with a combined sample of 365,915. The mean correlation between growth mindset and academic achievement was .10. Corrected for measurement unreliability, the estimated correlation was .12. The Sisk study analyzed the results relative to a variety of moderators. Academic risk status and family SES did not affect the relationship. There were statistically significant different effects for children, adolescents, and adults, but the effect remained weak for all subgroups.
The effects of growth mindset interventions on academic achievement. The second meta-analysis analyzed 43 studies with a combined sample of 57,155. Thirty-seven of the 43 effect sizes were not significantly different from zero. One was significantly different from zero but in the wrong direction. Only five of the effect sizes were significant and positive. Overall, the effect size was negligible (d = +0.08).
The problem in interpreting the meta-analyses is that so few of the sources provided large-sample direct tests of growth mindset theory or interventions (many were conflated with stereotype threat). Among those that did provide direct tests, many were unpublished master’s theses and doctoral dissertations of uncertain quality. The advocates of growth mindset theory can point to direct tests of the theory in the published literature that do show effect sizes, occasionally substantial, mixed in with small or zero effect sizes.47 Most recently, a nationwide longitudinal, double-blind, randomized trial with a sample of more than 12,000 found that a short online growth mindset intervention in public high schools increased the grades of lower-achieving students over the academic year and increased enrollment in advanced math courses in the subsequent year. The overall effect size for students at risk for low achievement was 0.11 overall and 0.17 for those in schools with positive peer norms. The findings were robust.48 The 0.17 effect size is small by Cohen’s guidelines and potentially consequential by Funder and Ozer’s guidelines.49 It is about the same as the mean for pre-K interventions, few of which were subjected to comparably rigorous evaluations.
The validation of growth mindset theory is a work in progress. The key task is to disentangle the effects of growth mindset interventions from preexisting personality characteristics, chiefly openness and conscientiousness, and cognitive ability.50 The advent of polygenic scores (see chapter 14) offers rich possibilities for such efforts.
“Some Aspects of the Nonshared Environment Can Be Affected by Outside Interventions”
The nonshared environment explains much of the variance in many traits—sometimes more than genes do. Is it really the case that outside interventions cannot affect the nonshared environment?
Answering that question requires knowing how the nonshared environment functions. In their landmark 1987 article “Why Are Children in the Same Family So Different from One Another?,” Plomin and Daniels acknowledged the obvious: “One gloomy prospect is that the salient environment might be unsystematic, idiosyncratic, or serendipitous events such as accidents, illnesses, and other traumas.… Such capricious events, however, are likely to prove a dead end for research.”51 But researchers did have something to work with in the form of the systematic components of the nonshared environment that I listed in chapter 10: family composition (birth order, gender differences), sibling interactions (differential responses to the same events), differential parental treatment of their children, and extrafami
lial networks such as peer groups.
The phrase “gloomy prospect” hit a nerve. Many scholars, including Plomin, spent the 1990s trying to put the study of the nonshared environment on an empirical footing. Much was learned. Parents really do treat their children differently and siblings really do respond differently to the same events (divorce, for example); and siblings really do have different peer groups that seem to have great influence on their lives.
By 2000, Turkheimer and Mary Waldron could conduct a meta-analysis from a literature search that identified 289 studies, of which 43 qualified for the meta-analysis. Their findings were bleak. When it came to explaining variance for outcomes such as adjustment, personality, and cognition, the largest proportion of explained variance was .053 for differential peer/teacher interactions. “Family constellation” (birth order, age, age spacing, gender) explained .011, differential parental behavior explained .023, and differential sibling interactions explained .024.52 These are all extremely small numbers. “We emphasize that these findings should not lead the reader to conclude that the nonshared environment is not as important as had been thought,” the authors wrote. “Rather, we believe that the appropriate conclusion is that the causal mechanisms underlying nonshared environmental variability in outcome remain unknown.”53
Plomin was having an equally frustrating experience with a 10-year longitudinal project he had launched with colleagues in the 1990s, Nonshared Environment in Adolescent Development (NEAD). For example, there was the matter of differential parental treatment of children. The researchers knew that parental negativity had been found to make a difference in the likelihood that children would become depressed. But the NEAD research found that parents’ negativity was largely a response to, not a cause of, the children’s depression and antisocial behavior.54 What was initially interpreted as an example of parental behavior affecting child outcomes was more appropriately described as a child-based genetic cause of parental behavior—an example of active rGE.