Human Diversity
Page 57
14. This is not the same as saying that beyond a certain threshold, higher IQ is unrelated to measures of success, despite a claim to that effect in Malcolm Gladwell’s bestselling book Outliers. Gladwell (2008). Most evidence indicates that more is better across the IQ range for a wide variety of outcomes. Gottfredson (1997a).
15. Strenze (2007): Table 2. The numbers represent correlations corrected for unreliability and dichotomization weighted by sample size.
16. Kuncel and Hezlett (2010): Fig. 1. Correlations are corrected for restriction of range and criterion unreliability.
17. The relationship of IQ to job performance has been so exhaustively studied that by 2012 when the Oxford University Press included a chapter titled “Cognitive Abilities” in The Oxford Handbook of Personnel Assessment and Selection, it consisted of a meta-review of meta-analyses. Here are the “operational validities”—equivalent to correlations—of tests of mental abilities with measures of overall job performance for a variety of different job types.
Low complexity jobs
Operational validity of the IQ score: .38
Medium complexity jobs
Operational validity of the IQ score: .56
High complexity jobs
Operational validity of the IQ score: .59
Police
Operational validity of the IQ score: .24
Drivers
Operational validity of the IQ score: .45
Salespeople
Operational validity of the IQ score: .46
Clerical jobs
Operational validity of the IQ score: .54
Engineers
Operational validity of the IQ score: .63
Managers
Operational validity of the IQ score: .67
Computer programmers
Operational validity of the IQ score: .73
Source: Ones, Dilchert, and Viswesvaran (2014): Table 3.
The results are limited to meta-analyses incorporating corrections for restriction of range and criterion unreliability using conservative criterion reliability estimates. When more than one meta-analysis reported operational validities for a given category, I report the mean of those results.
The chapter also includes analyses comparing the operational validities of different kinds of information that employers use. In all of those cases, it is not just that the operational validity for the IQ test is higher than for another measure. When IQ and another measure are combined, the other measure adds comparatively little. For example, if an IQ score and the evaluation from a job interview are combined, they jointly have an operational validity of .61, but the increment attributable to the interview (compared to the operational validity of the IQ test alone) is just .07. If an IQ score and biographical data about the job candidate are combined, their joint operational validity is .56—but the increment attributable to the biographical data is just .02. Ones, Dilchert, and Viswesvaran (2014): Table 8. “Cognitive ability tests are generalizably valid predictors of overall job performance across a large number of jobs, organizations, occupations, and even countries,” the authors concluded. “No other individual differences predictor produces as high validities as consistently as cognitive ability tests or has proven its validity in such a variety of settings.” Ones, Dilchert, and Viswesvaran (2014): 186.
For a dissent to the consensus about the relationship of IQ to job performance, see Richardson and Norgate (2015). The article presents a variety of measurement and aggregation problems associated with meta-analyses in general and assessments of job performance in particular.
18. Kuncel, Ones, and Sackett (2010): 333.
19. Multiple intelligences (MI). Gardner himself has never accepted that MI can be judged by psychometric standards, nor has he tried to develop measures of the various intelligences that would permit falsification of his theory, as he has acknowledged. For a technical exchange by critics, Gardner’s response, and a rejoinder by the critics, see Visser, Ashton, and Vernon (2006a); Gardner (2006); and Visser, Ashton, and Vernon (2006b). For a literature review of the evidence for MI, see Waterhouse (2006).
In 2016, Gardner offered this retrospective on MI’s relationship to classic theories of intelligence: “But, in truth, most psychologists, and particularly most psychometricians, have never warmed to the theory. I think that psychologists are wedded to the creation and administration of short-answer tests, and particularly ones that resemble the IQ test. While such tests can probe linguistic and logical capacities, as well as certain spatial abilities, they are deficient in assessing other abilities, such as interpersonal intelligence (social intelligence), intrapersonal intelligence (akin to emotional intelligence), and other nonacademic intelligences. I have not devoted significant effort to creating such tests.” Gardner (2016): 169.
I should emphasize that if you read Frames of Mind mentally substituting the word talent for intelligence and ignoring Gardner’s critique of g, there’s a lot to be learned from him. In that regard, Gardner has an amusing and I think correct observation in a 2018 interview: “I have never been able to reconstruct when I made the fateful decision not to call these abilities, talents, or gifts, but rather to call them ‘intelligences.’ Because if I had called them anything else, I would not be well known in different corners of the world and journalists like you wouldn’t come to interview me. It was picking the word ‘intelligence’ and pluralizing it.” Liz Mineo, “‘The Greatest Gift You Can Have Is a Good Education, One That Isn’t Strictly Professional,’” Harvard Gazette, May 2018.
Emotional intelligence (EI). As mentioned in chapter 3, the most widely used test of EI is the Mayer-Salovey-Caruso Emotional Intelligence Test (MSCEIT). Version 2 has eight subscales measuring four aspects of EI: perceiving emotions, using emotions to facilitate thought, understanding emotions, and managing emotions. For discussions of its psychometric properties, see Mayer, Caruso, and Salovey (1999); Ciarrochi, Chan, and Caputi (2000); Palmer, Gignac, Manocha et al. (2005); and Landy (2005). Matthews, Zeidner, and Roberts (2007) is a book-length critique of EI after the years in which it got the most attention. Van Rooy and Viswesvaran (2004) provide a meta-analysis of the literature. Assessments of the utility of EI independently of IQ and traditional personality factors vary widely, from enthusiastic to dismissive, and characterizing the strengths and weaknesses of the various positions would take us deep into the psychometric weeds. For me, a single source that is both rigorous and judicious is a 2010 integrative meta-analysis by psychologists Dana Joseph and Daniel Newman. They conclude the article with “Practical Advice for Using Emotional Intelligence Measures in Personnel Selection” that seems to do a good job of drawing lessons from a complicated literature of widely varying quality. Quoting directly from their text:
1. Choose your EI measure carefully. There are two, distinct definitions of the term “Emotional Intelligence”: (a) ability to perform emotional tasks, and (b) a grab-bag of everything that is not cognitive ability. It is critical to distinguish these two, because measures based on the two EI definitions do not have the same content, predictive validity, or subgroup differences.
2. Exercise extreme caution when using mixed EI measures. Grab-bag measures of EI (i.e., self-report mixed measures) appear to exhibit some incremental validity over cognitive ability and personality measures on average (based on nine studies), but it is not clear why. As such, use of these measures for personnel decisions may be difficult to defend, without extensive local validation.
3. Know that ability EI measures may add little to the selection system. Ability-based measures of EI (performance-based and self-report) exhibit little incremental validity over cognitive ability and personality, on average.
4. Base the decision to use an EI measure on the job type (i.e., consider the emotional labor content of the job). When dealing with high emotional labor jobs (jobs that require positive emotional displays), all types of EI measures exhibit meaningful validity and incremental validity over cognitive ability and personality. In contr
ast, for low emotional labor jobs, EI validities are weaker or can even be negative.
5. Be aware of subgroup differences on EI. Although more data are needed, preliminary evidence suggests that performance-based EI measures favor women and Whites, which may produce adverse impact against men and African Americans. (Joseph and Newman (2010): 72).
Grit. Labels can be effective, and “grit” is a great one. It captures a human quality that we recognize in some people and not in others and it has an obvious and persuasive relationship to success in all sorts of human endeavors. But grit as a psychological construct overlaps substantially with one of the Big Five factors, conscientiousness. There are other psychometric issues as well. Marcus Credé has published an excellent review of the literature and presentation of the major issues. He sums up as follows:
For all its intuitive appeal, the grit literature is currently characterized by a number of serious theoretical and empirical challenges ranging from a lack of construct validity, discriminant validity, and predictive validity. At present there is no empirical support for the idea that grit is the combination of perseverance and passion or for the claim that grit adds to our understanding of success and performance. Indeed, the best available evidence strongly suggests that grit is largely a repackaging of conscientiousness—a widely studied personality trait. If grit is to represent a meaningful contribution to our understanding of success, researchers should focus on three broad areas. First, future work will have to pay particularly close attention to whether the combination of perseverance and passion into a single construct can be theoretically or empirically justified or whether the two facets are best studied individually. Second, future work should consider whether grit or grit facets interact with ability to predict success or whether grit facets represent necessary-but-not-sufficient conditions for success. Third, efforts should be made to improve the measurement of grit and grit facets because any empirical investigation of the role of grit requires that grit be measured better than current scales allow. (Credé (2018): 611).
20. For a systematic discussion of the criteria for establishing that a noncognitive factor in academic achievement is involved in gene-environment transactions, see Tucker-Drob and Harden (2017).
21. Gottfredson and Hirschi (1990); Moffitt, Poulton, and Caspi (2013); Moffitt, Arseneault, Belsky et al. (2011); Duckworth and Carlson (2013).
22. von Stumm, Gale, Batty et al. (2009); Lim, Teo, and Loo (2003); Furnham and Cheng (2017).
23. Chamorro-Premuzic, Harlaar, Greven et al. (2010); Spinath, Spinath, Harlaar et al. (2006); Stankov (2013).
24. von Stumm, Hell, and Chamorro-Premuzic (2011).
25. Poropat (2014): Table 2. In addition to Poropat (2014), Shanahan, Bauldry, Roberts et al. (2014) has an excellent literature review in addition to making its independent contribution. Important subsequent studies include Lechner, Danner, and Rammstedt (2017) and Damian, Su, Shanahan et al. (2014).
26. Borghans, ter Weel, and Weinberg (2008).
27. Cheng and Furnham (2012).
28. These facets and the facets for neuroticism are taken from Costa, Terracciano, and McCrae (2001): Table 2.
29. A fourth study, Borghans, Golsteyn, Heckman et al. (2016), found too late to include in the main text, reports the independent and joint effects of personality and IQ on academic outcomes without a measure of childhood SES.
In a sample of 298 from a Dutch high school, conscientiousness explained more of the variance for grades than any other personality trait or IQ. For scores on the Differential Aptitude Test, which the authors classified as an achievement test, IQ was the most important predictor, but openness was also highly significant (supplemental tables 7.1 and 7.2).
A study of a sample of 8,501 from the British Cohort Study had 11 dependent variables representing test scores and grades. Locus of control consistently had a positive effect on test scores while disorganization consistently had a negative effect. The roles of IQ and the combined personality traits were roughly equal on five of the dependent variables, while IQ and the combined personality traits each had a somewhat larger role on three of the remaining dependent variables (supplemental tables 7.3–7.7).
An analysis of 638 members of the NLSY79 found a dominant role for IQ over measures of locus of control and self-esteem for scores on the Armed Forces Qualification Test (which the authors treated as an achievement test) and grades. On other dependent variables, neither IQ nor the personality inventories explained more than trivial amounts of the variance (supplemental tables 7.8 and 8.1–8.6).
An analysis of 1,561 members of the National Survey of Midlife Development (US) found that the Big Five personality traits explained more of the variance than IQ on dependent variables measuring adult wages, physical health, mental health, and depression (supplemental tables 8.7–8.11).
30. The Project Talent variables for intelligence, personality traits, and parental SES were standardized prior to the analysis while the measures of educational attainment, log of earned income, and occupational prestige were not. The tables in the article reported unstandardized regression coefficients. I express them in the table here as quasi effect sizes (the unstandardized regression coefficient divided by the sample standard deviation) to make them more easily interpretable relative to the standardized total effects in a structural equation model for the Aberdeen Birth Cohort, and path coefficients for the British NCDP.
Both the Project Talent and Aberdeen cohort analyses used structural models that were run five times, once for each of the Big Five factors. The total effects for childhood IQ and parental SES given in the table represent the median of the main effects for IQ and parental SES for the five coefficients (which with few exceptions were within .01 of each other in all five runs).
31. Krapohl, Rimfeld, Shakeshaft et al. (2014): Table S7.
32. Given the results of the bivariate analyses showing the extent to which the heritability of GCSE scores can be explained by each of the nine predictors without considering any of the other eight, the contributions of well-being, child-reported behavior problems, health, and the home environment after controlling for the other predictors cannot have been more than a percent or so, and those of personality, parent-reported behavior problems, and the school environment cannot have been much more than 5 percent. Self-efficacy was the domain that accounted for perhaps half of the 28 percent explained by eight non-IQ predictors. Based on the implications of Krapohl, Rimfeld, Shakeshaft et al. (2014): Table S6.
33. Krapohl, Rimfeld, Shakeshaft et al. (2014): The distribution of loadings in Table S6 applied to the estimate of the total contribution to phenotypic variance of the eight domains other than IQ of 28 percent in Table S7.
34. Tucker-Drob, Briley, Engelhardt et al. (2016): 800.
35. These statements are drawn from Tucker-Drob, Briley, Engelhardt et al. (2016): Figs. 3 and 5. The figures report decomposed standardized path coefficients that do not lend themselves to intuitive interpretation.
36. The table below is adapted from Sackett, Kuncel, Arneson et al. (2009): Table 4. The figures for the SAT meta-analysis are corrected for national population range restriction.
Sample: Meta-analysis of College Board data
Correlation
N:
SES–test: +.42
SES–grade: +.22
Test–grade: +.53
Partial correlation
Test–grade controlling for SES: +.50
SES–grade controlling for test: –.01
Sample: Meta-analysis of studies with composite SES
Correlation
N: 17,235
SES–test: +.15
SES–grade: +.09
Test–grade: +.37
Partial correlation
Test–grade controlling for SES: +.36
SES–grade controlling for test: +.03
Individual longitudinal studies: 1995. Nat’l Study of Law School Performance
Correlation
N: 3,375
SE
S–test: +.16
SES–grade: +.07
Test–grade: +.38
Partial correlation
Test–grade controlling for SES: +.38
SES–grade controlling for test: +.01
Individual longitudinal studies: Harvard Study of the Class of 1964–65
Correlation
N: 486
SES–test: +.07
SES–grade: +.05
Test–grade: +.30
Partial correlation
Test–grade controlling for SES: +.29
SES–grade controlling for test: +.03
Individual longitudinal studies: LSAC Nat’l Longitudinal Bar Passage Study
Correlation
N: 19,264
SES–test: +.13
SES–grade: +.05
Test–grade: +.35
Partial correlation
Test–grade controlling for SES: +.35
SES–grade controlling for test: +.01
Individual longitudinal studies: Nat’l Education Longitudinal Study of 1988
Correlation
N: 6,314
SES–test: +.40
SES–grade: +.10
Test–grade: +.24
Partial correlation
Test–grade controlling for SES: +.23
SES–grade controlling for test: +.02
Individual longitudinal studies: Nat’l Longitudinal Study of the Class of 1972
Correlation
N: 5,735
SES–test: +.30
SES–grade: +.04
Test–grade: +.31
Partial correlation
Test–grade controlling for SES: +.31
SES–grade controlling for test: –.01
Individual longitudinal studies: Project Talent
Correlation
N: 749
SES–test: +.18
SES–grade: +.05
Test–grade: +.30
Partial correlation
Test–grade controlling for SES: +.29
SES–grade controlling for test: +.01
37. I mark the beginning of this literature with Jencks, Smith, Acland et al. (1972), though an argument could be made for Coleman et al. (1966).