Gifted 7th graders. The last 60 years have seen major reductions in the male advantage at the extreme high end for 7th graders. For those in the top two percentiles, a ratio of about 2.0 in 1960 appears to have disappeared. For those in the top percentile, a male ratio of about 7.0 has fallen to around 1.5. At the most stratospheric level, the top 1 percent of the top 1 percent, a male advantage that was measured at about 13 to 1 in the 1970s and the early 1980s has fallen to less than 3 to 1.29 In short, what was once thought to be an overwhelming male advantage at high levels of math achievement has been greatly reduced during the last six decades. But a second statement is also true: The male advantage for 7th graders at the highest levels of ability shrank mostly during the 1980s and has been relatively stable since the early 1990s.30
Gifted 12th graders. For college-bound seniors taking the SAT math, the gap at a broadly defined high end—treating the entire SAT pool as a moderately gifted group—has narrowed since the 1970s. The effect size of the male advantage on the SAT was –0.38 in 1977, its largest since the first published scores in 1972, and stood at –0.25 in 2016, its smallest ever.31 The shrinkage in the size of the male advantage was relatively steady throughout the period.
A WORD ABOUT THE SAT
The letters SAT originally stood for Scholastic Aptitude Test, which signaled the test’s purpose: to identify high-IQ students regardless of their family circumstances or the quality of their schooling. The College Board has ignored that history since IQ became politically incorrect in the 1960s, but the SAT remained a good measure of IQ for high school graduates into the 1990s.32 Since the SAT does not release the needed psychometric information, there’s no way to be sure, but I surmise that the SAT lost a little of its quality as a measure of IQ in the revision in 1994, more in the revision in 2005, and still more in the revision in 2016. I should add that the SAT is not culturally biased against ethnic minorities or the poor and, at least until the revisions of 2005 and 2016, was far less susceptible to coaching than most parents think.[33]
If we concentrate on students who qualify as gifted by a more demanding definition—those who score 700–800 on the SAT math—a big drop in the male advantage occurred in the 1980s. It continued into the 1990s, but the downward trend flattened after 1995. Since 2010, the ratio of males to females scoring 700–800 in the SAT math has hovered near 1.9.
The male math advantage at the extreme high end for 17-and 18-year-olds remains large. Since 1950, the Mathematical Association of America has sponsored the American Mathematics Competitions (AMC) for high school students. The test used for the competition is far harder than the SAT math test.[34] The following table shows the male-female ratio for students who scored in the top five percentiles for the years 2009 through 2018. Everyone in this group is in the top percentile of the national population of 17-and 18-year-olds. Those in the 99th percentile on the AMC are probably around the top 0.01 percent of the national population or even higher.
I show two ways of computing the ratio for each of the top five percentiles. One is the raw ratio: the number of males scoring in that AMC percentile divided by the number of females. The other is the ratio adjusted for the numbers of males and females taking the test.35
AMC: Percentile: 95th
Male-female ratio: Raw: 4.2
Male-female ratio: Adjusted: 2.9
AMC: Percentile: 96th
Male-female ratio: Raw: 4.4
Male-female ratio: Adjusted: 3.1
AMC: Percentile: 97th
Male-female ratio: Raw: 4.4
Male-female ratio: Adjusted: 3.1
AMC: Percentile: 98th
Male-female ratio: Raw: 5.3
Male-female ratio: Adjusted: 3.7
AMC: Percentile: 99th
Male-female ratio: Raw: 7.8
Male-female ratio: Adjusted: 5.4
For the American Mathematics Competitions, the male-female ratio remains quite high. Which ratio you think is closer to correct depends on your judgment about the population of test-takers. From 2009 to 2018, the population of AMC12 test-takers averaged 59 percent male and 41 percent female. Your opinion about the reason for the sex imbalance in test-takers should push you toward one choice or the other.
One possibility is that students self-select into the AMC testing pool if they think they’re good enough at math to do well on the test and otherwise don’t bother to take it. To the extent that there is a genuine sex imbalance of talent in the top percentiles of ability, then more males than females will self-select into the pool. If you are attracted by this explanation, you should focus on the raw ratios as the correct ones.
Another possibility is that the larger proportion of male test-takers is an artifact having nothing to do with underlying math talent. Taking the AMC is exceptionally nerdy. Perhaps that’s more off-putting to 17-year-old girls than to 17-year-old boys. Perhaps there is a difference (whether biological or socialized doesn’t matter) in how much boys and girls enjoy the kind of competition that the AMC represents. If you are attracted by this explanation, you should focus on the adjusted ratios as the correct ones.
I won’t try to spin out all the many ways in which the meaning of the ratios is clouded by selection factors. Whichever ratio you think is closer to the truth, they point to an empirical reality: The male-female ratios in the top percentiles of the AMC12 are substantial and they grow larger at the 98th and especially the 99th percentile. In the table, I counted perfect scores of 150 as being in the 99th percentile. When they are broken out separately, it turns out that from 2009 to 2018, 97 males and 7 females got perfect scores: a ratio of 13.9.
On Average, Males Have Substantially Better Visuospatial Skills Than Females36
Diane Halpern’s review of sex differences in visuospatial skills takes 17 pages. It is so long partly because the concept is complicated (she divides visuospatial skills into five components) and partly because, in her words, “sex differences in spatial tasks are among the largest sex differences.”37 But another good reason for a lengthy discussion is that a male advantage in visuospatial skills has specific implications for real-world sex differences in vocations. In the Paleolithic period, they were useful for throwing spears at edible mammals and finding one’s way back home after a long hunting trip. Now they are useful because they seem to be an essential component of extraordinary mathematical and programming skills. Other professions that make extensive use of visuospatial abilities include engineering, architecture, chemistry, aviation, and the building trades.
The first category of spatial aptitude is spatial perception. An example is the Piaget water-level task:
Source: Halpern (2012): Fig. 3.12.
The test-taker is asked to draw a line to show how the water line would look in the tilted bottle. The correct answer is a horizontal line relative to the earth. Halpern reports that the best estimate, summarizing results over many studies, is that about 40 percent of college women get it wrong.38 Effect sizes favoring males range from –0.44 to –0.66. In Halpern’s words, “It is difficult to understand why this should be such a formidable task for college women.”39 And yet the result has been replicated many times, has been confirmed internationally, and is just about impossible to explain as a product of culture or socialization (if you doubt that, give it a try).40
“Mental rotation” refers to the ability to imagine how objects will look when rotated in two-or three-dimensional space. Twenty-five years of research and several meta-analyses have all confirmed a substantial male advantage throughout the age range, with effect sizes ranging from –0.52 to –1.49.
Spatiotemporal ability is another conceptually distinct form of visuospatial skill that calls for judgments about moving objects. For example, the subject of the test might be asked to press a key when a moving object passes a specified point or asked to make an estimate of “time of arrival” of a moving object at a specified destination. Effect sizes have ranged from –0.37 to –0.93.41 In a large sample, with a carefully executed experimental design, effect sizes r
anged from –0.51 to –0.81.42
The fourth type of visuospatial skill calls upon participants to generate a visual image from short-term or long-term memory and then use information in that image to perform a task. The tests usually are scored for both speed and accuracy. In one of the best studies, the effect sizes on speed for four different tasks ranged from –0.63 to –0.77, all favoring males, with no sex differences in accuracy.[43]
The last type of visuospatial skill is called spatial visualization, which calls on people to go through a multistep mental process to understand how an object will be changed if something is done to it. For example, the paper-folding test asks: If you fold a piece of paper in half and punch three holes through it, what will the piece of paper look like when it is unfolded? Males usually show an advantage on spatial visualization, but the effect sizes are generally small.
Halpern describes other types of visuospatial skills, all of which show a male advantage.[44] An important outstanding question is how large the aggregate difference in visuospatial skills might be. Many of the effect sizes for sex differences in visuospatial skills are large even when taken individually. But given the parallel with personality facets—conceptually related but distinct traits—a calculation of Mahalanobis D for large samples of males and females who have taken a comprehensive test battery would be instructive. Perhaps many of the different types of skills are so intercorrelated that aggregating them would not add much to the largest individual effect size. It is a question that I hope will be explored.
On Average, Women Have Better Social Cognition Than Men
We take for granted that we can infer what someone else is thinking, but this inference is actually a theory—“theory of mind,” often abbreviated as ToM in the literature. It refers to our belief that other people have minds of their own that operate in ways we can understand. It is properly called a theory because the only mind we have direct access to is our own and because we can make predictions based on our theory.[45]
Children acquire ToM as toddlers. As normal people mature, they employ ToM to navigate the social world in increasingly complex ways. But not everybody has a normal human consciousness. The severely autistic have trouble with ToM—one of the features of autism that inspired Baron-Cohen’s empathizer-systemizer theory. Even within the normal range, people vary widely in their ability to project themselves into another person’s mind and correctly predict how that person will react. These are skills that are encompassed by Howard Gardner’s interpersonal intelligence and that other scholars refer to as cognitive empathy, mentalizing, mindreading, or the label I have chosen to use, social cognition. In terms of Simon Baron-Cohen’s empathizing and systemizing, social cognition is to empathizing as visuospatial skills are to systemizing. In both cases, the topic is neurocognitive abilities that contribute to a broad difference between the sexes.
The study of social cognition originated in one of the most durable sex stereotypes, that women are more intuitive than men. Through the early 1970s, researchers were dismissive of evidence that a sex difference existed. As late as 1974, the most comprehensive review of sex differences yet undertaken concluded that “neither sex has greater ability to judge the reactions and intentions of others in any generalized sense.”46 Then in 1978, psychologist Judith Hall produced the first comprehensive study of all the quantitative work that had been done. In “Gender Effects in Decoding Nonverbal Cues,” published in Psychological Bulletin in 1978, Hall reported mean effect sizes favoring females of +0.32 for visual cues, +0.18 for auditory cues, and a large effect of +1.02 for the seven studies that combined visual and auditory cues.47 Six years later, Hall extended her meta-analysis to include nine countries around the world. Subsequent work has yielded similar results.48
In 2014, psychologists Ashley Thompson and Daniel Voyer undertook a new meta-analysis. Hall’s reviews had included studies of accuracy in interpersonal perception of any kind. Thompson and Voyer focused on the ability to detect specific discrete emotions. As in other studies, the results showed a female advantage, but with a smaller effect size that had a lower bound effect size of +0.19 and an upper bound of +0.27.49
The Thompson meta-analysis also corroborated Hall’s findings that effect sizes are substantially increased when the subjects in the studies have access to a combination of visual and audio information—that is, when they could see both face and body language and also hear tone of voice. The lower bound effect sizes favoring women were +0.17 for visual only, +0.16 for audio only, and +0.38 for a combination of the two.[50]
The publication of Daniel Goleman’s bestselling Emotional Intelligence: Why It Can Matter More than IQ in 1995 prompted the construction of tests to measure emotional intelligence (EI). The most psychometrically successful and widely used one has been the Mayer-Salovey-Caruso Emotional Intelligence Test (MSCEIT). Version 2 has eight subscales measuring four aspects of EI: perceiving emotion, assimilating emotion in thought, understanding emotion, and reflectively regulating emotion. Of these, the items that most directly measure social cognition as I have been using the term are in the subtests for perceiving emotion. A 2010 meta-analysis found an effect size favoring females of +0.49. On the overall score for performance EI, the female advantage was +0.47.51
I will return to other evidence of sex differences in social cognition in chapter 5, reporting the progress that neuroscientists have made in identifying sex differences in brain function that relate to sex differences in social cognition. In the meantime, two points about differences in social cognition need emphasis:
Social cognition consists of a set of abilities, not something that women do better than men just because they are paying more attention to other people than men do.52 Those abilities often break along the People-Things dimension. For example, it has been found that systemizing skills and empathizing skills are inversely related in men—men who scored high on tests measuring systemizing tended to score low on tests measuring empathizing. Males are rarely good at both systemizing and empathizing. In contrast, these skill sets are largely independent in women. Women can be high in both, low in both, or high in one and low in the other.53 The same study found evidence that men apply systemizing skills to empathizing tasks. Put another way, even when men do well in social cognition tasks, they are not using the cognitive tools most naturally suited to that purpose.
It has also been established that the relationship of IQ to social cognition is different for men and women. Subtests measuring memory are standard in a full-scale IQ test. They wouldn’t be included if they did not correlate with the other subtests seeking to measure g. But the correlation between IQ and certain kinds of memory is different for men and women. In a Swedish study comparing IQ with three episodic memory tasks, women outperformed men in all three—verbal memory, memory for pictures of things, and memory for pictures of faces. The difference was that male performance was substantially correlated with IQ for all three tasks while IQ was substantially less important, especially at the lower levels, for women. Women with IQs of 60–80 had verbal memory as high as men with IQs of 101–120. Women with IQs of 60–80 had substantially higher scores on memory for faces than men with IQs of 101–120.54 Something’s going on with memory in females that calls on non-IQ skills that men do not tap (or perhaps possess) to the same degree.
The aggregate sex difference in social cognition has yet to be estimated. Four different clusters of sex differences are relevant to assessing the overall magnitude of the sex difference in social cognition. The first consists of the direct measures that I have reviewed in this section. The second consists of the female advantage in memory for faces, which in turn is presumably related to the ability to discern visual clues about emotional states. The third is the cluster of ways in which the female sensory apparatus is more sensitive than the male’s. The fourth cluster has to do with male-female differences in personality that bear on the reasoning aspect of social cognition.55 In the technical literature, the effect sizes in all four of these cate
gories have been treated separately. The prudent expectation is that if these individual effect sizes, which have usually been in the small to medium range, were aggregated appropriately, they would reveal a much larger overall difference.
Is There a Sex Difference in g?
The most famous cognitive measure is the IQ test. The tests are designed to minimize sex differences,[56] but minor sex differences in test scores do exist, and they have usually, though not always, favored males.[57] The Wechsler Adult Intelligence Scale (WAIS), one of the best-known IQ tests, provides a typical example. The U.S. standardization samples for the first version, released in 1955, showed a 1.0-point difference in full-scale IQ favoring males. WAIS-R, released in 1981, showed a 2.2-point difference. WAIS-III, released in 1997, showed a 2.7-point difference. WAIS-IV, released in 2008, showed a 2.3-point difference.58
But all of this evidence is based on IQ scores, not on the general mental factor g, the thing that IQ tests are imperfectly measuring. The distinction between an IQ score and g is crucial. An IQ score is based on a set of subtests. The simple sum or average of scores depends on which tests have more representation in the test battery; therefore, as Arthur Jensen wrote, “the simple sum or mean of various subtest scores is a datum without scientific interest or generality.”59 The question of scientific interest regarding a sex difference in intelligence is whether there is a sex difference on g. Jensen’s conclusion after assessing g in five major test batteries—the WAIS, the Wechsler Intelligence Scale for Children-Revised (WISC-R), the General Aptitude Test Battery (GATB), the Armed Services Vocational Aptitude Battery (ASVAB), and the British Ability Scales (BAS)—was that “the sex difference in psychometric g is either totally nonexistent or is of uncertain direction and of inconsequential magnitude.”60
Human Diversity Page 7