The Bell Curve: Intelligence and Class Structure in American Life

Page 68

by Richard J. Herrnstein

Bearing these basics in mind, let us go back to the sloping line in the figure above. Out of mathematical necessity, we know several things about it. First, it must pass through the intersection of the zeros (which, in standard scores, correspond to the averages) for both height and weight. Second, the line would have had exactly the same slope had height been the vertical axis and weight the horizontal one. Finally, and most significant, the slope of the best-fitting line cannot be steeper than 1.0. The steepest possible best-fitting line, in other words, is one along which one unit of change in height is exactly matched by one unit of change in weight, clearly not the case in these data. Real data in the social sciences never yield a slope that steep.

In the picture, the line goes uphill to the right, but for other pairs of variables, it could go downhill. Consider a scatter diagram for, say, educational level and fertility by the age of 30. Women with more education tend to have fewer babies when they are young, compared to women with less education, as we discuss in Chapters 8 and 15. The cloud of points would decline from left to right, just the reverse of the cloud in the picture above. The downhill slope of the best-fitting line would be expressed as a negative number, but, again, it could be no steeper than—1.0.

We focus on the slope of the best-fitting line because it is the correlation coefficient—in this case, equal to .50, which is quite large by the standards of variables used by social scientists. The closer it gets to ±1.0, the stronger is the linear relationship between the standardized variables (the variables expressed as standard scores). When the two variables are mutually independent, the best-fitting line is horizontal; hence its slope is 0. Anything other than 0 signifies a relationship, albeit possibly a very weak one.

Whatever the correlation coefficient of a pair of variables is, squaring it yields another notable number. Squaring .50, for example, gives .25. The significance of the squared correlation is that it tells how much the variation in weight would decrease if we could make everyone the same height, or vice versa. If all the boys in the class were the same height, the variation in their weights would decline by 25 percent. Perhaps, if you have been compelled to be around social scientists, you have heard the phrase “explains the variance,” as in, for example, “Education explains 20 percent of the variance in income.” That figure comes from the squared correlation.

In general, the squared correlation is a measure of the mutual redundancy in a pair of variables. If they are highly correlated, they are highly redundant in the sense that knowing the value of one of them places a narrow range of possibilities for the value of the other. If they are uncorrelated or only slightly correlated, knowing the value of one tells us nothing or little about the value of the other.5

Regression Coefficients

Correlation assesses the strength of a relationship between variables. But we may want to know more about a relationship than merely its strength. We may want to know what it is. We may want to know how much of an increase in weight, for example, we should anticipate if we compare 66-inch boys with 73-inch boys. Such questions arise naturally if we are trying to explain a particular variable (e.g., annual income) in terms of the effects of another variable (e.g., educational level). How much income is another year of schooling worth? is just the sort of question that social scientists are always trying to answer.

The standard method for answering it is regression analysis, which has an intimate mathematical association with correlational analysis. If we had left the scatter diagram with its original axes—inches and pounds—instead of standardizing them, the slope of the best-fitting line would have been a regression coefficient, rather than a correlation coefficient. The figure below shows the scatter diagram with nonstandardized axes.

What a regression coefficient is telling you

Why are there two lines? Recall that the best-fitting line is the one that minimizes the aggregated distances between the data points and the line. For standardized measurements, it makes no difference whether the distances are measured along the pounds axis or the inches axis; for unstandardized measurements, it may make a difference. Hence we may get two lines, depending on which axis was used to fit the line. The two lines, which always intersect at the average values for the two variables, answer different questions. One answers the question we first posed: How much of a difference in pounds is associated with a given difference in inches (i.e., the regression of weight on height). The other one tells us how much of a difference in inches is associated with a given difference in pounds (i.e., the regression of height on weight).

Multiple Regression

Multiple regression analysis is the main way that social science deals with the multiple relationships that are the rule in social science. To get a fix on multiple regression, let us return to the high school gym for the last time. Your classmates are still scattered about the floor. Now imagine a pole, erected at the intersection of 60 inches and 90 pounds, marked in inches from 18 inches to 50 inches. For some inscrutable reason, you would like to know the impact of both height and weight on a boy’s waist size. Since imagination can defy gravity, you ask each boy to levitate until the soles of his shoes are at the elevation-that reads on the pole at the waist size of his trousers. In general, the taller and heavier boys must rise the most, the shorter and slighter ones the least, and most boys, middling in height and weight, will have middling waist sizes as well. Multiple regression is a mathematical procedure for finding that plane, slicing through the space in the gym, that minimizes the aggregated distances (in this instance, along the waist size axis) between the bottoms of the boys’ shoes and the plane.

The best-fitting plane will tilt upward toward heavy weights and tall heights. But it may tilt more along the pounds axis than along the inches axis, or vice versa. It may tilt equally for each. The slope of the tilt along each of these axes is again a regression coefficient. With two variables predicting a third, as in this example, there are two coefficients. One of them tells us how much of an increase in trouser waist size is associated with a given increase in weight, holding height constant; the other, howmuch of an increase in trouser waist size is associated with a given increase in height, holding weight constant.

With two variables predicting a third, we reach the limit of visual imagination. But the principle of multiple regression can be extended to any number of variables. Income, for example, may be related not just to education but also to age, family background, IQ, personality, business conditions, region of the country, and so on. The mathematical procedures will yield coefficients for each of them, indicating again how much of a change in income can be anticipated for a given change in any particular variable, with all the others held constant.

Logistic Regression

The text frequently resorts to a method of analysis called logistic regression. Here, we need only say what the method is for rather than what it is. Many of the variables we discuss are such things as being unemployed or not, being married or not, being a parent or not, and so on. Because they are measured in two values—corresponding to yes and no—they are called binary variables. Logistic regression is an adaptation of ordinary regression analysis tailored to the case of binary variables. (It can also be used for variables with larger numbers of discrete values.) It tells us how much change there is in the probability of being unemployed, married, and so forth, given a unit change in any given variable, holding all other variables in the analysis constant.

Appendix 2

Technical Issues Regarding the National Longitudinal Survey of Youth

This appendix provides details about the variables used in the text and about other technical issues associated with the NLSY.1 Colleagues who wish to recreate analyses will need additional information, which may be obtained from the authors.2

SURVEY YEAR, CONSTANT DOLLARS, AND SAMPLE WEIGHTS

Our use of the NLSY extends through the 1990 survey year.3

All dollar figures are expressed in 1990 dollars, using the consumer price index in
flators as reported in the 1992 edition of Statistical Abstract of the United States, Table 737.

Sample weights were employed in all analyses in the main text. We do not so note in each instance, to simplify the description. In computing scores that were based on the 11, 878 subjects who had valid scores on the Armed Forces Qualification Test (AFQT), we used the sampling weights specifically assigned for the AFQT population. For analyses based on the NLSY subjects’ status as of a given year (usually 1990), we used the sampling weights for that survey year. For analyses in which the children of NLSY women were the unit of analysis, the child’s sampling weights were used rather than the mother’s.

To make interpretation of the statistical significance easier, we replicated all the analyses in Part II using just the unweighted cross-sectional sample of whites, as reported in Appendix 4.

SCORING OF THE ARMED FORCES QUALIFICATION TEST (AFQT)

The AFQT is a combination of highly g-loaded subtests from the Armed Services Vocational Aptitude Battery (ASVAB) that serves as the armed services’ measure of cognitive ability, described in detail in Appendix 3. Until 1989, the AFQT consisted the summed raw scores of the ASVAB’s arithmetic reasoning, word knowledge, and paragraph comprehension subtests, plus half of the score on numerical operations subtest. In 1989, the armed forces decided to rescore the AFQT so that it consisted of the word knowledge, paragraph comprehension, arithmetic reasoning, and mathematics knowledge subtests. The reason for the change was to avoid the numerical operations subtest, which was both less highly g-loaded than the mathematics knowledge subtest and sensitive to small discrepancies in the time given to subjects when administering the test (numerical operations is a speeded test in which the subject completes as many arithmetic problems as possible within a time limit).

A draft of The Bell Curve was well underway when we became aware of the 1989 scoring scheme. We completed a full draft using the 1980 scoring system but decided that the revised scoring system was psychometrically superior to the old one and therefore replicated all of the analyses using the 1989 version.

Scholars who wish to replicate our analyses should note that the 1989 AFQT score as reported in the NLSY database is not the one used in the text. The NLSY’s variable is rounded to the nearest whole centile and based on the 18-to 23-year-old subset of the NLSY sample. We recomputed the AFQT from scratch using the raw subtest scores, and the population mean and standard deviation used in producing the across-ages AFQT score was based on all 11,878 subjects, not just those ages 18 to 23.4 This measure is useful for multivariate analyses in which age is also entered as an independent variable but should not be used (and is never used in the text) as a representation of an individual subject’s cognitive ability because of age-related differences in test scores (see discussion below).

Age

AFQT scores in the NLSY sample rose by an average of .07 standard deviations per year. The simplest explanation for this is that the AFQT was designed by the military for a population of recruits who would be taking the test in their late teens, and younger subjects in the NLSY sample got lower scores for the same reason that high school freshmen get lower SAT scores than high school seniors. However, a cohort effect could also be at work, whereby (because of educational or broad environmental reasons) youths born in the first half of the 1960s had lower realized cognitive ability than youths born in the last half of the 1950s. There is no empirical way of telling which reason really explains the age-related differences in the AFQT or what the mix of reasons might be. The age-related increase is not perfectly linear (it levels off in the top two years) but close enough that the age problem is best handled in the multivariate analyses by entering the subject’s birthdate as an independent variable (all the NLSY sample took the AFQT within a few months of each other in late 1980).

For all analyses except the multivariate regression analyses, we use age-equated scores. These were produced by using the sample weight as a frequency, then preparing separate distributions by birth year, expressed in centiles.5 Each subject’s rank in that population (mathematically, the “population” is the sum of the sample weights for that birth year) was divided by the population to obtain the centile where that subject fell within his birth year cohort.6

That AFQT scores vary according to education raises an additional issue: To what extent is the AFQT a measure of cognitive ability, and not just length and quality of education? We explore this issue at length in Appendix 3.

Skew

The distribution of the AFQT in either of its versions is skewed so that the high scores tend to be more closely bunched than the low scores. To put it roughly, the most intelligent people who take the test have less of an opportunity to get a high score than the least intelligent people have to get a low score. One effect is to limit artificially the maximum size of a standardized score. It is artificial because the AFQT does in fact discriminate reasonably well at the high end of the scale. For example, only 22 youths out of 11,878 in the NLSY with valid AFQT scores earned perfect scores on the subtests, representing 0.253 percent of the national population of their age (using sampling weights). In a test with a normal distribution, those youths would have had a standardized score of 2.80. But given the skew in the NLSY, it is impossible for anyone to have a standardized score higher than 1.66. The standard deviation for a high-scoring group is similarly squeezed.

A certain amount of skew is not a concern for many kinds of analysis. For the analyses in The Bell Curve, however, the difference between two groups is often expressed in terms of standard deviations, and the size of that difference was likely to be affected by skew.

We therefore computed standardized scores corrected for skew, first by computing the centile scores for the NLSY population, using sample weights as always, then assigning to each subject the standardized score corresponding to that centile in a normal distribution. We did this for both the old and new versions of the AFQT. Following armed forces’ convention, all scores greater or smaller than 3 standard deviations from the mean were set at 3 standard deviations (this affected only a small number of scores at the low end of the distribution).

The effects of correcting for skew were noticeable when expressing differences between groups. For example, for the most sensitive group comparison, between ethnic groups, the results are shown in the following table. As always when full information about means, standard deviations, and sample sizes is available, the group differences are computed using the weighted average of the groups’ standard deviations. The equation is given in note 25 for Chapter 13. The primary effect of the skew was to squeeze the standard deviation of the higher-scoring group (whites) and, in comparison, elongate the standard deviation of the lower scoring groups. Correcting for skew thus shrank both the black-white and Latino-white differences. The same phenomenon affected all comparisons involving subgroups with markedly different AFQT means. All standardized AFQT scores, for both the regression analyses and the age-equated scores, are therefore corrected for skew. In other words, each represents the standardized score in a normal distribution that corresponds to the (unrounded) centile score of the subject in the observed distribution.

Comparison of Two Versions of The AFQT, Uncorrected and Corrected for Skew

Version of the AFQT Corrected for Skew? Black Latino White Black/ White Difference Latino/ White Differences

Mean SD Mean SD Mean SD

Pre-1989 No −.97 .91 −.67 1.01 .24 .88 1.36 1.02

Yes −.90 .81 −.64 .93 .23 .92 1.25 .94

1989 revision No −.93 .87 −.67 .98 .23 .90 1.30 .99

Yes −.88 .83 −.64 .94 .22 .92 1.21 .93

The effects of the different scoring methods on ethnic differences raise a larger question that we should answer directly: How would the results presented in this book be different if we had used the 1980 version of the AFQT instead of the 1989 version? If we had not corrected for skew instead of correcting for skew? For most analyses, the answer is that the results are unaffected. But it
may also be said that whenever differences were found, the scoring procedure we used tended to produce smaller relationships between IQ and the indicators, and smaller ethnic differences, than the alternatives. We did not compute every analysis by each of the four scoring permutations, but we did replicate all of the analyses using the two extremes (1980 version uncorrected for skew and the 1989 version corrected for skew). In no instance did the 1989 version corrected for skew—the version reported in the text—yield significant findings that were not also found when using the 1980 uncorrected version. In terms of the relationships explored in this book, the 1989 version corrected for skew is the most conservative of the alternatives.

Why Not Just Use Centiles?

One way of avoiding the skew problem is to leave the AFQT scores in centiles. This was unsatisfactory, however, for we knew from collateral data that much of the important role of IQ occurs at the tails of the distribution. Using centiles throws away information about the tails. (See Appendix 1 on the normal distribution.)

THE SOCIOECONOMIC STATUS INDEX

The SES index was created with the variables that are commonly used in developing measures of socioeconomic status: education, income, and occupation. Since the purpose of the index was to measure the socioeconomic environment in which the NLSY youth was raised, the specific variables employed referred to the parents’ status: total net family income, mother’s education, father’s education, and an index of occupational status of the adults living with the subject at the age of 14. The population for the computation was limited to the 11,878 NLSY subjects with valid AFQT scores. In more detail:

‹ Prev Next ›