The Mismeasure of Man

Home > Other > The Mismeasure of Man > Page 31
The Mismeasure of Man Page 31

by Stephen Jay Gould


  There are different styles of doing science, all legitimate and partially valid. The beetle taxonomist who delights in noting the peculiarities of each new species may have little interest in reduction, synthesis, or in probing for the essence of “beetleness”—if such exists! At an opposite extreme, occupied by Spearman, the externalities of this world are only superficial guides to a simpler, underlying reality. In a popular image (though some professionals would abjure it), physics is the ultimate science of reduction to basic and quantifiable causes that generate the apparent complexity of our material world. Reductionists like Spearman, who work in the so-called soft sciences of organismic biology, psychology, or sociology, have often suffered from “physics envy.” They have strived to practice their science according to their clouded vision of physics—to search for simplifying laws and basic particles. Spearman described his deepest hopes for a science of cognition (1923, p. 30):

  Deeper than the uniformities of occurrence which are noticeable even without its aid, it [science] discovers others more abstruse, but correspondingly more comprehensive, upon which the name of laws is bestowed.… When we look around for any approach to this ideal, something of the sort can actually be found in the science of physics as based on the three primary laws of motion. Coordinate with this physica corporis [physics of bodies], then, we are today in search of a physica animae [physics of the soul].

  With g as a quantified, fundamental particle, psychology could take its rightful place among the real sciences. “In these principles,” he wrote in 1923 (p. 355), “we must venture to hope that the so long missing genuinely scientific foundation for psychology has at last been supplied, so that it can henceforward take its due place along with the other solidly founded sciences, even physics itself.” Spearman called his work “a Copernican revolution in point of view” (1927, p. 411) and rejoiced that “this Cinderella among the sciences has made a bold bid for the level of triumphant physics itself” (1937, p. 21).

  Spearman’s g and the theoretical justification of IQ

  Spearman, the theorist, the searcher for unity by reduction to underlying causes, often spoke in most unflattering terms about the stated intentions of IQ testers. He referred to IQ (1931) as “the mere average of sub-tests picked up and put together without rhyme or reason.” He decried the dignification of this “gallimaufry of tests” with the name intelligence. In fact, though he had described his g as general intelligence in 1904, he later abandoned the word intelligence because endless arguments and inconsistent procedures of mental testers had plunged it into irremediable ambiguity (1927, p. 412; 1950, p. 67).

  Yet it would be incorrect—indeed it would be precisely contrary to Spearman’s view—to regard him as an opponent of IQ testing. He had contempt for the atheoretical empiricism of the testers, their tendency to construct tests by throwing apparendy unrelated items together and then offering no justification for such a curious procedure beyond the claim that it yielded good results. Yet he did not deny that the Binet tests worked, and he rejoiced in the resuscitation of the subject thus produced: “By this one great investigation [the Binet scale] the whole scene was transformed. The recently despised tests were now introduced into every country with enthusiasm. And everywhere their practical application was brilliantly successful” (1914, p. 312).

  What galled Spearman was his conviction that IQ testers were doing the right thing in amalgamating an array of disparate items into a single scale, but that they refused to recognize the theory behind such a procedure and continued to regard their work as rough-and-ready empiricism.

  Spearman argued passionately that the justification for Binet testing lay with his own theory of a single g underlying all cognitive activity. IQ tests worked because, unbeknownst to their makers, they measured g with fair accuracy. Each individual test has a g-loading and its own specific information (or s), but g-loading varies from nearly zero to nearly 100 percent. Ironically, the most accurate measure of g will be the average score for a large collection of individual tests of the most diverse kind. Each measures g to some extent. The variety guarantees that s -factors of the individual tests will vary in all possible directions and cancel each other out. Only g will be left as the factor common to all tests. IQ works because it measures g.

  An explanation is at once supplied for the success of their extraordinary procedure of … pooling together tests of the most miscellaneous description. For if every performance depends on two factors, the one always varying randomly, while the other is constantly the same, it is clear that in the average the random variations will tend to neutralize one another, leaving the other, or constant factor, alone dominant (1914, p. 313; see also, 1923, p. 6, and 1927, p. 77).

  Binet’s “hotchpot of multitudinous measurements” was a correct theoretical decision, not only the intuitive guess of a skilled practitioner: “In such wise this principle of making a hotchpot, which might seem to be the most arbitrary and meaningless procedure imaginable, had really a profound theoretical basis and a supremely practical utility” (Spearman quoted in Tuddenham, 1962, p. 503).

  Spearman’s g, and its attendant claim that intelligence is a single, measurable entity, provided the only promising theoretical justification that hereditarian theories of IQ have ever had. As mental testing rose to prominence during the early twentieth century, it developed two traditions of research that Cyril Burt correctly identified in 1914 (p. 36) as correlational methods (factor analysis) and age-scale methods (IQ testing). Hearnshaw has recendy made the same point in his biography of Burt (1979, p. 47): “The novelty of the 1900’s was not in the concept of intelligence itself, but in its operational definition in terms of correlational techniques, and in the devising of practicable methods of measurement.”

  No one recognized better than Spearman the intimate connection between his model of factor analysis and hereditarian interpretations of IQ testing. In his 1914 Eugenics Review article, he prophesied the union of these two great traditions in mental testing: “Each of these two lines of investigation furnishes a peculiarly happy and indispensable support to the other.… Great as has been the value of the Simon-Binet tests, even when worked in theoretical darkness, their efficiency will be multiplied a thousand-fold when employed with a full light upon their essential nature and mechanism.” When Spearman’s style of factor analysis came under attack late in his career (see pp. 326–332), he defended g by citing it as the rationale for IQ: “Statistically, this determination is grounded on its extreme simpleness. Psychologically, it is credited with affording the sole base for such useful concepts as those of ‘general ability,’ or ‘IQ’”(1939, p. 79).

  To be sure, the professional testers did not always heed Spearman’s plea for an adoption of g as the rationale for their work. Many testers abjured theory and continued to insist on practical utility as the justification for their efforts. But silence about theory does not connote an absence of theory. The reification of IQ as a biological entity has depended upon the conviction that Spearman’s g measures a single, scalable, fundamental “thing” residing in the human brain. Many of the more theoretically inclined mental testers have taken this view (see Terman et al., 1917, p. 152). C. C. Brigham did not base his famous recantation solely upon a belated recognition that the army mental tests had considered patent measures of culture as inborn properties (pp. 262—263). He also pointed out that no strong, single g could be extracted from the combined tests, which, therefore, could not have been measures of intelligence after all (Brigham, 1930). And I will at least say this for Arthur Jensen: he recognizes that his hereditarian theory of IQ depends upon the validity of g, and he devotes much of his major book (1979) to a defense of Spearman’s argument in its original form, as do Richard Herrnstein and Charles Murray in The Bell Curve (1994)—see essays at end of this book. A proper understanding of the conceptual errors in Spearman’s formulation is a prerequisite for criticizing hereditarian claims about IQ at their fundamental level, not merely in the tangled minutiae of statistic
al procedures.

  Spearman’s reification of g

  Spearman could not rest content with the idea that he had probed deeply under the empirical results of mental tests and found a single abstract factor underlying all performance. Nor could he achieve adequate satisfaction by identifying that factor with what we call intelligence itself.* Spearman felt compelled to ask more of his g: it must measure some physical property of the brain; it must be a “thing” in the most direct, material sense. Even if neurology had found no substance to identify with g, the brain’s performance on mental tests proved that such a physical substrate must exist. Thus, caught up in physics envy again, Spearman described his own “adventurous step of deserting all actually observable phenomena of the mind and proceeding instead to invent an underlying something which—by analogy with physics—has been called mental energy” (1927, p. 89).

  Spearman looked to the basic property of g—its influence in varying degree, upon mental operations—and tried to imagine what physical entity best fitted such behavior. What else, he argued, but a form of energy pervading the entire brain and activating a set of specific “engines,” each with a definite locus. The more energy, the more general activation, the more intelligence. Spearman wrote (1923, p. 5):

  This continued tendency to success of the same person throughout all variations of both form and subject matter—that is to say, throughout all conscious aspects of cognition whatever—appears only explicable by some factor lying deeper than the phenomena of consciousness. And thus there emerges the concept of a hypothetical general and purely quantitative factor underlying all cognitive performances of any kind.… The factor was taken, pending further information, to consist in something of the nature of an “energy” or “power” which serves in common the whole cortex (or possibly, even, the whole nervous system).”

  If g pervades the entire cortex as a general energy, then the s-factors for each test must have more definite locations. They must represent specific groups of neurons, activated in different ways by the energy identified with g. The s-factors, Spearman wrote (and not merely in metaphor), are engines fueled by a circulating g.

  Each different operation must necessarily be further served by some specific factor peculiar to it. For this factor also, a physiological substrate has been suggested, namely the particular group of neurons specially serving the particular kind of operation. These neural groups would thus function as alternative “engines” into which the common supply of “energy” could be alternatively distributed. Successful action would always depend, partly on the potential of energy developed in the whole cortex, and partly on the efficiency of the specific group of neurons involved. The relative influence of these two factors could vary greatly according to the kind of operation; some kinds would depend more on the potential of the energy, others more on the efficiency of the engine (1923, pp. 5–6).

  The differing g-loadings of tests had been provisionally explained: one mental operation might depend primarily upon the character of its engine (high s and lowg-loading), another might owe its status to the amount of general energy involved in activating its engine (high g-loading).

  Spearman felt sure that he had discovered the basis of intelligence, so sure that he proclaimed his concept impervious to disproof. He expected that a physical energy corresponding with g would be found by physiologists: “There seem to be grounds for hoping that a material energy of the kind required by psychologists will some day actually be discovered” (1927, p. 407). In this discovery, Spearman proclaimed, “physiology will achieve the greatest of its triumphs” (1927, p. 408). But should no physical energy be found, still an energy there must be—but of a different sort:

  And should the worst arrive and the required physiological explanation remain to the end undiscoverable, the mental facts will none the less remain facts still. If they are such as to be best explained by the concept of an underlying energy, then this concept will have to undergo that which after all is only what has long been demanded by many of the best psychologists—it will have to be regarded as purely mental (1927, p. 408).

  Spearman, in 1927 at least, never considered the obvious alternative: that his attempt to reify g might be invalid in the first place.

  Throughout his career, Spearman tried to find other regularities of mental functioning that would validate his theory of general energy and specific engines. He enunciated (1927, p. 133) a “law of constant output” proclaiming that the cessation of any mental activity causes others of equal intensity to commence. Thus, he reasoned, general energy remains intact and must always be activating something. He found, on the other hand, that fatigue is “selectively transferred”—that is, tiring in one mental activity entails fatigue in some related areas, but not in others (1927, p. 318). Thus, fatigue cannot be attributed to “decrease in the supply of the general psycho-physiological energy,” but must represent a build up of toxins that act selectively upon certain kinds of neurons. Fatigue, Spearman proclaimed, “primarily concerns not the energy but the engines” (1927, p. 318).

  Yet, as we find so often in the history of mental testing, Spearman’s doubts began to grow until he finally recanted in his last (posthumously published) book of 1950. He seemed to pass off the theory of energy and engines as a folly of youth (though he had defended it staunchly in middle age). He even abandoned the attempt to reify factors, recognizing belatedly that a mathematical abstraction need not correspond with a physical reality. The great theorist had entered the camp of his enemies and recast himself as a cautious empiricist (1950, p. 25):

  We are under no obligation to answer such questions as: whether “factors” have any “real” existence? do they admit of genuine “measurement”? does the notion of “ability” involve at bottom any kind of cause, or power? Or is it only intended for the purpose of bare description? … At their time and in their place such themes are doubtless well enough. The senior writer himself has indulged in them not a little. Duke est desipere in loco [it is pleasant to act foolishly from time to time—a line from Horace]. But for the present purposes he has felt himself constrained to keep within the limits of barest empirical science. These he takes to be at bottom nothing but description and prediction.… The rest is mostly illumination by way of metaphor and similes.

  The history of factor analysis is strewn with the wreckage of misguided attempts at reification. I do not deny that patterns of causality may have identifiable and underlying, physical reasons, and I do agree with Eysenck when he states (1953, p. 113): “Under certain circumstances, factors may be regarded as hypothetical causal influences underlying and determining the observed relationships between a set of variables. It is only when regarded in this light that they have interest and significance for psychology.” My complaint lies with the practice of assuming that the mere existence of a factor, in itself, provides a license for causal speculation. Factorists have consistently warned against such an assumption, but our Platonic urges to discover underlying essences continue to prevail over proper caution. We can chuckle, with the beneficence of hindsight, at psychiatrist T. V. Moore who, in 1933, postulated definite genes for catatonic, deluded, manic, cognitive, and constitutional depression because his factor analysis grouped the supposed measures of these syndromes on separate axes (in Wolfle, 1940). Yet in 1972 two authors found an association of dairy production with florid vocalization on the tiny thirteenth axis of a nineteen-axis factor analysis for musical habits of various cultures—and then suggested “that this extra source of protein accounts for many cases of energetic vocalizing” (Lomax and Berkowitz, 1972, p. 232).

  Automatic reification is invalid for two major reasons. First, as I discussed briefly on pp. 282–285 and will treat in full on pp. 326–347, no set of factors has any claim to exclusive concordance with the real world. Any matrix of positive correlation coefficients can be factored, as Spearman did, in to g and a set of subsidiary factors or, as Thurstone did, into a set of “simple structure” factors that usually lack a single domi
nant direction. Since either solution resolves the same amount of information, they are equivalent in mathematical terms. Yet they lead to contrary psychological interpretations. How can we claim that one, or either, is a mirror of reality?

  Second, any single set of factors can be interpreted in a variety of ways. Spearman read his strong g as evidence for a single reality underlying all cognitive mental activity, a general energy within the brain. Yet Spearman’s most celebrated English colleague in factor analysis, Sir Godfrey Thomson, accepted Spearman’s mathematical results but consistently chose to interpret them in an opposite manner. Spearman argued that the brain could be divided into a set of specific engines, fueled by a general energy. Thomson, using the same data, inferred that the brain has hardly any specialized structure at all. Nerve cells, he argued, either fire completely or not at all—they are either off or on, with no intermediary state. Every mental test samples a random array of neurons. Tests with high g-loadings catch many neurons in the active state; others, with lowg-loadings, have simply sampled a smaller amount of unstructured brain. Thomson concluded (1939): “Far from being divided up into a few ‘unitary factors,’ the mind is a rich, comparatively undifferentiated complex of innumerable influences—on the physiological side an intricate network of possibilities of intercommunication.” If the same mathematical pattern can yield such disparate interpretations, what claim can either have upon reality?

 

‹ Prev