Dark Matter of the Mind
Page 34
According to McNeill (2012, 69), Mead’s loop entails that “speech and gesture had to evolve together. . . . There could not have been gesture-first or speech-first12” [emphasis in the original]. This follows, he claims, because Mead’s loop creates a “dual semiotic”: “To create the dual semiotic of Mead’s Loop, they [speech and gesture] had to be equiprimordial.” Mead’s loop made possible the dynamic aspects of speech as well as the analysis of otherwise holophrastic constructions into parts, such as words, phrases, sentences, morphemes, phonetic segments, and so on. McNeill explains this by claiming that
semiotically, it [Mead’s loop] brought the gesture’s meaning into the mirror neuron area. Mirror neurons no longer were confined to the semiosis of actions. One’s own gestures . . . entered, as if it were liberating action from action and opening it to imagery in gesture. Extended by metaphoricity, the significance of imagery is unlimited. So from this one change, the meaning potential of language moved away from only action and expanded vastly. (2012, 67)
I notice a couple of things in this quote relevant to our present concerns. First, the language focuses on action and meaning rather than structure, which sets it off from—while still complementing—a great deal of linguistic analysis. Second, Mead’s loop and the growth point place compositionality in a somewhat different light in the evolution of human language. Most linguists (myself included), when asked what the great quantum leap in the evolution of language was, would have likely answered, “Compositionality.” But if McNeill is partially right here, the growth point’s evolution from Mead’s Loop is more important than compositionality. In fact, in Everett (2012a) I allude to the possibility that compositionality relies on nonlanguage-specific cognitive abilities. Interestingly and unfortunately, this is completely ignored in most recent works on the evolution of language (e.g., Fitch [2010]; but see D. Everett [forthcoming] for integration of the growth point into the understanding of language evolution, including syntax, phonology, pragmatics, etc.). But what makes this important for our current concerns is the historicity of dark matter; connections are learned and passed down transgenerationally—in this case, most likely by example.
Once we get past this initial hurdle of how gestures become meaningful for humans, other notions arise to fine-tune the evolutionary story of the gesture-speech nexus. McNeill’s theory (e.g., 1992, 311ff) takes a perspective similar to construction grammar (Goldberg 1995) in claiming that utterances—gesture/speech wholes—are initially “holophrastic.” That is, they are used as single words or unanalyzable wholes. Through reuse and the aid of gestures to focus on specific components of the construction, they are later analyzed in more detail. This leads to the generation of syntactic constituents and rules reminiscent of the discovery methods of Z. Harris (1951), Longacre (1964), and others, i.e. distributional isolability and recombination).
As gestures and speech become signs in the social space, gestures take on one of two perspectives (McNeill 2005, 34). They represent either the viewpoint of the observer/speaker (OVPT) or the viewpoint of the person being talked about, or character viewpoint (CVPT). Thus as we practice language and culture we learn these things—different viewpoints, different ways of highlighting content and attributing ownership of content.
For example, McNeill gives an example of one person retelling what they say in a cartoon of Sylvester the Cat and Tweety Bird. When their hand movements are meant to duplicate or stand for Sylvester’s movements, then their perspective is CVPT. But when their hand movements indicate their own perspective, then their perspective is OVPT.
Many researchers have speculated that gestures might have preceded speech in the evolution of human language. McNeill does not disagree entirely with this position. His reasons are similar to those suggested in Everett (2012a). Intentionality is a necessary prerequisite to language. And intentionality is shown not only in speech but also in gestures and other actions and states (e.g., anxiety, tail pointing in canines, focused attention in all species; see also D. Everett, forthcoming). One reason that gestures are used is because intentionality is focus, often involuntary. The orientation of our eyes, body, hands, and so on, vary according to the direction of our attention. This part does seem to be a very low-level biological fact, exploited by communication. The implications of McNeill’s analysis of Mead’s loop and the growth point are enormous. For one thing, if he is correct, then gesture could not have been the initial form of language. This is not to say that pre-linguistic creatures cannot express intentionality by pointing or gesturing in some way. It does mean that real linguistic communication must have always included both gestures and speech.
IMPLICATIONS FOR LANGUAGE EVOLUTION FROM GESTURES
Another interesting component of McNeill’s theory of language evolution concerns his own take on recursion. Recursion is (see D. Everett 2010b) a tool for more tightly packing information into single utterances.13 Thus he independently arrives at an important conclusion in recent debates on recursion by providing a model of language evolution and use in which recursion is useful but not essential, a very similar point to D. Everett (2005a, 2005b, 2008, 2009a, 2009b 2010a, 2010b, 2012a, 2012b, and many others).
Language evolution is of course vital to a theory of dark matter because a careful study of evolution helps us trace the origin of human knowledge of language. For example, does this knowledge wind back to Plato or to Aristotle’s conception of knowledge—that is, remembered or learned? The answer, unsurprisingly, is that it goes to Aristotle. To see this, consider the role of gestures in language evolution as expounded by McNeill. Thus, although in recent years, Tomasello (1999, 2008), Corballis (2002), Hewes (1973), and Arbib (2005), among many others, have argued that “language evolved, not from the vocal calls of our primate ancestors, but rather from their manual and facial gestures” (Corballis 2002, ix), McNeill argues that there are two theory-busting problems with the “gesture-first” theory of language evolution. First, speech did not supplant gesture. Rather, as all the work of McNeill, his students, and many, many others show, the two form an integrated system. The gesture-first origin of language predicts asynchrony between gesture and speech, since they would be separate systems. But they are synchronous and parts of a single whole. Further, code switching between gestures and speech is common. Why, if speech evolved from gestures, would the two still have this give-and-take relationship? Moreover, if the gesture-first hypothesis is correct, then why, aside from languages of the deaf, is gesture never the primary “channel” for any language in the world?
The second major problem with the gesture-first theory is that if gestures could be substituted by speech, they would not then be of the right type to form a language. This follows because in the absence of language, the available communicative gestures would have to be pantomimes. But, as McNeill makes clear throughout his trilogy, pantomime repels speech. Pantomime does not accompany speech—it fills in missing values or gaps in speech. It is used in lieu of speech.
Also, as McNeill makes clear throughout his trilogy, speech is built on a stable grammar. The only gestures which provide stability are the conventionalized and grammaticized gestures in sign languages. In this case again, however, gestures are either used instead of or to supplant speech. Summing up, had sign language or other gestures—for example, pantomimes or language-slotted gestures—preceded speech, then there would have been no functional need for speech to develop. As McNeill (2012, 60ff) puts it: “First, gesture-first must claim that speech, when it emerged, supplanted gesture; second, the gestures of gesture-first would [initially] be pantomimes, that is, gestures that simulate actions and events; such gestures do not combine with co-expressive speech as gesticulations but rather fall into other slots on the Gesture Continuum, the language-slotted and pantomime.”
One might attempt to reply that Pike’s example shows that gestures can substitute for speech. But the gestures Pike discusses are language-slotted gestures, parasitic on speech, not the type of gesture to function in place of
speech. On the other hand, Pike’s example suggests another question, namely, whether there could be “gesture-slotted speech” corresponding to speech-slotted gestures (i.e., an output in which speech substitutes for what would normally be expressed by gestures). If speech evolved from gestures, after all, this is how it would have come about. And gesture-slotted speech is not hard to imagine. For example, consider someone bilingual in American Sign Language and English substituting a spoken word for each sign, one by one, in front of an audience. Yet such an event would not really exemplify gesture-slotted language, since it would be a translation between two independent languages, not speech taking the place of gestures in a single language. This is important for our thesis here for a couple of reasons: (i) the utilitarian nature of gestures offers us a clear route to understanding their genesis and spread; and (ii) their universality is as supportive of the Aristotelian view of knowledge as learned, over the Platonic conception of knowledge as always present, exactly because they are so useful. In fact, it is even more likely that they have been learned since they are so useful they would be rediscovered time and time again, and because to propose a Platonic view in light of their learnability is to complicate the story and hence is less parsimonious.
And as they stabilize by conventionalization, such gestures become sign languages. But these are all gestures replacing speech functions and thus for speech to develop from these would make little sense either functionally or logically.
However, in spite of my overall positive view of McNeill’s reasoning about the absence of gesture-first languages, however, there seems to be something missing. If he were correct in his additional assertion or speculation that two now-extinct species of hominin had used either a gesture-first or a gesture-only language and that this is the first stage in the ontological development of modern language, then why would it be so surprising to think that Homo sapiens had also used gesture-first initially? I see no reason to believe that the path to language would have been different for either hominin species. In fact, I doubt seriously that pre-sapien species of Homo would have followed a different path, since, as D. Everett (2012a) argues, there are significant advantages of vocal vs. gesture communication.
Another question arising in connection to the equiprimordial relationship between speech and gesture is this: Is there a common, specific innate cerebral basis for language and gesture or gesticulation? McNeill (1992, 333ff) seems to think so, if I read him correctly. My own opinion is that no such evidence exists. In fact, the opposite seems to hold. For example, McNeill reviews evidence to show that the cortical proximity of speech and gesture is directly proportionate to the distance leftward where the gesture is located in Kendon’s gesture hierarchy. This means that the closer a type of gesture is tied to speech, the greater the proximity of that type to speech in the brain. Yet this is not support for any innate pathway. There is nothing in cortical proximity that could not be accounted for better by hypothesizing merely that the two are learned together and initially experienced together. On the other hand, superficially stronger evidence for the neurological connectedness of speech and gesture emerges from studies of aphasia. In people with Broca’s aphasia, meaningful gestures are produced in a choppy fashion. In Wernicke’s aphasia, on the other hand, the gestures are meaningless but fluent—at least, according to Hewes (1973). Yet this proximity again does not support a nativist “bioprogram” (a word McNeill occasionally uses) of any sort. Setting aside my discussion in D. Everett (2012a) where I argue against the very existence of these two types of aphasia (based on the simple fact that these correspond to no language-specific parts of the brain), McNeill’s data seem open to an explanation via the general principle of “adjacency” (sometimes also called “iconicity”) noted across various studies in linguistics and other disciplines, namely, that the more two things affect one another, the closer they will be to one another. This idea has applications in the understanding of morphosyntactic constituents, vowel harmony, and neurological coordinates.
McNeill discusses a variety of different types of gestures. We have discussed catchments earlier, so now the other three—iconic gestures, metaphoric gestures, and beats, are introduced. Each reveals a distinct facet of the gesture-speech relationship and its relationship to cognition and culture. And each, like gestures and speech more generally, reveal a great deal of dark matter.
McNeill (1992, 12ff) describes iconic gestures as bearing “a close formal relationship to the semantic content of speech.” Iconic gestures show that “what is depicted through gesture should be incorporated into a complete picture of a person’s thought processes.” These gestures depict or represent concrete objects to flesh out the imagery and meaning of speech—for example, making the motion and appropriate hand shapes of pulling back a bow when discussing shooting an arrow, a common occurrence in Amazonian communication.
Alongside iconic gestures, speakers also use metaphoric gestures. Metaphoric gestures are simultaneously metalinguistic (representing discourses or discourse genres, etc.) and cultural (based on what counts as a metaphor). These gestures are abstractions. For example, McNeill (1992, 14) illustrates a speaker holding up both hands, palms facing toward each other, to represent a span of speech—a story told via a cartoon in the first example McNeill uses.
A third form of gesture is the beat: “Beats mark information that does not advance the plot line but provides the structure within which the plot line unfolds” (McNeill 1992, 15ff). These can signal departures from the main event line, such as a hand movement to accompany a “summing up” of what has been said in a discourse to this point. Beats can also be used to segment discourses or to accompany phonological emphasis in the speech stream. In D. Everett (1988) I discussed the case of my language teacher, Kaioá, using gestures to indicate stressed syllables.14 Ladefoged, Ladefoged, and Everett (1997) also discuss training speakers to mark phonological beats in three different Amazonian languages. It turns out that this is not be as unusual as I thought it was at the time, since this is not an uncommon function of beat gestures, according to McNeill’s lab, among others.
McNeill (2012, 77) further suggests that “an area of life where a syntactic ability could evolve is the cultural and social encounter.” Here he cites the work of Freyd (1983) on “shareability”—the idea that structures and meanings must come to be shared among individuals if we are to say that they speak the same language (i.e., are utilizing the same outputs of conventionalization—another instance of the actuation problem). In particular, McNeill appeals to Freyd’s “discreteness filter,” an idea akin to the generative notion of discrete in the phrase discrete infinity (for criticisms of the latter, see D. Everett 2010a). The idea is that our utterances were initially holophrastic, noncompositional. Then, as humans began to learn a repertory of such utterances, these utterances changed via the GP, such that gestures would highlight some portions of the previously unanalyzable whole, leading to an analysis of the holophrastic into component parts—top-down parsing that eventually results in compositionality. This fascinates me because it presents a picture of how learning and emicization of the relationship between gestures and grammar can drive (dark matter) development of the very evolution of language.
This points to a stark difference between McNeill’s theory and other theories of language evolution (such as Hauser et al. 2002). In McNeill’s theory, the compositionality of syntax arises from actual language use via GPs, not from a sudden mysterious appearance of compositionality via recursion. In fact, in McNeill’s theory (and in Kinsella’s [2009] and D. Everett’s [2012a], among many others), compositionality precedes recursion. And this is just as the dark matter model of language predicts—language emerges from a process of emicization, reanalysis, and re-emicization in order to better satisfy communication needs (Hopper 1988; MacWhinney 2006; Steels 2005; Rosenbaum 2014; etc.). The following quote expresses this well: “Contrary to traditions both philological and Biblical, language did not begin with a ‘first word.’ Words emerg
ed from GPs. There was an emerging ability to differentiate news worthy points in contexts; a first ‘psychological predicate’ perhaps but not a first word” (McNeill 2012, 78). Ironically, by demonstrating how compositionality could have come about by use and thus entered all human languages from early human interactions, McNeill undermines the need to appeal to genetics or biology to account for this, instead supporting our account here. In the context of his discussion of compositionality, moreover, McNeill (2012, 78, 223) offers an extremely interesting discussion of how recursion itself might have entered grammar, one quite compatible with my own (D. Everett 2010a, 2010b, 2012a, 2012b). The story begins with the analyzability of holophrastic utterance via growth points.
Recursion would not have begun with gestures themselves. This is because gestures, unlike the eventually compositional static outputs of grammar, are gestalt units (though not all gesture researchers accept this). This is a fundamental difference between these dynamic units vs. static syntax. Gestures are wholes without meaningful parts. And the meaning of the whole is not derived from the meaning of the parts. Thus, although we can observe several submovements in the larger gesture, none of these smaller acts has any meaning apart from the gesture as a whole. Gestures are in this sense anticompositional.