Language-slotted gestures are those that can actually replace a word—as, for example, in the substitution song mentioned above with which Pike begins his 1967 book. Or imagine that you tell someone, “He [use of foot in a kicking motion] the ball,” where the gesture replaces the verb kicked; or “She [use of open hand across your face] me” (for “She slapped me”). These gestures occupy grammatical slots in an utterance and replace the grammatical units, usually words, that we otherwise expect there. They are improvised and used for particular effects in particular circumstances. They reveal the speakers’ understanding of the positions, words, and structures of their syntax. As Pike said, they show that language is a form of individual behavior, heavily influenced by culture.
Pantomime is gesture that simulates an object or an action without speech. Like gesticulations and language-slotted gestures, pantomime is also not conventionalized, meaning that its forms may vary widely.
Emblems are conventionalized gestures that function as isolated “signs,” such as the forefinger and the thumb rounding and touching at their tips to form the OK sign, or “the bird,” the upraised, solitary middle finger.
Sign languages are gesture-based languages that use gesture in a static, rather than dynamic, way. That is, there are gestural morphemes, sentences, and other grammatical units that are tightly conventionalized. Sign languages replace spoken language. They do not enhance it or interact with it—one reason that McNeill argues that spoken languages did not begin, and could not have begun, as sign languages. On the other hand, sign languages themselves make use of gesticulation, in addition to the conventional signs that form the lexical items of the grammar.
Part of what McNeill intends by the term dynamic is action-based, imagery-creating performances that are not conventionalized. Using the Vygotskian terminology that influenced him (and me, among many others), he argues that language and gesture participate in a “dialectic interaction between unlike modes of thinking” (2005, 4). This bimodal language-forming dialectic (but let us not forget other modes of language, such as affective prosody, register alternations, facial expressions, body language, and so on) is not an add-on to language. It is as much part of the multimodal whole of language as syntax.
DYNAMIC VS. STATIC COGNITION
McNeill (2005, 162) claims that “dynamic cognition is environment-sensitive in its structure. Static cognition is synchronically insensitive to environment.” I believe that he thus bridges a conceptual gap between different uses of the term cognition in modern cognitive theories. I myself have thought about this same distinction for some time. Before reading McNeill (1992), I had thought that D. Everett (1994) was perhaps the first paper to draw an explicit distinction between “dynamic” and “static” cognition. In that paper I argued that sentential grammars are “static” (learned and fixed) objects whereas the use of language in conversation and discourse was “dynamic” (131): “Discourse and sentence structures illustrate two types of cognition, dynamic vs. static and that . . . necessarily involve different theoretical constructs for their explanation. That is, they constitute distinct epistemological domains.”6 In that paper, I further defined—directly connected to what here we are calling dark matter—cognition as referring “to structures and processes which underlie reasoning, knowledge, and other higher-level, brain-caused behavior that cannot be accounted for in terms of neurons or arrangements of neurons in our present state of neurophysiological knowledge” (134).
I am pleased to learn that McNeill and I have been in agreement about the importance of these parameters for twenty years at least (though both of us could profitably have paid more attention to the dynamic aspect of even grammar and syntax that sociolinguists have discussed for so long). These parameters seem quite an important distinction to draw. Much of the debate about what is “cognitive” in the cognitive sciences could be elucidated or eliminated if researchers were to recognize this distinction. Formal linguistics and functional linguistics, for example, both study cognition but from different perspectives, if this distinction is correct—the static and the dynamic, respectively. The sociological and scientific consequences of the failure to recognize this distinction have long been with us. One wonders therefore how the history of linguistics and the cognitive sciences more generally might have been altered had gesture work impacted linguistic theories from the outset.
West Coast cognitive scientists (such as Lakoff and Fillmore at Berkeley and Langacker at UCSD) often focused on discourse-construction as an active process that engaged concepts that included, inter alia, “live vs. dead metaphors,” “framing,” “newsworthy,” “foregrounding,” and “backgrounding.” They emphasized those aspects of language that varied by specific speakers’ perceptions of context and communicative goals in real time. East Coast cognitive scientists (I am oversimplifying) at the time were concerned with greater knowledge of structure, formal semantic relationships, lexical meanings and structures, and so on. McNeill (2005, 17) sums up this approach, saying that “in this tradition, language is regarded as an object, not a process.” He rightly links the concerns of the object-oriented vs. process-oriented perspectives to the early distinction between diachronic (process) vs. synchronic (static) studies introduced by Saussure.
Another way of conceptualizing the static vs. the dynamic perspectives on cognition is in the importance of variation in each. Static, object-focused linguistics traditions think of idiolects or dialects as (at least for a moment) stable or discrete entities. It is possible in such a tradition to talk, for example, of someone’s “I(nternalized/Intensional)-language” as more than a convenient fiction, because a state of knowledge of grammar is seen as steady in some sense. But other linguists and cognitive scientists believe that the static view is a pernicious idealization. For these researchers—concerned with language as a process—variation is where the action is. From Pike’s perspective, these two forms of cognition relate to particle vs. wave manifestations of behavior, yet both are equally emic, equally governed by internalized parameters that help shape our dark matter.7 This distinction is all the more interesting to our discussion of dark matter because sentence grammars (from a Chomskyan perspective, at least) are formed from static cognitive capacities (e.g., learned constructions, rules, constraints, templates), whereas discourse principles (e.g., emphasizing the newsworthy, emphasis, intonation, use of idioms, ideophones) present dynamic cognition.8
As we move to consider gesture theory in more detail, the bedrock concept is the “growth point.” This is the core unit of language in the theory of McNeill, and it must be understood if we are to understand his theory. The growth point is the moment of synchronization where gesture and speech coincide. In McNeill (2012, 24), a growth point is described as that point where “speech and gesture are (a) synchronized, (b) co-expressive, (c) jointly form a ‘psychological predicate,’ and (d) present the same idea in opposite semiotic modes.” This description of the growth point contains several technical terms. By “co-expressive” McNeill means that for the symbols used simultaneously in gesture and speech “each symbol, in its own way, expresses the same idea” (1). By “psychological predicate” (terminology deriving from the work of Lev Vygotsky [1978]), McNeill intends the moment of the expression when “newsworthy content is differentiated from a context” (33). By “semiotic opposites” McNeill means that gestures are dynamic, created on the spur of the moment and, though influenced by culture and convention, are not themselves lexical or conventionalized units. Speech, on the other hand, contains grammar, which is highly conventionalized (grammatical rules, lexical items, etc.) and is thus “static communication.”
In short, gesture studies leave us no alternative but to see language in dynamic, process-oriented terms. It is manufactured by speakers in real time, following a culturally articulated unconscious; it is dynamic; it is not merely application of uni-modal rules, but is a multimodal holistic event. Gestures are actions and processes par excellence. Nevertheless, they do have object
properties too, showing the ultimate weakness of dichotomies. McNeill provides some of the best analysis of the object properties of gestures. Thus he defines a gesture unit as “the interval between successive rests of the limbs” (2005, 31). Like most units of human activity (Pike 1967, 82ff, 315ff), it is useful to break down gestures into margins (onset and coda) and nucleus. Thus, McNeill (and others) argues that gestures should likewise be analyzed in terms of prestroke, stroke, and poststroke. And just as onsets, codas, and nuclei in syllables may be long or short, gestures may be lengthened in their different constituents—what McNeill calls “holds.” In the prestroke—which, like the other constituents of a gesture, may be “held” to better synchronize timing with the spoken speech—the hands move from their rest position in anticipation of the gesture. The stroke is the meaningful core movement of the hands. The poststroke is the beginning of the retraction of the gesture. The work on gesture is full of rich illustrations of these gestural constituents and how they at once synchronize with and add dynamically to speech.
These constituents and holds are strong indications that utterances are tacitly designed. As Kendon (2004, 5) says, “Gestures are part of the ‘design’ of an utterance.” One of the clearest ways in which gestures show design are in the constituents of gestures (prestroke, stroke, and poststroke) and how they are often held to synchronize precisely with spoken speech. The question of how gestures are learned and controlled is every bit as interesting as those few aspects of language that linguistics as a discipline currently addresses.
Another crucial component of the dynamic theory of language and gestures that McNeill develops is the catchment (also referred to in places as a cohesive), which indicates that two temporally discontinuous portions of a discourse go together—repeating the same gesture indicates that the points with such gestures form a unit. In essence, a catchment is a way to mark continuity in the discourse through gestures (Givón 1983).9 McNeill says that:
[a] catchment is recognized when one or more gesture features occur in at least two (not necessarily consecutive) gestures. The logic is that recurrent images suggest a common discourse theme, and a discourse theme will produce gestures with recurring features . . . A catchment is a kind of thread of visuospatial imagery that runs through a discourse to reveal the larger discourse units that encompass the otherwise separate parts. (2005, 117)
In the notion of catchment, gesture theory makes one of its most important contributions to the understanding of dark matter: it underscores that speakers put the constituents of sentences to use in larger discourse functions that cannot be captured by focusing on static knowledge of sentence grammars alone. Thus catchments function simultaneously (each individual occurrence) at the level of the sentence and at the level of the discourse (the shared features of the catchment gestures), illustrating that sentences and their constituents are themselves constituents of discourses, once again reinforcing the Pikean ideas on behavior, language, and “hierarchy” (where the apex of the grammatical hierarchy is not the sentence, but rather conversations, which may contain monologic discourses as constituents10).
On the other hand, gestures are not linked to sentences and discourses merely by timing and visual features. They are also connected semantically via “lexical affiliates.” The lexical affiliate concept was first introduced by Schegloff (1984; McNeill 2005, 305) and it refers to “the word or words deemed to correspond most closely to a gesture in meaning.” Gestures generally precede the words that lexically correspond to them, thus marking the introduction of new meaning into the discourse. (An example might be a downward motion functionally preceding a word such as downward.) The dark matter control involved in linking gestures to their lexical affiliates is astounding in its subtlety and complexity. But this control reflects a knowledge of rhythm, highlighting, newsworthiness, and other ways of recognizing and attempting to communicate what is important.
To better understand the tacit relationship, or “unbreakable bond,” between speech and gestures, there are numerous experiments that look at effects that result from real or imposed sensory deficits: delayed auditory feedback (DAF), blindness, and afferent disorder (proprioceptive deficits). In DAF experiments, the subject wears headphones and hears parts of their own speech on roughly a .2-second delay, close to the length of a standard syllable in English. This produces an auditory stuttering effect. The speaker tries to adjust by slowing down (though this doesn’t help because the feedback also slows down) and by simplifying their grammar. Interestingly, the gestures produced by the speaker become more robust, more frequent, in effect taking more of the communication task upon themselves in these situations. And yet the gestures “do not lose synchrony with speech.” (McNeill 1992, 273ff). In other words, gestures are tied to speech not by their own timing but by intent and meaning of the speaker—dark matter—and by their inextricable link to the content of what is being expressed.
And this inextricability is quite special. For example, in McNeill (2005, 234ff), the case of the subject “IW” is discussed. At age nineteen, IW suddenly lost all sense of touch and proprioception below the neck due to an infection. The experiments conducted by McNeill and his colleagues show that IW is unable to control instrumental movements when he cannot see his hands (though when he can see his hands, he has learned how to use this visual information to control them in a natural-appearing manner). What is fascinating is that IW, when speaking, uses a number of (what IW refers to as) “throwaway gestures” that are well coordinated, unplanned, nonvisually reliant, speech-connected gestures. McNeill concludes that at a minimum, this case provides evidence that speech gestures are different from other uses of the hands—even other gesturing uses of the hands. However, I am nonetheless unconvinced by McNeill’s further conclusion that there is some innate thought-language-hand neural pathway in the brain. On the other hand, I am convinced that such a pathway arises developmentally, and that it is different from other pathways involving gestures and movement.
Finally, with regard to the special relationship between gestures and speech, McNeill (2012, 13) observes that not only do sighted people employ gestures when talking on the phone—showing that gestures are not simply something added for the benefit of an interlocutor—but also that the blind use gestures when speaking, indicating that we use gestures even when we cannot see them and thus that they are a vital constituent of normal speech.11 Since the blind cannot have observed gestures in their speech community, their gestures will not match up exactly to those of the local sighted culture. But the blinds’ use of gestures shows us that communication, as we stated earlier, is holistic, and that we use as much of our bodies as we are able when we are communicatively engaged. We need studies of how the blind first begin to use gestures. But to my mind the gestures of the blind simply follow from the use of our entire bodies in communication. We “feel” what we are saying in our limbs and faces, and so on. Thus DAF, the blinds’ use of gestures, and the “throwaway” gestures emerging even in the context of IW’s proprioceptive disorder suggest that the relationship between gestures and speech is, in McNeill’s words, an “unbreakable bond.”
Fascinatingly, however, though the bond may be unbreakable, it is culturally malleable. David Efron’s ([1942] 1972) work may have provided the first modern study to examine the link between culture and gesture. But it is not the only one; there is a now a sizeable literature on such effects. To take one example, de Ruiter and Wilkins (1998) and Wilkins (1999) discuss the case of Arrernte, in which the connection, or “binding,” of speech and gesture is overridden by culture and dark matter. According to de Ruiter and Wilkins, the Arrernte regularly perform gestures after the co-expressive speech. The cultural reason for this, the authors suggest, is that the Arrernte make much larger gestures physically than are found in many other cultures, using movements of the entire arm in gesturing. Thus, as the author interpret the phenomena, the larger gestures and space required by the Arrernte demand more planning time, favoring the performance o
f gestures following the relevant speech. A simpler analysis is suggested by McNeill (2005, 28ff), however; namely, that the Arrernte simply prefer the gestures to follow the speech. The lack of binding and different timing would simply be a cultural choice, a cultural value. Gestures for the Arrernte could then be interpreted similarly to the Turkana people of Kenya, also discussed by McNeill, in which gestures function in part to echo and reinforce speech, other potential cultural values, and functional enhancements of language communication. Whatever the analysis, one must appreciate the relevance and significance accorded to culture in McNeill’s and other researchers modern analyses of gesture, following on in the Boasian tradition inaugurated by Efron.
EQUIPRIMORDIALITY
In his work McNeill introduces the vital term equiprimordiality into the discussion of gestures and their relationship to the evolution of human speech. By this he intends that gestures and speech were equally and simultaneously implicated in the evolution of language. To understand this, we must ask how the growth point and the imagery-language “dialectic” evolved. Here McNeill (2012, 65ff) relies on George Herbert Mead’s (1974) seminal work of on the evolution of the mind as a social entity, with special attention to language and gestures. Mead’s (1974, 47ff) claim on gestures is that they “become significant symbols when they implicitly arouse in an individual making them the same response which they explicitly arouse in other individuals” (this was probably written in the 1920s). McNeill’s insight is to take Mead’s conjecture and tie it in with Rizzolatti and Arbib’s (1998) discussion of the involvement of mirror neurons in language. What McNeill claims is that Rizzolatti and Arbib missed a crucial step, which he refers to as “Mead’s loop,” wherein one’s own gestures are responded to by one’s own mirror neurons in the same way that these neurons respond to the actions of others, thus bringing one’s own actions into the realm of the social and contributing crucially to the development of a theory of mind—being able to interpret the actions of others under the assumption that others have minds like we do and think according to similar processes. Thus McNeill at once links his research program and the evolution of language more generally to the brain and society in an interesting and unique way, also highlighting the ineffable cerebral, as well as social connections in the formation of language, culture, and dark matter.
Dark Matter of the Mind Page 33