So the cross-modal priming studies show that we activate the entries for all possible words compatible with the acoustic input. But does this not mean that there is a real danger of system overload? How do we prevent an explosion of lexical possibilities? How do we choose which possibilities are the right ones? And how is it possible that we can activate the meanings of all these words without recognizing that we have done so? Apparently, activation does not imply recognition. But if it does not, what exactly is recognition? What does it mean to say that we recognize (or even hear) a word? And when does this recognition happen? To return to the OED metaphor: it looks as if access is what goes on when we activate the possibilities, and recognition is what happens when we (somehow) determine which of these possibilities is the right one. But how do we do that?
The effects of acoustic mismatch
According to Marslen-Wilson's theory, lexical access is a kind of race; different lexical entries compete in the race, but there can be only one winner-we recognize a word when it has been identified as the winner. But for there to be a winner, there have to be losers. So what determines whether, and when, a competitor falls by the wayside?
The most obvious factor is compatibility with the acoustic input. There is extensive evidence showing that acoustic mismatch leads to a rapid decline in the activation of a lexical entry. Whereas a word like `book' might prime `page', the nonword `boog' (pronounced to rhyme with `book') would not-changing the voice onset time (see Chapter 3) of the final phoneme from a /k/ to a /g/ would be enough to cause rapid deactivation of the lexical entry for `book'. But if the smallest deviation can lead to a decline in activation (and see Chapter 5 for further examples), what is going to happen each time we hear a word pronounced slightly differently, or each time a bit (or worse still, a lot) of background noise changes the acoustic signal? There has to be some tolerance in the system.
In fact, it turns out that there is; a slight deviation does not cause a lexical entry to self-destruct, it merely causes a decline in the activation, which means that the activation can pick up again if subsequent input is still compatible with that entry. Of course, if that deviation occurs at the start of a word, it may prevent the intended word from being activated in the first place. But it is not just any small deviation that leads to this; it is the smallest acoustic deviation that could in principle distinguish between one word in the language and another-in other words, the smallest detail that would cause one phoneme to be perceived as another. Indeed, the categorical perception of phonemes discussed in Chapter 3 is an example of how variation in the acoustic signal associated with a particular phoneme is tolerated up to a certain degree, beyond which any further variation causes the sound to be perceived quite differently.
In general, then, a word can be recognized when there has been sufficient mismatch between the acoustic input and that word's competitors. Often this will be before the word's acoustic offset, but sometimes it may be after. `Ram' could continue as `ramp' or `rampart'. But if the sequence being heard was something like `The ram roamed around', the lexical entries for `ramp' and `rampart' would become deactivated when `roamed' was encountered, resulting in the eventual recognition of `ram'.
So far so good. But there is one further detail that needs to be considered. Words are rarely spoken in isolation, but are spoken in the (seamless) context of other words coming before and after. And this is important for a number of reasons, not least because people are generally quite lazy in their articulation, and the position and shape of the articulators at any one moment reflects not simply the sound to be produced at that moment, but also the sound that will be produced next. We encountered a version of this phenomenon in Chapter 5 under the guise of co-articulation. Generally, the term is used to describe how a vowel, for instance, can be `coloured' by the consonants that precede and follow it. The fact that vowels are not perceived categorically allows this colouring to be used in anticipating the identity of the following segment. But something very similar can occur when one consonant is followed by another. And this is where the problems start: if the consonant were to actually change as a result of this process, a mismatch would occur. And this would mean that we would then fail to activate the intended meaning. Just how bad is the problem?
The answer is that it is as bad as having to recognize `Hameetha- thimboo' as meaning `Hand me that thin book'. Word-final consonants such as the /d/ in `hand', the /t/ in `that' and the /k/ in `book' are often dropped completely. And instead of articulating the /n/ in `thin' by closing off the mouth with the tip of the tongue against the back of the upper teeth (and allowing air through the nasal passage), the speaker might anticipate the following /b/ and instead close off the mouth at the lips (still allowing air through the nasal passage). This would result in `thin book' being articulated as `thim book'. And because the /d/ had been dropped from `hand me', the preceding /n/ may combine with the /m/ to produce `hamee'. These kinds of changes, generally at the ends of words, are surprisingly common, although the extent to which they occur, and how they occur, can depend on the language being spoken. But if acoustic mismatch leads to the deactivation of lexical candidates, what hope is there of recognizing the intended words after these changes have occurred? If these kinds of effects are more common than not, how could we ever recognize a sentence in its entirety?
The answer, once again, is tolerance. In this case, the tolerance is context-sensitive. The nonword `thim' will activate the meaning associated with `thin', but only in the context of a following word. But it cannot be just any old word, it has to be a word in the context of which it would have made sense for what was originally an /n/ to become an /m/. Whereas the `thim' in `thim book' would activate the lexical entry for `thin', the `thim' in `thim slice' would not. This was demonstrated by another student of William Marslen-Wilson's, Gareth Gaskell, in a series of experiments using variations on the priming theme. This naturally begs the question of how the system `knows' to do this.
Linguists have produced a whole range of rules which describe the range of circumstances in which these different kinds of word-final changes can occur. The rules are complex-the Cambridge encyclopedia of language writes one such rule as: `an alveolar nasal becomes bilabial before a following bilabial consonant'. Yet despite their complexity, there has been a temptation to believe that (or at least to talk as if) the human mind runs these rules in reverse in order to recover what was originally meant. Do we really do this?
The simplest answer is `not necessarily'. And one way to imagine what we might do instead is to recall that the task of the infant is to associate sounds with meaning. The infant must therefore associate not just /thin/ with the meaning of `thin', but also /thim/ with `thin', and even /thing/ with `thin' (as in `The thin carpet was worn through', where `thin' would be pronounced as /thing/). But what is actually being associated with the meaning of `thin' is not just the sound that has been heard, but rather the sound that has been heard within a particular context. This context necessarily includes the surrounding sounds. The infant might therefore associate with the meaning of `thin' all the following: /thin/ in combination with a following /t/ (e.g. `thin tree'), /thim/ in combination with a following /b/ ('thin book'), or /thing/ in combination with a following /k/ ('thin carpet', where /k/ is the first phoneme of `carpet'). As an adult, it is then just a matter of recovering whatever meaning was associated with a particular combination of sounds.
Not surprisingly, many linguists have objected to this last possibility-it would require the infant/adult to store all possible pronunciations of each word in all possible contexts. There would be enormous numbers of combinations possible, and it would surely be much easier to simply acquire a very much smaller number of rules to do the same job, each rule applying across a whole range of words (for instance, one rule could apply to all words ending in /n/ when followed by a /b/, /p/, or /m/). Of course, in order to learn the rule, the infant would still have to be exposed to all the different pronunciations in all the different con
texts, but at least it would not have to remember each combination of pronunciation and context. And if it did remember each such combination, how much context would be required? The following phoneme? The following word? The entire utterance?
On what basis can anyone reasonably claim that rules are not run in reverse, but that the infant/adult has knowledge of all possible pronunciations in all possible contexts? Surely this would require the most enormous memory space. The fact that the entire OED, which we would also expect to take up a lot of space, fits into something that is less than a millimetre thick and just a few centimetres across does suggest that unimaginably huge volumes of information are none the less manageable. But so what? The brain is hardly a CD-ROM, and its own memory capacity may well be limited, especially given the huge amounts of memory that would be required to store information about all the different pronunciations of all the different words in all their different contexts. But even if all this information could be stored, could it feasibly be learned and feasibly be deployed? And is acquiring, storing, and deploying this information more feasible than acquiring, storing, and deploying what by comparison would be a very small number of rules?
Currently, there is no empirical way to establish conclusively which of these two possibilities we actually use. A rule-based approach appears to have all the advantages stacked up in its favour-it is low on memory, and easily deployed. The alternative approach, based on some representation of the individual pronunciations and their associated contexts, would place an excessive burden on memory, and for all sorts of reasons it has a somewhat implausible ring to it. But then again, so did the idea that we activate the meanings of all the neighbouring words we encounter during the search for the actual word that was spoken. So plausibility is not necessarily a good criterion by which to choose between the possibilities. A more useful criterion concerns the feasibility of the alternatives. This is where, in the absence of data on what we actually do, computational modelling can shed some light on the puzzle. Unfortunately, this means waiting until Chapter 13, and in the meantime we must accept that, one way or another, we can overcome the problems associated with the mispronunciation of words uttered in the context of continuous speech.
Getting at the meaning
We know something about how, and when, lexical entries are activated, and how, and when, they may become deactivated. But what information is contained within a lexical entry? How do we square a question like this with the idea that a lexical entry is simply a kind of neural circuit? Returning to the analogy of a combination lock, we can ask the same kind of question: given the arrangement of its tumblers, what information does a combination lock contain? On the one hand, there is a sense in which a combination lock itself contains no information at all. It is simply a physical arrangement of potentially moveable objects. On the other hand, the precise arrangement of the tumblers determines which exact sequence will open the lock-the appropriate sequence has meaning by virtue of causing an effect to occur that is specific to that sequence, and to no other. In this sense, the combination lock does contain information, and a skilled locksmith would be able to examine the arrangement of the tumblers, and figure out, on the basis of this information, the sequence required to open the lock. Similarly, even if a lexical entry is nothing more than the neural equivalent of a combination lock, it contains information by virtue of the effect that an input sequence can have (and in Chapter 9 we shall discuss further the nature of meaning, and the nature of the effects that a word may cause). And just as we can still refer to lexical entries when what we are really talking about is some complex neural circuitry, so we can refer to meaning when what we are really talking about is the result of this circuitry becoming activated.
So lexical entries are where the meaning of a word resides. But one of the first things one notices when opening up a large written dictionary is that most words have more than one meaning. The word `pitch', for example, was introduced in Chapter 1 without any explicit definition. And yet it would be almost impossible to look up the word and not discover that it has several distinct senses or meanings: to pitch a ball; to pitch a tent; the pitch of a roof, the pitch of a musical sound; the pitch you get from distilling tar; the sales pitch; the football pitch, and so on. Presumably the mental lexicon must also reflect this multiplicity of meaning. But what are the implications for how we retrieve a single meaning? Do we activate all possible meanings of a word that is ambiguous and has more than one meaning? Do we somehow scan all the meanings (to return, momentarily, to the dictionary metaphor) until we get to the first one that is appropriate given the context in which the word occurs, ignoring any others that we have yet to get reach? Or do we somehow activate only the contextually appropriate meaning, so avoiding a cluttering of our minds with all those other, inappropriate, meanings?
In the late 1970s, David Swinney, then at Tufts University, published a paper that was to prove extremely influential. Not only did it demonstrate (after many years of bitter argument and counter-argument) that the alternative meanings of ambiguous words are activated, but it was also the first demonstration of cross-modal priming, which we encountered earlier. The specific question that Swinney considered was whether or not we activate the alternative meanings of words even when those words are heard in the context of sentences which are compatible with only one of the meanings of the word. For instance, in `He swam across to the far side of the river and scrambled up the bank before running off', it would hardly be appropriate to interpret `bank' as a financial institution. Similarly in `He walked across to the far side of the street and held up the bank before running off', it would hardly be appropriate to interpret `bank' as a river bank (or to interpret `hold up' as `support'). In order to explore what actually happens when we hear sentences such as these, Swinney played people sentences similar in principle to these ones and immediately after they heard the word `bank', he flashed up on a screen either `money' or `river'. The people knew to make a lexical decision as soon as they saw a word appear on the screen. Swinney found that, irrespective of which of the two sentences had been used, both `money' and `river' were primed. This showed that both meanings of `bank' must have been activated.
Of course, at some stage, the inappropriate meaning of `bank' must be suppressed, and, sure enough, Swinney found that if he presented the target words two or three syllables later (that is, downstream from the ambiguous word), only the target related to the contextually appropriate sense of the word was primed.
These findings aroused an enormous amount of interest, not least because some subsequent studies failed to show the same results. These studies found that in context, only the appropriate meaning was activated. Of course, certain meanings of a word will be produced more (or less) often than certain others, and the evidence suggests that the more frequent the meaning, the greater its activation. This is entirely consistent with the idea, discussed earlier in connection with Zwitserlood's experiment, that the more frequent a word, the greater the activation of its lexical entry. If the institution meaning of `bank' is more frequent than the river meaning of `bank', the institution meaning will be the more active. And if, in these studies, the inappropriate meaning is sufficiently infrequent, it might look as if the contextually inappropriate meaning has not been activated, but what has in fact happened is that its activation was so low that it was very quickly deactivated by the context. So despite some initial controversy surrounding Swinney's results, the general consensus is that they are right-we do activate all meanings of an ambiguous word.
At around the same time that Swinney performed his cross-modal priming experiments, Michael Tanenhaus and colleagues at Wayne State University in Detroit performed a similar experiment, using words that were ambiguous between a noun (e.g. `watch' as a time-piece) and a verb (e.g. `watch' as a kind of looking). In a sentence like `John began to watch the game', only a verb can follow the fragment `John began to ...'. Armed with this information, we could scan a written dictionary and look only at the entr
y for `watch'-as-verb, ignoring the entry for `watch'-as-noun, and hence ignoring the time-piece meaning of `watch'. But does the same thing happen when we search the mental lexicon? Can we eliminate from the lexical search all the words whose syntactic categories are inappropriate given the preceding words in the sentence? Apparently not. Tanenhaus found that the alternative meanings of `watch', related to time-piece and looking, were activated when people listened to sequences such as 'John began to'. So knowledge of the type of word that must follow (that is, knowledge of its syntactic category) is not used to help constrain the possibilities. But why not? Is this some arbitrary property of the workings of the mental lexicon? Or is there some reason behind this?
The Ascent of Babel: An Exploration of Language, Mind, and Understanding Page 10