We might at first be tempted to assume that we retrieve words according to the phonemes they contain, if only because this fits most readily with our experience of letter-by-letter look-up in a written dictionary. But the fact that infants apparently organize what they hear in terms of syllables (see Chapter 2) offers an alternative account of the access code; perhaps as adults we break down what we hear into syllable-sized chunks, and perhaps the lexicon is organized in terms of these chunks.
The lexicon must be organized, as it develops through infancy, according to those aspects of the speech input to which the infant is sensitive. But infant sensitivities may not be the only thing to determine lexical organization. If properties of the written form of the language can determine the nature of the access code used with written dictionaries (different codes for English and Chinese), then properties of the spoken form of the language should determine the nature of the access code used with mental lexicons. But if different languages exhibit different properties with respect to their spoken structure (e.g. rhythmic structure, syllabic structure, and melodic structure), they may cause infants brought up in those different languages to develop different access procedures. And one does not have to look too far afield to find such differences. English and French are a good example.
In French, the syllable has a rather more distinctive role in terms of defining the rhythmic properties of the language than it has in English. In English, the beat coincides with the stressed syllables. A good test of which is the stressed syllable in a word with more than two syllables is to attempt to insert an infix into the word-the word `bloody' is an infix in `fan-bloody-tactic'. Infixes of this type tend to occur immediately before the stressed syllable, which is why `fantas-bloody-tic' is pretty bad! French is quite different from English; the beat tends to coincide with every syllable. So in French, the syllable is a rhythmic unit. The syllable is the smallest thing that is repeated regularly, rhythmically, with each beat. In English, the rhythmic unit is a sequence of syllables, with just one stressed syllable occurring in each such sequence. This rhythm is especially obvious in limericks. The thing that repeats on each beat is not the syllable in this case. This means that the syllable is generally more salient in French than it is in English. So perhaps French babies latch on to syllables in a way that English babies do not.
There is another important difference between English and French which also conspires to make the syllable more salient in French than it is in English. In French, the words `balcon' and `balance' (meaning `balcony' and `balance', respectively) start with different syllables. In `balcon' the first syllable is /bal/, whereas in `balance' it is /ba/. We could write these words, in order to highlight this difference, as `balcon' and `ba-lance'. But the English word `balance' is not quite the same as the French word `balance'. Whereas the /1/ clearly belongs to the second syllable in the French word, it is less clear which syllable it belongs to in the corresponding English word. If people are asked to judge which syllable it belongs to, they cannot do so reliably; in effect, the /1/ belongs to both syllables. This phenomenon is particularly predominant when the first syllable is stressed and the second is unstressed (a very common pattern in English). This means that the break-down into distinct syllables is much harder in English than it is in French.
So do adult speakers of French make use of their syllables in a way that adult speakers of English do not?
A variety of experimental studies have explored this question, and not just with English and French-other language pairs also differ in theoretically interesting ways and these have also been studied. The research has largely been conducted by Jacques Mehler and colleagues, as part of their research into the acquisition and nature of the mental lexicon. The original motivation for the research was to discover whether, in French at least, the syllable functions as a perceptual unit-that is, as the chunk that is used to both access and organize the mental lexicon.
Mehler investigated this issue by devising a syllable-monitoring task, in which French speakers were asked to press a response button as soon as they heard a word containing a particular target syllable, for instance /ba/. He would then play them either a word like `ba-lance' or a word like `bal-con' (the hyphen is used here, again, simply to indicate the syllabic structure of the word). If the syllable is indeed a basic unit of perception, then this task should be relatively easy when the first syllable of the word matches the target syllable they have to listen out for. It should be easy to match the target /ba/ against `ba-lance', or the target /bal/ against `bal-con'. On the other hand, it should be harder, and should therefore take longer, if the word does not contain the target syllable (so it should be harder to match /ba/ against `bal-con' and /bal/ against `ba-lance'). In fact, this is exactly what was found, supporting the hypothesis that adult French speakers organize what they hear into syllable-sized chunks. If they simply broke down what they heard into sequences of individual phonemes, for instance, one would not be able to explain this effect-the phoneme sequence in /ba/ exists in both words, so no difference would have been found in the time taken to match it against these two words.
So much for French. What about English? When equivalent experiments are performed with English speakers the pattern is very different; the time it takes them to respond in the syllable-monitoring task is not dependent on the relationship between the target syllable and the syllable structure of the word they hear. English speakers are not sensitive to syllabic structure in the same way that French speakers are. Apparently, French speakers break down what they hear into syllablesized chunks, whereas English speakers do not. On the face of it, then, it looks as if English and French adults do quite fundamentally different things when processing their respective languages.
Much of the evidence on infant speech perception leads to the view that babies and infants will organize what they hear according to the rhythms of their language (see Chapter 2 for more discussion of this theme). And if different languages have different rhythms, and so emphasise different aspects of the speech that the baby or infant hears, it follows quite naturally that adult speakers of these languages may organize what they hear according to these different aspects. They may process the speech they hear in fundamentally different ways. In other words, their minds will work (with respect to this aspect of speech pro cessing) in fundamentally different ways. But do they? There is another possibility, and it concerns the ways in which sounds are spoken.
Exploiting the smallest details
Different sounds are produced by changing the shape of the vocal tract (changing the position of the tongue, changing its shape, changing the position and shape of the lips, and so on). The vocal tract changes shape continuously during the utterance of a word-it does not change, stop, change again, but changes shape in a more `fluid' manner; the shape at any one time is a function of what the shape was beforehand, what the intended shape is (given the sound to be produced) and what the shape will become in order to produce the next sound. Consequently, at any one moment, the sound that is produced reflects not simply the intended sound, but also the previous sound, and the sound that is to follow. To give an example: the vowel sound in the word `worm' is different from the vowel sound in `word', because in the former, the sound is influenced by the following /m/ (the `r' is silent, and appears only in the spelling), whereas in the latter, it is influenced by the /d/. This phenomenon is termed co-articulation. So the speech that we hear does not consist of sequences of simple, individually identifiable, phonemes. In this respect it is rather different from the way in which a written word consists of a sequence of letters.
One can think of the process by which we produce speech (discussed in more detail in Chapter 10) as the stringing together of articulatory movements, or gestures. The position of the articulators (the tongue, lips, and other parts of the vocal apparatus) is determined by where they are heading within that particular articulatory gesture. Each of these gestures corresponds, more or less, to a syllable. So co-articulation is large
ly restricted to the speech contained within a single syllable. In a word like `balderdash', there will be significant co-articulation of the /1/ on the preceding vowel, whereas in a word like `balloon', there will be much less (if any at all), because the /1/ belongs to the following, stressed syllable. But what has this to do with the differences between English and French?
The fact that the syllabic structure in French is very clear means that in a French word like `ba-lance' there will be little co-articulation of the /1/ on the preceding vowel-the /1/ falls in a different syllable. In `balcon' it falls in the same syllable and co-articulation on the vowel will occur. In English, on the other hand, the fact that in `balance' the /1/ belongs to both syllables means that the vowels in both syllables will be co-articulated with the /1/-the first one because the /1/ is where the articulators are heading, and the second one because that is where the articulators have just been. So the differences in syllabic structure between the two languages are reflected in differences in coarticulation. But again, so what?
Imagine that both English speakers and French speakers (and speakers of every other language) are able to use the smallest possible details in the speech input to help distinguish between the different words in their lexicons. When French speakers hear the sequence /bal/ from `balcon', they hear (even if they are unaware of it) the co-articulation on the vowel of the following consonant. They, or more correctly their perceptual systems, should therefore be able to eliminate from the search any words like `ba-lance' which would require the vowel to be free of co-articulation (recall from Chapter 3 that vowels are not perceived categorically, so small differences between vowels can be detected). They would in fact be able to eliminate any words whose first syllable was not /bal/. English speakers hearing the /bal/ of `bal-cony' would also be able to eliminate any words whose first syllable was not /bal/. But they would not be able to eliminate words like `balance' because both `balcony' and `balance' are compatible with co-articulation of the /1/ on the preceding vowel.
In the early 1990s, William Marslen-Wilson and Paul Warren, working in Cambridge, devised an ingenious experiment to test whether co-articulated information is used in this way. Imagine taking off the final /d/ from `word' and replacing it with a /g/. When the perceptual system hears the first consonant and vowel it will think that it is hearing a word in which the next consonant is a /d/-that is what the co-articulation on the vowel indicates. It will therefore rule out words like `worm', `work', and so on, leaving just `word'. When it then encounters the final /g/, it will have to rule-out `word'. Imagine now that the /g/ had replaced the last /b/ of the made-up word `worb'. In this case, the perceptual system, on hearing that first consonant and vowel would rule out all words, as none is compatible with a following /b/. When it then encounters the /g/, there will be nothing left to rule out. Marslen-Wilson and Warren recorded someone saying `word', `worb', and `worg' (in fact, they used other examples but the principle is the same), and then replaced the final phonemes in `word' and `worb' with the final phoneme spliced from `worg'. They then asked people to listen to these composite words and to decide as quickly as possible whether or not what they were hearing was a real word. In both these cases, of course, they were not. But in the case created from `word', as opposed to the case created from `worb', as the nonword unfolded, coarticulation would lead them to think that they were hearing a real word, so they should take longer to say, subsequently, that it was not a word after all. This is exactly what was found.
So it looks as if we do use the smallest detail possible to distinguish between alternative words in the mental lexicon. And because, in French, words like `bal-con' and `ba-lance' can be distinguished on the basis of these (co-articulation) details, whereas in English words like `bal-cony' and `balance' cannot, it might look as if French speakers are doing something quite different from what English speakers are doing. In fact, they might all be doing just the same-they might both be using co-articulated information if it helps. In French it does, a lot. In English it does, but not so much.
Interpreting the dip-stick
As with any scientific endeavour, controversies exist not only in respect of the alternative theories concerned with the processes underlying language understanding, but also in respect of the utility of the different tasks that have been used in order to uncover these processes. No task is totally immune to controversy, and the syllable-monitoring task is no exception. Part of the problem concerns our assumptions about what the task taps into-cynics say that if the human mind is like an engine, then any experimental tool used to investigate that engine is a little like a dip-stick, except that we can never be sure that we are sticking it in the right place, and we can therefore never be sure about how to interpret the sticky mess that we subsequently find.
The original motivation for the syllable-monitoring task was that it would be sensitive to the processes that convert the acoustic input into a form that can then be matched against the mental lexicon-the equivalent of analysing a printed word in terms of its individual letters for the purpose of matching those letters against a written dictionary. But perhaps the syllable-monitoring task is instead sensitive to information that is in fact contained within the lexicon itself. After all, when we access a word in the lexicon, we must retrieve all sorts of information about that word, including some representation of the articulatory gestures required to utter that word. And perhaps people's responses in the syllable-monitoring task are determined by the availability of that information. If the auditory signal is compatible with many words whose first-syllable-gestures are the same as the target syllable-gesture (and few whose first-syllable-gestures are different), people may be able to respond quickly. If the auditory signal is compatible with many words whose first-syllable-gestures are not the same as the target syllable-gesture, people may respond more slowly.
There is one further piece in the jigsaw. If the French sensitivity to syllabic structure is no more than a consequence of the presence or absence of co-articulation, what happens if people respond so fast that they could not yet have encountered the co-articulated information on the vowel before initiating their response? In this case, they might hear the consonant-vowel sequence /ba/ but not so much of the vowel that they could hear any co-articulation from a subsequent /1/. At this very early stage, it would not be possible, if listening to `bal-con', to eliminate from the search `ba-lance'. In effect, they would find the relevant page in the dictionary, but would not have yet reached the information that would allow any finer discriminations within that page. So both words would be compatible with the input so far. Consequently, this input would be compatible both with the target syllable /ba/ and with the target syllable /bal/. In fact, subsequent analyses of the French data found exactly this-the faster responses did not vary as a function of syllabic match or mismatch.
What should we now make of the syllable's role in French? Is the syllable represented mentally only for the purposes of speech production (see Chapter 10)? Does it have no direct role in the processes that match the acoustic input against the mental lexicon? Currently, it is impossible to tell. Whether we say that French speakers can exploit syllables whereas English speakers cannot, or that French speakers can exploit co-articulation in ways that English speakers cannot, is really the same. Co-articulation reflects syllabic structure. The two are, in many respects, inseparable.
So what should we now conclude? What progress has been made in our attempts to unlock the access code? The evidence is far from clear, and mainly circumstantial. But what would count as progress? We could have unambiguously discovered, for instance, that the speech input is broken down into a sequence of syllables which are then matched against the lexicon. Or we could have discovered that the input is broken down into sequences of phonemes, and that these phonemes are matched against the lexicon. Instead, we have discovered that even more subtle information than that can influence which words stay in, or are eliminated from, the lexical search. Of course, this does not tell us whether t
his (co-articulated) information is used directly in the lexical search, or whether it is used simply to anticipate the next phoneme, or the end of the syllable. These other units may none the less form the basis for the lexical search. But crucially, the system acts as if the smallest detail is mapped directly onto the lexicon. We know the currency that we are dealing in, we just cannot guarantee what denomination the notes come in.
There is nothing incompatible between the finding that infants are sensitive to syllabic structure, and the supposition that adults (even French adults) do not break down what they hear into syllable-sized chunks before matching these against the mental lexicon. If infants develop gradually more refined sensitivities to the structure of the speech that they hear, perhaps being able to make finer and finer discriminations, and learning which discriminations are useful and which are not (see Chapter 3), it follows that the access code used in the construction of the lexicon, and in the retrieval of information from that lexicon, may itself develop and, in effect, evolve. We can in fact only guess at what the initial organization of the lexicon might be, and how it may change as infant sensitivities themselves change. The goal of much current research into the acquisition and organization of the early lexicon is to map out these changes.
The Ascent of Babel: An Exploration of Language, Mind, and Understanding Page 8