So (at least) two things drive a conversation: the turn-taking cues, which signal when to say something; and the discrepancies between what the speaker knows and what the speaker presumes the listener to know, which signal what to say. Sometimes the two work independently, as when the speaker comes to an end and there is an awkward silence whilst the listener, sensitive to the change of turn, desperately tries to think of something pertinent to say. At the opposite extreme, the discrepancy between the participants in a conversation can be so great that turn-taking is abandoned and each participant is driven simply by the discrepancy between their own thoughts and the thoughts they attribute to the other. Over-heated arguments are a good example of this. They often consist of little more than one interruption after another, with much simultaneous speaking, and both speakers raising their voices, and pitch, to signal that neither is ready, or willing, to relinquish their turn.
This all points to the obvious fact that the desire to say something, the will to speak, is a function of having something to say, having someone to say it to, and having an opportunity to say it. But what about the content of what is said? What drives that?
Why we say what we say
In some cases, it is pretty clear what drives the content of an utterance. Ask someone the time, or the way to the cathedral, and they will say something in answer to whatever you have asked. Invite them to give a lecture or tell a story or say what they think about the government and they will say something that is a reflection of their knowledge, or their views. Put someone in a situation where they require help and the chances are they will say something intended to elicit your cooperation. Informal conversation can be a mixture of all of these. Often though, things are said in conversation for no apparent reason-things like `uhhuh', `I know', `mmm', and `yes', which are said simply to show a kind of solidarity with the speaker. Another complication is that once a topic has been sufficiently covered, and the discrepancy between speaker and hearer minimized, one of the participants in the conversation may decide to switch to a new topic. Sometimes this happens for no other reason than boredom with the earlier one. But what determines the chosen topic is harder to explain, and psycholinguistics has relatively little to say about this.
Once a topic has been chosen, it is a little easier to see what drives the content of the individual utterances that make up the conversation. Listeners, like all model builders, need more than just a list of mental pieces. They need to know what to do with the pieces. This means they need to know where, within the model, these pieces should go, and to which pre-existing pieces they should be attached. So before identifying what specific modifications need to be made to the model, the speaker has to identify which parts of the model need modifying. Paradoxically, therefore, much of the information that a speaker provides is already known to the listener, referring to information that is already in the listener's model. We tend to structure what we say (and write) to reflect this two-step process-we present what is mutually known first, and what is new second:
The prime minister who was renowned for consistently giving out misleading signals about when she had finished speaking even had been coached speak. to how on
Only the underlined text provides information that is not already known (except to those readers who are already acquainted with the story behind Margaret Thatcher's distinctive speaking style).
Occasionally, and most often in response to questions, the speaker does not have to provide any information that is already in the listener's model. When someone asks a question, they often do so because they know what to do within the model, but not to which pieces, or because they know some of the pieces which are involved, but not what to do with them. These different situations give rise to different kinds of question, and consequently, different kinds of answer:
Who was always interrupted when interviewed on TV? Margaret Thatcher.
Why was Mrs Thatcher always so irritated when interviewed on TV? Because she was always interrupted.
Depending on the question, the speaker answering it does not need to identify all of the mental pieces, or all of what to do with them. Sometimes, the speaker does not need to identify anything at all:
Is it true that Mrs Thatcher was always interrupted by her interviewers? Uh-huh.
How much information the speaker provides that is already known to the listener is one factor that determines the form of the utterance. But actually providing that information, in a way that is recognizable to the listener, is another.
The speaker has to ensure, when referring to something or someone, that the listener will be able to work out, from their own perspective, who or what is being referred to. It is no good saying `she' if the listener will not know who this refers to, or `the politician' if the listener will not know which politician. The speaker must therefore choose appropriate ways for referring to things, where `appropriate' is defined in terms of what the speaker believes the listener knows. Often, this is based on what has gone on in the conversation so far. For instance, if `The politician who was so often interrupted' had been uttered in a recent conversational turn, the speaker could use the expression `she', but only if the listener could be assumed to know who `she' was. And if the politician had last been mentioned some while back, then the speaker would need to use a different kind of referring expression, one that would enable the listener to bring the politician in question back into the attentional spotlight (such as `that politician', `the politician who. . .', `Margaret Thatcher'). Some of this was discussed in Chapter 9.
Much of what a speaker says is determined, therefore, by what the listener needs to know in order to add something new to his or her body of knowledge, and what the listener is presumed to know. Of course, for the listener to be able to do anything at all with the information supplied by the speaker, the words used to convey that information must be organized in the right way given the grammatical conventions of the language. Without these conventions, speaker and hearer might just as well give up. But then again, without any words, there would be no use for any conventions. At what stage in the whole process do words, and the way they are organized, get chosen?
Paradoxically, the best way to address these questions is to consider what happens when the speaker chooses the wrong words.
Goings-on and goings-wrong in speech production
Some of the most famous, and most often quoted speech errors were made by the Reverend William Spooner (1844-1930). He once told his congregation, for instance, that `The Lord is a shoving leopard', and his students that they had `tasted the whole worm'. Whether these were genuine errors or not is unclear-he is rumoured to have intentionally produced at least some of his errors in order to amuse his students at New College, Oxford. But these kinds of error do occur in spontaneous everyday speech. The mistakes are interesting because they say some quite surprising things about the production process. For instance, some errors tell us that, contrary to intuition, we do not first choose the words we shall utter, and then choose what order to put them in. We do it the other way around-we choose the order we will put the words in before choosing the words themselves.
The following were all genuine errors, and are borrowed from various collections of such errors that have been published in the psycholinguistic literature. In each case, the speakers blended two words with similar meanings:
It's a lot of brothel [from bother/trouble] The competition is a little stougher [from stiffer/tougher] It's difficult to valfty [from validate/verify]
When it came to saying the critical word, each speaker obviously had more than one word in mind, and ended up saying a mixture of the two. But the speaker put this mixture in the right place given the grammatical conventions that the sentence required. The speaker must have selected the appropriate grammatical conventions before mis-selecting the specific words to which those conventions applied. But why does it work this way, and how?
On the application ofgrammatical convention
The answer to this quite par
adoxical state of affairs lies in the nature of the grammatical conventions we employ during speech production. Chapter 9 suggested that our knowledge of grammar is not stored as a large list of rules nestling somewhere within the folds of our brains. Instead, it suggested that `a balding linguist ate' and `a very large fish ate' would evoke patterns of neural activity which, although different, would reflect the accumulated experience of things doing eating. These patterns would reflect the convention in English for placing the subject (the thing doing the eating) before the verb. If sequences of the form thing-action-thing were associated instead with the second thing doing the action to the first thing, then the neural activity associated with these sequences would reflect a grammatical convention which placed the object of the verb (the thing being eaten, in this case) before the verb. So grammatical conventions are embodied in the neural response to language input. But if these conventions are embodied in the response to language, how can they be applied during the production of language?
There is, currently, no complete theory that links language understanding with language production. Similarly, there is as yet no complete single theory of how the embodiment of grammatical conventions used to decode language is tied to the embodiment of those conventions when used to encode language. But there are hints. What follows should give a feel for what such a theory might look like.
During language learning, similar input sequences become associated with similar patterns of neural activity. A particular sentence structure, and hence a particular grammatical convention, becomes associated with a particular pattern of neural activity. In effect, it becomes associated with a particular kind of meaning. This process of association involves a gradual changing of the connections between the different neurons (that is how the patterns of neural activity come to change). So those conventions are actually embodied in the connections between the neurons. This means that the associations could in principle work both ways-sequences of certain kinds would become associated (that is, would give rise to) meanings of certain kinds, and working the other way, those kinds of meaning could give rise to those kinds of sequences. Given that the appropriate associations develop when we learn to extract meaning from what we hear, it is possible that the same associations (but effectively in reverse) ensure that the appropriate sequencing happens when we, in turn, wish to convey meaning with what we speak.
This is necessarily an oversimplification. In Chapter 12, on language disorders, we shall discover that the decoding of grammatical knowledge during understanding, and the encoding of grammatical knowledge during production, rely on associations that, although a little like mirror-images of one another, are actually separate. How two distinct sets of associations can, in effect, do the opposite of one another will be discussed in that chapter. For present purposes, though, whether there is one set of associations that can work in both directions, or two sets of associations that happen to work in opposite directions, is immaterial.
If it is true that decoding and encoding grammatical structure are linked in this way, why should it look as if the ordering of the words is decided before there are any words to order?
In language understanding, sequences of certain kinds evoke meanings of certain kinds. But these sequences are more than just sequences of words-they are actually sequences of concepts. Each word we hear evokes a neural response corresponding to the concept to which that word refers, and it is the changing pattern of these neural responses that corresponds to the meaning of the sequence as a whole. In production, when the associations are run in reverse, particular kinds of meaning are associated with particular orderings of the concepts that make up those meanings. In effect, you have order-grammatical convention-before you select the words that will express those concepts.
Of course, much depends on what you call `a word'. If a word is its physical form then, yes, those conventions are chosen beforehand. But if a word is its meaning (that is, is the concept to which it refers) then, no, those conventions are not really chosen before the words to which they will apply. In fact, and as we shall shortly see, words are both of these things-each word can be described in terms of a conceptual component and a physical component.
One way or another, words do eventually get uttered. How? Once again, it is informative to look at what happens when things go wrong.
Getting it wrong, again ... and again ... and again
Perhaps the most basic thing that speech errors tell us is that when uttering one word, we already have in mind words that are waiting to be uttered, as in the following examples:
joke, have you heard the Mike about ... ? Igot into this 2u with a discussion ...
In each case, the words affected by the error must have been simultaneously available for the exchange to have happened. It should not be too surprising that words (and other things) are available simultaneously-after all, we can have in mind a word like `lips', but we cannot physically say the individual phonemes /1/, /i/, /p/, and /s/ simultaneously. They have to be placed in the mental equivalent of a queue, or buffer. Similarly, when we want to convey information about who-did-what-to-whom, we necessarily have in mind the who, what, and whom. But again, it is impossible to express them they also must be stored in a queue. And each one of these may need to be expressed with more than one word ('each one of these' corresponds to a single `who'), and because they also cannot all be articulated simultaneously, they too have to be put in a queue (more of which later).
But not all accidental exchanges involve whole words:
I'd hear one if I knew it ... But she writes her slanting ...
Again, two words have exchanged position. But it is a little different in these cases, because if it were a strict exchange (that is.. of the whole word, including its physical form), the errors would have ended up as `I'd heard one if I know it' and `But she writing her slants'. What actually happened is that the information regarding the intended tense of each verb stayed in the right position (as did the affix `--ing'), but the vocabulary items to which this information should have applied exchanged position. Although the tense information was applied to the wrong word in these cases, it was applied correctly given the peculiarities of each word (the error created `knew', not `knowed').
These errors suggest that it was not actual (physical) words that changed position, rather it was some more abstract representation of those words which changed position-something similar perhaps to the concept corresponding to the words' meanings. In effect, then, there is a conceptual version of the word that appears to be separable from the physical version. Of course, if the conceptual equivalent of a word can move to the wrong place, leaving the affix in the right place, we should also find the converse error, in which the conceptual equivalents of affixes move, leaving behind the words to which they should have applied. And we do:
I disregard this as _precise [from I regard this as imprecise]
What appears to have happened is that the affix meaning `not' attached itself to the wrong word-to `regard' instead of `precise'. But, interestingly, it was not the actual affix that moved, as otherwise the sentence would have been `I imregard this . . .'. Instead, what moved was the conceptual affix-information that an affix meaning `not' was required. Although this information was applied to the wrong word, it was applied correctly, as in the previous examples, given the peculi arities of the affected word-when the physical form of the word + affix combination was chosen, the right form was chosen for that particular word.
The idiosyncratic properties of individual words are not always respected when things get out of order:
I randomed some samtily ... I hate raining on a hitchy day ...
These exchanges are quite unlike the `I'd hear one if I knew it' and `disregard' cases. One cannot create a verb by attaching a verb-like inflection ('-ed') to `random', or an adverb by attaching an adverbial inflection (`-ly') to `sample', or an adjective by attaching an adjectival inflection ('-y') to `hitch'. It is as if the stems of each word ('random' in `randomly'
, `sample' in `sampled', `rain' in `rainy', and `hitch' in `hitching') have exchanged position, leaving their affixes behind. But whereas the previous errors showed due regard for the properties of the words that were affected by the error, these errors showed no such regard. Why not? Perhaps because these errors happened at a different stage from the others. That is, they happened once the actual stems, affixes, and inflections had been chosen.
So the production of an utterance can be thought of as taking place in stages, with certain things happening before certain others. Further evidence for these distinct stages comes not from errors involving words or inflections, but from errors involving individual phonemes (these are the most frequently made errors):
The Ascent of Babel: An Exploration of Language, Mind, and Understanding Page 18