The Ascent of Babel: An Exploration of Language, Mind, and Understanding

Page 17

by Gerry T. M. Altmann

Reading the stonemason text, there were many different levels at which what you were really doing was trying to predict, subconsciously, and from one moment to the next, what was going on. In effect, these predictions constitute a kind of expectation (although as we shall see in Chapter 13, they probably are predictions). For instance, you may have conjured up an image of a sunny day, with clear-blue skies and Mediterranean scenery. None of these was explicitly mentioned, but they were each predictable, or expected, from the text. Some of these expectations may even have arisen against your better judgement:

Envied by his colleagues, the stonemason had picked his apprentice carefully. A more skilled pair of hands he had never seen. They were the colour of light marble, as was herface.

Most people reading this passage might have expected the apprentice to be male-indeed, might have predicted the apprentice to be male.

The image that is left behind in the mind's eye after reading the stonemason's story is far richer than the literal content of the story. But expectation and prediction do more than just influence what is left after a text has been read, or a conversation listened to.

When the apprentice and the stonemason decided `to eat it later', the `it' referred not to the knife that had been used unsuccessfully to cut the nougat, but to the nougat itself. Figuring this out involved nothing more than the making of a prediction: of all the things mentioned-the block of stone, the olive tree, the lunch, the beer, the nougat, the knife-which could one predict would be eaten? It has to be either the lunch or the nougat, with the nougat being more likely (it was the more recently mentioned of the two). And when `the knife' was mentioned, its existence could be predicted from the meaning of `cut'. Similarly, the beer could be predicted (perhaps not with any great certainty, but we shall come to that in a moment) on the basis of the lunch. And the fact that the beer was cold could be predicted on the basis of both the hot day and the sequence `but fortunately the beer was . . .' (what else could plausibly be predicted at this point?). The lunch itself could have been predicted, from `they were hungry'. Even the block of stone could be predicted, given the stonemason's trade. It is surprising, really, just how little in the text could not be anticipated.

Experiments confirm the extent to which predictions are continuously being made, by showing that the time it takes to recognize a word is determined by how predictable that word is from the context in which it appears. The word `stone' in `A stonemason and his apprentice set down a block of stone . . .' would be recognized faster than the equivalent word in `A stonemason and his apprentice set down a block of wood . . .'. But does this mean that at each word we evaluate how predictable that word is from the context and from general knowledge about the ways of the world? That would surely be too much effort. What would the advantage be?

Prediction and meaning

The neural activity evoked in response to something, whether it is a word, a thing, an event, or whatever, comes to reflect not just the thing itself, but also the contexts in which the thing tends to occur. But it does not come to reflect just any aspect of those contexts. Instead, it reflects just those bits that could be predicted by the occurrence of that something. Why? Because that is the way this kind of learning works. Imagine being taught Mandarin Chinese and being shown a glass of beer that you are told is an example of `boli'. Imagine that you are then shown a bottle ofbeer and told the same thing. Presumably `boli' means `beer'. Now you are shown a glass of lager (a light-coloured beer, for readers who do not distinguish one beer from another), and you are again told that it is `boli'. You probably assume, now, that the Chinese, like Americans, do not distinguish between beer and lager. But now imagine that you are shown a glass of water, a bottle of Coke, a windowpane, and that each time you are told it is `boli'. The chances are that you do not assume that the word `boli' has several meanings, one for each of the things you were shown. Instead, you assume that `boli' means the one thing that was common to, and could be predicted by, the occurrence of `boli'-glass. This same principle will be revisited in Chapter 13 on artificial brains and what they can learn.

In effect, then, the neural activity that is evoked after many experiences of `glass' will constitute a prediction of what will correlate, in the environment, with the occurrence of that word. Generally, it is that transparent hard stuff around your beer, or in your window. But not always. It could be molten glass. In the case of `lunch', things are more complicated, because the neural activity associated with `lunch' will reflect the myriad of possibilities (or predictions) associated with lunch. And because repeated experience leads to the progressive strengthening of associations (as in the case of the bell-dinner pairing for Pavlov's dogs), the more commonly associated things (e.g. glasses, beer) will be reflected more strongly in the pattern of neural activity than the less commonly associated things (e.g. roast boar).

So the predictions that are made at each point in a text or conversation are really nothing more than the patterns of neural activity that have been evoked by the sequence of words up to that point. And all that business of wanting things to be coherent, and wanting to link everything to everything else, to the point of inferring links where necessary, is nothing to do with `wanting' at all. It is just an inevitable consequence of the way we acquire meaning. It is just an inevitable consequence of what meaning is.

Taking care of the loose ends

So is that it? Basically, yes. The same principles that apply to the meanings of individual words in fact apply to the meanings of combinations of words. And just as `linguist' and `fish' evoke different patterns of neural activity that reflect different experiences (and consequently different predictions), so do `a balding linguist ate' and `a very large fish ate'. In this case, the different patterns reflect not just our experiences of fish, linguists, and eating, but also our experiences of the common linguistic context: `an X ate', reflecting the fact that something, X, was doing the eating.

Eating fish, or just eating, or just fish, are things we can experience directly, and we can suppose that the neural activity evoked by hearing these words does have something in common with the activity that would be evoked by experiencing the corresponding things directly. But not all words share this property.

In the mid-1990s, it was believed that a change ofgovernment would do the country some good.

Not a single word in this sentence refers to anything that can be seen, or even, come to that, experienced. So how do we learn their meaning? In fact, no differently from any other word. With sufficient experience of the circumstances in which people use a word like `government', for instance, the neural activity evoked in response to this word will come to reflect whatever it is about those circumstances that is predictive of, and can be predicted by, the word `government'. In effect, this is just knowledge of the circumstances in which it is appropriate to use the word. And that, basically, is nothing more than what the word means. Of course, its meaning could also be learned by definition: the body of persons charged, collectively, with the duty of ruling and administration. But that is just another circumstance, which soon blends in with all the others we experience-which is why we rarely remember the definitions we have been told. Once again, it just boils down to patterns of neural activity that reflect our experience of which aspects of the contexts we have encountered, which words we have encountered, and which are predictive of which other.

But what about all that earlier talk of mental models, pieces in the model, being centre-stage, and so on? It all seems so far removed from where we have ended up (and indeed, from where we started off when thinking about the meanings of individual words). But that was all just metaphor-speak. The neural equivalent of the mental model reflects what would be common to all the experiences of all the situations in which, for instance, things get eaten, things do the eating, linguists do things, and fish have things done to them. The pieces in the model-the linguist and the fish-are again just patterns of neural activity reflecting our common experiences of the things the
y are associated with. And whether they are centre-stage or not is really a reflection of their predictability given the accumulated experience of the ways in which texts and conversations develop.

But if mental models seemed far removed from patterns of neural activity, what about that earlier talk of different kinds of meaning-of gists, of word inflections, of intonation, of shrugs? The quick-and-easy answer is that the same principles apply to them too. The neural consequences of seeing a shrug or hearing a particular intonation may simply reflect what is common across each experience of that kind of shrug or that kind of intonation.

It all sounds too good to be true. After all, a video tape is also a record of accumulated experience-instead of neural patterns, it has magnetic patterns, and these also change in response to experience. So does this mean that the video tape can also understand? No. Because the magnetic patterns, once laid down on the tape, are never changed; new patterns do not modify earlier patterns; they replace them. But imagine an alien life-form. It may be intelligent, or it may be the alien equivalent of a vegetable. How would you decide? One way would be to see how it reacted to its environment. Imagine that it started off by making random noises, but that after a while, it tailored those noises so that it only made a particular noise in the presence of something specific in its environment (perhaps an object, perhaps something happening to that object). The noises might even sound a little like the ones it heard from the humans around it. And imagine further that whenever it made something a bit like a crying noise, you gave it a banana, which it then ate. Finally, imagine that one day, instead of crying, it made the noise for banana even though there were none around. What would you deduce? That it was hungry? That it wanted a banana? That it had understood enough of our world to be able to ask for something?

What might actually have happened, in this last example, is that the associations between the noises and the things the alien encountered in its environment gradually strengthened so that each noise it heard, and subsequently produced, became more specifically `tuned to' a particular thing. When it got hungry (just another kind of neural activity), it made a noise, was given a banana, and gradually associated that hunger signal with bananas. And one day, this association had become so strong that when it got hungry, it produced the noise associated with the thing that was in turn associated with that hunger. So where is the understanding? And that is the point-what we call understanding need be nothing more than complex neural associations.

Whether all that talk of neural activity has seemed far-fetched or incomprehensible to you does not really matter, so long as the difference between the video tape and the alien banana-eater is clearthere is nothing about the video that could be mistaken for understanding, but there is something very compelling about the behaviour of the alien that allows us to talk of it as if it understood. That is the most we can ever do. We cannot see into the brains of our loved ones and point to something and say `Look: there's the understanding'. All we can say is that they behave as if they understand.

This has, by necessity, been a rather simplified account of the nature of meaning, and things are not quite as simple as they have been made out to be. Much has been skipped in order to make the ascent as smooth as possible. Sadly, the summit we have arrived at is not the true summitrecall that the Tower of Babel was never completed, and the true summit never reached. But at least the view from where we have got to is a little like the view from the actual summit (were it to exist).

There is still some way to go. We have said nothing yet about how we produce language, or how we learn the correspondence between the spoken language and the written language. A plateau may have been reached, but our ascent is not over yet.

Exercising the vocal organs

Our ability to use language at all is remarkable enough. Even more remarkable is the fact that we make sense of what we hear so effortlessly. Indeed, so effortless and automatic is it that we cannot help but make sense of what we hear-we cannot choose not to extract the meanings of the words we encounter; only if our attention wanders, or the words are from a language that we do not know, do we not automatically extract their meaning. In this respect, our ability to understand language is very different from our ability to produce it-we can exercise control over what we say and how we say it. Just as we can choose to wiggle one finger or another, or shut one eye or the other, so we can choose to open our mouth, or keep it shut.

So production is voluntary, but what is it that we voluntarily do when we speak? Basically, it seems as if we just get on with it, without much awareness of what we shall say until we actually say it. Yet, for all this lack of awareness, there is an awful lot that goes on whilst we speak. At any one moment we seem to be articulating one word, selecting the next, thinking more broadly about what we are about to say, working out the grammatical conventions we shall need, and so on. If there is any feat of language processing that seems to require the mental equivalent of juggling, this is it.

It is hard enough learning how to juggle when you can watch what a juggler is doing, see what the things being juggled are doing, and see how the two depend on one another. Imagine trying to figure it all out when all you can do is hear the juggling. Yet that is exactly what it is like to try and work out what happens during speech production. We can hear the result of the production process, but no more. We cannot see the mental input to that process.

Presumably, that input must have something in common with the end result of language understanding: in one case, meanings are generated from sounds, and in the other, sounds are generated from meanings. Of course, the route from meaning to sound cannot simply be the reverse of that from sound to meaning-after all, we hear with our ears but we do not talk with them as well. Still, one way or another, it is a well-travelled route. But what makes us take that route? What causes us to speak?

The will to speak

Asking why anyone should suddenly, out of the blue, start speaking is a little like asking why anyone should, again out of the blue, offer to help someone cross the road, or offer to carry a heavy load for an expectant mother. In each case, one of the people has a need, and the other can fulfil that need. Language is no different-the person doing the talking will generally believe that the person doing the listening either lacks the information being communicated, or has the information that is desired. The crucial element here is that the speaker must have an idea of what is (or is not) in someone else's mind. What drives the production process is the difference between what speakers have in their own minds, and what they believe to be in their listeners' minds.

Having an idea of the contents of someone else's mind is a little like having a model of whatever it is one thinks is in there-in fact, a mental model of the kind discussed in the previous chapter. The patterns of neural activity that correspond to this model may simply be based on the accumulated experience of the ways in which people (including the experiencer) have acted in many different situations. These patterns, which in effect reflect what might be in someone else's mind, may then be triggered by particular properties of the current situation. For present purposes, however, we need simply think of these models as information regarding what is presumed to be in someone else's mind. We are still a long way from really understanding those neural events and the manner in which they come about. It is enough to know that, somehow, we do generally manage to anticipate what is in someone else's mind. Exactly how we manage this extends far beyond psycholinguistic theory.

So, somehow, speakers have a model of what is in their listeners' minds. But what drives the production process is not just the discrepancy between what the speaker has in mind and what the speaker believes the listener has in mind. If it were, we would be forever shooting our mouths off with no regard for whether anyone was at all interested. Something else needs to be added to the equation. And just as the person we offer help to has to both need and want that help, so the listener must also want to listen. So speaker and listener cooperate, and signal their willi
ngness to do so when they initiate, and maintain, their spoken interaction.

In many ways, conversation is just like a game of chess-we anticipate the other's intentions, combine this with our own intentions, and then make our move. After we have done that, it becomes the other player's turn. But unlike chess, it is less obvious in conversation when it is time to take that turn, and make that move. How do we know that it is our turn? Perhaps surprisingly, the clearest signal that it is time for the listener to become the speaker has almost nothing at all to do with the content of what is said, even though it is the content that determines when the speaker is ready to give up his or her turn. In chess, the change of play occurs as soon as one player sees the other physically move his or her piece, irrespective of what that move is. Similarly, in spoken conversation what determines when the speaker has finished is, more often than not, a physical signal. Sometimes that signal is given by a shift in where the speaker is looking, towards the listener. Other times, it is a property of the speech itself-its prosody.

In English at least, there are quite distinct cues to the end of turn. Speakers tend to slow down, lengthen their syllables, and drop both pitch and loudness. Sometimes, though, the listener might think that the speaker has reached the end of their turn, even when the speaker believes otherwise. The 1982 Christmas issue of the scientific journal Nature carried an article which analysed the end-of-turn cues used by the then UK prime minister, Margaret Thatcher. She was renowned for her irritation at being frequently interrupted during interviews-her interviewers were evidently insensitive to the I-have-not-finished-yet cues she believed she was providing. Unlike Mrs Thatcher, many speakers often speed up at the ends of their sentences if they have not reached the end of their turn, to prevent the listener from trying to butt in at an otherwise inopportune moment.

‹ Prev Next ›