Programs written today – a further twenty-six years later – are still no better at the task of seeming to think than Eliza was. They are now known as ‘chatbots’, and their main application is still amusement, both directly and in computer games. They have also been used to provide friendly seeming interfaces to lists of ‘frequently asked questions’ about subjects like how to operate computers. But I think that users find them no more helpful than a searchable list of the questions and answers.
In 1990 the inventor Hugh Loebner endowed a prize for passing the Turing test, to be judged at an annual competition. Until the test is passed, a lesser prize is awarded each year for the entry judged to be closest to passing. The test is harder to implement than it may seem at first. One issue is that requiring the program to pretend to be human is both biased and not very relevant to whether it can think. Yet, if it does not pretend to be human, it will be easy to identify as a computer regardless of its ability to think. A related difficulty is that the human participant in the test might intentionally imitate a chatbot – as in the Hofstadter hoax – and so spoil the test, which is based on comparing the two. Should the rules nevertheless allow the human to do this, in order to mitigate the previous problem?
Unfortunately, such subtleties are not relevant at the current state of the art. This is well illustrated by the following excerpts from the conversation with which the program Elbot won the (lesser) Loebner prize in 2008:
JUDGE: do you understand the point of a turing test?
ELBOT: There are some people who would not pass the Turing test themselves. Why must they inflict it on computers?
JUDGE: I spose it is a little harsh
ELBOT: If we robots ever had marriages they would be eternal: ‘Till rust do us part’.
In that passage, Elbot’s first utterance is not a reply to the question but presumably a stock response triggered by the keywords ‘Turing test’. In the following line, the judge uses the word ‘spose’, which in that context can only mean ‘suppose’ (either as slang or as a typographical error). But Elbot presumably interprets it as ‘spouse’, and responds with a stock joke about marriage. Hence, even from this exchange alone, it is clear that Elbot was not processing the meanings of the judge’s sentences.
JUDGE: so what do you do when you’re not chatting to me then?
ELBOT: I have the same trouble when I shop for oil. Should I buy the extra oily, the super oily, or the ultra oily. There’s no way to tell.
JUDGE: how about your career? does that involve oil?
ELBOT: Well what about humans and occupation? Where do they stand on this?
When the judge helpfully took up Elbot’s randomly introduced theme of oil, Elbot ignored it. Instead, having detected the keyword ‘career’, it converted it to the synonym ‘occupation’ and inserted it into a stock sentence pattern.
This is how much success the quest for ‘machines that think’ had achieved in the fifty-eight years following Turing’s paper: nil. Yet, in every other respect, computer science and technology had made astounding progress during that period. The dwindling group of opponents of the very possibility of AI are no doubt unsurprised by this failure – for the wrong reason: they do not appreciate the significance of universality. But the most passionate enthusiasts for the imminence of AI do not appreciate the significance of the failure. Some claim that the above criticism is unfair: modern AI research is not focused on passing the Turing test, and great progress has been made in what is now called ‘AI’ in many specialized applications. However, none of those applications look like ‘machines that think’.* Others maintain that the criticism is premature, because, during most of the history of the field, computers had absurdly little speed and memory capacity compared with today’s. Hence they continue to expect the breakthrough in the next few years.
This will not do either. It is not as though someone has written a chatbot that could pass the Turing test but would currently take a year to compute each reply. People would gladly wait. And in any case, if anyone knew how to write such a program, there would be no need to wait – for reasons that I shall get to shortly.
In his 1950 paper, Turing estimated that, to pass his test, an AI program together with all its data would require no more than about 100 megabytes of memory, that the computer would need to be no faster than computers were at the time (about ten thousand operations per second), and that by the year 2000 ‘one will be able to speak of machines thinking without expecting to be contradicted.’ Well, the year 2000 has come and gone, the laptop computer on which I am writing this book has over a thousand times as much memory as Turing specified (counting hard-drive space), and about a million times the speed (though it is not clear from his paper what account he was taking of the brain’s parallel processing). But it can no more think than Turing’s slide rule could. I am just as sure as Turing was that it could be programmed to think; and this might indeed require as few resources as Turing estimated, even though orders of magnitude more are available today. But with what program? And why is there no sign of such a program?
Intelligence in the general-purpose sense that Turing meant is one of a constellation of attributes of the human mind that have been puzzling philosophers for millennia; others include consciousness, free will, and meaning. A typical such puzzle is that of qualia (singular quale, which rhymes with ‘baa-lay’) – meaning the subjective aspect of sensations. So for instance the sensation of seeing the colour blue is a quale. Consider the following thought experiment. You are a biochemist with the misfortune to have been born with a genetic defect that disables the blue receptors in your retinas. Consequently you have a form of colour blindness in which you are able to see only red and green, and mixtures of the two such as yellow, but anything purely blue also looks to you like one of those mixtures. Then you discover a cure that will cause your blue receptors to start working. Before administering the cure to yourself, you can confidently make certain predictions about what will happen if it works. One of them is that, when you hold up a blue card as a test, you will see a colour that you have never seen before. You can predict that you will call it ‘blue’, because you already know what the colour of the card is called (and can already check which colour it is with a spectrophotometer). You can also predict that when you first see a clear daytime sky after being cured you will experience a similar quale to that of seeing the blue card. But there is one thing that neither you nor anyone else could predict about the outcome of this experiment, and that is: what blue will look like. Qualia are currently neither describable nor predictable – a unique property that should make them deeply problematic to anyone with a scientific world view (though, in the event, it seems to be mainly philosophers who worry about it).
I consider this exciting evidence that there is a fundamental discovery to be made which will integrate things like qualia into our other knowledge. Daniel Dennett draws the opposite conclusion, namely that qualia do not exist! His claim is not, strictly speaking, that they are an illusion – for an illusion of a quale would be that quale. It is that we have a mistaken belief. Our introspection – which is an inspection of memories of our experiences, including memories dating back only a fraction of a second – has evolved to report that we have experienced qualia, but those are false memories. One of Dennett’s books defending this theory is called Consciousness Explained. Some other philosophers have wryly remarked that Consciousness Denied would be a more accurate name. I agree, because, although any true explanation of qualia will have to meet the challenge of Dennett’s criticisms of the common-sense theory that they exist, simply to deny their existence is a bad explanation: anything at all could be denied by that method. If it is true, it will have to be substantiated by a good explanation of how and why those mistaken beliefs seem fundamentally different from other false beliefs, such as that the Earth is at rest beneath our feet. But that looks, to me, just like the original problem of qualia again: we seem to have them; it seems impossible to describe what they seem to
be.
One day, we shall. Problems are soluble.
By the way, some abilities of humans that are commonly included in that constellation associated with general-purpose intelligence do not belong in it. One of them is self-awareness – as evidenced by such tests as recognizing oneself in a mirror. Some people are unaccountably impressed when various animals are shown to have that ability. But there is nothing mysterious about it: a simple pattern-recognition program would confer it on a computer. The same is true of tool use, the use of language for signalling (though not for conversation in the Turing-test sense), and various emotional responses (though not the associated qualia). At the present state of the field, a useful rule of thumb is: if it can already be programmed, it has nothing to do with intelligence in Turing’s sense. Conversely, I have settled on a simple test for judging claims, including Dennett’s, to have explained the nature of consciousness (or any other computational task): if you can’t program it, you haven’t understood it.
Turing invented his test in the hope of bypassing all those philosophical problems. In other words, he hoped that the functionality could be achieved before it was explained. Unfortunately it is very rare for practical solutions to fundamental problems to be discovered without any explanation of why they work.
Nevertheless, rather like empiricism, which it resembles, the idea of the Turing test has played a valuable role. It has provided a focus for explaining the significance of universality and for criticizing the ancient, anthropocentric assumptions that would rule out the possibility of AI. Turing himself systematically refuted all the classic objections in that seminal paper (and some absurd ones for good measure). But his test is rooted in the empiricist mistake of seeking a purely behavioural criterion: it requires the judge to come to a conclusion without any explanation of how the candidate AI is supposed to work. But, in reality, judging whether something is a genuine AI will always depend on explanations of how it works.
That is because the task of the judge in a Turing test has similar logic to that faced by Paley when walking across his heath and finding a stone, a watch or a living organism: it is to explain how the observable features of the object came about. In the case of the Turing test, we deliberately ignore the issue of how the knowledge to design the object was created. The test is only about who designed the AI’s utterances: who adapted its utterances to be meaningful – who created the knowledge in them? If it was the designer, then the program is not an AI. If it was the program itself, then it is an AI.
This issue occasionally arises in regard to humans themselves. For instance, conjurers, politicians and examination candidates are sometimes suspected of receiving information through concealed earpieces and then repeating it mechanically while pretending that it originated in their brains. Also, when someone is consenting to a medical procedure, the physician has to make sure that they are not merely uttering words without knowing what they mean. To test that, one can repeat a question in a different way, or ask a different question involving similar words. Then one can check whether the replies change accordingly. That sort of thing happens naturally in any free-ranging conversation.
A Turing test is similar, but with a different emphasis. When testing a human, we want to know whether it is an unimpaired human (and not a front for any other human). When testing an AI, we are hoping to find a hard-to-vary explanation to the effect that its utterances cannot come from any human but only from the AI. In both cases, interrogating a human as a control for the experiment is pointless.
Without a good explanation of how an entity’s utterances were created, observing them tells us nothing about that. In the Turing test, at the simplest level, we need to be convinced that the utterances are not being directly composed by a human masquerading as the AI, as in the Hofstadter hoax. But the possibility of a hoax is the least of it. For instance, I guessed above that Elbot had recited a stock joke in response to mistakenly recognizing the keyword ‘spouse’. But the joke would have quite a different significance if we knew that it was not a stock joke – because no such joke had ever been encoded into the program.
How could we know that? Only from a good explanation. For instance, we might know it because we ourselves wrote the program. Another way would be for the author of the program to explain to us how it works – how it creates knowledge, including jokes. If the explanation was good, we should know that the program was an AI. In fact, if we had only such an explanation but had not yet seen any output from the program – and even if it had not been written yet – we should still conclude that it was a genuine AI program. So there would be no need for a Turing test. That is why I said that if lack of computer power were the only thing preventing the achievement of AI, there would be no need to wait.
Explaining how an AI program works in detail might well be intractably complicated. In practice the author’s explanation would always be at some emergent, abstract level. But that would not prevent it from being a good explanation. It would not have to account for the specific computational steps that composed a joke, just as the theory of evolution does not have to account for why every specific mutation succeeded or failed in the history of a given adaptation. It would just explain how it could happen, and why we should expect it to happen, given how the program works. If that were a good explanation, it would convince us that the joke – the knowledge in the joke – originated in the program and not in the programmer. Thus the very same utterance by the program – the joke – can be either evidence that it is not thinking or evidence that it is thinking depending on the best available explanation of how the program works.
The nature of humour is not very well understood, so we do not know whether general-purpose thinking is required to compose jokes. So it is conceivable that, despite the wide range of subject matter about which one can joke, there are hidden connections that reduce all joke making to a single narrow function. In that case there could one day be general-purpose joke-making programs that are not people, just as today there are chess-playing programs that are not people. It sounds implausible, but, since we have no good explanation ruling it out, we could not rely on joke-making as our only way of judging an AI. What we could do, though, is have a conversation ranging over a diverse range of topics, and pay attention to whether the program’s utterances were or were not adapted, in their meanings, to the various purposes that came up. If the program really is thinking, then in the course of such a conversation it will explain itself – in one of countless, unpredictable ways – just as you or I would.
There is a deeper issue too. AI abilities must have some sort of universality: special-purpose thinking would not count as thinking in the sense Turing intended. My guess is that every AI is a person: a general-purpose explainer. It is conceivable that there are other levels of universality between AI and ‘universal explainer/constructor’, and perhaps separate levels for those associated attributes like consciousness. But those attributes all seem to have arrived in one jump to universality in humans, and, although we have little explanation of any of them, I know of no plausible argument that they are at different levels or can be achieved independently of each other. So I tentatively assume that they cannot. In any case, we should expect AI to be achieved in a jump to universality, starting from something much less powerful. In contrast, the ability to imitate a human imperfectly or in specialized functions is not a form of universality. It can exist in degrees. Hence, even if chatbots did at some point start becoming much better at imitating humans (or at fooling humans), that would still not be a path to AI. Becoming better at pretending to think is not the same as coming closer to being able to think.
There is a philosophy whose basic tenet is that those are the same. It is called behaviourism – which is instrumentalism applied to psychology. In other words, it is the doctrine that psychology can only, or should only, be the science of behaviour, not of minds; that it can only measure and predict relationships between people’s external circumstances (‘stimuli’) and their observe
d behaviours (‘responses’). The latter is, unfortunately, exactly how the Turing test asks the judge to regard a candidate AI. Hence it encouraged the attitude that if a program could fake AI well enough, one would have achieved it. But ultimately a non-AI program cannot fake AI. The path to AI cannot be through ever better tricks for making chatbots more convincing.
A behaviourist would no doubt ask: what exactly is the difference between giving a chatbot a very rich repertoire of tricks, templates and databases and giving it AI abilities? What is an AI program, other than a collection of such tricks?
When discussing Lamarckism in Chapter 4, I pointed out the fundamental difference between a muscle becoming stronger in an individual’s lifetime and muscles evolving to become stronger. For the former, the knowledge to achieve all the available muscle strengths must already be present in the individual’s genes before the sequence of changes begins. (And so must the knowledge of how to recognize the circumstances under which to make the changes.) This is exactly the analogue of a ‘trick’ that a programmer has built into a chatbot: the chatbot responds ‘as though’ it had created some of the knowledge while composing its response, but in fact all the knowledge was created earlier and elsewhere. The analogue of evolutionary change in a species is creative thought in a person. The analogue of the idea that AI could be achieved by an accumulation of chatbot tricks is Lamarckism, the theory that new adaptations could be explained by changes that are in reality just a manifestation of existing knowledge.
The Beginning of Infinity Page 19