But these finite tables could now be put into something like alphabetical order, beginning at the most simple, and working through larger and larger ones. They could be put in a list, or counted; and this meant that all the computable numbers could be put in a list. It was not a practical proposition actually to do it, but in principle the idea was perfectly definite, and would result in the square root of three being say 678th in order, or the logarithm of π being 9369th. It was a staggering thought, since this list would include every number that could be arrived at through arithmetical operations, finding roots of equations, and using mathematical functions like sines and logarithms – every number that could possibly arise in computational mathematics. And once he had seen this, he knew the answer to Hilbert’s question. Probably it was this that he suddenly saw on the Grantchester meadow. He would have seen the answer because there was a beautiful mathematical device, ready to be taken off the shelf.
Fifty years earlier, Cantor had realised that he could put all the fractions – all the ratios or rational numbers – into a list. Naively it might be thought that there were many more fractions than integers. But Cantor showed that, in a precise sense, this was not so, for they could be counted off, and put into a sort of alphabetical order. Omitting fractions with cancelling factors, this list of all the rational numbers between 0 and 1 would begin:
1/2 1/3 1/4 2/3 1/5 1/6 2/5 3/4 1/7 3/5 1/8 2/7 4/5 1/9 3/7 1/10…
Cantor went on to invent a certain trick, called the Cantor diagonal argument, which could be used as a proof that there existed irrational numbers. For this, the rational numbers would be expressed as infinite decimals, and the list of all such numbers between 0 and 1 would then begin:
1 .5000000000000000000….
2 .3333333333333333333….
3 .2500000000000000000….
4 .6666666666666666666….
5 .2000000000000000000….
6 .1666666666666666666….
7 .4000000000000000000….
8 .7500000000000000000….
9 .1428571428571428571….
10 .6000000000000000000….
11 .1250000000000000000….
12 .2857142857142857142….
13 .8000000000000000000….
14 .1111111111111111111….
15 .4285714285714285714….
16 .1000000000000000000….
. ….
. ….
The trick was to consider the diagonal number, beginning
.5306060020040180….
and then to change each digit, as for instance by increasing each by 1 except by changing a 9 to a 0. This would give an infinite decimal beginning
.6417171131151291 ….
a number which could not possibly be rational, since it would differ from the first listed rational number in the first decimal place, from the 694th rational number in the 694th decimal place, and so forth. Therefore it could not be in the list; but the list held all the rational numbers, so the diagonal number could not be rational.
It was already well-known – it was known to Pythagoras – that there were irrational numbers. The point of Cantor’s construction was actually rather different from this. It was to show that no list could possibly contain all the ‘real numbers’, that is, all infinite decimals. For any proposed list would serve to define another infinite decimal which had been left out. Cantor’s argument showed that in a quite precise sense there were more real numbers than integers. It opened up a precise theory of what was meant by ‘infinite’.
However, the point relevant to Alan Turing’s problem was that it showed how the rational could give rise to the irrational. In exactly the same way, therefore, the computable could give rise to the uncomputable, by means of a diagonal argument. As soon as he had made that observation, Alan could see that the answer to Hilbert’s question was ‘no’. There could exist no ‘definite method’ for solving all mathematical questions. For an uncomputable number would be an example of an unsolvable problem.
There was still much work to do before his result was clear. For one thing, there was something paradoxical about the argument. The Cantor trick itself would seem to be a ‘definite method’. The diagonal number was defined clearly enough, it appeared – so why could it not be computed? How could something that was constructed in this mechanical way be uncomputable? What would go wrong, if it were attempted?
Suppose one tried to design a ‘Cantor machine’ to produce this diagonal uncomputable number. Roughly speaking, it would start with a blank tape, and write the number 1. It would then have to produce the first table, and then execute it, stopping at the first digit that it wrote, and adding on one. Then it would start again, with the number 2, produce the second table, executing it as far as the second digit, and writing this down, adding on one. It would have to continue doing this for ever, so that when its counter read ‘1000’, it would produce the thousandth table, execute it as far as the thousandth digit, add on one to this and write it down.
One part of this process could certainly be done by one of his machines. For the process of ‘looking up the entries’ in a given table, and working out what the corresponding machine would do, was itself a ‘mechanical process.’ A machine could do it. There was a difficulty in that the tables were naturally thought of in two-dimensional form, but then it was only a technical matter to encode them in a form in which they could be put on a ‘tape’. In fact, they could be encoded as integers, rather as Gödel had represented formulae and proofs as integers. Alan called them ‘description numbers’, so that there was a description number corresponding to each table. In one way this was just a technicality, a means of putting tables on to the tape, and arranging them in an ‘alphabetical order’. But underneath there lay the same powerful idea that Gödel had used, that there was no essential distinction between ‘numbers’ and operations on numbers. From a modern mathematical point of view, they were all alike symbols.
With this done, it followed that one particular machine could simulate the work done by any machine. He called it the universal machine. It would be designed to read description numbers, decode them into tables, and execute them. It could do what any other machine would have done, if it were provided with the description number of that machine on its tape. It would be a machine to do everything, which was enough to give anyone pause for thought. It was, furthermore, a machine of perfectly definite form. Alan worked out an exact table for the universal machine.
This was not the trouble with mechanising the Cantor process. The difficulty lay in the other requirement, that of producing the tables, in their ‘alphabetical order’, for the computable numbers. Suppose that the tables were encoded as description numbers. In practice, they would not use up all the integers; in fact, the system Alan devised would encode even the simplest tables into enormous numbers. But that would not matter. It would be essentially a ‘mechanical’ matter to work through the integers in turn, and to pass over those which did not correspond to proper tables. That was a technicality, almost a matter of notation. The real problem was more subtle. The question was this: given (say) the 4589th properly defined table, how could one tell that it would produce a 4589th digit? Or indeed, that it would produce any digits at all? It might trundle back and forth in a repeated cycle of operations for ever, without producing more figures. If this were the case, the Cantor machine would be stuck, and could never finish its job.
The answer was that one could not tell. There was no way of checking in advance that a table would produce an infinite sequence. There might be a method for some particular table. But there was no mechanical process – no machine – that could work on all instruction tables. There was nothing better than the prescription: ‘take the table and try it out’. But this procedure would take an infinite time to find out whether infinitely many digits emerged. There was no rule that could be applied to any table, and be guaranteed to produce the answer in a finite time, as was required for the printing of the diagonal number. The Cantor process,
therefore, could not be mechanised, and the uncomputable diagonal number could not be computed. There was no paradox after all.
Alan called the description numbers which gave rise to infinite decimals the ‘satisfactory numbers’. So he had shown that there was no definite method of identifying an ‘unsatisfactory number’. He had pinned down a clearly specified example of something Hilbert said did not exist – an unsolvable problem.
There were other ways of demonstrating that no ‘mechanical process’ could eliminate the unsatisfactory numbers. The one he himself favoured was one which brought out the connection with self-reference in the question. For supposing that such a ‘checking’ machine did exist, able to locate the unsatisfactory numbers, it could be applied to itself. But this, he showed, led to a flat contradiction. So no such checking machine could exist.
Either way, he had found an unsolvable problem, and it required only a technical step to show that this settled Hilbert’s question about mathematics, in the exact form in which it had been posed. Alan Turing had dealt the death-blow to the Hilbert programme. He had shown that mathematics could never be exhausted by any finite set of procedures. He had gone to the heart of the problem, and settled it with one simple, elegant observation.
But there was more to what he had done than a mathematical trick, or logical ingenuity. He had created something new – the idea of his machines. And correspondingly, there remained a question as to whether his definition of the machine really did include everything that could possibly be counted as a ‘definite method’. Was this repertoire of reading, writing, erasing, moving and stopping enough? It was crucial that it was so, for otherwise the suspicion would always lurk that some extension of the machine’s faculties would allow it to solve a greater range of problems. One approach to this question led him to demonstrate that his machines could certainly compute any number normally encountered in mathematics. He also showed that a machine could be set up to churn out every provable assertion within Hilbert’s formulation of mathematics. But he also gave some pages of discussion37 that were among the most unusual ever offered in a mathematical paper, in which he justified the definition by considering what people could possibly be doing when they ‘computed’ a number by thinking and writing down notes on paper:
Computing is normally done by writing certain symbols on paper. We may suppose this paper is divided into squares like a child’s arithmetic book. In elementary arithmetic the two-dimensional character of the paper is sometimes used. But such a use is always avoidable, and I think that it will be agreed that the two-dimensional character of paper is no essential of computation. I assume then that the computation is carried out on one-dimensional paper, i.e. on a tape divided into squares. I shall also suppose that the number of symbols which may be printed is finite. If we were to allow an infinity of symbols, then there would be symbols differing to an arbitrarily small extent.
An ‘infinity of symbols’, he wished to argue, did not correspond to anything in reality. It might be argued that there was an infinity of symbols, in that
an Arabic numeral such as 17 or 999999999999999 is normally treated as a single symbol. Similarly in any European language words are treated as single symbols (Chinese, however, attempts to have an enumerable infinity of symbols).
But he disposed of this objection with the observation that
The differences from our point of view between the single and compound symbols is that the compound symbols, if they are too lengthy, cannot be observed at one glance. This is in accordance with experience. We cannot tell at a glance whether 999999999999999 and 999999999999999 are the same.
Accordingly, he felt justified in restricting a machine to a finite repertoire of symbols. Next came a most important idea:
The behaviour of the computer at any moment is determined by the symbols which he is observing, and his ‘state of mind’ at that moment. We may suppose that there is a bound B to the number of symbols or squares which the computer can observe at one moment. If he wishes to observe more, he must use successive observations. We will also suppose that the number of states of mind which need to be taken into account is finite. The reasons for this are the same character as those which restrict the number of symbols. If we admitted an infinity of states of mind, some of them will be ‘arbitrarily close’ and will be confused. Again, the restriction is not one which seriously affects computation, since the use of more complicated states of mind can be avoided by writing more symbols on the tape.
The word ‘computer’ here meant only what that word meant in 1936: a person doing calculations. Elsewhere in the paper he appealed to the idea that ‘the human memory is necessarily limited,’ but this was as far as he went in a discussion of the nature of the human mind. It was a bold act of imagination, a brave suggestion that ‘states of mind’ could be counted, on which to base his argument. It was especially noteworthy because in quantum mechanics, physical states could be ‘arbitrarily close’. He continued with his discussion of the human computer:
Let us imagine the operations performed by the computer to be split up into ‘simple operations’ which are so elementary that it is not easy to imagine them further divided. Every such operation consists of some change in the physical system consisting of the computer and his tape. We know the state of the system if we know the sequence of symbols on the tape, which of these are observed by the computer (possibly with a special order) and the state of mind of the computer. We may suppose that in a simple operation not more than one symbol is altered. Any other changes can be split up into simple changes of this kind. The situation in regard to the squares whose symbols may be altered in this way is the same as in regard to the observed squares. We may, therefore, without loss of generality, assume that the squares whose symbols are changed are always ‘observed’ squares.
Besides these changes of symbols, the simple operations must include changes of distribution of observed squares. The new observed squares must be immediately recognisable by the computer. I think it is reasonable to suppose that they can only be squares whose distance from the closest of the immediately previously observed squares does not exceed a certain fixed amount. Let us say that each of the new observed squares is within L squares of an immediately previously observed square.
In connection with ‘immediate recognisability’, it may be thought that there are other kinds of square which are immediately recognisable. In particular, squares marked by special symbols might be taken as immediately recognisable. Now if these squares are marked only by single symbols there can be only a finite number of them, and we should not upset our theory by adjoining these marked squares to the observed squares. If, on the other hand, they are marked by a sequence of symbols, we cannot regard the process of recognition as a simple process. This is a fundamental point and should be illustrated. In most mathematical papers the equations and theorems are numbered. Normally the numbers do not go beyond (say) 1000. It is, therefore, possible to recognise a theorem at a glance by its number. But if the paper was very long, we might reach Theorem 157767733443477; then, further on in the paper, we might find ‘… hence (applying Theorem 157767734443477) we have …’. In order to make sure which was the relevant theorem we should have to compare the two numbers figure by figure, possibly ticking the figures off in pencil to make sure of their not being counted twice. If in spite of this it is still thought that there are other ‘immediately recognisable’ squares, it does not upset my contention so long as these squares can be found by some process of which my type of machine is capable ….
The simple operations must therefore include:
(a) Changes of the symbol on one of the observed squares
(b) Changes of one of the squares observed to another square within L squares of one of the previously observed squares.
It may be that some of these changes necessarily involve a change of state of mind. The most general single operation must therefore be taken to be one of the following:
(A) A possible c
hange (a) of symbol together with a possible change of state of mind;
(B) A possible change (b) of observed squares, together with a possible change of state of mind.
The operation actually performed is determined, as has been suggested [above] by the state of mind of the computer and the observed symbols. In particular, they determine the state of mind of the computer after the operation is carried out.
‘We may now construct a machine,’ Alan wrote, ‘to do the work of this computer.’ The drift of his argument, indeed, was obvious, with each ‘state of mind’ of the human computer being represented by a configuration of the corresponding machine.
These ‘states of mind’ being a weak point of the argument, he added an alternative justification of the idea that his machines could perform any ‘definite method’ which did not need them:
We suppose [still] that the computation is carried out on a tape; but we avoid introducing the ‘state of mind’ by considering a more physical and definite counterpart of it. It is always possible for the computer to break off from his work, to go away and forget all about it, and later to come back and go on with it. If he does this he must leave a note of instructions (written in some standard form) explaining how the work is to be continued. This note is the counterpart of the ‘state of mind’. We will suppose that the computer works in such a desultory manner that he never does more than one step at a sitting. The note of instructions must enable him to carry out one step and write the next note. Thus the state of progress of the computation at any stage is completely determined by the note of instructions and the symbols on the tape …
But these arguments were quite different. Indeed, they were complementary. The first put the spotlight upon the range of thought within the individual – the number of ‘states of mind’. The second conceived of the individual as a mindless executor of given directives. Both approached the contradiction of free will and determinism, but one from the side of internal will, the other from that of external constraints. These approaches were not explored in the paper, but left as seeds for future growth.*
Alan Turing: The Enigma: The Book That Inspired the Film The Imitation Game Page 19