by James Gleick
In the Soviet Union, still moderately isolated from the rest of the world’s science, Kolmogorov was well placed to carry the banner of information. He was in charge of all mathematics in the Great Soviet Encyclopedia, choosing the authors, editing the articles, and writing much of it himself. In 1956 he delivered a long plenary report on the theory of information transmission to the Soviet Academy of Sciences. His colleagues thought this was a bit “addled”—that Shannon’s work was “more technology than mathematics,”♦ as Kolmogorov recalled it afterward. “It is true,” he said, “that Shannon left to his successors the rigorous ‘justification’ of his ideas in some difficult cases. However, his mathematical intuition was amazingly precise.” Kolmogorov was not as enthusiastic about cybernetics. Norbert Wiener felt a kinship with him—they had both done early work on stochastic processes and Brownian motion. On a visit to Moscow, Wiener said, “When I read the works of Academician Kolmogorov, I feel that these are my thoughts as well, this is what I wanted to say. And I know that Academician Kolmogorov has the same feeling when reading my works.”♦ But the feeling was evidently not shared. Kolmogorov steered his colleagues toward Shannon instead. “It is easy to understand that as a mathematical discipline cybernetics in Wiener’s understanding lacks unity,” he said, “and it is difficult to imagine productive work in training a specialist, say a postgraduate student, in cybernetics in this sense.”♦ He already had real results to back up his instincts: a useful generalized formulation of Shannon entropy, and an extension of his information measure to processes in both discrete and continuous time.
Prestige in Russia was finally beginning to flow toward any work that promised to aid electronic communication and computing. Such work began almost in a void. Pragmatic electrical engineering barely existed; Soviet telephony was notoriously dismal, a subject for eternally bitter Russian humor. As of 1965, there was still no such thing as direct long-distance dialing. The number of toll calls nationally had yet to surpass the number of telegrams, a milestone that had been reached in the United States before the end of the previous century. Moscow had fewer telephones per capita than any major world city. Nonetheless, Kolmogorov and his students generated enough activity to justify a new quarterly journal, Problems of Information Transmission, devoted to information theory, coding theory, theory of networks, and even information in living organisms. The inaugural issue opened with Kolmogorov’s “Three Approaches to the Definition of the Concept ‘Amount of Information’”—almost a manifesto—which then began its slow journey toward the awareness of mathematicians in the West.
“At each given moment there is only a fine layer between the ‘trivial’ and the impossible,”♦ Kolmogorov mused in his diary. “Mathematical discoveries are made in this layer.” In the new, quantitative view of information he saw a way to attack a problem that had eluded probability theory, the problem of randomness. How much information is contained in a given “finite object”? An object could be a number (a series of digits) or a message or a set of data.
He described three approaches: the combinatorial, the probabilistic, and the algorithmic. The first and second were Shannon’s, with refinements. They focused on the probability of one object among an ensemble of objects—one particular message, say, chosen from a set of possible messages. How would this work, Kolmogorov wondered, when the object was not just a symbol in an alphabet or a lantern in a church window but something big and complicated—a genetic organism, or a work of art? How would one measure the amount of information in Tolstoy’s War and Peace? “Is it possible to include this novel in a reasonable way in the set of ‘all possible novels’ and further to postulate the existence of a certain probability distribution in this set?”♦ he asked. Or could one measure the amount of genetic information in, say, the cuckoo bird by considering a probability distribution in the set of all possible species?
His third approach to measuring information—the algorithmic—avoided the difficulties of starting with ensembles of possible objects. It focused on the object itself.♦♦ Kolmogorov introduced a new word for the thing he was trying to measure: complexity. As he defined this term, the complexity of a number, or message, or set of data is the inverse of simplicity and order and, once again, it corresponds to information. The simpler an object is, the less information it conveys. The more complexity, the more information. And, just as Gregory Chaitin did, Kolmogorov put this idea on a solid mathematical footing by calculating complexity in terms of algorithms. The complexity of an object is the size of the smallest computer program needed to generate it. An object that can be produced by a short algorithm has little complexity. On the other hand, an object needing an algorithm every bit as long as the object itself has maximal complexity.
A simple object can be generated—or computed, or described—with just a few bits. A complex object requires an algorithm of many bits. Put this way, it seemed obvious. But until now it had not been understood mathematically. Kolmogorov put it this way:
The intuitive difference between “simple” and “complicated” objects has apparently been perceived a long time ago. On the way to its formalization, an obvious difficulty arises: something that can be described simply in one language may not have a simple description in another and it is not clear what method of description should be chosen.♦
That difficulty is solved by using computer language. It does not matter which computer language, because they are all equivalent, reducible to the language of a universal Turing machine. The Kolmogorov complexity of an object is the size, in bits, of the shortest algorithm needed to generate it. This is also the amount of information. And it is also the degree of randomness—Kolmogorov declared “a new conception of the notion ‘random’ corresponding to the natural assumption that randomness is the absence of regularity.”♦ The three are fundamentally equivalent: information, randomness, and complexity—three powerful abstractions, bound all along like secret lovers.
For Kolmogorov, these ideas belonged not only to probability theory but also to physics. To measure the complexity of an orderly crystal or a helter-skelter box of gas, one could measure the shortest algorithm needed to describe the state of the crystal or gas. Once again entropy was the key. Kolmogorov had a useful background in difficult physical problems to which these new methods could be applied. In 1941 he had produced the first useful, though flawed, understanding of the local structure of turbulent flows—equations to predict the distribution of whorls and eddies. He had also worked on perturbations in planetary orbits, another problem surprisingly intractable for classical Newtonian physics. Now he began laying the groundwork for the renaissance in chaos theory to come in the 1970s: analyzing dynamical systems in terms of entropy and information dimension. It made sense now to say that a dynamical system produces information. If it is unpredictable, it produces a great deal of information.
Kolmogorov knew nothing of Gregory Chaitin, nor did either man know of an American probability theorist named Ray Solomonoff, who had developed some of the same ideas. The world was changing. Time, distance, and language still divided mathematicians in Russia from their Western counterparts, but the gulf narrowed every year. Kolmogorov often said that no one should do mathematics after the age of sixty. He dreamed of spending his last years as a buoy keeper on the Volga, making a watery circuit in a boat with oars and a small sail.♦ When the time came, buoy keepers had switched to motorboats, and for Kolmogorov, this ruined the dream.
Now the paradoxes returned.
Zero is an interesting number. Books have been written about it. One is certainly an interesting number—it is the first and the foremost (not counting zero), the singular and unique. Two is interesting in all kinds of ways: the smallest prime, the definitive even number, the number needed for a successful marriage, the atomic number of helium, the number of candles to light on Finnish Independence Day. Interesting is an everyday word, not mathematicians’ jargon. It seems safe to say that any small number is interesting. All the two-digit numbers a
nd many of the three-digit numbers have their own Wikipedia entries.
Number theorists name entire classes of interesting numbers: prime numbers, perfect numbers, squares and cubes, Fibonacci numbers, factorials. The number 593 is more interesting than it looks; it happens to be the sum of nine squared and two to the ninth—thus a “Leyland number” (any number that can be expressed as xy + yx). Wikipedia also devotes an article to the number 9,814,072,356. It is the largest holodigital square—which is to say, the largest square number containing each decimal digit exactly once.
What would be an uninteresting number? Presumably a random number. The English number theorist G. H. Hardy randomly rode in taxi No. 1729 on his way to visit the ailing Srinivasa Ramanujan in 1917 and remarked to his colleague that, as numbers go, 1,729 was “rather a dull one.” On the contrary, replied Ramanujan (according to a standard anecdote of mathematicians), it is the smallest number expressible as the sum of two cubes in two different ways.♦ “Every positive integer is one of Ramanujan’s personal friends,” remarked J. E. Littlewood. Due to the anecdote, 1,729 is known nowadays as the Hardy-Ramanujan number. Nor is that all; 1,729 also happens to be a Carmichael number, an Euler pseudoprime, and a Zeisel number.
But even the mind of Ramanujan was finite, as is Wikipedia, as is the aggregate sum of human knowledge, so the list of interesting numbers must end somewhere. Surely there must be a number about which there is nothing special to say. Wherever it is, there stands a paradox: the number we may describe, interestingly, as “the smallest uninteresting number.”
This is none other than Berry’s paradox reborn, the one described by Bertrand Russell in Principia Mathematica. Berry and Russell had devilishly asked, What is the least integer not nameable in fewer than nineteen syllables? Whatever this number is, it can be named in eighteen syllables: the least integer not nameable in fewer than nineteen syllables. Explanations for why a number is interesting are ways of naming the number: “the square of eleven,” for example, or “the number of stars in the American flag.” Some of these names do not seem particularly helpful, and some are rather fuzzy. Some are pure mathematical facts: whether, for example, a number is expressible as the sum of two cubes in two different ways. But some are facts about the world, or about language, or about human beings, and they may be accidental and ephemeral—for example, whether a number corresponds to a subway stop or a date in history.
Chaitin and Kolmogorov revived Berry’s paradox in inventing algorithmic information theory. An algorithm names a number. “The paradox originally talks about English, but that’s much too vague,”♦ Chaitin says. “I pick a computer-programming language instead.” Naturally he picks the language of a universal Turing machine.
And then what does it mean, how do you name an integer? Well, you name an integer by giving a way to calculate it. A program names an integer if its output is that integer—you know, it outputs that integer, just one, and then it stops.
Asking whether a number is interesting is the inverse of asking whether it is random. If the number n can be computed by an algorithm that is relatively short, then n is interesting. If not, it is random. The algorithm PRINT 1 AND THEN PRINT 100 ZEROES generates an interesting number (a googol). Similarly, FIND THE FIRST PRIME NUMBER, ADD THE NEXT PRIME NUMBER, AND REPEAT A MILLION TIMES generates a number that is interesting: the sum of the first million primes. It would take a Turing machine a long time to compute that particular number, but a finite time nonetheless. The number is computable.
But if the most concise algorithm for n is “PRINT [n]”—an algorithm incorporating the entire number, with no shorthand—then we may say that there is nothing interesting about n. In Kolmogorov’s terms, this number is random—maximally complex. It will have to be patternless, because any pattern would provide a way to devise a shorthand algorithm. “If there is a small, concise computer program that calculates the number, that means it has some quality or characteristic that enables you to pick it out and to compress it into a smaller algorithmic description,” Chaitin says. “So that’s unusual; that’s an interesting number.”
But is it unusual? Looking generally at all the numbers, how can a mathematician know whether the interesting ones are rare or common? For that matter, looking at any one number, can a mathematician ever know for sure whether a smaller algorithm might be found? For Chaitin, these were the critical questions.
He answered the first with a counting argument. The vast majority of numbers have to be uninteresting because there cannot possibly be enough concise computer programs to go around. Count them. Given 1,000 bits (say), one has 21000 numbers; but not nearly that many useful computer programs can be written in 1,000 bits. “There are a lot of positive integers,” Chaitin says. “If the programs have to be smaller, then there just aren’t enough of them to name all those different positive integers.” So most n’s of any given length are random.
The next question was far more troubling. Knowing that most numbers are random, and given any particular number n, can mathematicians prove it to be random? They cannot tell by looking at it. They can often prove the opposite, that n is interesting: in that case they just have to find a short algorithm that generates n. (Technically, it must be shorter than log2n bits, the number needed to write n in binary.) Proving the negative is a different story. “Even though most positive integers are uninteresting,” Chaitin declared, “you can never be sure.… You can only prove it in a small number of cases.” One could imagine trying to do it by brute force, writing down every possible algorithm and testing them one by one. But a computer will have to perform the tests—an algorithm testing other algorithms—and soon, Chaitin demonstrated, a new version of Berry’s paradox appears. Instead of “the smallest uninteresting number,” one inevitably encounters a statement in the form of “the smallest number that we can prove cannot be named in fewer than n syllables.” (We are not really talking about syllables any more, of course, but Turing-machine states.)♦ It is another recursive, self-looping twist. This was Chaitin’s version of Gödel’s incompleteness. Complexity, defined in terms of program size, is generally uncomputable. Given an arbitrary string of a million digits, a mathematician knows that it is almost certainly random, complex, and patternless—but cannot be absolutely sure.
Chaitin did this work in Buenos Aires. When he was still a teenager, before he could graduate from City College, his parents moved back to their home in Argentina, and he got a job there with IBM World Trade. He continued to nurse his obsession with Gödel and incompleteness and to send papers to the American Mathematical Society and the Association for Computing Machinery. Eight years later, Chaitin returned to the United States to visit IBM’s research center in Yorktown Heights, New York, and placed a telephone call to his hero, then nearing seventy at the Institute for Advanced Study in Princeton. Gödel answered, and Chaitin introduced himself and said he had a new approach to incompleteness, based on Berry’s paradox instead of the liar paradox.
“It doesn’t make any difference which paradox you use,”♦ said Gödel.
“Yes, but …” Chaitin said he was on the trail of a new “information-theoretic” view of incompleteness and asked if he could call on Gödel in Princeton. He was staying in the YMCA in White Plains and would take the train, changing in New York City. Gödel agreed, but when the day came, he canceled. It was snowing, and he was fearful for his health. Chaitin never did meet him. Gödel, increasingly unstable, afraid of poisoning, died in the winter of 1978 of self-starvation.
Chaitin spent the rest of his career at the IBM Watson Research Center, one of the last great scientists to be so well supported in work of no plausible use to his corporate patron. He sometimes said he was “hiding” in a physics department; he felt that more conventional mathematicians dismissed him as “a closet physicist” anyway. His work treated mathematics as a sort of empirical science—not a Platonic pipeline to absolute truth, but a research program subject to the world’s contingencies and uncertainties. “In spite of i
ncompleteness and uncomputability and even algorithmic randomness,” he said, “mathematicians don’t want to give up absolute certainty. Why? Well, absolute certainty is like God.”♦
In quantum physics and later in chaos, scientists found the limits to their knowledge. They explored the fruitful uncertainty that at first so vexed Einstein, who did not want to believe that God plays dice with the universe. Algorithmic information theory applies the same limitations to the universe of whole numbers—an ideal, mental universe. As Chaitin put it, “God not only plays dice in quantum mechanics and nonlinear dynamics, but even in elementary number theory.”♦
Among its lessons were these:
Most numbers are random. Yet very few of them can be proved random.
A chaotic stream of information may yet hide a simple algorithm. Working backward from the chaos to the algorithm may be impossible.
Kolmogorov-Chaitin (KC) complexity is to mathematics what entropy is to thermodynamics: the antidote to perfection. Just as we can have no perpetual-motion machines, there can be no complete formal axiomatic systems.
Some mathematical facts are true for no reason. They are accidental, lacking a cause or deeper meaning.
Joseph Ford, a physicist studying the behavior of unpredictable dynamical systems in the 1980s, said that Chaitin had “charmingly captured the essence of the matter”♦ by showing the path from Gödel’s incompleteness to chaos. This was the “deeper meaning of chaos,” Ford declared:
Chaotic orbits exist but they are Gödel’s children, so complex, so overladen with information that humans can never comprehend them. But chaos is ubiquitous in nature; therefore the universe is filled with countless mysteries that man can never understand.