by DAVID KAHN
Various hypotheses have been proposed; none is entirely satisfactory. The theory that the most common sounds are those learned earliest in babyhood has been proven false. The suggestion that people inhabiting a cold, damp area might select a set of sounds requiring minimal lip-opening is not supported by the facts. Nor is the opposing one that a community of fishermen will evolve a language rich in vowels to better communicate over great distances and the noises of the sea. The tendency toward economy of effort explains many phenomena, such as the greater frequency of voiceless stops compared with voiced stops, but founders on other facts, such as the development in some languages of sounds harder to pronounce than those they are replacing. An entire book has discussed the correlation between the geographic distribution of the o-factor in blood and the development of the /th/ sounds in Europe to demonstrate that hereditary factors predispose populations to certain speech sounds. But, as a reviewer has pointed out, the two correspond “only because both reflect the distribution, movement and mixing of historic people and not for any causal relation” between genetics and phonetics. Theories that languages tend to fill holes in their linguistic patterns, that sounds are selected on the basis of auditory or articulatory distinctiveness, that unusual sounds gain in popularity because they are more expressive than ordinary ones, that the culture of a community (such as agricultural versus nomadic) affects its sound complement—each of these may contain a grain of truth and explain certain isolated phenomena. But no one thesis provides a single explanation of why one language evolves one set of sounds and another a different set, nor explains why each language favors certain sounds over others.
Note 2: Calculation of Redundancy
Determining the percentage of redundancy begins with the calculation of a quantity called “entropy.” Shannon borrowed this term from physics because the form of the equation that he developed for the amount of information in a set of utterances is identical with that used in physics to represent entropy. In both fields, entropy measures disorder, randomness, lack of structure. The greater the entropy, the greater the chaos. It marks a negative, dispersive tendency. Entropy is of great importance in physics because it figures in the second law of thermodynamics, perhaps the sovereign physical principle. This states that “entropy always increases”—that energy, in other words, always passes from the more organized state to the less organized, as, for example, a star dissipates its energy in radiation. This oneway flow has given entropy the epithet of “time’s arrow,” because an increase in entropy invariably means an increase in the age of an (isolated) physical system. If this process were to continue to its end within the universe, Sir James Jeans wrote, “there would be neither sunlight nor starlight, but only a cool glow of radiation uniformly diffused through space.” This state of ultimate maximum entropy has been called the “heat-death of the universe.”
Since an increase in entropy means an increase in anarchy, the language with the most entropy is the language with the greatest freedom. This is obviously a language with no rules at all to limit it. Such a language would naturally have no statistics to govern which letters would be used most often, which letters would be likely to follow a given letter, and so on. A text can be written in such a language by putting the 26 letters and the space (eliminating, for simplicity’s sake, signs of punctuation, capital letters, etc.) into an urn, drawing one item out, recording it, replacing it, stirring the elements, and redrawing. Chance alone controls, and the text would be purely random. It would look like the sample Shannon composed by just such a process:
XFOML RXKHRJFFJUJ ZLPWCFWKCYJ
FFJEYVKCQSGXYD QPAAMKBZAACIBZLHJQD
The Argentine author Jorge Luis Borges has written a haunting short story about an imaginary and infinite Library of Babel whose volumes were written in just such a random fashion. It therefore contains every possible combination of letters, and, consequently, “all that it is given to express in all languages. Everything: the minutely detailed history of the future, the archangels’ autobiographies, the faithful catalogue of the Library, thousands and thousands of false catalogues, the demonstration of the fallacy of those catalogues, the demonstration of the fallacy of the true catalogues … the true story of your death, the translation of every book in all languages…. I cannot combine some characters—dhcmrlchtdj—which the divine Library has not foreseen.” When the people of the library realized that it was total, they rejoiced because they knew it contained the answers to the mysteries of the world. Their elation was followed by extreme depression: they could not find the answers. Thus the entropy of the library as a whole was great as it could be. The coherent texts were as much the result of chance as a line of poetry accidentally tapped out by those typewriting monkeys of Émile Borel.
But natural languages are not random affairs. Their many rules impart a highly organized structure and so lower their entropy. In theory, to compute the entropy, one must 1) count each letter in the universe of discourse, figure its probability of occurrence by dividing into the total number of letters, multiply this probability by its logarithm and then add up all these products, changing the minus signs to plus; 2) count each letter-pair, figure its probability of occurrence, multiply the probability by its logarithm, total the products, change the signs, then divide by two (since entropy is specified on a per-letter basis); 3) count each trigram, figure its probability, total the products of their probabilities and the logarithms of these probabilities, change signs, and divide by three; 4) tally all the four-letter groups, figure the probability of each, multiply their probabilities by the logarithms of their probabilities, sum the products, change signs, and divide by four; 5) repeat this process with ever-increasing sizes of letter-groups up to the largest utterances that can give a valid probability of occurrence. Each step gives a closer approximation to the entropy of the language as a whole.
Shannon has actually carried out the first three steps on a sufficiently large sample. Using a 27-item alphabet (including the word-space) and English frequency tables, he found that the entropy for single letters is 4.03 bits* per letter. For digraphs, it falls to 3.32 bits per letter, and for trigrams to 3.1 bits per letter. The decrease stems from the fact that each letter influences what follows it: a t is more likely to drag an h behind it than an l. This probability constitutes an increase in order and a decrease in entropy.
This direct technique cannot be extended very far in practice, because the frequency counts rapidly become unwieldy. By the time 12-letter groups are reached, more than a billion billion frequency pigeonholes would be needed. Consequently, Shannon resorted to indirect methods, which draw upon the fact that “anyone speaking a language possesses, implicitly, an enormous knowledge of the statistics of the language.” Mathematics can extract the entropy from these results. In one experiment, Shannon counted how many guesses a subject needed to determine the right letter in an unknown text. The number beneath the letter gives the number of guesses one subject made:
Of the 102 symbols, the subject guessed right on the first try 79 times and required three or more guesses only 15 times. Most of these occur where the line of thought has more possibilities of branching out.
In another experiment, subjects tried to complete a phrase from which the spaces and the vowels had been deleted, yielding a text half as long as the original:
F C T S S T R N G R T H N F C T N
Six subjects restored an average of 93 per cent of the deletions, and several reconstructed the full text: Fact is stranger than fiction.
On the basis of tests like this, Shannon estimated that the entropy of English in 100-letter groups stands at about 1 bit per letter. It is this low because the cumulative effect of the long preceding passage quite substantially constrains the last letter. Actually, other tests indicate that entropy ceases to decline after about 32 letters are taken as a group. In other words, 32 letters exert virtually the same influence over those that follow them as 100 do.
Naturally, no such cohesion exists in the “urn la
nguage,” or that of the Library of Babel, where letters were generated independently. Nor are there any frequency variations, because chance will select all equally often. This “urn language,” whose entropy is strictly a function of the number of elements in its alphabet, becomes the standard against which natural languages are measured. The result of the comparison is redundancy. Technically speaking, redundancy is the ratio of the entropy of a language to the maximum value entropy can have with the same number of elements, subtracted from 1. It is expressed as a percentage. The maximum entropy of an alphabet of 27 items is the logarithm of 27, or 4.76 bits per letter, of 26 items, 4.70. With English’s entropy of 1 bit per letter, this gives a redundancy for English of 1-1⁄4.7, or 79 per cent, which is usually reduced to the simpler and more conservative 75 per cent.
* Note 1, at end of chapter, discusses other proposed explanations.
† Methods of determining this percentage are given in Note 2 at end of chapter.
* “Bits” is a blend-word for “binary digits.” Binary digits are the two figures in a system of numerical notation that uses only two different numbers to express quantities; the two digits are commonly 0 and 1. Ordinary notation uses decimal digits in its ten-number system. To count in bits, one goes through the same procedure with two numbers that one goes through in decimal digits with ten. Thus the quantities which in decimal notation are 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, are in binary notation 0, 1, 10, 11, 100, 101, 110, 111, 1000, 1001, 1010, 1011. Binary digits express results in terms of a choice of one of two exclusive possibilities, in terms of this or that, on or off, all or nothing, 1 or 0. Thus 4.03 bits per letter simply means that 4.03 yes-no choices must be made to determine the right letter. This form has a variety of conceptual advantages in information theory over the 10 alternatives presented by decimal digits. The binary is just another form of numerical notation, like the octal system used in computers and the duodecimal system sometimes urged for daily life because 12 has more divisors than 10. The binary system requires the use of logarithms to the base 2 instead of to the base 10 in calculations.
21
HETEROGENEOUS IMPULSES
FEW FALSE IDEAS have more firmly gripped the minds of so many intelligent men than the one that, if they just tried, they could invent a cipher that no one could break. Many have tried and, although only a fraction of their ciphers have been published or patented, the quantity and variety of even this small sample is astounding.
Emile Myzskowski, a retired French colonel, devised a kind of repeated-key transposition and published it in his Cryptographie indéchiffrable. Collon, a Belgian Army officer, proposed a number of fractionating systems. One Rozier marched his plaintext letters through the interior of a Vigenère tableau in a dizzily twisting path in an attempt to lose the cryptanalyst. The so-called Phillips cipher enciphers five letters monalphabetically in a 5 × 5 square, then shifts the lines of the square and repeats the process. The Amsco transposition cipher accepts both single letters and pairs as its plaintext elements. A. de Grandpré filled a 10 × 10 square with ten 10-letter words whose first letters form a mnemonic acrostic, then ranged coordinates on the outside and used these to encipher; the use of plaintext words inside provides homophones in approximately the proportion required to disguise the frequencies of normal plaintext. A French major, Louis-Marie-Jules Schneider, concocted an enormously complex polyalphabetic whose alphabets were generated one from the other; this was one of the systems William F. Friedman broke in evolving the principle of the index of coincidence. A mathematician named Arthur Porges devised a system based upon a continuing fraction. The Nicodemus cipher sets out a plaintext beneath a keyword, enciphers it in Vigenère according to that keyword, and then transposes it vertically by keynumbers derived from the keyword. The Count de Mirabeau, an 18th-century French revolutionist, enciphered in a Polybius square whose sets of coordinates both ran from 1 to 5; he wrote each two-digit equivalent vertically and then transcribed all of the first digits and then all of the second, inserting numbers from 6 to 0 at will as nulls. Some amateurs just propose enciphering a message in Vigenère and superenciphering the text in Playfair. There have been autokey transpositions and a cipher invented by W. B. Homan that produces a cryptogram in which every letter of the alphabet occurs as often as every other.
Beyond these pencil-and-paper systems, the files of the patent offices bulge with quantities of cipher disks—probably the most popular single kind of cipher invention—and with gear arrangements, grilles, cylinders, mechanized tableaux, strip systems, and so on. (Most of these mechanisms produce substitution ciphers because of a very basic difference between substitution and transposition. A transposition cipher resembles what industrial engineers call a “batch” manufacturing process, in which quantities of material are cooked at a time, the product issuing in batches. This is because a transposition requires a whole group of letters that will all be mixed together, and it is hard for a mechanical device to store letters. A substitution cipher, on the other hand, is like a “continuous” process. Here the raw materials—letters in one case, ingredients in the other—flow continually, are not stored, and may be cut off at any point.)
Probably most ciphers get invented as a bit of recreation, as a part of the spell of interest in cryptology that so many people seem to go through. Sooner or later it occurs to every cryptologist that an acquaintance will say, “It can’t be too hard to make a cipher that can’t be solved.” The friend then offers his theories, which often involve some crude sort of polyalphabeticity or a book code. Frequently he dredges up some system from his adolescence and, taking half an hour to put a ten-word message into that cipher, challenges the cryptologist to break it on the spot. William Jerdan, a 19th-century British journalist, told in his autobiography a very typical story of the birth of a cipher, reporting with a refreshing touch of humor on the dreams of glory that often accompany the nativity.
One evening, while Jerdan and his young friends were talking, the subject of ciphers came up. Jerdan boasted that “I myself could frame a system which nobody on earth could decypher and read” and bet a dinner on it. Then somebody pulled down an encyclopedia to show him the many systems that had been invented, and, said Jerdan, “when I retired to rest [I] was on no very pleasant terms with myself, for I had looked very like what I had no chance of inventing—a Cypher.” But in the morning he awoke “with a secret cypher concocted in my brain,” which he discussed with his friends, among them Thomas Wilde, a future Lord Chancellor. They agreed that “It ought to be laid before the government, and I cannot tell how immense a reward I was to reap for my wonderful discovery. No castle in the air was ever more stupendous and gorgeous than mine…. Wilde and I were now all agog for an audience of the Prime Minister, to put him in possession of the good fortune which had befallen his government, and ourselves in the way of wealth and promotion.” They did manage to describe the cipher to a government secretary, and many years later, Jerdan, visiting a high Foreign Office official, saw a cipher being used based on his principle. He naturally thought that it was his, but it may have been invented independently by someone else.
Nearly every inventor of a cipher system has been convinced of the un-solvability of his brainchild. (The tendency to claim this in patents has, however, been receding with the rise of cryptologic sophistication.) In 1744, Leonhard Euler, the great Swiss mathematician, sent to a friend a monalphabetic substitution cryptogram that had a few homophones, expressing his belief that it could not be deciphered. He was only slightly more naive than most inventors. A representative of the humanities, Walter W. Skeat, a distinguished English philologist and editor of Chaucer, proposed a cipher in 1896 that amounted to a Vigenère with key ABCDE; when hordes of amateur cryptanalysts knocked it off, he had the grace to bow and retire. Nearly all the cryptographic fossils entombed in dusty books or in old files of patent offices deserve their oblivion. They are too prone to error or too easy to solve or too cumbersome. Many an inventor delights in intricacy. Po
orly endowed with empathy, he never considers the possibility that cipher clerks will not dote as lovingly upon the complex calculations of his cipher as he does; he fails to realize that to the clerks ciphering is not a pleasant after-hours recreation but a day-long, dull, boring job, about as exciting as adding up columns of figures, and that they would rather be out on a date with a girl friend.
Charles Babbage asserted that no man’s cipher was worth looking at unless the inventor had himself solved a very difficult cipher. This rule holds true in the great majority of cases and if observed would have saved cryptologists a great deal of time. But it would be like having required Thomas Edison to pass a stiff examination in acoustical theory before deigning to look at his phonograph. The Babbage rule would have deprived cryptologists of some of the most important features of modern cryptography, such as the Vernam mechanism, the rotor, the Hagelin machine. Cryptologists must process a lot of ore to get something valuable—but so must diamond miners.
In evaluating their ciphers, many inventors err by thinking that the cryptanalyst must retrace the decipherment steps in his solution and that, since some of these steps are recoverable only with the key, the cipher must remain inviolate. For example, a simple cipher decomposes the plaintext into the coordinates of a checkerboard, regroups these by uniting the second coordinate of each letter with the first coordinate of the next letter (combining the leftover coordinates of the first and last letters), and recomposes them as letters: