by Tom Mahon
Counting letter frequencies quickly distinguishes between the two types: with transposition ciphers the most and least common letters of the cipher will be the most and least common letters of the underlying language. I assumed English was used for this set of six messages, since the surrounding text (‘Dear Sir’, ‘Yours Faithfully’) is in English. The most common letters of the cipher, EARIOST, are common letters in English. The least common letters of the cipher, BQVKP, are uncommon letters in English. The proportion of vowels in standard English text is about 40 per cent, and in this cipher is 47 per cent – rather high, but within normal variation. I was quite confident I was looking at a transposition cipher.
Solving the first cipher
The cipher may use any of scores of types of transposition: for example, the columnar transposition shown [p. 24] with the columns shuffled according to a secret key; pattern-based systems such as route transposition or railfence; the turning grille, using a square with cut-outs in which to write the message, turning it to each of four positions; nihilist transposition, where both the rows and columns of an array are shuffled; and many other variations on these themes. Each system uses a key: a secret piece of information intended to keep the message private even if the general system is known to the attacker, assuming the underlying cipher system is strong enough. The number of letters in this cipher serves to eliminate some of the possible systems: the 151 letters will not fill a square or rectangle evenly, so many of the common rectangle-based systems need not be considered.
I chose for my first attempts the columnar transposition system: it is simple to explain to a correspondent who may not be an experienced cipher clerk, it can be used for messages that do not fit in a complete rectangle, and it had been used rather widely before the 1920s, when these ciphers were composed. A cryptanalyst can solve normal columnar transpositions using only pencil and paper, depending on the length of the cipher, the length of the key, and in some cases the content of the message. The analysis can become more difficult if the encryption method is varied, if the key is very long, or if the message is short compared to the key length.
A message using a ten-letter key, meaning the message block will have ten columns, would be relatively straightforward to solve if it were, say, 75 or more letters long and used no tricks. A typical manual attack would be to write the cipher message in a ten-column block and then cut it apart in vertical strips. Since the order of columns is not known, the cryptanalyst would not know in advance which columns were short and which were long in an incompletely-flled block, so extra letters would be added to the top and bottom of each column to allow for that difference. The cryptanalyst would then shuffle these columns around on a table, finding where they can be aligned to form the hidden message.
This process can be tedious to execute with pencil, paper and scissors, especially if many ciphers are to be attacked. Over the past forty years I have developed a wide array of computer programmes to help in my analysis and in many cases to solve common types of ciphers automatically. One of the most effective general-purpose automatic methods I call Shotgun Hillclimbing. This method picks a key length in what I consider a reasonable range, creates a key of that length with randomly chosen letters, ‘decrypts’ the message using this key, then progressively changes the key to try to get a decryption that looks more English-like. When it reaches a plateau where simple changes to the key no longer improve the result, it compares the result with the best found so far, then goes back to try a new random starting point. The efficacy of the process depends on a number of factors, including the difficulty of the cipher itself, the length of the key, the methods used to modify each successive key, and the method used to score a decryption on the English-like scale. The process itself is, in principle, much like the pencil, paper and scissors method described above, trying the columns in different combinations until words and phrases begin to appear.
I unleashed my Shotgun Hillclimbing programme on this cipher, treating it as a columnar transposition with a partially-filled block of between eight and fifteen columns, and it returned the following, successively better, trial solutions, each using twelve columns:
The process stopped with the last of these – no better solutions were found using a few hundred more starting keys. The programme produced the key ‘fdbjalhcgkei’: these letters give the conjectured order of the twelve columns of the cipher, with the ‘a’ of the key indicating that the beginning of the cipher text (AEOOA IIIEO …) goes down the fifth column.
This attempted solution looks rather close: we see some clear words such as ‘send stuff for’ that must be part of the original message. To see why the text is imperfect I set the message in a partially-filled block with twelve columns, yielding twelve rows of twelve letters and one row of seven, and used the programme’s proposed key:
Key:
1
2
3
4
5
6
7
8
9
10
11
12
13
The cipher begins AEOOA IIIEO AEAEW, andstarts down column A, then continues to column B:
Key:
1
2
3
4
5
6
7
8
9
10
11
12
13
The next few groups are LFRRD ELBAP RAEEA EIIIE AAAHO IFMFN, and these are filled in the same way, continuing after the EW in column b and going on to columns c and d.
Key:
1
2
3
4
5
6
7
8
9
10
11
12
13
As I filled these in I noted that all the letters in columns a and c are vowels. This would not happen by chance: since only 40 per cent of English letters are vowels, the odds against having this many in a row appear by chance are astronomical. This means that the person encrypting the message put the vowels in independent of the plain text, and we will soon see the result. Filling in the rest of the cipher text in order gives the following result:
Key:
1
2
3
4
5
6
7
8
9
10
11
12
13
The meaning of the two columns of vowels is now clear: they were added to obfuscate the message. The sender and receiver would have arranged in advance on two columns of ‘duds’ (letters to be ignored), the sender could fill them in at random with vowels, and the receiver would know to ignore them. This also explains why the initial frequency count during the diagnosis showed more vowels than usual for English: the excess vowels were in the columns of duds. Removing these duds we see the result:
Key:
1
2
3
4
5
6
7
8
9
10
11
12
13
The programme has made an error in recovering the key: columns l and h have been switched. The result so far:
The address to which you will send stuff for QMG is Mrs Sweeney, Frudterer and Greengrocer, Five Harold’s Cross, Dublin. Try to make it ue to appear llke frhit.
The text is now clear, and there are four obvious errors: in Fruiterer, up, like and fruit. Referring back to the original document, we see the first error resulted from the sender overstriking the original D with an I on their typewriter, and the transcriber (me) reading the D instead. The second error is a result of the poor quality of the copy: the P in the typescript copy has a smudge on the bottom that I read as the bottom of an E. The next
letter is clear on the typescript as an I, and was simply a transcription error. The final error, in frhit, results from another overstrike on the typescript – the correct U can be seen in retrospect, but is not obvious when transcribing it.
I solved the remaining five ciphers from this initial set the same way. The second in the set was from the same document and used the same key, again with two columns of duds. The next three used a different twelve-letter key, AHCKEDJLBFGI, but were simpler than the first two in that they did not use the columns of duds.9 The final cipher was relatively short – only fifty-eight letters:10
Figure 4. Short message from Twomey, 5 April 1927.
Although short messages can be difficult to decrypt, my programme had no trouble with this one, producing key LIAHKFDJBCGE (again with no duds) and plain text:
continued to take action against the undesirable Sunday newspapers.
I returned these solutions the same day by e-mail to Tom Mahon, who offered to send the rest of his papers for decryption.
The game was on!
Decrypting the Columnar Transposition Ciphers
When the ciphers began arriving, it became clear that the project was to be very extensive. In all, the corpus consists of about 1,300 individual cryptograms. Most of them are typed and clearly legible, but in many cases the quality of the copies – either too faint11 or too dark12 – led to challenges in transcription. In some cases the typewriter used had misaligned13 or dirty14 typebars, so that certain letters were obscured or ambiguous. Some of the messages are torn or stained in ways that obliterate a group of adjacent letters.15 The copying process itself led to other problems: in some cases the flimsy copy was folded or crumpled as it was photographed for the microfilm, making separation of the lines of text quite challenging,16 and in other cases the reproduction process cut cipher letters off one side or the other.17 Some of the messages are hand-written in various writing styles, and without the cues of connected English text it can be difficult even to identify the different letters of random-looking connected cursive text.18
Figure 5. Penmanship challenge, 4 October 1926.
Figure 6. Creases through cipher text, 25 October 1926.
Columnar transposition resists mutilation rather well – even adjacent missing letters in the cipher text come from different places in the plain text, so there is still a good chance to read through the garbles. For example, consider the final cryptogram shown in Figure 6, which appears to be a photostat of a crushed and creased onion-skin copy. Replacing the damaged letters with a hyphen, the cipher to be solved is:
-NOLT T-VNL IOXPT OULES AFTWO –S-RE GASAA IEOIS AAMEA OLGSO ERFLN MO-AU TE-ET EPHUM CTHOD NEIFO NT-—R ONOVO HIIIY MYSYL ONPAE EVRHI NIP—-TERO- RHMHP EXT
Decrypting and adding word divisions, we see:
I may have –o go to californ-a next month for st-phen I will ha-e to appoint ma- to do a- -imthire unle-s yo- people get – m-ve on he is very anxious for results
We can read this quite easily from the context, since the gaps did not happen to fall in places that would make the decryption ambiguous.
In some cases we were able to read messages that could not be deciphered by the recipient. In some cases the senders had used incorrect keys: either the wrong day’s key, or an obsolete key. In some cases they used the correct key, but used it incorrectly, not quite taking the columns in alphabetical order. They occasionally botched the encryption by leaving out a letter or by combining two letters in a single cell of the cipher frame, either of which would make the decipherment much more difficult. We see several testy exchanges in the message traffic, exhorting one correspondent or another to take more care with their key protocol and encryption process, or criticising the length or volume of encrypted messages.
Since my decryption methods do not require me to know the intended key, I was more or less immune to the problem of senders using the wrong key. But for some of the botched encryption attempts I needed to resort to the same procedures that the original recipients would have needed to try, laying it out carefully on quadrille paper and sliding the columns up and down until the text came into alignment.
We eventually worked through the complete set of transposition ciphers, producing good decryptions of all but one:
Figure 7. The unsolved transposition cipher, 16 November 1926.
This message is identified as having fifty-two letters but only fifty-one appear in the cryptogram itself.19 I tried a number of approaches, including assuming the missing letter was in each of the fifty-two possible positions in turn (or in none of them, leaving fifty-one letters as shown), but none of my attacks succeeded. If you crack this one please let us know.
Recovering Transposition Keys
The process I used to break the transposition ciphers was very effective: a message of sixty or more letters typically falls in a matter of seconds to a completely automated programme with no human intervention required beyond supplying word breaks. Shorter messages presented more challenges: I sometimes needed to inspect the best results and intervene in various ways, such as telling the programme to keep a particular phrase and continue making changes with that phrase held constant. Several factors worked in my favour. A document frequently consists of a number of encrypted messages, and each message within the document was encrypted with the same key. This means I needed to break only one of these messages, and this would give me a key that would break the rest. Many of the keys were re-used across documents: the IRA used standard transposition keys for different brigades and battalions and even individuals, and these were used whenever a transmission was sent to or from these recipients. Particularly for foreign agents the IRA implemented a system for producing different daily keys based on a book and once even on a list of phrases sent in the clear. These frequent key changes improved the security but still allowed me to try recovered keys against other traffic sent on the same day to different recipients.
My procedure for breaking each cipher gave me a key that would allow me to read that cipher and others that used the same key, but it did not tell me the key that had actually been used to encrypt the message. The recovered ‘equivalent key’ simply gives the order for reading off the columns. If the keyword were MONARCHY, for example, an equivalent key showing the column order could be DFEAGBCH:
MONARCHY
DFEAGBCH
That is, column four (‘A’ in both cases) would be the first column to be read out of the message array, then column six (‘C’ in MONARCHY, ‘B’ in DFEAGBCH) and so on, retaining the order of the original word. Either of these keys will allow us to read the cipher, but if we deduce that MONARCHY was used, it can give us more insight into the way keys were chosen, and perhaps allow us to guess other keys to try on messages that continue to elude us. In this case several other words would match the alphabetical pattern, including MONARCHS, INLANDER and OUTBULLY, and the key that was actually intended might become obvious once we had a list of other recovered keys to compare with it.
Having broken the fifty-eight-letter message from the first set that I described above, I wanted to find out what actual key had been used to encrypt it.20 The recovered ‘equivalent key’ is LIAHKFDJBCGE, and the order of the letters determines which column of the message array must be read first. That is, column three (the ‘A’) is the first to be copied out, then column nine (‘B’) and so on. I assumed this column order was determined by a keyword or keyphrase. I assumed also that the keys were in English, since all the messages are in English. The third letter in the key must be the lowest letter in the alphabet that this key uses, and if that letter is used more than once, it would appear again as the ‘B’ in ninth place in LIAHKFDJBCGE. The ‘C’ after the ‘B’ will be a letter at least as far along in the alphabet as that represented by the ‘B’. Finally, the L must represent the highest letter in the alphabet used in this key, and since there is no higher letter to its right, it must be the only occurrence of that letter.
Using these restrictions on the keyword, we can
write the alphabet repeatedly on a series of twelve vertical strips of paper and slide them accordingly, keeping these restrictions in mind – that is, column two must start no higher than column one, and so on – until a word begins to appear across several lines of the strips. During the period when these ciphers were used this was the standard way to recover the key. Now, however, we have more efficient methods: we can programme a computer to check each word or phrase in a list in turn to see whether it matches this pattern. For a key this long, very few words and phrases will match the restrictions forced by the pattern. I used a wordlist from an unabridged dictionary of 308,081 English words, and of these only one matched the pattern for this key: TRANSFERABLE.
This procedure allowed me to find many of the one-word keys used by the IRA, but many others did not appear in my unabridged dictionary list. I postulated that short phrases were being used. It is much more difficult to find adequate lists of phrases on the internet, so I produced my own, making lists of phrases of a specified length from digital books. This time-honoured process has been used to good effect by generations of cryptographers, who would painstakingly count hundreds of thousands of letters to get good statistical distributions and find common phrases. I downloaded books from Project Gutenberg, a public service effort that distributes digital copies of books in the public domain.21 Their first set of twenty complete books was made available to the public in 1990 and 1991, and the three million words in them would make a fairly good start on any statistical project. However, my feeling is that if a thing is worth doing, it is worth overdoing. I downloaded Gutenberg’s production from 1990 through 2006 to run my statistics: 10,607 books in all, comprising over 89 million lines, 730 million words, and 4.4 billion letters.
When I ran my key-finding programme on twelve-letter words and phrases from this collection, again the only matching key it found was TRANSFERABLE. To identify the source of more keys I wrote to Bill Mason, another member of the American Cryptogram Association and one of our top cryptographic programmers. He reminded me that Google has made available for purchase a huge list of words and phrases gleaned from the World Wide Web. I bought this collection and wrote programmes to extract more potential keys from all this data. The collection is derived from over one trillion phrases, and is distributed on data DVDs as 24 gigabytes of compressed data. This made me very glad that I use a fast computer!22