Book Read Free

The Number Mysteries: A Mathematical Odyssey through Everyday Life

Page 15

by Marcus du Sautoy


  Figure 3.22 Packing crates is a mathematically complex problem.

  Can you find a combination of crates that will fill the truck as efficiently as possible? You have to find an algorithm that decides—given any number N and a set of smaller numbers n (1), n (2), . . . , n (r )—whether there is a choice of the smaller numbers that add up to the big number.

  These sorts of problems are not simply games: they crop up in business and industry when companies are faced with finding the most efficient solution to a practical problem. Wasted space or excess fuel costs companies money, and managers often need to solve one of these NP-problems. There are even some codes used in the telecommunications industry that depend on discovering the needle in the haystack to crack the code. So it’s not just mathematicians or obsessive game players who are interested in cracking this million-dollar problem.

  Whether it’s mathematically analyzing routes for a travelling salesman or arranging parties, coloring maps, or sweeping mines, this chapter’s million-dollar problem comes in so many different guises that there should be one version that appeals. But be warned: this problem may look like fun and games, but it is one of the hardest of all the million-dollar problems. Mathematicians believe that there is some essential complexity about all these problems that means there won’t be an efficient program to solve them. The problem is that it’s always more difficult to show why something doesn’t exist than to show that it does. But at least you’ll have fun trying to win this chapter’s million.

  SOLUTIONS

  Number Mysteries Lottery

  The winning numbers were 2, 3, 5, 7, 17, 42.

  The Traveling-Salesman Problem

  Here’s a route that covers 238 miles:

  Figure 3.23

  15 + 55 + 28 + 12 + 24 + 35 + 25 + 17 + 4 + 5 + 18 = 238

  Four

  THE CASE OF THE UNCRACKABLE CODE

  Ever since people first learned to communicate, they’ve been finding ever more fiendish ways to hide messages from their enemies. Maybe you used a private code to write your diary so that your brother or sister couldn’t read it, just as Leonardo da Vinci did. But codes aren’t only for keeping things secret: they also make sure that information is communicated without errors. And we can use mathematics to create ingenious ways to guarantee that the message that is received is the same as the message that was sent—which is vitally important in this age of electronic transactions.

  A code is simply a systematic way of arranging a set of symbols to convey a particular meaning. As soon as you start looking for codes, you discover that they are all around us: there are bar codes on everything we buy; codes enable us to store music in our MP3 players; codes let us browse the Internet. Even this book is written in code—English is simply a code formed with the 26 letters of the alphabet, and our “acceptable code words” are stored in the Oxford English Dictionary. Even our body contains codes—DNA is a code for reproducing a living creature made of four organic chemical substances called bases: adenine, guanine, cytosine, and thymine, or A, G, C, and T for short.

  In this chapter, I’ll show you how math has been used to create and break some of the cleverest codes around; how it lets us transmit information safely, efficiently, and secretly; and how it lets us do everything from photographing the planets from spacecraft to shopping on eBay. And at the end of the chapter, I’ll explain how cracking one of the million-dollar math problems could help you crack codes as well.

  HOW TO USE AN EGG TO SEND A SECRET MESSAGE

  In Italy in the sixteenth century, Giovanni Porta discovered that you could write a hidden message on a hard-boiled egg with an ink made by dissolving an ounce of alum in a pint of vinegar. The ink penetrates the shell and marks the hardened white inside, but while the writing stays on the egg white, it disappears from the shell—perfect for sending secret messages: to crack the code, you crack the egg! And this is just one of the many crazy ways people have come up with to hide secret messages.

  In 499 BC, a tyrant by the name of Histiaeus wanted to send a secret message to his nephew Aristagoras, encouraging him to rebel against the Persian king. Histiaeus was stationed in Susa, in modern-day Iran, and his nephew was back home in Miletus, now a part of Turkey. How could he get a message to his nephew without the Persian authorities intercepting it? He came up with a cunning plan. He shaved the head of his faithful servant and tattooed the message onto his bald head. Once the hair had grown back, Histiaeus sent the servant off to find his nephew. When the servant reached Miletus, the nephew shaved the servant’s head, read the message, and began a rebellion against the ruling Persian king.

  While the nephew only had to shave the messenger’s head to retrieve the message, we have to pity the recipients of secret messages sent by an ancient Chinese method. The message would be written on a piece of silk, which was then rolled up very tightly. The silk ball was covered in wax and given to the messenger to swallow. Recovering the message once it had reappeared was not a particularly pleasant affair.

  One of the most sophisticated ways of hiding a message was developed by the Spartans in 500 BC. They used a special wooden cylinder called a scytale around which they would wrap a thin strip of paper in a spiral. The secret message would then be written on the paper, lengthways down the scytale, but when the paper was unwrapped, the message looked like gobbledygook. It was only by winding the strip of paper around another scytale of exactly the same dimensions that all the letters lined up again correctly.

  These methods for sending secret messages are examples of steganography—the art of concealing—rather than coding. But however devious they were, once the message was discovered, the secret was out. So people started thinking about how to conceal the meaning of the message even if it was uncovered.

  HOW TO CRACK THE KAMA SUTRA CODES BY COUNTING

  B OBDFSOBDLNLBC, ILXS B QBLCDSV MV B QMSD, LE B OBXSV MH QBDDSVCE.

  LH FLE QBDDSVCE BVS OMVS QSVOBCSCD DFBC DFSLVE, LD LE ASNBGES DFSJ BVS OBTS ZLDF LTSBE. DFS OBDFSOBDLNLBC’E QBDDSVCE, ILXS DFS QBLCDSV’E MV DFS QMSD’E OGED AS ASBGDLHGI; DFS LTSBE ILXS DFS NMIMGVE MV DFS ZMVTE, OGED HLD DMUSDFSV LC B FBVOMCLMGE ZBJ. ASBGDJ LE DFS HLVED DSED: DFSVS LE CM QSVOBCSCD QIBNS LC DFS ZMVIT HMV GUIJ OBDFSOBDLNE.

  This looks like gibberish, but it’s a message written using one of the most popular codes ever created. Called the substitution cipher, it works by switching every letter of the alphabet with another one: so a might become P, t might become C, and so on. (I’ve used lowercase letters for those letters in the unencrypted message—the plaintext, as it’s known in the trade—and capitals for the letters in the ciphertext.) As long as the sender and receiver have agreed to the substitution key in advance, the receiver will be able to decipher the messages, but to everyone else, they’ll be a meaningless string of nonsense.

  The simplest version of these codes is called a Caesar shift—after Julius Caesar, who used this type of code to communicate with his generals during the Gallic Wars. Caesar shifts work by shifting each letter the same number of places along. So with a shift of three, a becomes D, b becomes E, and so on. From the Number Mysteries website, you can download and cut out your own cipher wheel for creating these simple Caesar shifts.

  Shifting each letter by the same number of places gives you only 25 possible ciphers, so once you’ve spotted that this is how the message has been encoded, it’s quite easy to unscramble. There’s a better way to encode a message: instead of simply shifting all the letters along together, we can mix things up and allow any letter to be substituted for any other. This method of encrypting messages was actually suggested some centuries before Julius Caesar, and surprisingly, it wasn’t in a military handbook but in the Kama Sutra. Although this ancient Sanskrit text is usually associated with a description of physical pleasures, it covers a host of other arts that the author believed a woman should be versed in, from conjuring and chess to bookbinding and even carpentry. The forty-fifth chapter is dedicated to the art of secret writing, and here, the substitution ciph
er is explained as a perfect way of concealing messages detailing liaisons between lovers.

  While there are only 25 Caesar shifts, if we allow ourselves to substitute any letter for any other, then the number of possible ciphers is somewhat greater. We have 26 choices for what to change the letter a into, and for each of those possibilities, there are 25 choices of where to send the letter b (one letter has already been used to encrypt the letter a ). So there are already 26 x 25 different ways to scramble the letters a and b. If we keep going, choosing different letters for the rest of the alphabet, we find that there are

  26 x 25 x 24 x 23 x 22 x 21 x 20 x 19 x 18 x 17 x 16 x 15 x 14 x 13 x 12 x 11 x 10 x 9 x 8 x 7 x 6 x 5 x 4 x 3 x 2 x 1

  different Kama Sutra codes. As we saw on page 39, we can write this as 26! We should also remember to take 1 off this number because one choice will have been A for a, B for b, and so on through Z for z, which isn’t much of a code. When we multiply out 26! and take away 1, we arrive at the grand total of

  403,291,461,126,605,635,583,999,999

  different ciphers—over four hundred million billion billion possibilities.

  The passage at the beginning of this section is encrypted with one of those codes. To give you an idea of how many possible permutations there are, if I wrote out the passage using all the different possible codes, then the piece of paper would stretch from here to well beyond the outskirts of our Milky Way Galaxy. A computer checking one code a second since the big bang went off over thirteen billion years ago would still be just a fraction of the way through checking each of the codes—and a very small fraction at that.

  So the code looks virtually uncrackable. How on earth (or beyond) can you find out which of this vast number of possible codes I’ve used to encrypt my message? Amazingly, the way to do it is by a very simple bit of mathematics: counting.

  Table 4.1

  It was the Arabs in the time of the Abbasid dynasty who first developed cryptanalysis, as the science of code breaking is called. The ninth-century polymath Ya’qub al-Kindi recognized that in a piece of written text, some letters crop up again and again, while others are used rarely, as shown in the table. This is something that Scrabble players are well aware of: the letter e is worth only one point, as it is the most common letter in the English alphabet, while a z is worth ten points. In written texts, every letter has its own distinct “personality”—how often it appears, and in what combinations with others—but the key to al-Kindi’s analysis is to realize that a letter’s personality doesn’t change when it is represented by another symbol.

  So let’s make a start at cracking the code used to encrypt the text at the beginning of this section. Here’s a breakdown of the frequency of each of the letters used in the encrypted text:

  Table 4.2

  From the graph, we can see that the letter S occurs with a frequency of 13 percent, more than any other letter in the ciphertext, so there’s a good chance that this letter is the one used to encode the letter e. (Of course, you have to hope that I haven’t chosen a passage from Georges Perec’s novel A Void, which was written without using the letter e anywhere in the text.) The next most common letter in the ciphertext is D, with a frequency of 12 percent. The second most common letter in the English language is t, so this would be a good guess for the choice of D. Third in order of frequency in the encrypted message is B, which is used 10 percent of the time, so there is a strong possibility that it stands for the third most common letter in the English language: a.

  Let’s substitute these letters into the text and see what we get:

  a OatFeOatLNLaC, ILXe a QaLCteV MV a QMet, LE a OaXeV MH QatteVCE.

  LH FLE QatteVCE aVe OMVe QeVOaCeCt tFaC tFeLVE, Lt LE AeNaGEe tFeJ aVe OaTe ZLtF LTeaE. tFe OatFeOatLNLaC’E QatteVCE, ILXe tFe QaLCteV’E MV tFe QMet’E OGEt Ae AeaGtLHGI; tFe LTeaE ILXe tFe NMIMGVE MV tFe ZMVTE, OGEt HLt tMUetFeV LC a FaVOMCLMGE ZaJ. AeaGtJ LE tFe HLVEt teEt: tFeVe LE CM QeVOaCeCt QIaNe LC tFe ZMVIT HMV GUIJ OatFeOatLNE.

  You might say that this still looks like gobbledygook, but the fact that you see the letter a on its own several times is telling us that we’ve probably decoded this letter correctly. (B might turn out to stand for i, of course, in which case we’d have to go back and try again.) And we can see that the word tFe occurs quite often, so it’s a good bet that this is the word the. Indeed, the letter F accounts for 6 percent of the ciphertext, and the letter h occurs 6 percent of the time in English.

  We can see the word Lt, in which only the second letter has been decoded. There are only two two-letter words that end in t: at and it. We’ve already decoded a, so there’s a good chance that L should be decoded as i, and our frequency chart bears this out. L appears in the ciphertext with 8 percent frequency, and i appears in the English language with 7 percent frequency—a close match. This isn’t an exact science: the longer the text, the more in tune the frequencies will be, but we have to be flexible when we use this technique.

  Let’s put in our two new decodings:

  a OatheOatiNiaC, IiXe a QaiCteV MV a QMet, iE a OaXeV MH QatteVCE.

  iH hiE QatteVCE aVe OMVe QeVOaCeCt thaC theiVE, it iE AeNaGEe theJ aVe OaTe Zith iTeaE. the OatheOatiNiaC’E QatteVCE, IiXe the QaiCteV’E MV the QMet’E OGEt Ae AeaGtiHGI; the iTeaE IiXe the NMIMGVE MV the ZMVTE, OGEt Hit tMUetheV iC a haVOMCiMGE ZaJ. AeaGtJ iE the HiVEt teEt: theVe iE CM QeVOaCeCt QIaNe iC the ZMVIT HMV GUIJ OatheOatiNE.

  Gradually, the message is beginning to emerge. I’ll leave to you the task of unraveling the rest; the decoded text is at the end of the chapter if you want to check whether you’re right. I’ll give you a hint: it’s a couple of my favorite passages from A Mathematician’s Apology by the Cambridge mathematician G. H. Hardy. I read this book when I was in school, and it was one of the things that made me decide to become a mathematician.

  This simple mathematical trick of counting letters means that any message disguised with a substitution cipher cannot be made secret—as Mary Queen of Scots found to her cost. Using a code that substituted strange symbols for the letters, she wrote messages to her fellow conspirator, Anthony Babington, about their plans to assassinate Queen Elizabeth I:

  Figure 4.1 The Babington code.

  At first glance, the messages Mary sent looked impenetrable, but Elizabeth had at her court one of the master code breakers of Europe: Thomas Phelippes. He was not a handsome man, as the following description of him makes clear: “of low stature, slender every way, dark yellow haired on the head and clear yellow beard, even in the face with smallpox, of short sight.” Many people believed that Phelippes had to be in league with the devil to read such hieroglyphs, but his trick was to apply the same principle of frequency analysis. He cracked the code, and Mary was arrested and put on trial. The decoded letters were the evidence that ultimately led to her execution for conspiracy.

  If you’re faced with cracking a Kama Sutra code, the following web page is helpful in analyzing the frequency of different letters in the encrypted text: www.simonsingh.net/The_Black_Chamber/frequencypuzzle.htm.

  You can access it with your smartphone by scanning this code.

  HOW DID MATHEMATICIANS HELP TO WIN THE SECOND WORLD WAR?

  Once it was known that substitution codes had this inherent weakness, cryptographers started to devise more ingenious ways to thwart attacks based on counting letters. One idea was to vary the substitution cipher. Instead of using just one substitution cipher to encode the whole text, you could alternate between two different ciphers. Then, if you were encoding the word beef, say, the letter e would be encoded differently each time, because the first one would be encoded using one cipher, and the second one by another cipher. So beef might get encoded as PORK. The more secure you want your message to be, the more ciphers you can cycle through.

  Of course, there is a balance to be struck in cryptography between making things very secure and having a cipher that is usable. The most secure sort of cipher, called a one-time pad, uses a different substitution cipher for every single letter of the tex
t. It is almost uncrackable because there is absolutely nothing to help you to come to grips with the encrypted text. It is also very unwieldy because you need to use a different substitution cipher for each letter in the message.

  The sixteenth-century French diplomat Blaise de Vigenére believed that to hinder any frequency analysis, it was enough to cycle through just a few substitution ciphers. Although the Vigenére code, as it became known, was a much stronger form of encryption, it was not unbreakable, and the British mathematician Charles Babbage eventually found a way to crack it. Babbage is regarded as the grandfather of the computer age for his belief that machines can be used to automate calculations; a reconstruction of his “difference engine” calculating machine can be seen in London’s Science Museum. It was his systematic approach to problems that, in 1854, gave him the idea for cracking the Vigenére code.

  Babbage’s method depends on one of the great skills of the mathematician—pattern recognition. The first thing you need to spot is how many different substitution ciphers are being cycled through. Because the word the will usually appear often in any plaintext message, spotting repetitions of the same three-letter sequence can be the key to unraveling how many ciphers are being used. So, for example, you might spot AWR appearing frequently, and that there is always a gap of a multiple of four letters between occurrences of AWR. That would be a good indication that four ciphers are being used.

 

‹ Prev