How Not to Be Wrong : The Power of Mathematical Thinking (9780698163843)
Page 28
The next layer is going to look just the same as this one, but cunningly placed so that each seed sits in the little triangular divot formed by three seeds below it. Then just keep adding more layers in the same way. It’s best to be a little careful here: only half the divots are going to support spheres in the next layer up, and at each stage you have a choice of which half of the divots you want to fill. The most customary choice, called the face-centered cubic lattice, has the nice property that every layer has the spheres placed directly over the spheres three layers below. According to Kepler, there is no denser way to pack spheres in space. And in the face-centered cubic lattice, each sphere touches exactly twelve others. As the pomegranate seeds grew, Kepler reasoned, each one would press against its twelve neighbors, flattening its surface near the point of contact and producing the twelve-sided figures he observed.
Whether Kepler was right about pomegranates I have no idea,* but his claim that the face-centered cubic lattice was the densest possible sphere packing became a topic of intense mathematical interest for centuries. Kepler offered no proof of his statement; apparently it just seemed right to him that the face-centered cubic lattice couldn’t be beat. Generations of grocers, who stack oranges in a face-centered cubic configuration without any worry as to whether their method is the absolute best possible, agree with him. Mathematicians, that demanding tribe, wanted absolute confirmation. And not just about circles and spheres; once you’re in the realm of pure mathematics, nothing stops you from going beyond circles and spheres to yet higher dimensions, packing the so-called hyperspheres of dimension greater than 3. Does the geometric story of high-dimensional sphere packings give insight into the theory of error-correcting codes, as the geometric story of the projective plane did? In this case, the flow has mostly been in the other direction;* the insights from coding theory have instigated progress in sphere packings. John Leech, in the 1960s, used one of Golay’s codes to build an incredibly dense packing of twenty-four-dimensional spheres, in a configuration now known as the Leech lattice. It’s a crowded place, the Leech lattice, where each of the twenty-four-dimensional spheres touches 196,560 of its neighbors. We still don’t know whether it’s the tightest possible twenty-four-dimensional packing, but in 2003, Henry Cohn* and Abhinav Kumar proved that if a denser lattice exists, it beats Leech by a factor of at most
1.00000000000000000000000000000165.
In other words: close enough.
You can be forgiven for not caring about twenty-four-dimensional spheres and how to smoosh them together, but here’s the thing; any mathematical object as startling as the Leech lattice is bound to be important. It turned out that the Leech lattice was very rich in symmetries of a truly exotic kind. The master group theorist John Conway, upon encountering the lattice in 1968, worked out all its symmetries in a twelve-hour spree of computation on a single giant roll of paper. These symmetries ended up forming some of the final pieces of the general theory of finite symmetry groups that preoccupied algebraists for much of the twentieth century.*
—
As for good old three-dimensional oranges, it turns out Kepler was right that his packing was the best possible—but that wasn’t proved for almost four hundred years, finally falling in 1998 to Thomas Hales, then a professor at the University of Michigan. Hales settled the matter by means of a difficult and delicate argument that reduced the problem to an analysis of a mere few thousand configurations of spheres, which he dealt with by means of a massive computer calculation. The difficult and delicate argument posed no problem for the math community; we’re used to those, and this part of Hales’s work was quickly judged and found correct. The massive computer calculation, on the other hand, was trickier. A proof can be checked down to the last detail, but a computer program is a different sort of thing. In principle, a human can check every line of code; but even having done so, how can you be sure the code ran correctly?
Mathematicians have almost universally accepted Hales’s proof, but Hales himself seems to have been stung by the initial discomfort with the proof’s reliance on computation. Since the resolution of the Kepler conjecture, he’s moved away from the geometry that made him famous and turned instead to the project of formal verification of proofs. Hales envisions, and is working to create, a future mathematics that looks very different from our own. In his view, mathematical proofs, whether computer-aided or carried out by humans with pencils, have gotten so complicated and interdependent that we can no longer reasonably have full confidence in their correctness. The classification of finite simple groups, the now-completed program of which Conway’s analysis of the Leech lattice formed a crucial part, is distributed over hundreds of papers by hundreds of authors, totaling some ten thousand pages. No human alive can be said to understand it all. So how can we be sure it’s really right?
Hales thinks we have no choice but to start over again, rebuilding the vast corpus of mathematical knowledge within a formal structure that can be verified by machine. If the code that checks the formal proofs is itself checkable (and this, Hales convincingly argues, is a feasible goal) then we can free ourselves forevermore from controversies like the one Hales endured over whether a proof is really a proof. And from there? The next step, maybe, is computers that can construct proofs, or even have ideas, without any human intervention at all.
If this actually happens, is mathematics over? Of course, if machines overtake and then surpass humans in all mental dimensions, using us as slaves or livestock or playthings, as some of the most extravagant futurists predict, then yeah, math is over, along with everything else. But short of that, I think mathematics will probably survive. After all, math has already been computer-aided for decades. Many calculations that once would have counted as “research” are now considered no more creative or praiseworthy than adding a series of ten-digit numbers; once your laptop can do it, it’s not mathematics anymore. But this hasn’t put mathematicians out of work. We’ve managed to stay just ahead of the ever-increasing sphere of computer dominance, like action heroes outracing a fireball.
And if machine intelligences of the future can take over from us much of the work we know as research now? We’ll reclassify that research as “computation.” And whatever we quantitatively minded humans are doing with our newly freed-up time, that’s what we’ll call “mathematics.”
The Hamming code is pretty good, but one might hope to do better still. After all, there’s something wasteful about Hamming’s code: even in the days of punched tape and mechanical relays, computers were reliable enough that almost all seven-bit blocks would come through unscathed. The code seems too conservative; surely we could get away with adding fewer failsafe bits to our message. And so we can; that’s what Shannon’s famous theorem proves. For example, if errors come at a rate of one per thousand bits, Shannon tells you there are codes that make each message only 1.2% longer than its unencoded form. And better yet, by making the basic blocks longer and longer, you can find codes that achieve this speed and satisfy any desired degree of reliability, however strict.
How did Shannon construct these excellent codes? Well, here’s the thing—he didn’t. When you encounter an intricate construction like Hamming’s, you’re naturally inclined to think an error-correcting code is a very special thing, designed and engineered and tweaked and retweaked until every pair of code words has been gingerly nudged apart without any other pair being forced together. Shannon’s genius was to see that this vision was totally wrong. Error-correcting codes are the opposite of special. What Shannon proved—and once he understood what to prove, it was really not so hard—was that almost all sets of code words exhibited the error-correcting property; in other words, a completely random code, with no design at all, was extremely likely to be an error-correcting code.
This was a startling development, to say the least. Imagine you were tasked with building a hovercraft; would your first approach be to throw a bunch of engine parts and rubber tubing on the
ground at random, figuring the result would probably float?
Hamming, still impressed forty years later, said of Shannon’s proof in 1986:
Courage is one of the things that Shannon had supremely. You have only to think of his major theorem. He wants to create a method of coding, but he doesn’t know what to do so he makes a random code. Then he is stuck. And then he asks the impossible question, “What would the average random code do?” He then proves that the average code is arbitrarily good, and that therefore there must be at least one good code. Who but a man of infinite courage could have dared to think those thoughts? That is the characteristic of great scientists; they have courage. They will go forward under incredible circumstances; they think and continue to think.
If a random code was very likely to be an error-correcting code, what’s the point of Hamming? Why not just choose code words completely at random, secure in the knowledge that Shannon’s theorem makes it very likely your code corrects errors? Here’s one problem with that plan. It’s not enough for a code to be able to correct errors in principle; it has to be practical. If one of Shannon’s codes uses blocks of size 50, then the number of code words is the number of 0-1 strings fifty bits long, which is 2 to the 50th power, a little over a quadrillion. Big number. Your spacecraft receives a signal, which is supposed to be one of these quadrillion code words, or at least close to one. But which one? If you have to flip through the quadrillion code words one by one, you’re in big trouble. It’s the combinatorial explosion again, and in this context it forces on us another tradeoff. Codes that have a lot of structure, like the Hamming codes, tend to be easy to decode. But these very special codes, it turns out, are usually not as efficient as the completely random ones that Shannon studied! And in the decades between then and now, mathematicians have tried to ride that conceptual boundary between structure and randomness, laboring to construct codes random enough to be fast, but structured enough to be decodable.
The Hamming code is great for the Transylvanian lottery, but not so effective at Cash WinFall. The Transylvanian lottery has just seven numbers; Massachusetts offered forty-six. We’re going to need a bigger code. The best one I could find for the purpose was discovered by R. H. F. Denniston of the University of Leicester in 1976. And it’s a beauty.
Denniston wrote down a list of 285,384 six-number combinations from a choice of forty-eight numbers. The list starts like this:
1 2 48 3 4 8
2 3 48 4 5 9
1 2 48 3 6 32 . . .
The first two tickets have four numbers in common: 2, 3, 4, and 48. But—and here is the miracle of the Denniston system—you will never find any two of those 285,384 tickets that have five numbers in common. You can translate the Denniston system into a code, much as we did with the Fano plane: replace each ticket with a string of 48 1s and 0s, with a 0 in the places corresponding to the numbers on your ticket, and a 1 in the places corresponding to the numbers not on your ticket. So the first ticket above would translate into the codeword:
000011101111111111111111111111111111111111111110
Check for yourself: the fact that no two tickets agree on five out of six numbers means that this code, like the Hamming code, has no two code words separated by a Hamming distance of less than four.*
Another way to say this is that every five-number combination appears on at most one of Denniston’s tickets. And it gets better: in fact, every five-number combination appears on exactly one ticket.*
As you can imagine, a lot of care is required in choosing the tickets on Denniston’s list. Denniston includes in his paper a computer program in ALGOL that verifies that his list really does have the magical property he claims, a rather advanced gesture for the 1970s. Still, he insists that the computer’s role in the collaboration is to be understood as strictly subordinate to his own: “I should like, indeed, to make it clear that all the results announced here were found without recourse to computers, even though I suggest that computers may be used to verify them.”
Cash WinFall has only forty-six numbers, so to play it Denniston-style, you have to destroy the beautiful symmetry a bit by throwing out all the tickets in Denniston’s system containing a 47 or a 48. This still leaves you with 217,833 tickets. Suppose you get together $435,666 out of the couch cushions and decide to play these numbers. What happens?
The lottery draws six numbers—say, 4, 7, 10, 11, 34, 46. In the unlikely event these match one of your tickets exactly, you win the jackpot. But even if not, you’re still in line to win a healthy pile of money for matching five of the six numbers. Do you have a ticket with 4, 7, 10, 11, 34 on it? One of Denniston’s tickets does, so the only way you can miss out is if the Denniston ticket with those five numbers was 4, 7, 10, 11, 34, 47 or 4, 7, 10, 11, 34, 48, and thus got trashed.
But what about a different five-number combination, like 4, 7, 10, 11, 46? Maybe you had bad luck the first time, because 4, 7, 10, 11, 34, 47 was one of Denniston’s tickets. But then 4, 7, 10, 11, 46, 47 cannot be on Denniston’s list, because it would agree in five places with a ticket you already know to be there. In other words, if evil 47 makes you miss out on one five-out-of-six prize, it can’t make you miss out on any others. The same goes for 48. So of the six possible five-number wins:
4, 7, 10, 11, 34
4, 7, 10, 11, 46
4, 7, 10, 34, 46
4, 7, 11, 34, 46
4, 10, 11, 34, 46
7, 10, 11, 34, 46
you are guaranteed to have at least four of them among your tickets.
In fact, if you buy the 217,833 Denniston tickets, you have a
2% chance of hitting the jackpot
72% chance of winning six of the five-out-of-six prizes
24% chance of winning five of the five-out-of-six prizes
2% chance of winning four of the five-out-of-six prizes
Compare this with the Selbee Quick Pick strategy of choosing tickets randomly. In that case, there’s a small chance, 0.3%, of getting shut out of the five-out-of-six prize tier entirely. Worse, there’s a 2% chance of getting just one of those prizes, a 6% chance of getting two, an 11% chance of getting three, and a 15% chance of getting four. The guaranteed returns of the Denniston strategy are replaced by risk. Naturally, that risk comes with an upside, too—team Selbee has a 32% chance of getting more than six of those prizes, impossible if you pick your tickets according to Denniston. The expected value of Selbee’s tickets is the same as that of Denniston’s, or anyone else’s. But the Denniston method shields the player from the winds of chance. In order to play the lottery without risk, it’s not enough to play hundreds of thousands of tickets; you have to play the right hundreds of thousands of tickets.
Is this strategy the reason Random Strategies spent the time to fill out hundreds of thousands of tickets by hand? Were they using Denniston’s system, developed in the spirit of utterly pure mathematics, to siphon money from the Lottery at no risk to themselves? Here’s where my reporting hit a wall. I was able to get in touch with Yuran Lu, but he didn’t know exactly how those tickets had been chosen; he told me only that they had a “go-to guy” in the dorm who handled all such algorithmic matters. I can’t be sure whether the go-to guy used the Denniston system, or something like it. But if he didn’t, I think he probably should have.
OKAY, FINE, YOU CAN PLAY POWERBALL
At this point, we’ve documented exhaustively how the choice to play the lottery is almost always a poor one in terms of expected number of dollars, and how, even in those rare cases where the expected monetary value of a lottery ticket exceeds its cost, great care is required in order to squeeze as much expected utility as possible out of the tickets you buy.
This leaves mathematically minded economists with one inconvenient fact to explain, the same one that baffled Adam Smith more than two hundred years ago: lotteries are very, very popular. The lottery is not the kind of situation Ellsberg studied, in which people face
decisions against unknown and unknowable odds. The minuscule chance of winning the lottery is posted for all to see. The principle that people tend to make choices that more or less maximize their utility is a pillar of economics, and does an adequate job modeling behavior in everything from business practice to romantic choices. But not Powerball. This kind of irrational behavior is as unacceptable to a certain species of economist as the irrational magnitude of the hypotenuse was to the Pythagoreans. It doesn’t fit their model of what can be; and yet it is.
Economists are more flexible than Pythagoreans. Rather than angrily drowning the bearers of bad news, they adjust their models to fit reality. One popular account was offered by our old buddies Milton Friedman and Leonard Savage, who proposed that lottery players follow a squiggly utility curve, reflecting that people think about wealth in terms of classes, not numerical amounts. If you’re a middle-class worker who spends five bucks a week on the lottery, and you lose, that choice costs you a little money but doesn’t change your class position; despite the loss of money, the negative utility is pretty close to zero. But if you win, well, that moves you into a different stratum of society. You can think of this as the “deathbed” model—on your deathbed, will you care that you died with a little less money because you played the lottery? Probably not at all. Will you care that you retired at thirty-five and spent the rest of your life snorkeling off Cabo because you hit the Powerball? Yes. Yes you will.
In a bigger departure from classical theory, Daniel Kahnemann and Amos Tversky suggested that people in general tend to follow a different path from the one the utility curve demands, not just when Daniel Ellsberg sticks an urn in front of them, but in the general course of life. Their “prospect theory,” for which Kahnemann later won the Nobel Prize, is now seen as the founding document of behavioral economics, which aims to model with the greatest possible fidelity the way people do act, not the way that, according to an abstract notion of rationality, they should. In the Kahnemann-Twersky theory, people tend to place more weight on low-probability events than a person obedient to the von Neumann-Morgenstern axioms would; so the allure of the jackpot exceeds what a strict expected utility calculation would license.