Algorithms to Live By

Page 23

by Brian Christian

9 Randomness

When to Leave It to Chance

I must admit that after many years of work in this area, the efficacy of randomness for so many algorithmic problems is absolutely mysterious to me. It is efficient, it works; but why and how is absolutely mysterious.

—MICHAEL RABIN

Randomness seems like the opposite of reason—a form of giving up on a problem, a last resort. Far from it. The surprising and increasingly important role of randomness in computer science shows us that making use of chance can be a deliberate and effective part of approaching the hardest sets of problems. In fact, there are times when nothing else will do.

In contrast to the standard “deterministic” algorithms we typically imagine computers using, where one step follows from another in exactly the same way every time, a randomized algorithm uses randomly generated numbers to solve a problem. Recent work in computer science has shown that there are cases where randomized algorithms can produce good approximate answers to difficult questions faster than all known deterministic algorithms. And while they do not always guarantee the optimal solutions, randomized algorithms can get surprisingly close to them in a fraction of the time, just by strategically flipping a few coins while their deterministic cousins sweat it out.

There is a deep message in the fact that on certain problems, randomized approaches can outperform even the best deterministic ones. Sometimes the best solution to a problem is to turn to chance rather than trying to fully reason out an answer.

But merely knowing that randomness can be helpful isn’t good enough. You need to know when to rely on chance, in what way, and to what extent. The recent history of computer science provides some answers—though the story begins a couple of centuries earlier.

Sampling

In 1777, George-Louis Leclerc, Comte de Buffon, published the results of an interesting probabilistic analysis. If we drop a needle onto a lined piece of paper, he asked, how likely is it to cross one of the lines? Buffon’s work showed that if the needle is shorter than the gap between the blines, the answer is 2⁄π times the needle’s length divided by the length of the gap. For Buffon, deriving this formula was enough. But in 1812, Pierre-Simon Laplace, one of the heroes of chapter 6, pointed out that this result has another implication: one could estimate the value of π simply by dropping needles onto paper.

Laplace’s proposal pointed to a profound general truth: when we want to know something about a complex quantity, we can estimate its value by sampling from it. This is exactly the kind of calculation that his work on Bayes’s Rule helps us to perform. In fact, following Laplace’s suggestion, several people have carried out exactly the experiment he suggested, confirming that it is possible—although not particularly efficient—to estimate the value of π in this hands-on way.*

Throwing thousands of needles onto lined paper makes for an interesting pastime (for some), but it took the development of the computer to make sampling into a practical method. Before, when mathematicians and physicists tried using randomness to solve problems, their calculations had to be laboriously worked out by hand, so it was hard to generate enough samples to yield accurate results. Computers—in particular, the computer developed in Los Alamos during World War II—made all the difference.

Stanislaw “Stan” Ulam was one of the mathematicians who helped develop the atomic bomb. Having grown up in Poland, he moved to the United States in 1939, and joined the Manhattan Project in 1943. After a brief return to academia he was back at Los Alamos in 1946, working on the design of thermonuclear weapons. But he was also sick—he had contracted encephalitis, and had emergency brain surgery. And as he recovered from his illness he worried about whether he would regain his mathematical abilities.

While convalescing, Ulam played a lot of cards, particularly solitaire (a.k.a. Klondike). As any solitaire player knows, some shuffles of the deck produce games that just can’t be won. So as Ulam played, he asked himself a natural question: what is the probability that a shuffled deck will yield a winnable game?

In a game like solitaire, reasoning your way through the space of possibilities gets almost instantly overwhelming. Flip over the first card, and there are fifty-two possible games to keep track of; flip over the second, and there are fifty-one possibilities for each first card. That means we’re already up into thousands of possible games before we’ve even begun to play. F. Scott Fitzgerald once wrote that “the test of a first-rate intelligence is the ability to hold two opposing ideas in mind at the same time and still retain the ability to function.” That may be true, but no first-rate intelligence, human or otherwise, can possibly hold the eighty unvigintillion possible shuffled-deck orders in mind and have any hope of functioning.

After trying some elaborate combinatorial calculations of this sort and giving up, Ulam landed on a different approach, beautiful in its simplicity: just play the game.

I noticed that it may be much more practical to [try] … laying down the cards, or experimenting with the process and merely noticing what proportion comes out successfully, rather than to try to compute all the combinatorial possibilities which are an exponentially increasing number so great that, except in very elementary cases, there is no way to estimate it. This is intellectually surprising, and if not exactly humiliating, it gives one a feeling of modesty about the limits of rational or traditional thinking. In a sufficiently complicated problem, actual sampling is better than an examination of all the chains of possibilities.

When he says “better,” note that he doesn’t necessarily mean that sampling will offer you more precise answers than exhaustive analysis: there will always be some error associated with a sampling process, though you can reduce it by ensuring your samples are indeed random and by taking more and more of them. What he means is that sampling is better because it gives you an answer at all, in cases where nothing else will.

Ulam’s insight—that sampling can succeed where analysis fails—was also crucial to solving some of the difficult nuclear physics problems that arose at Los Alamos. A nuclear reaction is a branching process, where possibilities multiply just as wildly as they do in cards: one particle splits in two, each of which may go on to strike others, causing them to split in turn, and so on. Exactly calculating the chances of some particular outcome of that process, with many, many particles interacting, is hard to the point of impossibility. But simulating it, with each interaction being like turning over a new card, provides an alternative.

Ulam developed the idea further with John von Neumann, and worked with Nicholas Metropolis, another of the physicists from the Manhattan Project, on implementing the method on the Los Alamos computer. Metropolis named this approach—replacing exhaustive probability calculations with sample simulations—the Monte Carlo Method, after the Monte Carlo casino in Monaco, a place equally dependent on the vagaries of chance. The Los Alamos team was able to use it to solve key problems in nuclear physics. Today the Monte Carlo Method is one of the cornerstones of scientific computing.

Many of these problems, like calculating the interactions of subatomic particles or the chances of winning at solitaire, are themselves intrinsically probabilistic, so solving them through a randomized approach like Monte Carlo makes a fair bit of sense. But perhaps the most surprising realization about the power of randomness is that it can be used in situations where chance seemingly plays no role at all. Even if you want the answer to a question that is strictly yes or no, true or false—no probabilities about it—rolling a few dice may still be part of the solution.

Randomized Algorithms

The first person to demonstrate the surprisingly broad applications of randomness in computer science was Michael Rabin. Born in 1931 in Breslau, Germany (which became Wrocław, Poland, at the end of World War II), Rabin was the descendant of a long line of rabbis. His family left Germany for Palestine in 1935, and there he was diverted from the rabbinical path his father had laid down for him by the beauty of mathematics—discovering Alan Turing’s work
early in his undergraduate career at the Hebrew University and immigrating to the United States to begin a PhD at Princeton. Rabin would go on to win the Turing Award—the computer science equivalent of a Nobel—for extending theoretical computer science to accommodate “nondeterministic” cases, where a machine isn’t forced to pursue a single option but has multiple paths it might follow. On sabbatical in 1975, Rabin came to MIT, searching for a new research direction to pursue.

He found it in one of the oldest problems of them all: how to identify prime numbers.

Algorithms for finding prime numbers date back at least as far as ancient Greece, where mathematicians used a straightforward approach known as the Sieve of Erastothenes. The Sieve of Erastothenes works as follows: To find all the primes less than n, begin by writing down all the numbers from 1 to n in sequence. Then cross out all the numbers that are multiples of 2, besides itself (4, 6, 8, 10, 12, and so on). Take the next smallest number that hasn’t been crossed out (in this case, 3), and cross out all multiples of that number (6, 9, 12, 15). Keep going like this, and the numbers that remain at the end are the primes.

For millennia, the study of prime numbers was believed to be, as G. H. Hardy put it, “one of the most obviously useless branches” of mathematics. But it lurched into practicality in the twentieth century, becoming pivotal in cryptography and online security. As it happens, it is much easier to multiply primes together than to factor them back out. With big enough primes—say, a thousand digits—the multiplication can be done in a fraction of a second while the factoring could take literally millions of years; this makes for what is known as a “one-way function.” In modern encryption, for instance, secret primes known only to the sender and recipient get multiplied together to create huge composite numbers that can be transmitted publicly without fear, since factoring the product would take any eavesdropper way too long to be worth attempting. Thus virtually all secure communication online—be it commerce, banking, or email—begins with a hunt for prime numbers.

This cryptographic application suddenly made algorithms for finding and checking primes incredibly important. And while the Sieve of Erastothenes is effective, it is not efficient. If you want to check whether a particular number is prime—known as testing its “primality”—then following the sieve strategy requires trying to divide it by all the primes up to its square root.* Checking whether a six-digit number is prime would require dividing by all of the 168 primes less than 1,000—not so bad. But checking a twelve-digit number involves dividing by the 78,498 primes less than 1 million, and all that division quickly starts to get out of control. The primes used in modern cryptography are hundreds of digits long; forget about it.

At MIT, Rabin ran into Gary Miller, a recent graduate from the computer science department at Berkeley. In his PhD thesis, Miller had developed an intriguingly promising, much faster algorithm for testing primality—but there was one small problem: it didn’t always work.

Miller had found a set of equations (expressed in terms of two numbers, n and x) that are always true if n is prime, regardless of what values you plug in for x. If they come out false even for a single value of x, then there’s no way n can be prime—in these cases, x is called a “witness” against primality. The problem, though, is false positives: even when n is not prime, the equations will still come out true some of the time. This seemed to leave Miller’s approach hanging.

Rabin realized that this was a place where a step outside the usually deterministic world of computer science might be valuable. If the number n is actually nonprime, how many possible values of x would give a false positive and declare it a prime number? The answer, Rabin showed, is no more than one-quarter. So for a random value of x, if Miller’s equations come out true, there’s only a one-in-four chance that n isn’t actually prime. And crucially, each time we sample a new random x and Miller’s equations check out, the probability that n only seems prime, but isn’t really, drops by another multiple of four. Repeat the procedure ten times, and the probability of a false positive is one in four to the tenth power—less than one in a million. Still not enough certainty? Check another five times and you’re down to one in a billion.

Vaughan Pratt, another computer scientist at MIT, implemented Rabin’s algorithm and started getting results late one winter night, while Rabin was at home having friends over for a Hanukkah party. Rabin remembers getting a call around midnight:

“Michael, this is Vaughan. I’m getting the output from these experiments. Take a pencil and paper and write this down.” And so he had that 2400−593 is prime. Denote the product of all primes p smaller than 300 by k. The numbers k × 338 + 821 and k × 338 + 823 are twin primes.* These constituted the largest twin primes known at the time. My hair stood on end. It was incredible. It was just incredible.

The Miller-Rabin primality test, as it’s now known, provides a way to quickly identify even gigantic prime numbers with an arbitrary degree of certainty.

Here we might ask a philosophical question—about what the meaning of “is” is. We’re so used to mathematics being a realm of certainty that it’s jarring to think that a number could be “probably prime” or “almost definitely prime.” How certain is certain enough? In practice, modern cryptographic systems, the ones that encrypt Internet connections and digital transactions, are tuned for a false positive rate of less than one in a million billion billion. In other words, that’s a decimal that begins with twenty-four zeros—less than one false prime for the number of grains of sand on Earth. This standard comes after a mere forty applications of the Miller-Rabin test. It’s true that you are never fully certain—but you can get awfully close, awfully quick.

Though you may have never heard of the Miller-Rabin test, your laptop, tablet, and phone know it well. Several decades after its discovery, it is still the standard method used to find and check primes in many domains. It’s working behind the scenes whenever you use your credit card online, and almost any time secure communications are sent through the air or over wires.

For decades after Miller and Rabin’s work, it wasn’t known whether there would ever be an efficient algorithm that allows testing primality in deterministic fashion, with absolute certainty. In 2002, one such method did get discovered by Manindra Agrawal, Neeraj Kayal, and Nitin Saxena at the Indian Institute of Technology—but randomized algorithms like Miller-Rabin are much faster and thus are still the ones used in practice today.

And for some other problems, randomness still provides the only known route to efficient solutions. One curious example from mathematics is known as “polynomial identity testing.” If you have two polynomial expressions, such as 2x3 + 13x2 + 22x + 8 and (2x + 1) × (x + 2) × (x + 4), working out whether those expressions are in fact the same function—by doing all the multiplication, then comparing the results—can be incredibly time-consuming, especially as the number of variables increases.

Here again randomness offers a way forward: just generate some random xs and plug them in. If the two expressions are not the same, it would be a big coincidence if they gave the same answer for some randomly generated input. And an even bigger coincidence if they also gave identical answers for a second random input. And a bigger coincidence still if they did it for three random inputs in a row. Since there is no known deterministic algorithm for efficiently testing polynomial identity, this randomized method—with multiple observations quickly giving rise to near-certainty—is the only practical one we have.

In Praise of Sampling

The polynomial identity test shows that sometimes our effort is better spent checking random values—sampling from the two expressions we want to know about—than trying to untangle their inner workings. To some extent this seems reasonably intuitive. Given a pair of nondescript gadgets and asked whether they are two different devices or two copies of the same one, most of us would start pushing random buttons rather than crack open the cases to examine the wiring. And we aren’t particularly surprised when, say, a television drug lo
rd knifes open a few bundles at random to be reasonably certain about the quality of the entire shipment.

There are cases, though, where we don’t turn to randomness—and maybe we should.

Arguably the most important political philosopher of the twentieth century was Harvard’s John Rawls, who set for himself the ambitious task of reconciling two seemingly opposite key ideas in his field: liberty and equality. Is a society more “just” when it’s more free, or more equal? And do the two really have to be mutually exclusive? Rawls offered a way of approaching this set of questions that he called the “veil of ignorance.” Imagine, he said, that you were about to be born, but didn’t know as whom: male or female, rich or poor, urban or rural, sick or healthy. And before learning your status, you had to choose what kind of society you’d live in. What would you want? By evaluating various social arrangements from behind the veil of ignorance, argued Rawls, we’d more readily come to a consensus about what an ideal one would look like.

What Rawls’s thought experiment does not take into account, however, is the computational cost of making sense of a society from behind such a veil. How could we, in this hypothetical scenario, possibly hope to hold all of the relevant information in our heads? Set aside grand questions of justice and fairness for a moment and try to apply Rawls’s approach merely to, say, a proposed change in health insurance regulations. Take the probability of being born, perhaps, as someone who grows up to become a town clerk in the Midwest; multiply that by the distribution of the different health care plans available to government employees across various midwestern municipalities; multiply that by actuarial data that offer the probability of, for instance, a fractured tibia; multiply that by the average medical bill for the average procedure for a fractured tibia at a midwestern hospital given the distribution of possible insurance plans.… Okay, so would the proposed insurance revision be “good” or “bad” for the nation? We can barely hope to evaluate a single injured shin this way, let alone the lives of hundreds of millions.

‹ Prev Next ›