Book Read Free

Algorithms to Live By

Page 40

by Brian Christian


  solving the continuous versions of these problems: There are certain kinds of continuous optimization problems that can be solved in polynomial time; the most prominent example is linear programming problems, in which both the metric to be optimized and the constraints on the solution can be expressed as a linear function of the variables involved. See Khachiyan, “Polynomial Algorithms in Linear Programming,” and Karmarkar, “A New Polynomial-Time Algorithm for Linear Programming.” However, continuous optimization is no panacea: there are also classes of continuous optimization problems that are intractable. For example, see Pardalos and Schnitger, “Checking Local Optimality in Constrained Quadratic Programming is NP-hard.”

  at most twice as many invitations: Khot and Regev, “Vertex Cover Might Be Hard to Approximate to Within 2-ε.”

  quickly get us within a comfortable bound: For more on these approximations, see Vazirani, Approximation Algorithms.

  not a magic bullet: It’s still an open question within the field whether Continuous Relaxation even offers the best possible approximation for the minimum vertex cover (party invitations) problem, or whether better approximations can be found.

  “Inconceivable!”: The Princess Bride, screenplay by William Goldman; 20th Century Fox, 1987.

  computational technique called Lagrangian Relaxation: Lagrangian Relaxation (initially spelled “Lagrangean”) was given its name by Arthur M. Geoffrion of UCLA in “Lagrangean Relaxation for Integer Programming.” The idea itself is considered to have emerged in the work of Michael Held (of IBM) and Richard Karp (of UC Berkeley) on the traveling salesman problem in 1970—see Held and Karp, “The Traveling-Salesman Problem and Minimum Spanning Trees,” and Held and Karp, “The Traveling-Salesman Problem and Minimum Spanning Trees: Part II.” Earlier precursors, however, also exist—for instance, Lorie and Savage, “Three Problems in Rationing Capital”; Everett III, “Generalized Lagrange Multiplier Method”; and Gilmore and Gomory, “A Linear Programming Approach to the Cutting Stock Problem, Part II.” For an overview and reflections see Fisher, “The Lagrangian Relaxation Method for Solving Integer Programming Problems,” as well as Geoffrion, “Lagrangian Relaxation for Integer Programming.”

  “If you end up with fractional games”: Michael Trick, personal interview, November 26, 2013.

  “make-believe can never be reconciled”: Christopher Booker, “What Happens When the Great Fantasies, Like Wind Power or European Union, Collide with Reality?,” the Telegraph, April 9, 2011.

  9. RANDOMNESS

  “why and how is absolutely mysterious”: Quoted in Shasha and Rabin, “An Interview with Michael Rabin.”

  a randomized algorithm uses: Randomized algorithms are discussed in detail in Motwani and Raghavan, Randomized Algorithms, and Mitzenmacher and Upfal, Probability and Computing. Shorter but older introductions are provided by Karp, “An Introduction to Randomized Algorithms,” and Motwani and Raghavan, “Randomized Algorithms.”

  an interesting probabilistic analysis: Buffon, “Essai d’arithmétique morale.”

  simply by dropping needles onto paper: Laplace, Théorie analytique des probabilités.

  Lazzarini supposedly made 3,408 tosses: Lazzarini, “Un’applicazione del calcolo della probabilità.”

  makes Lazzarini’s report seem suspicious: For further discussion of Lazzarini’s results, see Gridgeman, “Geometric Probability and the Number π,” and Badger, “Lazzarini’s Lucky Approximation of π.”

  he had contracted encephalitis: Ulam’s story appears in Ulam, Adventures of a Mathematician.

  “the test of a first-rate intelligence”: Fitzgerald, “The Crack-Up.” Later collected with other essays in The Crack-Up.

  “it may be much more practical”: Ulam, Adventures of a Mathematician, pp. 196–197. Calculating the winning odds for Klondike solitaire remains an active area of research to this day, driven chiefly by Monte Carlo simulation. For an example of recent work in the area, see Bjarnason, Fern, and Tadepalli, “Lower Bounding Klondike Solitaire with Monte-Carlo Planning.”

  Metropolis named this approach: Metropolis claims the naming rights in a letter that appears in Hurd, “Note on Early Monte Carlo Computations.”

  descendant of a long line of rabbis: Shasha and Lazere, Out of Their Minds.

  multiple paths it might follow: Rabin’s key paper here, coauthored with Dana Scott, was “Finite Automata and Their Decision Problems.” We’ve already encountered one of the ways that this concept became central to theoretical computer science in our discussion of the complexity class of the traveling salesman problem in chapter 8; Rabin’s notion of “nondeterministic” computing is the “N” of NP.

  “one of the most obviously useless branches”: The quote is from Hardy, “Prime Numbers”; see also Hardy, Collected Works. For more about the influence of prime numbers in cryptography, see, e.g., Schneier, Applied Cryptography.

  In modern encryption, for instance: One widely used algorithm that is based on the multiplication of prime numbers is RSA, which stands for the initials of its inventors: Ron Rivest, Adi Shamir, and Leonard Adleman. See Rivest, Shamir, and Adleman, “A Method for Obtaining Digital Signatures and Public-Key Cryptosystems.” Other cryptographic systems—e.g., Diffie-Hellman—also use prime numbers; see Diffie and Hellman, “New Directions in Cryptography.”

  The problem, though, is false positives: The possible breakthrough—or lack thereof—in Miller’s approach would come down to how easily these false positives could be dismissed. How many values of x do you need to check to be sure about a given number n? Miller showed that if the “generalized Riemann hypothesis” were true, the minimum number of potential witnesses that would need to be checked is O((log n)2)—far less than the required by algorithms like the Sieve of Erastothenes. But here was the hitch: the generalized Riemann hypothesis was—and still is—unproven.

  (The Riemann hypothesis, first offered by the German mathematician Bernhard Riemann in 1859, concerns the properties of a complex mathematical function called the Riemann zeta function. This function is intimately related to the distribution of prime numbers, and in particular how regularly those numbers appear on the number line. If the hypothesis is true, then primes are well enough behaved as to guarantee the efficiency of Miller’s algorithm. But nobody knows if it’s true. In fact, the Riemann hypothesis is one of six major open problems in mathematics for whose solutions the Clay Mathematics Institute will award a “Millennium Prize” of $1 million. The question of whether P = NP, which we saw in chapter 8, is also a Millennium Prize problem.)

  “Michael, this is Vaughan”: Rabin tells this story in Shasha and Lazere, Out of Their Minds.

  quickly identify even gigantic prime numbers: Rabin’s paper on his primality test, “Probabilistic Algorithm for Testing Primality,” appeared a few years later. In parallel, Robert Solovay and Volker Strassen had developed a similar probabilistic algorithm based on a different set of equations that primes need to obey, although their algorithm was less efficient; see Solovay and Strassen, “A Fast Monte-Carlo Test for Primality.”

  less than one in a million billion billion: The documentation for OpenSSL specifies a function to “perform a Miller-Rabin probabilistic primality test with … a number of iterations used … that yields a false positive rate of at most 2−80 for random input”; see https://www.openssl.org/docs/crypto/BN_generate_prime.html. Likewise the US Federal Information Processing Standard (FIPS) specifies that its Digital Signature Standard (DSS) accept error probability of 2−80 (for 1,024-bit keys, at least); see Gallagher and Kerry, Digital Signature Standard. Forty Miller-Rabin tests are sufficient to achieve this bound, and work from the 1990s has suggested that in many cases as few as three Miller-Rabin tests will suffice. See Damgård, Landrock, and Pomerance, “Average Case Error Estimates for the Strong Probable Prime Test”; Burthe Jr., “Further Investigations with the Strong Probable Prime Test”; and Menezes, Van Oorschot, and Vanstone, Handbook of Applied Cryptography, as well as more recent discussion at http:
//security.stackexchange.com/questions/4544/how-many-iterations-of-rabin-miller-should-be-used-to-generate-cryptographic-saf.

  for the number of grains of sand: The number of grains of sand on Earth is estimated from various sources at between 1018 and 1024.

  whether there would ever be an efficient algorithm: Here by “efficient” we are using the field’s standard definition, which is “polynomial-time,” as discussed in chapter 8.

  one such method did get discovered: Agrawal, Kayal, and Saxena, “PRIMES Is in P.”

  generate some random xs and plug them in: One of the key results on the role of randomness in polynomial identity testing is what’s called the “Schwartz–Zippel lemma.” See Schwartz, “Fast Probabilistic Algorithms for Verification of Polynomial Identities”; Zippel, “Probabilistic Algorithms for Sparse Polynomials”; and DeMillo and Lipton, “A Probabilistic Remark on Algebraic Program Testing.”

  the only practical one we have: Will an efficient deterministic algorithm for polynomial identity testing ever be found? More broadly, does an efficient deterministic algorithm have to exist anyplace we find a good randomized one? Or could there be problems that randomized algorithms can solve efficiently but that deterministic algorithms simply cannot? It is an interesting problem in theoretical computer science, and the asnwer to it is still unknown.

  One of the approaches that has been used to explore the relationship between randomized and deterministic algorithms is called derandomization—essentially, taking randomized algorithms and removing the randomness from them. In practice, it’s hard for a computer to get access to true randomness—so when people implement a randomized algorithm, they often use a deterministic procedure to generate numbers that obey certain statistical properties of true randomness. Derandomization makes this explicit, examining what happens when the randomness in randomized algorithms is replaced by the output of some other complex computational process.

  The study of derandomization shows that it’s possible to turn efficient randomized algorithms into efficient deterministic algorithms—provided you can find a function that is sufficiently complex that its output looks random but sufficiently simple that it can be computed efficiently. For (detailed) details, see Impagliazzo and Wigderson, “P = BPP if E Requires Exponential Circuits,” and Impagliazzo and Wigderson, “Randomness vs. Time.”

  he called the “veil of ignorance”: The veil of ignorance is introduced in Rawls, A Theory of Justice.

  Rawls’s philosophical critics: Most prominent among Rawls’s critics was economist John Harsanyi; see, e.g., Harsanyi, “Can the Maximin Principle Serve as a Basis for Morality? A Critique of John Rawls’s Theory.”

  the civilization of Omelas: Le Guin, “The Ones Who Walk Away from Omelas.”

  These are worthy critiques: For more on what is sometimes called “the repugnant conclusion,” see Parfit, Reasons and Persons, as well as, for instance, Arrhenius, “An Impossibility Theorem in Population Axiology.”

  “concern of engineers rather than philosophers”: Aaronson, “Why Philosophers Should Care About Computational Complexity.”

  “noticed something you don’t often see”: Rebecca Lange, “Why So Few Stories?,” GiveDirectly blog, November 12, 2014, https://www.givedirectly.org/blog-post.html?id=2288694352161893466.

  “I mean Negative Capability”: John Keats, letter to George and Thomas Keats, December 21, 1817.

  “assurance sufficient for the purposes of human life”: John Stuart Mill, On Liberty (1859).

  “there should be a drinking game”: Michael Mitzenmacher, personal interview. November 22, 2013.

  well over a trillion distinct URLs: “We Knew the Web Was Big…” July 25, 2008, http://googleblog.blogspot.com/2008/07/we-knew-web-was-big.html.

  weighs in at about seventy-seven characters: Kelvin Tan, “Average Length of a URL (Part 2),” August 16, 2010, http://www.supermind.org/blog/740/average-length-of-a-url-part-2.

  the URL is entered into a set of equations: Bloom, “Space/Time Trade-offs in Hash Coding with Allowable Errors.”

  shipped with a number of recent web browsers: Google Chrome until at least 2012 used a Bloom filter: see http://blog.alexyakunin.com/2010/03/nice-bloom-filter-application.html and https://chromiumcodereview.appspot.com/10896048/.

  part of cryptocurrencies like Bitcoin: Gavin Andresen, “Core Development Status Report #1,” November 1, 2012, https://bitcoinfoundation.org/2012/11/core-development-status-report-1/.

  “The river meanders”: Richard Kenney, “Hydrology; Lachrymation,” in The One-Strand River: Poems, 1994–2007 (New York: Knopf, 2008).

  use this approach when trying to decipher codes: See Berg-Kirkpatrick and Klein, “Decipherment with a Million Random Restarts.”

  called the Metropolis Algorithm: Sometimes also known as the Metropolis-Hastings Algorithm, this technique is described in Metropolis et al., “Equation of State Calculations by Fast Computing Machines,” and Hastings, “Monte Carlo Methods Using Markov Chains and Their Applications.” The Metropolis Algorithm was developed by Nicholas Metropolis and the two husband-and-wife teams of Marshall and Arianna Rosenbluth and Edward and Augusta Teller in the 1950s. Metropolis was the first author on the paper describing the algorithm, so today it is known as the Metropolis Algorithm—which is doubly ironic. For one thing, Metropolis apparently made little contribution to the development of the algorithm, being listed as an author out of courtesy, as the head of the computing laboratory (see Rosenbluth, Marshall Rosenbluth, Interviewed by Kai-Henrik Barth). What’s more, Metropolis himself liked giving things illustrative names: he claimed to have named the chemical elements technetium and astatine, as well as the MANIAC computer and the Monte Carlo technique itself (Hurd, “Note on Early Monte Carlo Computations”).

  “Growing a single crystal from a melt”: Kirkpatrick, Gelatt, and Vecchi, “Optimization by Simulated Annealing.”

  “The guy who was the best at IBM”: Scott Kirkpatrick, personal interview, September 2, 2014.

  Finally we’d start going only uphill: If this idea—starting out being willing to move around between options, then focusing more tightly on the good ones—sounds familiar, it should: optimizing a complex function requires facing the explore/exploit tradeoff. And randomness turns out to be a source of pretty good strategies for solving problems like multi-armed bandits as well as the kind of optimization problems that Kirkpatrick was focused on.

  If you recall, the multi-armed bandit offers us several different options—arms we can pull—that provide different, unknown payoffs. The challenge is to find the balance between trying new options (exploring) and pursuing the best option found so far (exploiting). Being more optimistic and more exploratory early on is best, becoming more discerning and exploiting more later. Pursuing such a strategy of gradually decreasing optimism about the alternatives promises the best outcome you can hope for—accumulating regrets at a decreasing rate, with your total regret rising as a logarithmic function of time.

  Randomness provides an alternative strategy to optimism. Intuitively, if the problem is one of balancing exploration and exploitation, why not simply do so explicitly? Spend some amount of your time exploring and some amount exploiting. And that’s exactly the strategy that multi-armed bandit experts call Epsilon Greedy.

  Epsilon Greedy has two parts—Epsilon and Greedy. The Epsilon part is that some small proportion of the time (the letter epsilon is used by mathematicians to denote a small number), you choose at random from among your options. The Greedy part is that the rest of the time you take the best option you have found so far. So walk into the restaurant and flip a coin (or roll a die, depending on your value of epsilon) to decide whether to try something new. If it says yes, close your eyes and point at the menu. If not, enjoy your current favorite.

  Unfortunately, multi-armed bandit researchers don’t particularly like Epsilon Greedy. It seems wasteful—you’re guaranteed to spend a proportion of your time trying new things even if the best becomes clear very quickly. If
you follow Epsilon Greedy, then your regret increases linearly in the number of times you play. Each time you dine, there’s a chance that you’re going to choose something other than the best, so your average regret increases by the same amount every time. This linear growth is much worse than the logarithmic regret guaranteed by deterministic algorithms based on appropriately calibrated optimism.

  But if the simplicity of Epsilon Greedy is appealing, there is good news. There’s a simple variant of this algorithm—what we are dubbing Epsilon-Over-N Greedy—that does guarantee logarithmic regret, and performs well in practice (see Auer, Cesa-Bianchi, and Fischer, “Finite-Time Analysis of the Multiarmed Bandit Problem”). The trick is to decrease the chance of trying something new over time. The first time you make a choice, you choose at random with probability 1/1 (a.k.a. always). If that option is any good, then the second time you choose at random with probability 1/2 (a.k.a. flip a coin: heads you take the same option, tails you try something new). On visit three, you should pick the best thing with probability 2/3, and try something new with probability 1/3. On the Nth visit to the restaurant, you choose at random with probability 1/N, otherwise taking the best option discovered so far. By gradually decreasing the probability of trying something new, you hit the sweet spot between exploration and exploitation.

  There’s also another, more sophisticated algorithm for playing the multi-armed bandit that likewise makes use of randomness. It’s called Thompson Sampling, named after William R. Thompson, the Yale physician who first posed the problem (back in 1933) of how to choose between two treatments (Thompson, “On the Likelihood That One Unknown Probability Exceeds Another”). Thompson’s solution was simple: using Bayes’s Rule, calculate the probability that each treatment is the best. Then choose that treatment with that probability. To begin with you know nothing, and you are equally likely to choose either treatment. As the data accumulate you come to favor one over the other, but some of the time you still choose the dispreferred treatment and have the chance to change your mind. As you become more certain that one treatment is better, you will end up almost always using that treatment. Thompson Sampling balances exploration and exploitation elegantly, and also guarantees that regret will increase only logarithmically (see Agrawal and Goyal, “Analysis of Thompson Sampling”).

 

‹ Prev