Algorithms to Live By

Page 33

by Brian Christian

four-thousand-year-old Sumerian clay tablet: A detailed analysis appears in Knuth, “Ancient Babylonian Algorithms.” Further information on the history of algorithms, with an emphasis on mathematical algorithms, appears in Chabert, Barbin, and Weeks, A History of Algorithms.

strikes with the end of an antler: This technique is known as “soft hammer percussion.”

“Science is a way of thinking”: Sagan, Broca’s Brain.

the way we think about human rationality: The limitations of a classical conception of rationality—which assumes infinite computational capacity and infinite time to solve a problem—were famously pointed out by the psychologist, economist, and artificial intelligence pioneer Herbert Simon in the 1950s (Simon, Models of Man), ultimately leading to a Nobel Prize. Simon argued that “bounded rationality” could provide a better account of human behavior. Simon’s insight has been echoed in mathematics and computer science. Alan Turing’s colleague I. J. Good (famous for the concept of “the singularity” and for advising Stanley Kubrick about HAL 9000 for 2001: A Space Odyssey) called this sort of thinking “Type II Rationality.” Whereas classic, old-fashioned Type I Rationality just worries about getting the right answer, Type II Rationality takes into account the cost of getting that answer, recognizing that time is just as important a currency as accuracy. See Good, Good Thinking.

Artificial intelligence experts of the twenty-first century have also argued that “bounded optimality”—choosing the algorithm that best trades off time and error—is the key to developing functional intelligent agents. This is a point made by, for instance, UC Berkeley computer scientist Stuart Russell—who literally cowrote the book on artificial intelligence (the bestselling textbook Artificial Intelligence: A Modern Approach)—and by Eric Horvitz, managing director at Microsoft Research. See, for example, Russell and Wefald, Do the Right Thing, and Horvitz and Zilberstein, “Computational Tradeoffs Under Bounded Resources.” Tom and his colleagues have used this approach to develop models of human cognition; see Griffiths, Lieder, and Goodman, “Rational Use of Cognitive Resources.”

analogy to a human mathematician: In section 9 of Turing, “On Computable Numbers,” Turing justifies the choices made in defining what we now call a Turing machine by comparing them to operations that a person might carry out: a two-dimensional piece of paper becomes a one-dimensional tape, the person’s state of mind becomes the state of the machine, and symbols are written and read as the person or machine moves around on the paper. Computation is what a computer does, and at the time the only “computers” were people.

we are irrational and error-prone: For example, see Gilovich, How We Know What Isn’t So; Ariely and Jones, Predictably Irrational; and Marcus, Kluge.

1. OPTIMAL STOPPING

“Though all Christians start”: From Kepler’s letter to “an unknown nobleman” on October 23, 1613; see, e.g., Baumgardt, Johannes Kepler.

such a common phenomenon: The turkey drop is mentioned, among many other places, in http://www.npr.org/templates/story/story.php?storyId=120913056 and http://jezebel.com/5862181/technology-cant-stop-the-turkey-drop.

In any optimal stopping problem: For more about the mathematics of optimal stopping, Ferguson, Optimal Stopping and Applications, is a wonderful reference.

optimal stopping’s most famous puzzle: A detailed treatment of the nature and origins of the secretary problem appears in Ferguson, “Who Solved the Secretary Problem?”

its first appearance in print: What Gardner writes about is a parlor game called the “Game of Googol,” apparently devised in 1958 by John Fox of the Minneapolis-Honeywell Regulator Company and Gerald Marnie of MIT. Here’s how it was described by Fox in his original letter to Gardner on May 11, 1959 (all letters to Gardner we quote are from Martin Gardner’s papers at Stanford University, series 1, box 5, folder 19):

The first player writes down as many unique positive numbers on different slips of paper as he wishes. Then he shuffles them and turns them over one at a time. If the second player tells him to stop at a certain slip and the number on that slip is the largest number in the collection then the second player wins. If not, the first player wins.

Fox further noted that the name of the game comes from the fact that the number “one googol” is often written on one of the slips (presumably to trick the opponent into thinking it’s the largest number, with “two googol” appearing somewhere else). He then claimed that the optimal strategy for the second player was to wait until half the slips had been turned over and then choose the first number larger than the largest in the first half, converging on a 34.7% chance of winning.

Gardner wrote to Leo Moser, a mathematician at the University of Alberta, to get more information about the problem. Moser had written a journal article in 1956 that addressed a closely related problem (Moser, “On a Problem of Cayley”), originally proposed in 1875 by the influential British mathematician Arthur Cayley (Cayley, “Mathematical Questions”; Cayley, Collected Mathematical Papers). Here’s the version proposed by Cayley:

A lottery is arranged as follows: There are n tickets representing a, b, c pounds respectively. A person draws once; looks at his ticket; and if he pleases, draws again (out of the remaining n − 1 tickets); looks at his ticket, and if he pleases draws again (out of the remaining n − 2 tickets); and so on, drawing in all not more than k times; and he receives the value of the last drawn ticket. Supposing that he regulates his drawings in the manner most advantageous to him according to the theory of probabilities, what is the value of his expectation?

Moser added one more piece of information—that the tickets were equally likely to take on any value between 0 and 1.

In Cayley’s problem and Moser’s slight reframing thereof (sometimes collectively referred to as the Cayley-Moser problem), the payoff is the value of the chosen ticket and the challenge is to find the strategy that gives the highest average payoff. It’s here that the problem explored by Cayley and Moser differs from the secretary problem (and the Game of Googol) by focusing on maximizing the average value of the number chosen, rather than the probability of finding the single largest number (when nothing but the best will do). Moser’s 1956 paper is notable not just for the neat solution it provides to this problem, but also because it’s the first place we see mention of the real-world consequences of optimal stopping. Moser talks about two possible scenarios:

1. The tourist’s problem: A tourist traveling by car wants to stop for the night at one of n motels indicated on his road guide. He seeks the most comfortable accommodation but naturally does not want to retrace any part of his journey. What criterion should he use for stopping?

2. The bachelor’s dilemma: A bachelor meets a girl who is willing to marry him and whose “worth” he can estimate. If he rejects her she will have none of him later but he is likely to meet other girls in the future and he estimates that he will have n chances in all. Under what circumstances should he marry?

The idea of entertaining a series of suitors—with the sexes of the protagonists reversed—duly made an appearance in Gardner’s 1960 column on the Game of Googol.

Moser provided the correct solution—the 37% Rule—to Gardner, but his letter of August 26, 1959, suggested that the problem might have an earlier origin: “I also found it in some notes that R. E. Gaskell (of Boeing Aircraft in Seattle) distributed in January, 1959. He credits the problem to Dr. G. Marsaglia.”

Gardner’s charitable interpretation was that Fox and Marnie were claiming the creation of the specific Game of Googol, not of the broader problem of which that game was an instance, a point that was carefully made in his column. But he received a variety of letters citing earlier instances of similar problems, and it’s clear that the problem was passed around among mathematicians.

origins of the problem are surprisingly mysterious: Even Gilbert and Mosteller, “Recognizing the Maximum of a Sequence,” one of the most authoritative scientific papers on the secretary problem, admits that “efforts to discover
the originator of this problem have been unsuccessful.” Ferguson, “Who Solved the Secretary Problem?,” provides an amusing and mathematically detailed history of the secretary problem, including some of its variants. Ferguson argued that in fact the problem described by Gardner hadn’t been solved. It should already be clear that lots of people solved the secretary problem of maximizing the probability of selecting the best from a sequence of applicants distinguished only by their relative ranks, but Ferguson pointed out that this is not actually the problem posed in the Game of Googol. First of all, the Googol player knows the values observed on each slip of paper. Second, it’s a competitive game—with one player trying to select numbers and a sequence that will deceive the other. Ferguson has his own solution to this more challenging problem, but it’s complex enough that you will have to read the paper yourself!

Mosteller recalled hearing about the problem: Gilbert and Mosteller, “Recognizing the Maximum of a Sequence.”

Roger Pinkham of Rutgers wrote: Letter from Roger Pinkham to Martin Gardner, January 29, 1960.

Flood’s influence on computer science: See Cook, In Pursuit of the Traveling Salesman; Poundstone, Prisoner’s Dilemma; and Flood, “Soft News.”

considering the problem since 1949: Flood made this claim in a letter he wrote to Gardner on May 5, 1960. He enclosed a letter from May 5, 1958, in which he provided the correct solution, although he also indicated that Andrew Gleason, David Blackwell, and Herbert Robbins were rumored to have solved the problem in recent years.

In a letter to Tom Ferguson dated May 12, 1988, Flood went into more detail about the origin of the problem. (The letter is on file in the Merrill Flood archive at the University of Michigan.) His daughter, recently graduated from high school, had entered a serious relationship with an older man, and Flood and his wife disapproved. His daughter was taking the minutes at a conference at George Washington University in January 1950, and Flood presented what he called the “fiancé problem” there. In his words, “I made no attempt to solve the problem at that time, but introduced it simply because I hoped that [she] would think in those terms a bit and it sounded like it might be a nice little easy mathematical problem.” Flood indicates that Herbert Robbins provided an approximate solution a few years later, before Flood himself figured out the exact solution.

appears to be in a 1964 paper: The paper is Chow et al., “Optimal Selection Based on Relative Rank.”

the best you’ve seen so far: In the literature, what we call “best yet” applicants are referred to (we think somewhat confusingly) as “candidates.”

settles to 37% of the pool: The 37% Rule is derived by doing the same analysis for n applicants—working out the probability that setting a standard based on the first k applicants results in choosing the best applicant overall. This probability can be expressed in terms of the ratio of k to n, which we can call p. As n gets larger, the probability of choosing the best applicant converges to the mathematical function −p log p. This is maximized when p = 1/e. The value of e is 2.71828…, so 1/e is 0.367879441…, or just under 37%. And the mathematical coincidence—that the probability of success is the same as p—arises because log e is equal to 1. So if p = 1/e, −p log p is just 1/e. A well-explained version of the full derivation appears in Ferguson, “Who Solved the Secretary Problem?”

one of the problem’s curious mathematical symmetries: Mathematicians John Gilbert and Frederick Mosteller call this symmetry “amusing” and discuss it at slightly greater length in Gilbert and Mosteller, “Recognizing the Maximum of a Sequence.”

“The passion between the sexes”: Malthus, An Essay on the Principle of Population.

“married the first man I ever kissed”: Attributed by many sources, e.g., Thomas, Front Row at the White House.

a graduate student, looking for love: Michael Trick’s blog post on meeting his wife is “Finding Love Optimally,” Michael Trick’s Operations Research Blog, February 27, 2011, http://mat.tepper.cmu.edu/blog/?p=1392.

the number of applicants or the time: The 37% Rule applies directly to the time period of one’s search only when the applicants are uniformly distributed across time. Otherwise, you’ll want to aim more precisely for 37% of the distribution over time. See Bruss, “A Unified Approach to a Class of Best Choice Problems.”

the 37% Rule gave age 26.1 years: The analysis of waiting until at least age 26 to propose (37% of the way from 18 to 40) first appears in Lindley, “Dynamic Programming and Decision Theory,” which is presumably where Trick encountered this idea.

courting a total of eleven women: Kepler’s story is covered in detail in Koestler, The Watershed, and in Baumgardt, Johannes Kepler, as well as in Connor, Kepler’s Witch. Most of what we know about Kepler’s search for a second wife comes from one letter in particular, which Kepler wrote to “an unknown nobleman” from Linz, Austria, on October 23, 1613.

propose early and often: Smith, “A Secretary Problem with Uncertain Employment,” showed that if the probability of a proposal being rejected is q, then the strategy that maximizes the probability of finding the best applicant is to look at a proportion of applicants equal to q1/(1−q) and then make offers to each applicant better than those seen so far. This proportion is always less than 1/e, so you’re making your chances better by making more offers. Unfortunately, those chances are still worse than if you weren’t getting rejected—the probability of ending up with the best applicant is also q1/(1−q), and hence less than that given by the 37% Rule.

until you’ve seen 61% of applicants: If delayed proposals are allowed, the optimal strategy depends on the probability of an immediate proposal being accepted, q, and the probability of a delayed proposal being accepted, p. The proportion of candidates to initially pass over is given by the fairly daunting formula . This integrated formula for rejection and recall comes from Petruccelli, “Best-Choice Problems Involving Uncertainty,” although recalling past candidates was considered earlier by Yang, “Recognizing the Maximum of a Random Sequence.”

This formula simplifies when we make particular choices for q and p. If p = 0, so delayed proposals are always rejected, we get back the rule for the secretary problem with rejection. As we approach q = 1, with immediate proposals always being accepted, the proportion at which to begin making offers tends toward ep−1, which is always greater than 1/e (which can be rewritten as e−1). This means that having the potential to make offers to applicants who have been passed over should result in spending more time passing over applicants—something that is quite intuitive. In the main text we assume that immediate proposals are always accepted (q = 1) but delayed proposals are rejected half the time (p = 0.5). Then you should pass over 61% of applicants and make an offer to the best yet who follows, going back at the end and making an offer to the best overall if necessary.

Another possibility considered by Petruccelli is that the probability of rejection increases with time, as the ardor of applicants decreases. If the probability of an offer being accepted by an applicant is qps, where s is the number of “steps” into the past required to reach that applicant, then the optimal strategy depends on q, p, and the number of applicants, n. If q/(1 − p) is more than n − 1 then it’s best to play a waiting game, observing all applicants and then making an offer to the best. Otherwise, observe a proportion equal to q1/(1−q) and make an offer to the next applicant better than those seen so far. Interestingly, this is exactly the same strategy (with the same probability of success) as that when p = 0, meaning that if the probability of rejection increases with time, there is no benefit to being able to go back to a previous candidate.

“No buildup of experience is needed”: Gilbert and Mosteller, “Recognizing the Maximum of a Sequence.”

use the Threshold Rule: The general strategy for solving optimal stopping problems like the full information game is to start at the end and reason backward—a principle that is called “backward induction.” For instance, imagine a game where you roll a die, and have the option either to
stick with that number or roll again a maximum of k times (we took this example from Hill, “Knowing When to Stop”). What’s the optimal strategy? We can figure it out by working backward. If k = 0, you don’t have an option—you have to stick with your roll, and you will average 3.5 points (the average value of a die roll, (1 + 2 + 3 + 4 + 5 + 6)/6). If k = 1, then you should only keep a roll that beats that average—a 4 or higher. If you get a 1, 2, or 3, you’re better off chancing that final roll. Following this strategy, there’s a 50% chance you stop with a 4, 5, or 6 (for an average of 5) and a 50% chance you go on to the final roll (for an average of 3.5). So your average score at k = 1 is 4.25, and you should only keep a roll at k = 2 if it beats that score—a 5 or higher. And so on.

Backward induction thus answers an age-old question. “A bird in the hand is worth two in the bush,” we say, but is 2.0 the right coefficient here? The math suggests that the right number of birds in the bush actually depends on the quality of the bird in the hand. Replacing birds with dice for convenience, a roll of 1, 2, or 3 isn’t even worth as much as a single die “in the bush.” But a roll of 4 is worth one die in the bush, while a roll of 5 is worth two, three, or even four dice in the bush. And a roll of 6 is worth even more than the entire contents of an infinitely large dice bush—whatever that is.

‹ Prev Next ›