Algorithms to Live By

Page 39

by Brian Christian

Kenny Rogers famously advised: “The Gambler” is best known as sung by Kenny Rogers on his 1978 album of the same name, but it was originally written and performed by Don Schlitz. The Rogers recording of the song would go on to reach the top spot on the Billboard country charts, and win the 1980 Grammy for Best Male Country Vocal Performance.

“I breathed a very long sigh of relief”: Gould, “The Median Isn’t the Message.”

asking people to make predictions: Griffiths and Tenenbaum, “Optimal Predictions in Everyday Cognition.”

people’s prior distributions across a broad swath: Studies have examined, for example, how we manage to identify moving shapes from the patterns of light that fall on the retina, infer causal relationships from the interactions between objects, and learn the meaning of new words after seeing them just a few times. See, respectively, Weiss, Simoncelli, and Adelson, “Motion Illusions as Optimal Percepts”; Griffiths et al., “Bayes and Blickets”; Xu and Tenenbaum, “Word Learning as Bayesian Inference.”

famous “marshmallow test”: Mischel, Ebbesen, and Raskoff Zeiss, “Cognitive and Attentional Mechanisms in Delay of Gratification.”

all depends on what kind of situation: McGuire and Kable, “Decision Makers Calibrate Behavioral Persistence on the Basis of Time-Interval Experience,” and McGuire and Kable, “Rational Temporal Predictions Can Underlie Apparent Failures to Delay Gratification.”

grew into young adults who were more successful: Mischel, Shoda, and Rodriguez, “Delay of Gratification in Children.”

how prior experiences might affect behavior: Kidd, Palmeri, and Aslin, “Rational Snacking.”

Carnegie Hall even half full: According to figures from the Aviation Safety Network (personal correspondence), the number of fatalities “on board US-owned aircraft that are capable of carrying 12+ passengers, also including corporate jets and military transport planes” during the period 2000–2014 was 1,369, and adding the 2014 figure again to estimate deaths in 2015 yields a total estimate of 1,393 through the end of 2015. Carnegie Hall’s famous Isaac Stern Auditorium seats 2,804; see http://www.carnegiehall.org/Information/Stern-Auditorium-Perelman-Stage/.

greater than the entire population of Wyoming: According to the National Highway Traffic Safety Administration, 543,407 people died in car accidents in the United States in the years 2000–2013. See http://www-fars.nhtsa.dot.gov. Repeating the 2013 figure to estimate deaths in 2014 and 2015 yields an estimate of 608,845 deaths through the end of 2015. The 2014 population of Wyoming, as estimated by the US Census Bureau, was 584,153. See http://quickfacts.census.gov/qfd/states/56000.html.

gun violence on American news: Glassner, “Narrative Techniques of Fear Mongering.”

7. OVERFITTING

“Marry—Marry—Marry Q.E.D.”: This note by Darwin is dated April 7, 1838; see, e.g., Darwin, The Correspondence of Charles Darwin, Volume 2: 1837–1843.

“Moral or Prudential Algebra”: Franklin’s letter to Joseph Priestley, London, September 19, 1772.

“Anything you can do I can do better”: “Anything You Can Do,” composed by Irving Berlin, in Annie Get Your Gun, 1946.

what you know and what you don’t: In the language of machine-learning researchers: the “training” and the “test.”

a recent study conducted in Germany: Lucas et al., “Reexamining Adaptation and the Set Point Model of Happiness.”

our job is to figure out the formula: For math aficionados, we’re trying to find the best polynomial function for capturing this relationship. Taking time since marriage to be x and satisfaction to be y, the one-predictor model is y = ax + b. The two-predictor model is y = ax2 + bx + c, and the nine-predictor model finds the best coefficients for all values of x up to x9, estimating a polynomial of degree 9.

through each and every point on the chart: In fact, it’s a mathematical truth that you can always draw a polynomial of degree n − 1 through any n points.

people’s baseline level of satisfaction: Lucas et al., “Reexamining Adaptation and the Set Point Model of Happiness.”

not always better to use a more complex model: Statisticians refer to the various factors in the model as “predictors.” A model that’s too simple, such as a straight line attempting to fit a curve, is said to exhibit “bias.” The opposite kind of systemic error, where a model is made too complicated and therefore gyrates wildly because of small changes in the data, is known as “variance.”

The surprise is that these two kinds of errors—bias and variance—can be complementary. Reducing bias (making the model more flexible and complicated) can increase variance. And increasing bias (simplifying the model and fitting the data less tightly) can sometimes reduce variance.

Like the famous Heisenberg uncertainty principle of particle physics, which says that the more you know about a particle’s momentum the less you know about its position, the so-called bias-variance tradeoff expresses a deep and fundamental bound on how good a model can be—on what it’s possible to know and to predict. This notion is found in various places in the machine-learning literature. See, for instance, Geman, Bienenstock, and Doursat, “Neural Networks and the Bias/Variance Dilemma,” and Grenander, “On Empirical Spectral Analysis of Stochastic Processes.”

in the Book of Kings: The bronze snake, known as Nehushtan, gets destroyed in 2 Kings 18:4.

“pay good money to remove the tattoos”: Gilbert, Stumbling on Happiness.

duels less than fifty years ago: If you’re not too fainthearted, you can watch video of a duel fought in 1967 at http://passerelle-production.u-bourgogne.fr/web/atip_insulte/Video/archive_duel_france.swf.

as athletes overfit their tactics: For an interesting example of very deliberately overfitting fencing, see Harmenberg, Epee 2.0.

“Incentive structures work”: Brent Schlender, “The Lost Steve Jobs Tapes,” Fast Company, May 2012, http://www.fastcompany.com/1826869/lost-steve-jobs-tapes.

“whatever the CEO decides to measure”: Sam Altman, “Welcome, and Ideas, Products, Teams and Execution Part I,” Stanford CS183B, Fall 2014, “How to Start a Startup,” http://startupclass.samaltman.com/courses/lec01/.

Ridgway cataloged a host of such: Ridgway, “Dysfunctional Consequences of Performance Measurements.”

At a job-placement firm: In this tale, Ridgway is himself citing Blau, The Dynamics of Bureaucracy.

“Friends don’t let friends measure Page Views”: Avinash Kaushik, “You Are What You Measure, So Choose Your KPIs (Incentives) Wisely!” http://www.kaushik.net/avinash/measure-choose-smarter-kpis-incentives/.

“dead cops were found”: Grossman and Christensen, On Combat. See http://www.killology.com/on_combat_ch2.htm.

officer instinctively grabbed the gun: Ibid.

“If you can’t explain it simply”: This quotation is frequently attributed to Albert Einstein, although this attribution is likely to be apocryphal.

Tikhonov proposed one answer: See, e.g., Tikhonov and Arsenin, Solution of Ill-Posed Problems.

invented in 1996 by biostatistician Robert Tibshirani: Tibshirani, “Regression Shrinkage and Selection via the Lasso.”

human brain burns about a fifth: For more on the human brain’s energy consumption see, e.g., Raichle and Gusnard, “Appraising the Brain’s Energy Budget,” which in turn cites, e.g., Clarke and Sokoloff, “Circulation and Energy Metabolism of the Brain.”

brains try to minimize the number of neurons: Using this neurally inspired strategy (known as “sparse coding”), researchers have developed artificial neurons that have properties similar to those found in the visual cortex. See Olshausen and Field, “Emergence of Simple-Cell Receptive Field Properties.”

groundbreaking “mean-variance portfolio optimization”: The work for which Markowitz was awarded the Nobel Prize appears in his paper “Portfolio Selection” and his book Portfolio Selection: Efficient Diversification of Investments.

“I split my contributions fifty-fifty”: Harry Markowitz, as quoted in Jason Zweig, “How the
Big Brains Invest at TIAA–CREF,” Money 27(1): 114, January 1998.

“less information, computation, and time”: Gigerenzer and Brighton, “Homo Heuristicus.”

more than quadrupled from the mid-1990s to 2013: From Soyfoods Association of North America, “Sales and Trends,” http://www.soyfoods.org/soy-products/sales-and-trends, which in turn cites research “conducted by Katahdin Ventures.”

“Nuts are trendy now”: Vanessa Wong, “Drinkable Almonds,” Bloomberg Businessweek, August 21, 2013.

an astounding three-hundred-fold since 2004: Lisa Roolant, “Why Coconut Water Is Now a $1 Billion Industry,” TransferWise, https://transferwise.com/blog/2014-05/why-coconut-water-is-now-a-1-billion-industry/.

“jumped from invisible to unavoidable”: David Segal, “For Coconut Waters, a Street Fight for Shelf Space,” New York Times, July 26, 2014.

the kale market grew by 40%: “Sales of Kale Soar as Celebrity Chefs Highlight Health Benefits,” The Telegraph, March 25, 2013

Pizza Hut, which put it in their salad bars: Ayla Withee, “Kale: One Easy Way to Add More Superfoods to Your Diet,” Boston Magazine, May 31, 2012.

early vertebrates’ bodies twisted 180 degrees: Kinsbourne, “Somatic Twist.” Further discussion of body and organ structure in primitive vertebrates can be found in Lowe et al., “Dorsoventral Patterning in Hemichordates.” A more approachable overview is Kelly Zalocusky, “Ask a Neuroscientist: Why Does the Nervous System Decussate?,” Stanford Neuroblog, December 12, 2013, https://neuroscience.stanford.edu/news/ask-neuroscientist-why-does-nervous-system-decussate.

jawbones were apparently repurposed: See, for example, “Jaws to Ears in the Ancestors of Mammals,” Understanding Evolution, http://evolution.berkeley.edu/evolibrary/article/evograms_05.

“the premise that we can’t measure what matters”: “The Scary World of Mr Mintzberg,” interview with Simon Caulkin, Guardian, January 25, 2003, http://www.theguardian.com/business/2003/jan/26/theobserver.observerbusiness11.

“one’s whole life like a neuter bee”: Darwin, The Correspondence of Charles Darwin, Volume 2: 1837–1843.

“When? Soon or Late”: Ibid.

8. RELAXATION

“successfully design a peptidic inhibitor”: Meghan Peterson (née Bellows), personal interview, September 23, 2014.

about 11107 possible seating plans: More precisely, there would be 11107 possibilities if we were choosing a table assignment for each person independently. The number is a little less once we take into account the constraint that only 10 people can sit at each table. But it’s still huge.

Bellows was pleased with the computer’s results: The formal framework that Meghan Bellows used to solve her wedding seating chart is described in Bellows and Peterson, “Finding an Optimal Seating Chart.”

Lincoln worked as a “prairie lawyer”: You can read more about Lincoln’s circuit in Fraker, “The Real Lincoln Highway.”

“the postal messenger problem”: Menger, “Das botenproblem,” contains a lecture given by Menger on the subject in Vienna on February 5, 1930. For a fuller history of the traveling salesman problem see Schrijver, “On the History of Combinatorial Optimization,” as well as Cook’s very readable book In Pursuit of the Traveling Salesman.

fellow mathematician Merrill Flood: Flood, “The Traveling-Salesman Problem.”

iconic name first appeared in print: Robinson, On the Hamiltonian Game.

“impossibility results would also be valuable”: Flood, “The Traveling-Salesman Problem.”

“no good algorithm for the traveling salesman problem”: Edmonds, “Optimum Branchings.”

what makes a problem feasible: Cobham, “The Intrinsic Computational Difficulty of Functions,” explicitly considers the question of what should be considered an “efficient” algorithm. Similarly, Edmonds, “Paths, Trees, and Flowers,” explains why a solution to a difficult problem is significant and, in making the case for this particular solution, establishes a general framework for what makes algorithms good.

the field’s de facto out-of-bounds marker: There are, in fact, algorithms that run slower than polynomial time but faster than exponential time; these “superpolynomial” runtimes also put them outside the set of efficient algorithms.

either efficiently solvable or not: The set of efficiently solvable problems in computer science is called P, short for “polynomial time.” The controversially liminal set of problems, meanwhile, is known as NP, for “nondeterministic polynomial.” Problems in NP can have their solutions verified efficiently once found, but whether every problem that can be easily verified can also be easily solved is unknown. For instance, if someone shows you a route and says that it’s less than 1,000 miles, the claim is easy to check—but finding a route less than 1,000 miles, or proving that it’s impossible, is another feat entirely. The question of whether P = NP (i.e., whether it’s possible to jump efficiently to the solutions of NP problems) is the greatest unsolved mystery in computer science.

The main advance toward a solution has been the demonstration that there are certain problems with a special status: if one of them can be solved efficiently, then any problem in NP can be solved efficiently and P = NP (Cook, “The Complexity of Theorem-Proving Procedures”). These are known as “NP-hard” problems. In the absence of an answer to whether P = NP, problems in NP cannot be solved efficiently, which is why we refer to them as “intractable.” (In “A Terminological Proposal,” Donald Knuth suggested this as an appropriate label for NP-hard problems, in addition to offering a live turkey to anybody who could prove P = NP.) The intractable scheduling problems that Eugene Lawler encountered in chapter 5 fall into this category. An NP-hard problem that is itself in NP is known as “NP-complete.” See Karp, “Reducibility Among Combinatorial Problems,” for the classic result showing that a version of the traveling salesman problem is NP-complete, and Fortnow, The Golden Ticket: P, NP, and the Search for the Impossible, for an accessible introduction to P and NP.

most computer scientists believe that there aren’t any: In a 2002 survey of one hundred leading theoretical computer scientists, sixty-one thought P ≠ NP and only nine thought P = NP (Gasarch, “The P =? NP Poll”). While proving P = NP could be done by exhibiting a polynomial-time algorithm for an NP-complete problem, proving P ≠ NP requires making complex arguments about the limits of polynomial-time algorithms, and there wasn’t much agreement among the people surveyed about exactly what kind of mathematics will be needed to solve this problem. But about half of them did think the issue would be resolved before 2060.

What’s more, many other optimization problems: This includes versions of vertex cover and set cover—two problems identified as belonging to NP in Karp, “Reducibility Among Combinatorial Problems,” where twenty-one problems were famously shown to be in this set. By the end of the 1970s, computer scientists had identified some three hundred NP-complete problems (Garey and Johnson, Computers and Intractability), and the list has grown significantly since then. These include some problems that are very familiar to humans. In 2003, Sudoku was shown to be NP-complete (Yato and Seta, “Complexity and Completeness”), as was maximizing the number of cleared rows in Tetris, even with perfect knowledge of future pieces (Demaine, Hohenberger, and Liben-Nowell, “Tetris Is Hard, Even to Approximate”). In 2012, determining whether there exists a path to the end of the level in platformer games like Super Mario Brothers was officially added to the list (Aloupis, Demaine, and Guo, “Classic Nintendo Games are (NP-) Hard”).

“you still have to fight it”: Jan Karel Lenstra, personal interview, September 2, 2014.

“The perfect is the enemy of the good”: Voltaire’s couplet Dans ses écrits, un sage Italien / Dit que le mieux est l’ennemi du bien (“In his writings, an Italian sage / Says the perfect is the enemy of the good”) appears at the start of his poem “La Bégueule.” Voltaire had earlier cited the Italian expression “Le meglio è l’inimico del bene” in his 1764 Dictionnaire philosophique.

thei
r minds also turn to relaxation: Shaw, An Introduction to Relaxation Methods; Henderson, Discrete Relaxation Techniques. Caveat lector: the math is intense enough that these make for far-from-relaxing reading.

for Lincoln’s judicial circuit: The towns of Lincoln’s judicial circuit are derived from the 1847–1853 map of the 8th Judicial Circuit in the Journal of the Abraham Lincoln Association. See http://quod.lib.u mich.edu/j/jala/images/fraker_fig01a.jpg.

essentially no time at all: Well, okay, a little bit of time—linear in the number of cities if you’re lucky, linearithmic if you’re not. Pettie and Ramachandran, “An Optimal Minimum Spanning Tree Algorithm.”

the spanning tree, with its free backtracking: Approaching the traveling salesman problem via the minimum spanning tree is discussed in Christofides, Worst-Case Analysis of a New Heuristic.

visits every single town on Earth: For more on the state of the art in the all-world-cities traveling salesman problem (the so-called “World TSP”), an up-to-date report can be found at http://www.math.uwaterloo.ca/tsp/world/. For more on the traveling salesman problem in general, Cook, In Pursuit of the Traveling Salesman, is a good general reference, and Lawler et al., The Traveling Salesman Problem, will satisfy those who want to go deeper.

finding the minimal set of locations: This classic discrete optimization problem is known as the “set cover” problem.

“when you can’t do half of this”: Laura Albert McLay, personal interview, September 16, 2014.

let you lick the fewest envelopes: In computer science, this is known as the “vertex cover” problem. It’s a kind of cousin to the set cover problem, where instead of seeking the smallest number of fire stations whose coverage includes everyone, the goal is to find the smallest number of people who are connected to everyone else.

‹ Prev Next ›