Algorithms to Live By

Page 24

by Brian Christian

Rawls’s philosophical critics have argued at length about how exactly we are supposed to leverage the information obtained from the veil of ignorance. Should we be trying, for instance, to maximize mean happiness, median happiness, total happiness, or something else? Each of these approaches, famously, leaves itself open to pernicious dystopias—such as the civilization of Omelas imagined by writer Ursula K. Le Guin, in which prosperity and harmony abound but a single child is forced to live in abject misery. These are worthy critiques, and Rawls deliberately sidesteps them by leaving open the question of what to do with the information we get from behind the veil. Perhaps the bigger question, though, is how to gather that information in the first place.

The answer may well come from computer science. MIT’s Scott Aaronson says he’s surprised that computer scientists haven’t yet had more influence on philosophy. Part of the reason, he suspects, is just their “failure to communicate what they can add to philosophy’s conceptual arsenal.” He elaborates:

One might think that, once we know something is computable, whether it takes 10 seconds or 20 seconds to compute is obviously the concern of engineers rather than philosophers. But that conclusion would not be so obvious, if the question were one of 10 seconds versus 101010 seconds! And indeed, in complexity theory, the quantitative gaps we care about are usually so vast that one has to consider them qualitative gaps as well. Think, for example, of the difference between reading a 400-page book and reading every possible such book, or between writing down a thousand-digit number and counting to that number.

Computer science gives us a way to articulate the complexity of evaluating all possible social provisions for something like an injured shin. But fortunately, it also provides tools for dealing with that complexity. And the sampling-based Monte Carlo algorithms are some of the most useful approaches in that toolbox.

When we need to make sense of, say, national health care reform—a vast apparatus too complex to be readily understood—our political leaders typically offer us two things: cherry-picked personal anecdotes and aggregate summary statistics. The anecdotes, of course, are rich and vivid, but they’re unrepresentative. Almost any piece of legislation, no matter how enlightened or misguided, will leave someone better off and someone worse off, so carefully selected stories don’t offer any perspective on broader patterns. Aggregate statistics, on the other hand, are the reverse: comprehensive but thin. We might learn, for instance, whether average premiums fell nationwide, but not how that change works out on a more granular level: they might go down for most but, Omelas-style, leave some specific group—undergraduates, or Alaskans, or pregnant women—in dire straits. A statistic can only tell us part of the story, obscuring any underlying heterogeneity. And often we don’t even know which statistic we need.

Since neither sweeping statistics nor politicians’ favorite stories can truly guide us through thousands of pages of proposed legislation, a Monte Carlo–informed computer scientist would propose a different approach: sampling. A close examination of random samples can be one of the most effective means of making sense of something too complex to be comprehended directly. When it comes to handling a qualitatively unmanageable problem, something so thorny and complicated that it can’t be digested whole—solitaire or atomic fission, primality testing or public policy—sampling offers one of the simplest, and also the best, ways of cutting through the difficulties.

We can see this approach at work with the charity GiveDirectly, which distributes unconditional cash transfers to people living in extreme poverty in Kenya and Uganda. It has attracted attention for rethinking conventional charity practices on a number of levels: not only in its unusual mission, but in the level of transparency and accountability it brings to its own process. And the latest element of the status quo that it’s challenging is success stories.

“If you regularly check our website, blog, or Facebook page,” writes program assistant Rebecca Lange, “you may have noticed something you don’t often see: stories and photos of our recipients.” The problem isn’t that the glowing stories proffered by other charities aren’t true. Rather, the very fact that they were deliberately chosen to showcase successes makes it unclear how much information can be gleaned from them. So GiveDirectly decided to put a twist on this conventional practice as well.

Every Wednesday, the GiveDirectly team selects a cash recipient at random, sends out a field officer to interview them, and publishes the field officer’s notes verbatim, no matter what. For instance, here’s their first such interview, with a woman named Mary, who used the money for a tin roof:*

She was able to make a better house and that was a tinned house. She was also able to buy a sofa set for her own house. Her life has changed because she used to have a leaking roof soaking up everything in the house whenever it rained. But because of the transfer she was able to make a better tinned house.

“We hope that this gives you confidence in all types of information we share with you,” Lange writes, “and maybe even inspires you to hold other organizations to a higher bar.”

The Three-Part Tradeoff

At once it struck me what quality went to form a Man of Achievement, especially in Literature, and which Shakespeare possessed so enormously—I mean Negative Capability, that is, when a man is capable of being in uncertainties, mysteries, doubts, without any irritable reaching after fact and reason.

—JOHN KEATS

There is no such thing as absolute certainty, but there is assurance sufficient for the purposes of human life.

—JOHN STUART MILL

Computer science is often a matter of negotiating tradeoffs. In our discussion of sorting in chapter 3, for instance, we noted the tradeoff between time spent up front on sorting versus the time spent later on searching. And in the discussion of caching in chapter 4, we explored the tradeoff of taking up extra space—caches for caches for caches—to save time.

Time and space are at the root of the most familiar tradeoffs in computer science, but recent work on randomized algorithms shows that there’s also another variable to consider: certainty. As Harvard’s Michael Mitzenmacher puts it, “What we’re going to do is come up with an answer which saves you in time and space and trades off this third dimension: error probability.” Asked for his favorite example of this tradeoff into uncertainty, he doesn’t hesitate. “A colleague just said that there should be a drinking game that every time this term appears on one of my slides, you have to take a drink. Have you ever heard of Bloom filters?”

To understand the idea behind a Bloom filter, Mitzenmacher says, consider a search engine like Google, trying to crawl the entire web and index every possible URL. The web is comprised of well over a trillion distinct URLs, and the average URL weighs in at about seventy-seven characters long. When the search engine looks at some URL, how can it check whether that page has already been processed? Just storing a list of all the URLs that have been visited would take a huge amount of space, and repeatedly searching that list (even if it were fully sorted) could prove a nightmare. In fact, it could well be that the cure is worse than the disease: in other words, checking every time to make sure that we’re not reindexing a page might be more time-consuming than just indexing the occasional page twice.

But what if we only needed to be mostly sure this URL was new to us? That’s where the Bloom filter comes in. Named for its inventor, Burton H. Bloom, a Bloom filter works much like the Rabin-Miller primality test: the URL is entered into a set of equations that esssentially check for “witnesses” to its novelty. (Rather than proclaim “n is not prime,” these equations say “I have not seen n before.”) If you’re willing to tolerate an error rate of just 1% or 2%, storing your findings in a probabilistic data structure like a Bloom filter will save you significant amounts of both time and space. And the usefulness of such filters is not confined to search engines: Bloom filters have shipped with a number of recent web browsers to check URLs against a list of known malicious websites, and they are also an
important part of cryptocurrencies like Bitcoin.

Says Mitzenmacher, “The idea of the error tradeoff space—I think the issue is that people don’t associate that with computing. They think computers are supposed to give you the answer. So when you hear in your algorithms class, ‘It’s supposed to give you one answer; it might not be the right answer’—I like to think that when [students] hear that, it focuses them. I think people don’t realize in their own lives how much they do that and accept that.”

Hills, Valleys, and Traps

The river meanders because it can’t think.

—RICHARD KENNEY

Randomness has also proven itself to be a powerful weapon for solving discrete optimization problems, like assembling the calendar for NCAA basketball or finding the shortest route for a traveling salesman. In the previous chapter we saw how relaxation can play a big role in cutting such problems down to size, but the tactical use of randomness has emerged as an arguably even more important technique.

Imagine you’re putting together a globe-trotting ten-city vacation, your own version of the traveling salesman problem: you’ll start and finish in San Francisco and visit Seattle, Los Angeles, New York, Buenos Aires, London, Amsterdam, Copenhagen, Istanbul, Delhi, and Kyoto. You might not be too worried about the total length of the route, but you probably do want to minimize the monetary cost of the trip. The first thing to note here is that even though ten cities hardly sounds like a lot, the number of possible itineraries is ten factorial: more than three and a half million. In other words, there’s no practical way for you to simply check every permutation and pick the lowest price. You have to work smarter than that.

For your first attempt at an itinerary, you might look at taking the cheapest flight out of San Francisco (let’s say it’s Seattle), then taking the cheapest flight from there to any of the other remaining cities (call it Los Angeles), then the cheapest from there (say, New York), and so forth, until you’re at your tenth city and you fly from there back to San Francisco. This is an example of a so-called greedy algorithm, which you can also think of as a “myopic algorithm”: one that shortsightedly takes the best thing available every step of the way. In scheduling theory, as we saw in chapter 5, a greedy algorithm—for instance, always doing the shortest job available, without looking or planning beyond—can sometimes be all that a problem requires. In this case, for the traveling salesman problem, the solution given by the greedy algorithm probably isn’t terrible, but it’s likely to be far from the best you can do.

Once you’ve assembled a baseline itinerary, you might test some alternatives by making slight perturbations to the city sequence and seeing if that makes an improvement. For instance, if we are going first to Seattle, then to Los Angeles, we can try doing those cities in reverse order: L.A. first, then Seattle. For any given itinerary, we can make eleven such two-city flip-flops; let’s say we try them all and then go with the one that gives us the best savings. From here we’ve got a new itinerary to work with, and we can start permuting that one, again looking for the best local improvement. This is an algorithm known as Hill Climbing—since the search through a space of solutions, some better and some worse, is commonly thought of in terms of a landscape with hills and valleys, where your goal is to reach the highest peak.

Eventually you will end up with a solution that is better than all of its permutations; no matter which adjacent stops you flip, nothing beats it. It’s here that the hill climbing stops. Does this mean you’ve definitely found the single best possible itinerary, though? Sadly, no. You may have found only a so-called “local maximum,” not the global maximum of all the possibilities. The hill-climbing landscape is a misty one. You can know that you’re standing on a mountaintop because the ground falls away in all directions—but there might be a higher mountain just across the next valley, hidden behind clouds.

An “error landscape,” which depicts how solution quality can vary across different possibilities.

Consider the lobster stuck in the lobster trap: poor beast, he doesn’t realize that exiting the cage means backtracking to the cage’s center, that he needs to go deeper into the cage to make it out. A lobster trap is nothing other than a local maximum made of wire—a local maximum that kills.

In the case of vacation planning, local maxima are fortunately less fatal, but they have the same character. Even once we’ve found a solution that can’t be improved by any small tweaks, it’s possible that we are still missing the global maximum. The true best itinerary may require a radical overhaul of the trip: doing entire continents in a different order, for instance, or proceeding westward instead of eastward. We may need to temporarily worsen our solution if we want to continue searching for improvements. And randomness provides a strategy—actually, several strategies—for doing just that.

Out of the Local Maximum

One approach is to augment Hill Climbing with what’s known as “jitter”: if it looks like you’re stuck, mix things up a little. Make a few random small changes (even if they are for the worse), then go back to Hill Climbing; see if you end up at a higher peak.

Another approach is to completely scramble our solution when we reach a local maximum, and start Hill Climbing anew from this random new starting point. This algorithm is known, appropriately enough, as “Random-Restart Hill Climbing”—or, more colorfully, as “Shotgun Hill Climbing.” It’s a strategy that proves very effective when there are lots of local maxima in a problem. For example, computer scientists use this approach when trying to decipher codes, since there are lots of ways to begin decrypting a message that look promising at first but end up being dead ends. In decryption, having a text that looks somewhat close to sensible English doesn’t necessarily mean that you’re even on the right track. So sometimes it’s best not to get too attached to an initial direction that shows promise, and simply start over from scratch.

But there’s also a third approach: instead of turning to full-bore randomness when you’re stuck, use a little bit of randomness every time you make a decision. This technique, developed by the same Los Alamos team that came up with the Monte Carlo Method, is called the Metropolis Algorithm. The Metropolis Algorithm is like Hill Climbing, trying out different small-scale tweaks on a solution, but with one important difference: at any given point, it will potentially accept bad tweaks as well as good ones.

We can imagine applying this to our vacation planning problem. Again, we try to tweak our proposed solution by jiggling around the positions of different cities. If a randomly generated tweak to our travel route results in an improvement, then we always accept it, and continue tweaking from there. But if the alteration would make thing a little worse, there’s still a chance that we go with it anyway (although the worse the alteration is, the smaller the chance). That way, we won’t get stuck in any local maximum for very long: eventually we’ll try another nearby solution, even though it’s more expensive, and potentially be on our way to coming up with a new and better plan.

Whether it’s jitter, random restarts, or being open to occasional worsening, randomness is incredibly useful for avoiding local maxima. Chance is not just a viable way of dealing with tough optimization problems; in many cases, it’s essential. Some questions linger, however. How much randomness should you use? And when? And—given that strategies such as the Metropolis Algorithm can permute our itinerary pretty much ad infinitum—how do you ever know that you’re done? For researchers working on optimization, a surprisingly definitive answer to these questions would come from another field entirely.

Simulated Annealing

In the late 1970s and early ’80s, Scott Kirkpatrick considered himself a physicist, not a computer scientist. In particular, Kirkpatrick was interested in statistical physics, which uses randomness as a way to explain certain natural phenomena—for instance, the physics of annealing, the way that materials change state as they are heated and cooled. Perhaps the most interesting characteristic of annealing is that how quickly or slowly a material is
cooled tends to have tremendous impact on its final structure. As Kirkpatrick explains:

Growing a single crystal from a melt [is] done by careful annealing, first melting the substance, then lowering the temperature slowly, and spending a long time at temperatures in the vicinity of the freezing point. If this is not done, and the substance is allowed to get out of equilibrium, the resulting crystal will have many defects, or the substance may form a glass, with no crystalline order.

Kirkpatrick was then working at IBM, where one of the biggest, trickiest, and most hallowed problems was how to lay out the circuits on the chips that IBM was manufacturing. The problem was ungainly and intractable: there was an enormous range of possible solutions to consider, and some tricky constraints. It was better in general for the components to be close together, for instance—but not too close, or there would be no room for the wires. And any time you moved anything, you’d have to recompute how all the wires would run in the new hypothetical layout.

At the time, this process was led by something of a cryptic guru-type figure within IBM. As Kirkpatrick recalls, “The guy who was the best at IBM at squeezing more circuits on a chip … he had the most mysterious way of explaining what he was doing. He didn’t like to really tell you.”

Kirkpatrick’s friend and IBM colleague Dan Gelatt was fascinated by the problem, and quickly hooked Kirkpatrick, who had a flash of insight. “The way to study [physical systems] was to warm them up then cool them down, and let the system organize itself. From that background, it seemed like a perfectly natural thing to treat all kinds of optimization problems as if the degrees of freedom that you were trying to organize were little atoms, or spins, or what have you.”

‹ Prev Next ›