Algorithms to Live By

Page 29

by Brian Christian

Computer science illustrates the fundamental limitations of this kind of reasoning with what’s called the “halting problem.” As Alan Turing proved in 1936, a computer program can never tell you for sure whether another program might end up calculating forever without end—except by simulating the operation of that program and thus potentially going off the deep end itself. (Accordingly, programmers will never have automated tools that can tell them whether their software will freeze.) This is one of the foundational results in all of computer science, on which many other proofs hang.* Simply put, any time a system—be it a machine or a mind—simulates the workings of something as complex as itself, it finds its resources totally maxed out, more or less by definition. Computer scientists have a term for this potentially endless journey into the hall of mirrors, minds simulating minds simulating minds: “recursion.”

“In poker, you never play your hand,” James Bond says in Casino Royale; “you play the man across from you.” In fact, what you really play is a theoretically infinite recursion. There’s your own hand and the hand you believe your opponent to have; then the hand you believe your opponent believes you have, and the hand you believe your opponent believes you to believe he has … and on it goes. “I don’t know if this is an actual game-theory term,” says the world’s top-rated poker player, Dan Smith, “but poker players call it ‘leveling.’ Level one is ‘I know.’ Two is ‘you know that I know.’ Three, ‘I know that you know that I know.’ There are situations where it just comes up where you are like, ‘Wow, this is a really silly spot to bluff but if he knows that it is a silly spot to bluff then he won’t call me and that’s where it’s the clever spot to bluff.’ Those things happen.”

One of the most memorable bluffs in high-level poker occurred when Tom Dwan wagered $479,500 on Texas Hold ’Em’s absolute worst possible hand, the 2–7—while literally telling his opponent, Sammy George, that he was holding it. “You don’t have deuce-seven,” George replied. “You don’t have deuce-seven.” George folded, and Dwan—with, yes, deuce-seven—took the pot.

In poker, recursion is a dangerous game. You don’t want to get caught one step behind your opponent, of course—but there’s also an imperative not to get too far ahead of them either. “There’s a rule that you really only want to play one level above your opponent,” explains poker professional Vanessa Rousso. “If you play too far above your opponent, you’re going to think they have information that they don’t actually have—[and] they won’t be able to glean the information that you want them to glean from your actions.” Sometimes poker pros will deliberately bait their opponent into a convoluted recursion, meanwhile playing completely by-the-book, unpsychological poker themselves. This is known as luring them into “a leveling war against themselves.”

(Luring an opponent into fruitless recursion can be an effective strategy in other games, too. One of the most colorful, bizarre, and fascinating episodes in the history of man-vs.-machine chess came in a 2008 blitz showdown between American grandmaster Hikaru Nakamura and leading computer chess program Rybka. In a game where each side got just three minutes on the clock to play all of their moves or automatically lose, the advantage surely seemed to be on the side of the computer—capable of evaluating millions of positions every second, and of making its move without twitching a muscle. But Nakamura immediately gridlocked the board, and proceeded to make repetitive, meaningless moves as fast as he could click. Meanwhile, the computer wasted precious moments fruitlessly searching for winning variations that didn’t exist and doggedly trying to anticipate all the possible future moves by Nakamura, who himself was simply doing the chess equivalent of twiddling his thumbs. When the computer had nearly exhausted its time and began flailing so as not to lose by the clock, Nakamura finally opened the position and crashed through.)

Given recursion’s dangers, how do poker professionals break out of it? They use game theory. “Sometimes you can come up with reasons to make exploitive [leveling] plays, but a lot of the time you are just making inferior plays for reasons that are really just noise,” Dan Smith explains. “I try really hard to have a base level of theory understanding in most situations.… I always start by knowing or trying to know what Nash is.”

So what is Nash?

Reaching Equilibrium

You know the rules, and so do I.…

We know the game and we’re gonna play it.

—RICK ASTLEY

Game theory covers an incredibly broad spectrum of scenarios of cooperation and competition, but the field began with those resembling heads-up poker: two-person contests where one player’s gain is another player’s loss. Mathematicians analyzing these games seek to identify a so-called equilibrium: that is, a set of strategies that both players can follow such that neither player would want to change their own play, given the play of their opponent. It’s called an equilibrium because it’s stable—no amount of further reflection by either player will bring them to different choices. I’m content with my strategy, given yours, and you’re content with your strategy, given mine.

In rock-paper-scissors, for example, the equilibrium tells us, perhaps unexcitingly, to choose one of the eponymous hand gestures completely at random, each roughly a third of the time. What makes this equilibrium stable is that, once both players adopt this 1⁄3 - 1⁄3 - 1⁄3 strategy, there is nothing better for either to do than stick with it. (If we tried playing, say, more rock, our opponent would quickly notice and start playing more paper, which would make us play more scissors, and so forth until we both settled into the 1⁄3 - 1⁄3 - 1⁄3 equilibrium again.)

In one of the seminal results in game theory, the mathematician John Nash proved in 1951 that every two-player game has at least one equilibrium. This major discovery would earn Nash the Nobel Prize in Economics in 1994 (and lead to the book and film A Beautiful Mind, about Nash’s life). Such an equilibrium is now often spoken of as the “Nash equilibrium”—the “Nash” that Dan Smith always tries to keep track of.

On the face of it, the fact that a Nash equilibrium always exists in two-player games would seem to bring us some relief from the hall-of-mirrors recursions that characterize poker and many other familiar contests. When we feel ourselves falling down the recursive rabbit hole, we always have an option to step out of our opponent’s head and look for the equilibrium, going directly to the best strategy, assuming rational play. In rock-paper-scissors, scrutinizing your opponent’s face for signs of what they might throw next may not be worthwhile, if you know that simply throwing at random is an unbeatable strategy in the long run.

More generally, the Nash equilibrium offers a prediction of the stable long-term outcome of any set of rules or incentives. As such, it provides an invaluable tool for both predicting and shaping economic policy, as well as social policy in general. As Nobel laureate economist Roger Myerson puts it, the Nash equilibrium “has had a fundamental and pervasive impact in economics and the social sciences which is comparable to that of the discovery of the DNA double helix in the biological sciences.”

Computer science, however, has complicated this story. Put broadly, the object of study in mathematics is truth; the object of study in computer science is complexity. As we’ve seen, it’s not enough for a problem to have a solution if that problem is intractable.

In a game-theory context, knowing that an equilibrium exists doesn’t actually tell us what it is—or how to get there. As UC Berkeley computer scientist Christos Papadimitriou writes, game theory “predicts the agents’ equilibrium behavior typically with no regard to the ways in which such a state will be reached—a consideration that would be a computer scientist’s foremost concern.” Stanford’s Tim Roughgarden echoes the sentiment of being unsatisfied with Nash’s proof that equilibria always exist. “Okay,” he says, “but we’re computer scientists, right? Give us something we can use. Don’t just tell me that it’s there; tell me how to find it.” And so, the original field of game theory begat algorithmic game theory—that is, the s
tudy of theoretically ideal strategies for games became the study of how machines (and people) come up with strategies for games.

As it turns out, asking too many questions about Nash equilibria gets you into computational trouble in a hurry. By the end of the twentieth century, determining whether a game has more than one equilibrium, or an equilibrium that gives a player a certain payoff, or an equilibrium that involves taking a particular action, had all been proved to be intractable problems. Then, from 2005 to 2008, Papadimitriou and his colleagues proved that simply finding Nash equilibria is intractable as well.

Simple games like rock-paper-scissors may have equilibria visible at a glance, but in games of real-world complexity it’s now clear we cannot take for granted that the participants will be able to discover or reach the game’s equilibrium. This, in turn, means that the game’s designers can’t necessarily use the equilibrium to predict how the players will behave. The ramifications of this sobering result are profound: Nash equilibria have held a hallowed place within economic theory as a way to model and predict market behavior, but that place might not be deserved. As Papadimitriou explains, “If an equilibrium concept is not efficiently computable, much of its credibility as a prediction of the behavior of rational agents is lost.” MIT’s Scott Aaronson agrees. “In my opinion,” he says, “if the theorem that Nash equilibria exist is considered relevant to debates about (say) free markets versus government intervention, then the theorem that finding those equilibria is [intractable] should be considered relevant also.” The predictive abilities of Nash equilibria only matter if those equilibria can actually be found by the players. To quote eBay’s former director of research, Kamal Jain, “If your laptop cannot find it, neither can the market.”

Dominant Strategies, for Better or Worse

Even when we can reach an equilibrium, just because it’s stable doesn’t make it good. It may seem paradoxical, but the equilibrium strategy—where neither player is willing to change tack—is by no means necessarily the strategy that leads to the best outcomes for the players. Nowhere is that better illustrated than in game theory’s most famous, provocative, and controversial two-player game: “the prisoner’s dilemma.”

The prisoner’s dilemma works as follows. Imagine that you and a co-conspirator have been arrested after robbing a bank, and are being held in separate jail cells. Now you must decide whether to “cooperate” with each other—by remaining silent and admitting nothing—or to “defect” from your partnership by ratting out the other to the police. You know that if you both cooperate with each other and keep silent, the state doesn’t have enough evidence to convict either one of you, so you’ll both walk free, splitting the loot—half a million dollars each, let’s say. If one of you defects and informs on the other, and the other says nothing, the informer goes free and gets the entire million dollars, while the silent one is convicted as the sole perpetrator of the crime and receives a ten-year sentence. If you both inform on each other, then you’ll share the blame and split the sentence: five years each.

Here’s the problem. No matter what your accomplice does, it’s always better for you to defect.

If your accomplice has ratted you out, ratting them out in turn will give you five years of your life back—you’ll get the shared sentence (five years) rather than serving the whole thing yourself (ten years). And if your accomplice has stayed quiet, turning them in will net you the full million dollars—you won’t have to split it. No matter what, you’re always better off defecting than cooperating, regardless of what your accomplice decides. To do otherwise will always make you worse off, no matter what.

In fact, this makes defection not merely the equilibrium strategy but what’s known as a dominant strategy. A dominant strategy avoids recursion altogether, by being the best response to all of your opponent’s possible strategies—so you don’t even need to trouble yourself getting inside their head at all. A dominant strategy is a powerful thing.

But now we’ve arrived at the paradox. If everyone does the rational thing and follows the dominant strategy, the story ends with both of you serving five years of hard time—which, compared to freedom and a cool half million apiece, is dramatically worse for everyone involved. How could that have happened?

This has emerged as one of the major insights of traditional game theory: the equilibrium for a set of players, all acting rationally in their own interest, may not be the outcome that is actually best for those players.

Algorithmic game theory, in keeping with the principles of computer science, has taken this insight and quantified it, creating a measure called “the price of anarchy.” The price of anarchy measures the gap between cooperation (a centrally designed or coordinated solution) and competition (where each participant is independently trying to maximize the outcome for themselves). In a game like the prisoner’s dilemma, this price is effectively infinite: increasing the amount of cash at stake and lengthening the jail sentences can make the gap between possible outcomes arbitrarily wide, even as the dominant strategy stays the same. There’s no limit to how painful things can get for the players if they don’t coordinate. But in other games, as algorithmic game theorists would discover, the price of anarchy is not nearly so bad.

For instance, consider traffic. Whether it’s individual commuters trying to make their way through the daily bumper-to-bumper, or routers shuffling TCP packets across the Internet, everyone in the system merely wants what’s easiest for them personally. Drivers just want to take the fastest route, whatever it is, and routers just want to shuffle along their packets with minimal effort—but in both cases this can result in overcrowding along critical pathways, creating congestion that harms everyone. How much harm, though? Surprisingly, Tim Roughgarden and Cornell’s Éva Tardos proved in 2002 that the “selfish routing” approach has a price of anarchy that’s a mere 4/3. That is, a free-for-all is only 33% worse than perfect top-down coordination.

Roughgarden and Tardos’s work has deep implications both for urban planning of physical traffic and for network infrastructure. Selfish routing’s low price of anarchy may explain, for instance, why the Internet works as well as it does without any central authority managing the routing of individual packets. Even if such coordination were possible, it wouldn’t add very much.

When it comes to traffic of the human kind, the low price of anarchy cuts both ways. The good news is that the lack of centralized coordination is making your commute at most only 33% worse. On the other hand, if you’re hoping that networked, self-driving autonomous cars will bring us a future of traffic utopia, it may be disheartening to learn that today’s selfish, uncoordinated drivers are already pretty close to optimal. It’s true that self-driving cars should reduce the number of road accidents and may be able to drive more closely together, both of which would speed up traffic. But from a congestion standpoint, the fact that anarchy is only 4/3 as congested as perfect coordination means that perfectly coordinated commutes will only be 3/4 as congested as they are now. It’s a bit like the famous line by James Branch Cabell: “The optimist proclaims that we live in the best of all possible worlds; and the pessimist fears this is true.” Congestion will always be a problem solvable more by planners and by overall demand than by the decisions of individual drivers, human or computer, selfish or cooperative.

Quantifying the price of anarchy has given the field a concrete and rigorous way to assess the pros and cons of decentralized systems, which has broad implications across any number of domains where people find themselves involved in game-playing (whether they know it or not). A low price of anarchy means the system is, for better or worse, about as good on its own as it would be if it were carefully managed. A high price of anarchy, on the other hand, means that things have the potential to turn out fine if they’re carefully coordinated—but that without some form of intervention, we are courting disaster. The prisoner’s dilemma is clearly of this latter type. Unfortunately, so are many of the most critical games the world must play.
r /> The Tragedy of the Commons

In 1968, the ecologist Garrett Hardin took the two-player prisoner’s dilemma and imagined scaling it up to involve all the members of a farming village. Hardin invited his readers to picture a “commons” of public lawn—available to be grazed by everyone’s livestock, but with finite capacity. In theory, all the villagers should graze only as many animals as would leave some grass for everyone. In practice, though, the benefits of grazing a little bit more than that accrue directly to you, while the harms seem too small to be of consequence. Yet if everyone follows this logic of using just slightly more of the commons than they should, a dreadful equilibrium results: a completely devastated lawn, and no grass for anyone’s livestock thereafter.

Hardin called this the “tragedy of the commons,” and it has become one of the primary lenses through which economists, political scientists, and the environmental movement view large-scale ecological crises like pollution and climate change. “When I was a kid, there was this thing called leaded gasoline,” says Avrim Blum, Carnegie Mellon computer scientist and game theorist. “Leaded was ten cents cheaper or something, but it pollutes the environment.… Given what everyone else is doing, how much worse really are you personally [health-wise] if you put leaded gasoline in your own car? Not that much worse. It’s the prisoner’s dilemma.” The same is true at the corporate and national levels. A recent newspaper headline put the trouble succinctly: “Stable climate demands most fossil fuels stay in the ground, but whose?” Every corporation (and, to some degree, every nation) is better off being a bit more reckless than their peers for the sake of competitive advantage. Yet if they all act more recklessly, it leads to a ravaged Earth, and all for nothing: there’s no economic advantage for anyone relative to where they started.

‹ Prev Next ›