The Perfect Bet

Page 19

by Adam Kucharski

When it comes to rock-paper-scissors, machines are much better than humans at coming up with the unpredictable moves required for an optimal game theory strategy. Such a strategy is inherently defensive, of course, because it aims to limit potential losses against a perfect opponent. But the rock-paper-scissors bot on the New York Times website was not playing a perfect opponent. It was playing error-prone humans, who have memory issues and can’t generate random numbers. The bot therefore deviated from a random strategy and started to hunt down weaknesses.

The computer had two main advantages over its human opponents. First, it could accurately remember what the human had done in previous rounds. It could recall which sequences of moves the person had played, for example, and which patterns the person was fond of. And that’s when the second advantage kicked in.

The computer wasn’t just using information on its current opponent. It was drawing on knowledge gained during two hundred thousand rounds of rock-paper-scissors against humans. The database came from Shawn Bayern, a law professor and an ex-computer scientist whose website runs a massive online rock-paper-scissors tournament. The competition is still going, with over half a million rounds played to date (the computer has won the majority of them). The data meant the bot could compare its current opponent with others it had played. Given a particular sequence of moves, it could work out what humans tended to do next. Rather than being interested only in randomness, the machine was instead building a picture of its opponent.

Such approaches can be particularly important in games like poker, which can have more than two players. Recall that, in game theory, optimal strategies are said to be in Nash equilibrium: no single player will gain anything by picking a different strategy. Neil Burch, one of the researchers in the University of Alberta poker group, points out that it makes sense to look for such strategies if you have a single opponent. If the game is zero-sum—with everything you lose going to your opponent, and vice versa—then a Nash equilibrium strategy will limit your losses. What’s more, if your opponent deviates from an equilibrium strategy, your opponent will lose out. “In two player games that are zero-sum, there’s a really good reason to say that a Nash equilibrium is the correct thing to play,” Burch said. However, it isn’t necessarily the best option when more players join the game. “In a three player game, that can fall apart.”

Nash’s theorem says players will lose out if they change their strategy unilaterally. But it doesn’t say what will happen if two players swap tactics together. For instance, two of the players could decide to gang up on the third. When von Neumann and Morgenstern wrote their book on game theory, they noted that such coalitions work only when there are at least three players. “In a two-person game there are not enough players to go around,” they said. “A coalition absorbs at least two players, and then nobody is left to oppose.” Turing also acknowledged the potential role of coalitions in poker. “It is only etiquette, sense of fair play etc. which prevents this happening in actual games,” he said.

There are two main ways to form coalitions in poker. The most blatant way to collude would be for two or more players to reveal their cards to each other. When one of them received a strong hand, they all could push the bets up gradually to squeeze more money out of their opponents. Naturally, this approach is much easier to employ while playing online. Parisa Mazrooei and colleagues at the University of Alberta suggest such collusion should really be referred to as “cheating” because players are using strategies outside the rules of the game, which stipulate that cards remain hidden.

The alternative would be for colluders to keep their cards to themselves but give signals to other players when they had a strong hand. Technically, they would be operating within the rules (if not inside the boundaries of fair play). Colluding players often follow certain betting patterns to improve their chances. If one player bets big, for example, the others follow to drive opponents out of the game. Human players would have to remember such signals, but things are easier for bots, which can have access to the exact set of programmed rules used by fellow colluders.

There are reports of unscrupulous players using both types of approach in online poker rooms. However, it can be difficult to detect collusion. If a player is matching another’s bets, gradually inflating the pot, that player might be manipulating the game to help their teammate. Or the person could be just a naive beginner, trying to bluff to victory. “In any form of poker, there exist a large variety of strategy combinations that are mutually beneficial to those that apply them,” Frederik Dahl has pointed out. “If they apply such strategies on purpose, we would say that they cheat by co-operating, but if it happens just by accident, we would not.”

That’s the problem with using game theory in poker: coalitions don’t always have to be deliberate. They might just result from the strategies players choose. In many situations, there is more than one Nash equilibrium. Take driving a car. There are two equilibrium strategies: if everyone drives on the left, you will lose out if you make a unilateral decision to drive on the other side; if driving on the right is in vogue, the left is no longer the best choice.

Depending on the location of your driver’s seat, one of these equilibriums will be preferable to the other. If your car is left-hand drive, for instance, you’ll probably prefer it if everyone drives on the right. Obviously, having a driver’s seat inconveniently positioned on the “wrong” side of the car won’t be enough to make you change the side you drive on. But the situation is still a bit like having everyone else in a coalition against you (if you’re feeling particularly grumpy about things). Drive on the other side of the road and you’ll clearly lose out, so you just have to put up with the situation.

The same problem crops up in poker. As well as causing inconvenience, it can cost players money. Three poker players could choose Nash equilibrium strategies, and when these strategies are put together, it may turn out that two players have selected tactics that just so happen to pick on the third player. This is why three-player poker is so difficult to tackle from a game theory point of view. Not only is the game far more complicated, with more potential moves to analyze, it’s not clear that hunting for the Nash equilibrium is always the best approach. “Even if you could compute one,” Michael Johanson said, “it wouldn’t necessarily be useful.”

There are other drawbacks, too. Game theory can show you how to minimize your losses against a perfect opponent. But if your opponent has flaws—or if there are more than two players in the game—you might want to deviate from the “optimal” Nash equilibrium strategy and instead take advantage of weaknesses. One way to do this would be to start off with an equilibrium strategy, and then gradually tweak your tactics as you learn more about your opponent. Such approaches can be risky, however. Tuomas Sandholm at Carnegie Mellon University points out that players must strike a balance between exploitation and exploitability. Ideally, you want to exploit, taking as much as possible from weak opponents, but not be exploitable, and come unstuck against strong players. Defensive strategies—such as the Nash equilibrium, and the tactics employed by Dahl’s poker bot—are not very exploitable. Strong players will struggle to beat them. However, this comes at the price of not exploiting weak opponents much; bad players will get off lightly. It would therefore make sense to vary strategy depending on the opponent. As the old saying goes, “Don’t play the cards, play the person.”

Unfortunately, learning to exploit opponents can in turn leave players vulnerable to exploitability. Sandholm calls it the “get taught and exploited problem.” For example, suppose your opponent appears to play aggressively at first. When you notice this, you might adjust tactics to try to take advantage of this aggression. However, at this point the opponent could suddenly become more conservative and exploit the fact that you believe—incorrectly—that you’re facing an aggressive player.

Researchers can judge the effects of such problems by measuring the exploitability of their bots. This is the maximum amount they could ex
pect to lose if the bot had made completely the wrong assumptions about its opponent. Along with PhD student Sam Ganzfried, Sandholm has been developing “hybrid” bots, which combine defensive Nash equilibrium tactics with opponent modeling. “We would like to only attempt to exploit weak opponents,” they said, “while playing the equilibrium against strong opponents.”

IT’S CLEAR THAT POKER programs are getting better and better. Every year, the bots in the Annual Computer Poker Competition become smarter, and Vegas is filling up with poker machines that can beat most casino visitors. But have computers truly overtaken humans? Are the best bots really better than all people?

According to Sandholm, it’s hard to say whether the crossover has happened for several reasons. To start with, you’d have to identify who the best human is. Unfortunately, it’s hard to rank players definitively: poker doesn’t have a clear Garry Kasparov or Marion Tinsley. “We don’t really know who the best human is,” Sandholm said. Games against humans are also difficult to arrange. Although there is a computer competition every year, Sandholm points out that mixed matches are far less common. “It’s hard to get pros to do these man-machine matches.”

There has been the occasional match-up. In 2007, professional players Phil Laak and Ali Eslami took on Polaris, a bot created by the University of Alberta group, in a series of two-player poker games. Polaris was designed to be hard to beat. Rather than attempting to exploit its opponents, it employed a strategy that was close to Nash equilibrium.

At the time, some of the poker community thought Laak and Eslami were strange choices for the match. Laak had a reputation for being hyperactive at the poker table, jumping around, rolling on the floor, doing push-ups. In contrast, Eslami was fairly unknown, having appeared in relatively few televised tournaments. But Laak and Eslami had skills that the researchers needed. Not only were they good players, they were able to say what they were thinking during a game, and they were comfortable with the unusual setup involved in a man-versus-machine match.

The venue for the match was an artificial intelligence conference in Vancouver, Canada, and the game was limit Texas hold’em: the same game Dahl’s bot would later play in Vegas. Although Laak and Eslami would play Polaris in separate games, their scores would be combined at the end of each session. It was to be humans versus the machine, with Laak and Eslami playing as a team against Polaris. To minimize the effects of luck, the card deals were mirrored: whichever cards were dealt to Polaris in one game, the human player got in the other (and vice versa). The organizers also imposed a clear margin of victory. To win, a team needed to finish with at least $250 more chips than their opponent.

On the first day, there were two sessions of play, each consisting of five hundred hands. The first session ended as a draw (Polaris had finished seventy dollars up, which was not enough to be considered a victory). In the second session, Laak was lucky enough to be dealt good cards against Polaris, which meant the computer received the same strong cards in the game against Eslami. Polaris capitalized on the advantage more than Laak did, and the bot ended the day with a clear win over the human team.

That night, Laak and Eslami met up to discuss the thousand poker hands they’d just played. The Alberta team gave them the logbook of the day’s play, including all the hands that had been dealt. This helped the pair dissect the games they’d played. When they returned to the tables the next day, the humans had a much better idea of how to tackle Polaris, and they won the final two sessions. Even so, the humans were modest about their victory. “This was not a win for us,” Eslami said at the time. “We survived. I played the best heads-up poker I’ve ever played and we just narrowly won.”

The following year, there was a second man-machine competition, with a new set of human opponents. This time, seven human players would take on the University of Alberta’s bot in Las Vegas. The humans were arguably some of the best around; several had career winnings totaling more than $1 million. But they would not be facing the same Polaris that lost the previous year. This was Polaris 2.0. It was more advanced and better trained. Since the games against Laak and Eslami, Polaris had played over eight billion games against itself. It was now better at exploring the vast combination of possible moves, which meant there were fewer weak links in its strategy for opponents to target.

Polaris 2.0 also put more emphasis on learning. During a match, the bot would develop a model of its opponent. It would identify which type of strategy a player was using and tailor its game to target weaknesses. Players couldn’t beat Polaris by discussing tactics between games as Laak and Eslami did, because Polaris would be playing differently against each one of them. Nor could humans regain the advantage by altering their own playing style. If Polaris noticed its opponent had changed strategy, the bot would adapt to the new tactics. Michael Bowling, who headed up the Alberta team, said that many of the human players struggled against Polaris’s new box of tricks; they had never seen an opponent switch strategy like that.

As before, the players paired up to take on Polaris in the limit version of Texas hold’em. There were four matches in total, spread over four days. The first two went badly for Polaris, with one a draw and the other a victory for the humans. But this time the humans did not finish strongly; Polaris won the final two matches, and with them the competition.

Whereas Polaris 2.0 drifted away from an optimal strategy to exploit its opponents, the next challenge for the Alberta team was how to make a bot that was truly unbeatable. Their existing bots could only compute an approximate Nash equilibrium, which meant there might have been a strategy out there that could beat them. Bowling and his colleagues therefore set out to find a set of flawless tactics, which in the long run would not lose money against any opponent.

Using the regret minimization approach we encountered in the previous chapter, the Alberta researchers honed the bots, and then got them to play each other again and again, at a rate of about two thousand games per second. Eventually, the bots learned how to avoid being exploited, even by a perfect player. In 2015, the team unveiled their unbeatable poker program—named Cepheus—in the journal Science. With a nod to the group’s checkers research, the paper was titled “Heads-Up Limit Hold’em Poker Is Solved.”

Some of the findings lined up with conventional wisdom. The team showed that in heads-up poker, the dealer—who goes first—holds the advantage as well as the cards. They also found that Cepheus rarely “limps” and chooses to raise or fold on its first go rather than simply calling its opponent’s bet. According to Johanson, as the bot narrowed in on the optimal strategy it also started to come up with some unexpected tactics. “Every now and then we find differences between what the program chooses and what human wisdom says.” For example, the final version of Cepheus chooses to play hands—such as a four and six with different suits—that many humans would fold. In 2013, the team also noticed their bot would occasionally place the minimum allowable bet rather than putting down a larger stake. Given the extent of the bot’s training, this was apparently the optimal thing to do. But Burch points out that human players would see things differently. Although the computer had decided it is a smart tactic, most humans would view it as annoying. “It’s almost a nuisance bet,” Burch said. The polished version of Cepheus is also reluctant to bet big initially. Even when it has the best hand (a pair of aces), it will bet the maximum possible amount only 0.01 percent of the time.

Cepheus has shown that, even in complex situations, it can be possible to find an optimal strategy. The researchers point to a range of scenarios in which such algorithms could be useful, from designing coast guard patrols to medical treatments. But this was not the only reason for the research. The team ended their Science paper with a quote from Alan Turing: “It would be disingenuous of us to disguise the fact that the principal motive which prompted the work was the sheer fun of the thing.”

Despite the breakthrough, not everyone was convinced that it represented the ultimate victory of the artificial over the biolo
gical. Michael Johanson says that many human players view limit poker as the easy option, because there is a cap on how much players can raise the betting. It means the boundaries are well defined, the possibilities constrained.

No-limit poker is seen as the bigger challenge. Players can raise by any amount and can go all in whenever they want. This creates more options, and more subtleties. The game therefore has a reputation for being more of an art than a science. And that’s why Johanson would love to see a computer win. “It would attack the mystique that poker is all about psychology,” he said, “and that computers couldn’t possibly do it.”

Sandholm says it won’t be long before two-player no-limit poker has fallen to the machines. “We’re working on that very actively,” he said. “We may already have bots that are better than the best pros.” Indeed, Carnegie Mellon’s bot Tartanian put in a very strong showing in the 2014 computer poker competition. There were two types of contest for no-limit poker, and Tartanian came out on top in both. As well as winning the knockout competition, it prevailed in the total bankroll contest. Tartanian could survive when it needed to but also rack up plenty of chips against weaker opponents.

As bots get better—and beat more humans—players could end up learning poker from machines. Chess grandmasters already use computers to hone their skills in training. If they want to know how to play a particularly tricky position, the machines can show them the best way to proceed. Chess computers can come up with strategies that stretch further into the future than we humans can manage.

‹ Prev Next ›