The Perfect Bet

Page 16

by Adam Kucharski

The argument also showed that major results aren’t always appreciated in their original format. Despite his defense of Borel’s work, Fréchet didn’t think the minimax work was particularly special because mathematicians already knew about the idea, albeit in a different form. It was only when von Neumann applied the minimax concept to games that its value become apparent. As Ferguson discovered when he applied game theory to poker, sometimes an idea that seems unremarkable to scientists can prove extremely powerful when used in a different context.

While the fiery debate between von Neumann and Fréchet sparked and crackled, John Nash was busy finishing his doctorate at Princeton. By establishing the Nash equilibrium, he had managed to extend von Neumann’s work, making it applicable to a wider number of situations. Whereas von Neumann had looked at zero-sum games with two players, Nash showed that optimal strategies exist even if there are multiple players and uneven payoffs. But knowing perfect strategies always exist is just the start for poker players. The next problem is working out how to find them.

MOST PEOPLE WHO HAVE a go at creating poker bots don’t rummage through game theory to find optimal strategies. Instead, they often start off with rule-based approaches. For each situation that could crop up in a game, the creator puts together a series of “if this happens, do that” instructions. The behavior of a rule-based bot therefore depends on its creator’s betting style and how the creator thinks a good player should act.

While earning his master’s degree in 2003, computer scientist Robert Follek put together a rule-based poker program called SoarBot. He built it using a set of artificial decision-making methods known as “Soar,” which had been developed by researchers at the University of Michigan. During a poker game, SoarBot acted in three phases. First, it noted the current situation, including the pocket cards it had been dealt, the values of the communal cards, and the number of players who had folded. With this information, it then ran through all its preprogrammed rules and identified all those that were relevant to the present situation.

After collecting the available options, it entered a decision phase, choosing what to do based on preferences Follek had given it. This decision-making process could be problematic. Occasionally, the set of preferences turned out to be incomplete, with SoarBot either failing to identify any suitable options or being unable to choose between two potential moves. The predetermined preferences could also be inconsistent. Because Follek had input each one individually, sometimes the program would end up containing two preferences that were contradictory. For instance, one rule might tell SoarBot to bet in a given situation while another simultaneously tried to get it to fold.

Even if more rules were added manually, the program would still hit upon an inconsistency or incompleteness from time to time. This type of problem is well known to mathematicians. One year after obtaining his PhD in 1930, Kurt Gödel published a theorem pointing out that the rules that governed arithmetic could not be both complete and consistent. His discovery shook the research community. At the time, leading mathematicians were trying to construct a robust system of rules and assumptions for the subject. They hoped this would clear up a few logical anomalies that had recently been spotted. Led by David Hilbert, who had been von Neumann’s mentor in Germany, these researchers wanted to find a set of rules that was complete—so that all mathematical statements could be proved using only these rules—and consistent, with none of the rules contradicting one another. But Gödel’s incompleteness theorem showed that this was impossible: whichever set of rules was specified, there would always be situations in which additional rules were needed.

Gödel’s logical rigor caused problems outside academia, too. While studying for his American citizenship assessment in 1948, he told his sponsor Oskar Morgenstern that he’d spotted some inconsistencies in the US Constitution. According to Gödel, the contradictions created a legal path for someone to become a dictator. Morgenstern told him it would be unwise to bring it up in the interview.

FORTUNATELY FOR FOLLEK, THE team that originally developed the Soar technology had found a way around Gödel’s problem. When a bot ran into trouble, it would teach itself an additional rule. So, if Follek’s SoarBot couldn’t decide what to do, it could instead pick an arbitrary option and add the choice to its set of rules. Next time the same situation popped up, it could simply search through its memory to find out what it did last time. This type of “machine learning,” with the bot adding new rules as it went along, allowed it to avoid the pitfalls Gödel described.

When Follek let SoarBot compete against human and computer opponents, it became clear his program wasn’t a potential champion. “It played much better than the worst human players,” he said, “and much worse than the best human and software players.” Actually, SoarBot played about as well as Follek did. Although he’d read up on poker strategies, his weakness as a player limited the success of his bot.

From 2004 onward, poker bots grew in popularity thanks to the arrival of cheap software that let players put together their own bot. By tweaking the settings, they could decide which rules the program should follow. With well-chosen rules, these bots could beat some opponents. But, as Follek found, the structure of rule-based bots means that they are generally only as good as their creator. And judging by bots’ success rates online, most creators aren’t very good at poker.

BECAUSE RULE-BASED TACTICS CAN be difficult to get right, some people have turned to game theory to improve their computer players. But it’s tricky to find the optimal tactics for a game as complicated as Texas hold’em poker. Because a huge number of different possible situations could arise, it is very difficult to compute the ideal Nash equilibrium strategy. One way around the problem is to simplify things, creating an abstract version of the game. Just as stripped-down versions of poker helped von Neumann and Ferguson understand the game, making simplifications can also help find tactics that are close to the true optimal strategy.

A common approach is to collect similar poker hands into “buckets.” For example, we could work out the probability a given pair of pocket cards would beat a random other hand in a showdown, and then put hands with comparable winning probabilities into the same bucket. Such approximations dramatically reduce the number of potential scenarios we have to look at.

Bucketing also crops up in other casino games. Because the aim of blackjack is to get as near to twenty-one as possible, knowing whether the next card is likely to be high or low can give a player an advantage. Card counters get ahold of this information by keeping track of the cards that have already been dealt, and hence which ones remain. But with casinos using up to six decks at once, it’s impractical to memorize each individual card as it appears. Instead, counters often group cards into categories. For instance, they might split them into three buckets: high, low, and neutral. As the game progresses, they keep a count of the type of cards they have already seen. When a high card is dealt, they add one to the count; when a low card arrives, they subtract one.

In blackjack, bucketing provides only an estimate of the true count: the fewer buckets a player uses, the less accurate the count will be. Likewise, bucketing won’t give poker players a perfect strategy. Rather, it leads to what are known as “near-equilibrium strategies,” some of which are closer to the true optimum than others. Just as penalty takers can improve their odds by deviating from a simple pure strategy, these not-quite-perfect poker strategies can be categorized by how much players would gain by altering their tactics.

Even with the inclusion of bucketing, we still need a way to work out a near-equilibrium strategy for poker. One way to do this is to use a technique known as “regret minimization.” First, we create a virtual player and give it a random initial strategy. So, it might start off by folding half of the time in a given situation, betting the other half, and never checking. Then we simulate lots and lots of games and allow the player to update its strategy based on how much it regrets its choices. For instance, if the opponent folds premat
urely, the player might regret betting big. Over time, the player will work to minimize the amount of regret it has, and in the process approach the optimal strategy.

Minimizing regret means asking, “How would I have felt if I’d done things differently?” It turns out that the ability to answer this question can be critical when playing games of chance. In 2000, researchers at the University of Iowa reported that people who had injuries to parts of the brain that are related to regret—such as the orbitofrontal cortex—performed very differently in betting games compared to those without brain damage. It wasn’t because the injured players failed to remember previous bad decisions. In many cases, patients with orbitofrontal damage still had a good working memory: when asked to sort a series of cards, or match up different symbols, they had few problems. The difficulties came when they had to deal with uncertainty and use their past experiences to weigh the risks involved. The researchers found that when the feeling of regret was missing from patients’ decision-making process, they struggled to master games involving an element of risk. Rather than simply looking forward to try to maximize payoffs, it appears that it is sometimes necessary to look back at what might have happened and use hindsight to refine a strategy. This contrasts with much economic theory, in which the focus is often on expected gains, with people trying to maximize future payoffs.

Regret minimization is becoming a powerful tool for artificial players. By repeatedly playing games and reevaluating past decisions, bots can construct near-equilibrium strategies for poker. The resulting strategies are far more successful than simple rule-based methods. Yet such approaches still rely on making estimates, which means that against a perfect poker bot, a near-equilibrium strategy will struggle. But how easy is it to make a perfect bot for a complex game?

GAME THEORY WORKS BEST in straightforward games in which all information is known. Tic-tac-toe is a good example: after a few games, most people work out the Nash equilibrium. This is because there aren’t many ways in which a game can progress: if a player gets three in a row, the game is over; players must take it in turns; and it doesn’t matter which way the board is oriented. So, although there are 39 ways to place X’s and O’s on a three-by-three board, only a hundred or so of these 19,683 combinations are actually relevant.

Because tic-tac-toe is so simple, it’s fairly easy to work out the perfect way to react to an opponent’s move. And once both players know the ideal strategy, the game will always result in a draw. Checkers, however, is far from simple. Even the best players have failed to find the perfect strategy. But if anyone could have spotted it, it would have been Marion Tinsley.

A mathematics professor from Florida, Tinsley had a reputation for being unbeatable. He won his first world championship in 1955, holding it for four years before choosing to retire, citing a lack of decent competition. Upon his return to the championships in 1975, he immediately regained his title, thrashing all opposition. Fourteen years later, however, Tinsley’s interest in the game had again started to wane. Then he heard about a piece of software being developed at the University of Alberta in Canada.

Jonathan Schaeffer is now Dean of Science at the university, but back in 1989 he was a young professor in the department of computer science. He’d become interested in checkers after spending time looking at chess programs. Like chess, checkers is played on an eight-by-eight board. Pieces move forward diagonally and capture opposition pieces by leapfrogging. Upon reaching the other side of the board, they become kings and can move backward and forward. The simplicity of its rules makes checkers interesting to game theorists because it is relatively easy to understand, and players can predict the consequences of a move in detail. Perhaps a computer could even be trained to win?

Schaeffer decided to name the fledgling checkers project “Chinook” after the warm winds that occasionally sweep over the Canadian prairies. The name was a pun, inspired by the fact that the English call the game “draughts.” Helped by a team of fellow computer scientists and checkers enthusiasts, Schaeffer quickly got to work on the first challenge: how to deal with the complexity of the game. There are around 1020 possible positions in checkers. That’s 1 followed by twenty zeros: if you collected the sand from all of the world’s beaches, you’d end up with about that number of grains.

To navigate this enormous selection of possibilities, the team got Chinook to follow a minimax approach, hunting down strategies that would be least costly. At each point in the game, there were a certain number of moves Chinook could make. Each of these branched out into another set of options, depending on what its opponent did. As the game progressed, Chinook “pruned” this decision tree, removing weak branches that were likely to lose it the game and examining stronger, potentially winning, branches in detail.

Chinook also had a few tricks lined up especially for human opponents. When it spotted strategies that would eventually lead to a draw against a perfect computer opponent, it didn’t necessarily ignore them. If the draw lay at the end of a long, tangled branch of options, there was a chance a human might make a mistake somewhere along the way. Unlike many game-playing programs, Chinook would often pick these antihuman strategies over an option that was actually better according to game theory.

Chinook played its first tournament in 1990, coming in second in the US National Checker Championship. This should have meant it qualified for the World Championships, but the American Checkers Federation and the English Draughts Association did not want a computer to compete. Fortunately, Tinsley didn’t share their view. After a handful of unofficial games in 1990, he decided that he liked Chinook’s aggressive playing style. Whereas human players would try to force a draw against him, Chinook took risks. Determined to play the computer in a tournament, Tinsley resigned his championship title. Reluctantly, the authorities decided to allow the computer match, and in 1992 Chinook played Tinsley in a “Man-Machine World Championship.” Out of 39 games, Tinsley won 4 to Chinook’s 2, with 33 draws.

Despite holding their own against Tinsley, Schaeffer and his team wanted to do one better. They wanted to make Chinook unbeatable. Chinook depended on detailed predictions, which made it very good but still vulnerable to chance. If they could remove this element of luck, they would have the perfect checkers player.

It might seem odd that checkers involves luck. As long as an identical series of moves are made, the game will always end with the same result. To use the mathematical term, the game is “deterministic”: it is not affected by randomness like poker is. Yet when Chinook played checkers, it couldn’t control the result purely through its actions, which meant that it could be beaten. In theory, it was even possible to lose to a completely incompetent opponent.

To understand why, we must look at another piece of Émile Borel’s research. As well as his work on game theory, Borel was interested in very unlikely events. To illustrate how seemingly rare things will almost certainly happen if we wait long enough, he coined the infinite monkey theorem. The premise of this theorem is simple. Suppose a monkey is hammering away randomly at a typewriter (without smashing it up, as happened when a University of Plymouth team attempted this with a real monkey in 2003) and does so for an infinitely long period of time. If the monkey continues to bash away at the keys, eventually it will be almost sure to type out the complete works of Shakespeare. By sheer chance, says the theorem, the monkey will at some point hit the right letters in the order needed to reproduce all thirty-seven of the Bard’s plays.

No monkey would ever live to an infinite age, let alone sit at a typewriter for that long. So, it’s best to think of the monkey as a metaphor for a random letter generator, churning out an arbitrary sequence of characters. Because the letters are random, there’s a chance—albeit a small one—that the first ones the monkey types are “Who’s there,” the opening line of Hamlet. The monkey might then get lucky and keep typing the correct letters until it’s reproduced all of the plays. This is extremely unlikely, but it could happen. Alternatively, the monkey might ty
pe reams of utter nonsense and then finally hit good fortune with the right combination of letters. It might even type gibberish for billions of years before eventually typing the correct letters in the correct order.

Individually, each one of these events is incredibly unlikely. But because there are so many ways the monkey could end up typing out the complete works of Shakespeare—an infinite number of ways, in fact—the chances of it happening eventually are extremely high. Actually, it’s almost certain to happen.

Now suppose we replaced the typewriter with a checkers board and taught our hypothetical monkey the basic rules of the game. It would therefore make a series of totally random—but valid—moves. And the infinite monkey theorem tells us that because Chinook relied on predictions, the monkey would eventually strike upon a winning combination of moves. Whereas a computer can always force a draw in tic-tac-toe, victory in checkers depended on what Chinook’s opponent did. Part of the game was therefore out of its hands. In other words, winning required luck.

CHINOOK PLAYED ITS LAST competitive game in 1996. But Schaeffer and his collegues did not fully retire their champion software. Instead, they set it to work finding a checkers strategy that would never lose, no matter what the computer’s opponent did. The results were finally announced in 2007, when the Alberta researchers published a paper announcing “Checkers is solved.”

There are three levels of solution for a game like checkers. The most detailed, a “strong solution,” describes the final outcome when perfect players pick up any game at any point, including ones where errors have already been made. This means that, whatever the starting position, we always know the optimal strategy from that point onward. Although this type of solution requires a huge amount of computation, people have found strong solutions for relatively simple games like tic-tac-toe and Connect Four.

‹ Prev Next ›