The Perfect Bet

Page 17

by Adam Kucharski

The next type of solution is when the optimal result is known, but we only know how to reach it if we play from the start of the game. These “weak solutions” are particularly common for complicated games, where it is only feasible to look at what happens if both players make perfect moves throughout.

The most basic, an “ultraweak solution,” reveals the final result when both players make a perfect sequence of moves but doesn’t show what those moves are. For instance, although strong solutions have been found for Connect Four and tic-tac-toe, John Nash showed in 1949 that when any such get-so-many-in-a-row-style game is played perfectly, the player who goes second will never win. Even if we can’t find the optimal strategy, we can prove this claim is true by looking at what happens if we assume that it isn’t and show that our incorrect assumption leads to a logical dead-end. Mathematicians call such an approach “proof by contradiction.”

To kick off our proof, let’s suppose there is a winning sequence of moves for the second player. The first player can turn the situation to their advantage by making their opening move completely at random, waiting for the second player to reply, and then “stealing” the second player’s winning strategy from that point on. In effect, the first player has turned into the second player. This “strategy-stealing” approach works because having the randomly placed extra counter on the board at the start can only improve the first player’s chance of winning.

By adopting the second player’s winning strategy, the first player will end up victorious. However, at the start we assumed the second player has a winning strategy. This means both players therefore win, which is clearly a contradiction. So, the only logical outcome is that the second player can never win.

Knowing that a game has an ultraweak solution is interesting but doesn’t really help a player win in practice. In contrast, strong solutions, despite guaranteeing an optimal strategy, can be difficult to find when games have a lot of possible combinations of moves. Because checkers is around a million times more complicated than Connect Four, Schaeffer and colleagues focused on finding a weak solution.

When it played Marion Tinsley, Chinook made decisions in one of two ways. Early in the game, it searched through possible moves, looking ahead to see where they might lead. In the final stages, when there were fewer pieces left on the board and hence fewer possibilities to analyze, Chinook instead referred to its “endgame database” of perfect strategies. Tinsley also had a remarkable understanding of the endgame, which is partly why he was so hard to beat. This became apparent in one of his early games against Chinook in 1990. Chinook had just made its tenth move when Tinsley said, “You’re going to regret that.” Twenty-six moves later, Chinook resigned.

The challenge for the Alberta team was getting the two approaches to meet in the middle. In 1992, Chinook could only look ahead seventeen moves, and its endgame database only had information for situations in which there were fewer than six pieces on the board. What happened in between came down to guesswork.

Thanks to increases in computing power, by 2007 Chinook was able to search far enough into the future, and assemble a large enough endgame database, to trace out the perfect strategy from start to finish. The result, which was published in the journal Science, was a remarkable achievement. Yet the strategy might never have been found had it not been for the matches against Tinsley. The Alberta team later said that the Chinook project “might have died in 1990 because of a lack of human competition.”

Despite it being a perfect strategy, Schaeffer wouldn’t recommend using it in a game against less-skilled opponents. Chinook’s early matches against humans showed that it is often beneficial to deviate from the optimal strategy if it means increasing the chances of an opponent making a mistake. This is because most players cannot see dozens of moves ahead like Chinook can. The potential for errors is even greater in games like chess and poker, where nobody knows the perfect strategy. Which raises an important question: What happens when we apply game theory to games that are too complicated to fully learn?

ALONG WITH TOBIAS GALLA, a physicist at the University of Manchester, Doyne Farmer has started to question how game theory holds up when games aren’t simple. Game theory relies on the assumption that all players are rational. In other words, they are aware of the effects of the various decisions they could make, and they choose the one that benefits them most. In simple games, like tic-tac-toe or the prisoner’s dilemma, it’s easy to make sense of the possible options, which means players’ strategies almost always end up in Nash equilibrium. But what happens when games are too complicated to fully grasp?

The complexity of chess and many forms of poker means that players, be they human or machine, haven’t yet found the optimal strategy. A similar problem crops up in financial markets. Although crucial information—from share prices to bond yields—is widely available, the interactions among banks and brokers that bump and buffet the markets are too intricate to fully understand.

Poker bots try to get around the problem of complexity by “learning” a set of strategies prior to playing an actual game. But in real life, players often learn strategies during a game. Economists have suggested that people tend to pick strategies using “experience-weighted attraction,” preferring past actions that were successful to those that were not. Galla and Farmer wondered whether this process of learning helps players find the Nash equilibrium when games are difficult. They were also curious to see what happens if the game doesn’t settle down to an optimal outcome. What sort of behavior should we expect instead?

Galla and Farmer developed a game in which two computer players could each choose from fifty possible moves. Depending on which combination the two players picked, they each got a specific payoff, which had been assigned randomly before the game started. The values of these predetermined payoffs decided how competitive the game was. The payoffs varied between being zero-sum, with one player’s losses equal to the other’s gain, to being identical for both players. The extent of players’ memory could also vary. In some games, players took account of every previous move during the learning process; in others, they put less emphasis on events further in the past.

For each degree of competitiveness and memory, the researchers looked at how players’ choices changed over time as they learned to pick moves with better outcomes. When players had poor memories, the same decisions soon cropped up again and again, with players often descending into tit-for-tat behavior. But when the players both had a good memory and the game was competitive, something odd happened. Rather than settle down to equilibrium, the decisions fluctuated wildly. Like the roulette balls Farmer had attempted to track while a student, the players’ choices bounced around unpredictably. The researchers found that as the number of players increased, this chaotic decision making became more common. When games are complicated, it seems that it may be impossible to anticipate players’ choices.

Other patterns also emerged, including ones that had previously been spotted in real-life games. When mathematician Benoit Mandelbrot looked at financial markets in the early 1960s, he noticed that volatile periods in stock markets tended to cluster together. “Large changes tend to be followed by large changes,” he noted, “and small changes tend to be followed by small changes.” The appearance of “clustered volatility” has intrigued economists ever since. Galla and Farmer spotted the phenomenon in their game, too, suggesting the pattern may just be a consequence of lots of people trying to learn the complexities of the financial markets.

Of course, Galla and Farmer made several assumptions about how we learn and how games are structured. But even if real life is different, we should not ignore the results. “Even if it turns out that we are wrong,” they said, “explaining why we are wrong will hopefully stimulate game theorists to think more carefully about the generic properties of real games.”

ALTHOUGH GAME THEORY CAN help us identify the optimal strategy, it’s not always the best approach to use when players are error-prone or have to learn. T
he Chinook team knew this, which is why they ensured the program picked strategies that would entice its opponents into making mistakes. Chris Ferguson was also aware of the issue. As well as employing game theory, he looked for changes in body language, adjusting his betting if players become nervous or overconfident. Players don’t just need to anticipate how the perfect opponent behaves; they need to predict how any opponent will behave.

As we shall see in the next chapter, researchers are now delving deeper into artificial learning and intelligence. For some of them, the work has been years in the making. In 2003, an expert human player competed against one of the leading poker bots. Although the bot used game theory strategies to make decisions, it could not predict the changing behavior of its competitors. Afterward, the human player told the bot’s creators, “You have a very strong program. Once you add opponent modeling to it, it will kill everyone.”

7

THE MODEL OPPONENT

WHEN IT CAME TO THE GAME SHOW JEOPARDY! KEN JENNINGS and Brad Rutter were the best. It was 2011, and Rutter had netted the most prize money, while Jennings had gone a record seventy-four appearances without defeat. Thanks to their ability to dissect the show’s famous general knowledge clues, they had won over $5 million between them.

On Valentine’s Day that year, Jennings and Rutter returned for a special edition of the show. They would face a new opponent, named Watson, who had never appeared on Jeopardy! before. Over the course of three episodes, Jennings, Rutter, and Watson answered questions on literature, history, music, and sports. It didn’t take long for the newcomer to edge into the lead. Despite struggling with the “Name the Decade” round, Watson dominated when it came to the Beatles and the history of the Olympics. Although there was a last-minute surge from Jennings, the ex-champions could not keep up. By the end of the show, Watson had racked up over $77,000, more than Jennings and Rutter combined. It was the first time Rutter had ever lost.

Watson didn’t celebrate the win, but its makers did. Named after Thomas Watson, founder of IBM, the machine was the culmination of seven years of work. The idea for Watson had come during a company dinner in 2004. During the meal, an eerie silence had descended over the restaurant. Charles Lickel, IBM’s research manager, realized the lack of conversation was caused by something happening on the room’s television screens. Everyone was watching Ken Jennings on his phenomenal Jeopardy! winning streak. As Lickel looked at the screen, he realized the game could be a good test for IBM’s expertise. The firm had a history of taking on human games—their Deep Blue computer had beaten chess grandmaster Garry Kasparov in 1997—but it hadn’t tackled a game like Jeopardy! before.

To win Jeopardy! players need knowledge, wit, and a talent for wordplay. The show is essentially an inverted quiz. Contestants get clues about the answer and have to tell the host what the question is. So, if a clue is “5,280,” the answer might be “How many feet are there in a mile?”

The finished version of Watson would use dozens of different techniques to interpret the clues and search for the correct response. It had access to the entire contents of Wikipedia and $3 million of computer processors to crunch the information.

Analyzing human language and juggling data can be useful in other less glitzy environments, too. Since Watson’s victory, IBM has updated the software so that it can trawl medical databases and help with decision making in hospitals. Banks are also planning to use it to answer customer questions, while universities are hoping to employ Watson to direct student queries. By studying cookbooks, Watson is even helping chefs find new flavor combinations. In 2015, IBM collected some of the results into a “cognitive computing cookbook,” which includes recipes such as a burrito with chocolate, cinnamon, and edamame beans.

Although Watson’s feats on Jeopardy! were impressive, the show is not the ultimate test for thinking machines. There is another, arguably much bigger, challenge for artificial intelligence out there, one that predates Watson, and even Deep Blue. While Deep Blue’s predecessor “Deep Thought” was gradually clambering up the chess rankings in the early 1990s, a young researcher named Darse Billings arrived at the University of Alberta. He joined the computer science department, where Jonathan Schaeffer and his team had recently developed the successful Chinook checkers program. Perhaps chess would make a good next target? Billings had other ideas. “Chess is easy,” he said. “Let’s try poker.”

EACH SUMMER, THE WORLD’S best poker bots gather to play a tournament. In recent years, three competitors have dominated. First, there is the University of Alberta group, which currently has about a dozen researchers working on poker programs. Next, there is a team from Carnegie Mellon University in Pittsburgh, Pennsylvania, just down the road from where Michael Kent used to work while developing his sports predictions. Tuomas Sandholm, a professor in computer science, heads up the group and their work on champion bot “Tartanian.” Finally, there is Eric Jackson, an independent researcher, who has created a program named “Slumbot.”

The tournament consists of several different competitions, with teams tailoring their bots’ personalities to each one. Some competitions are knockouts. In each round two bots go head-to-head, and the one with the smallest pile of chips at the end gets eliminated. To win these competitions, bots need strong survival instincts. They need to win only enough to get through to the next round: greed, as it were, is not good. In other matches, however, the winning bot is the one that gathers the most cash overall. Computer players therefore need to squeeze as much as they can out of their opponents. Bots need to go on the offensive and find ways to take advantage of the others.

Most of the bots in the competition have spent years in development, training over millions if not billions of games. Yet there are no big prize pots waiting for the winners. The creators might get bragging rights, but they won’t leave with Vegas-sized rewards. So, why are these programs useful?

Whenever a computer plays poker, it is solving a problem that’s familiar to all of us: how to deal with missing information. In games like chess, information is not an issue. Players can see everything. They know where the pieces are and what moves their opponent has made. Luck creeps into the game not because players can’t observe events but because they are unable to process the available information. That is why there is a chance (albeit tiny) that a grandmaster could lose to a monkey picking random moves.

With a good game-playing algorithm—and a lot of computer power—it’s possible to get around the information-processing problem. That’s how Schaeffer and his colleagues found the perfect strategy for checkers and how a computer might one day solve chess. Such machines can beat their opponents with brute force, crunching through every possible set of moves. But poker is different. No matter how good players are, each has to cope with the fact that opponents’ cards are hidden. Although the game has rules and limits, there are always unknown factors. The same problem crops up in many aspects of life. Negotiations, auctions, bargaining; they are all incomplete information games. “Poker is a perfect microcosm of many situations we encounter in the real world,” Schaeffer said.

WHILE WORKING AT LOS Alamos during the Second World War, Stanislaw Ulam, Nick Metropolis, John von Neumann, and others would often play poker late into the night. The games were not particularly intense. The stakes were small, and the conversation light. Ulam said it was “a bath of refreshing foolishness from the very serious and important business that was the raison d’être of Los Alamos.” During one of their games, Metropolis won ten dollars from von Neumann. He was delighted to beat a man who’d written an entire book on game theory. Metropolis used half the money to buy a copy of von Neumann’s Theory of Games and Economic Behavior and stuck the remaining five dollars inside the cover to mark the win.

Even before von Neumann had published his book on game theory, his research into poker was well known. In 1937, von Neumann had presented his work in a lecture at Princeton University. Among the attendees, there would almost certainly have been a young Britis
h mathematician by the name of Alan Turing. At the time Turing was a graduate student visiting from the University of Cambridge. He had come to the United States to work on mathematical logic. Although he was disappointed Kurt Gödel was no longer at the university, Turing generally enjoyed his time at Princeton, even if he did find certain American habits puzzling. “Whenever you thank them for anything they say ‘You’re welcome,’” he told his mother in a letter. “I rather liked it at first, thinking I was welcome, but I now find it comes back like a ball against a wall, and I become positively apprehensive.”

After spending a year at Princeton, Turing returned to England. Although he was based mainly in Cambridge, he also took up a part-time position with the Government Code and Cypher School at nearby Bletchley Park. When the Second World War broke out in autumn 1939, Turing found himself at the forefront of the British effort to break enemy codes. During that period, the German military encrypted radio messages using so-called Enigma machines. These typewriter-like contraptions had a series of rotors that converted keystrokes into coded text. This complexity of the encryption was a major obstacle for the code breakers at Bletchley Park. Even if Turing and his colleagues had clues about the messages—for example, certain “crib” words that were likely to appear in the text—there were still thousands of possible rotor settings to trawl through. To solve the problem, Turing designed a computer-like machine called the “bombe” to do the hard work. Once code breakers had found a crib, the bombe could identify the Enigma settings that produced the code and decipher the rest of the message.

Breaking the Enigma code was probably Turing’s most famous achievement, but much like von Neumann he was also interested in games. Von Neumann’s research on poker certainly grabbed Turing’s attention. When Turing died in 1954, he left his friend Robin Gandy a collection of papers. Among them was a half-finished manuscript entitled “The Game of Poker,” in which Turing had tried to build on von Neumann’s simple analysis of the game.

‹ Prev Next ›