by Haim Shapira
Mathematicians call this kind of diagram a ‘game matrix’ – they don’t like using terms such as ‘table’ or ‘chart’ for fear that ordinary people might understand what it’s about, God forbid.
Honestly, so far it sounds like a most boring story, and it’s hard to understand why so many people have written about it. It becomes interesting when we start wondering how it should be played. At first glance, the answer is clear: they should both keep silent, spend a year at the taxpayer’s expense, and go free even sooner if they are pardoned for being model prisoners. End of story. Yet if things were so simple, no one would have cared about the prisoners’ dilemma. The truth is that anything could happen here.
To truly understand the dilemma, let’s step into A’s shoes for a moment:
‘I don’t know what B might say or has said, but I know that he has only two options: silence or betrayal. If B keeps silent and I take the Fifth, I’ll spend a year in prison; but if I betray him, I walk! I mean, if B decides to keep his mouth shut, I walk. I should throw him under the bus.
‘On the other hand, if B gives me up and I keep silent, I’ll rot in jail. Twenty years is a hell of a long time, so if he talks, I should too. That would give me only 18 years. Better than 20, no?
‘Hey! I got it! Betrayal is my best option either way because I’ll either serve no time at all or 2 years less, and 2 years are 730 days on the outside! Man, I’m smart!’
As presented above, it’s a symmetrical game: that is, both players are equal. This means, of course, that B will spend his time in a holding cell doing the very same calculations and reaching the same conclusion, realizing that betrayal is his best option. Where does that leave us? Both players rationally looked after themselves, and the result is bad for both. The rules of the game place them behind bars for 18 years. I can even imagine A and B strolling the prison courtyard a year later, eyeing each other strangely, scratching their heads, and wondering: ‘How the hell did this happen? This is most peculiar. If only we’d had a better notion of the Prisoner’s Dilemma and how it’s played, we would have been free men right now.’
Where did A and B go wrong? Were they even wrong? After all, if we follow their logic, it would seem that both arguably did the right thing: both choose to look after themselves first and realized that betrayal was their best choice, regardless of what the other prisoner did. So they each betrayed the other, and neither of them gained from that move. In fact, both have lost.
My intelligent readers must have realized by now that the result – players following the ‘betrayal’ strategy and paying its price (18, 18) – is also the Nash Equilibrium.
The Nash Equilibrium is a set of strategies whereby no player regrets their chosen strategy and its results – after the fact. (Remember that players have control only over their own decisions.)
That is, if the other player had chosen betrayal, it was right for me to do the exact same. The (18, 18) outcome is the Nash Equilibrium because once both players had chosen the betrayal strategy, if one of them decided at the last minute that he’d rather keep silent, he’d spend 20 instead of 18 years in prison – that is, he would lose and live to regret his move. At the same time, he wouldn’t regret the betrayal strategy – which is the Nash strategy. So the question is not about winning or losing, but about (not) regretting one’s choice once the other player’s choice is known.
Silence, on the other hand, is not a Nash strategy, because if you knew that the other player had chosen silence, you’d be better off betraying him. Doing so, your jail term is eliminated and you gain more than you would have if you’d chosen silence. This example shows that, among other things, the Nash strategy may not be judicious, since you’d end up spending 18 years in jail when you could have served just one. In fact, the Prisoner’s Dilemma contains a conflict between a personal, individual rationale and a collective, group rationale. Each prisoner makes his own best choice, but as a group … they both suffer.
When each player plays his own best choice and takes no care at all for the consequences of his action on other players, the outcome may be catastrophe for all. In many situations egotistical behaviour is not only morally problematic but strategically unwise.
So how do we solve this conundrum?
Here’s one option: Suppose A and B are no ordinary criminals, but members of a tough crime organization. On the day they took the oath, the Godfather warned them: ‘You may have heard a thing or two about the Prisoner’s Dilemma, or even read a scientific paper about it. So I now must tell you: there’ll be no betrayal strategy in our business. If you betray another member of the organization,’ he almost whispered, ‘you’ll keep silent for a very long time … as in: forever. And you’ll not be the only one, because my guys will silence everyone you care about, for good. I do love the sound of silence, you know.’
Given that piece of information, there’s not much of a dilemma left. Both prisoners will take the Fifth, and will even gain from this because they’ll only serve one year. Which means overall that if we narrow down the number of choices, the result is actually better, which conflicts with the popular belief that having more options is always preferable. So when the Godfather orders his imprisoned minions to play deaf and dumb, the result is good for both prisoners … though it’s not as good for the police and for law-abiding citizens.
Another (legalistic) example of an enforceable agreement that could solve the Prisoner’s Dilemma is using a bill of exchange. Bills of exchange are a tool of the business world. Trader A writes a bill that orders his bank to pay Trader B a given sum, but only if the goods that the latter delivers match perfectly the bill of lading that A and B have signed. Thus, Trader A allows a bank to supervise himself and Trader B. Once A deposits the money in the bank, he can no longer cheat (betray) B, because only the bank may decide whether B’s goods match the bill of lading, not A. If, however, Trader B should choose to cheat (betray), he wouldn’t receive a dime; whereas if B goes along with the agreement (keeps silent) and the goods he sends are as A and B had agreed, he’d be paid in full.
In real life (if there’s such a thing), people very often face similar dilemmas, and the results show that in life and in simulation games people do actually tend to betray each other.
Even Giacomo Puccini’s Tosca includes a classic case of the Prisoner’s Dilemma that ends with a two-way betrayal. Scarpia, the evil chief of police, promises Tosca that if she makes love to him, he won’t kill her lover: he’ll only use blank bullets. Tosca agrees to sleep with him. But they both cheat. Tosca stabs Scarpia with a knife instead, while he shoots her lover with real bullets. She commits suicide in the end. What a classic operatic ending! And the music!
In the Prisoner’s Dilemma, and perhaps in Tosca too, it’s clear that even if the players agree not to betray each other (because they are familiar with the dilemma), they might still have a hard time keeping the deal. Suppose that before the two prisoners are each sent to their cell, they agree that theirs is a case of the Prisoner’s Dilemma and they decide never to turn state witness, even if they are made such an offer: they’ll keep their mouths shut and serve their one-year term. Once they are separated, however, each of them alone cannot help wondering whether the other prisoner will indeed keep his word. In this case, the result would be the same: again they realize that betraying is a better option. If A betrays B, A walks; but if both betray each other, they serve only 18 years, not 20. So even if they had a deal, they both cheat.
This may seem totally irrational, with disastrous results. A rational prisoner may reach the conclusion that if the other prisoner thinks the way he does, and if he too understands that 18 years in prison are so much worse than one, he’ll decide to remain silent. Certain Game Theory experts indeed believe that rational players would both shut up. Personally, I don’t understand why. After all, if I were in such a situation, I wouldn’t make an unsafe assumption about the other player’s thoughts and I’d realize that betrayal is my better option. Alhough I hate to admi
t it, I’d betray the other guy, and since he would betray me too we’d be both spending years behind bars, trying to figure out what went wrong.
Does the Prisoner’s Dilemma mean that human cooperation is never possible, or at least not under threat of prison or similar? What is the meaning of this Prisoner’s Dilemma? It would seem that the conclusion can’t be avoided. In this game, and under similar circumstances, people are likely to betray each other. On the other hand, we know that people do cooperate with each other, and not only after a hearty chat with a leading Mafioso. How can we reconcile this apparent contradiction?
When I first wondered about this, I couldn’t find the answer until I remembered how things were in the army and what happened when I started driving. In my years of service, I could always ask people to do me serious favours, and they often responded favourably. I could ask my company members to take my place on some missions, and even went on home leave when it was someone else’s turn. Then I ended the tour of duty and got my driver’s licence, late in life as it was. I remember driving for the first time and reaching a stop sign. I stopped and waited for someone to give way and let me join the traffic, and then … nothing happened! Nada! A million cars drove by and no one even slowed down to let me take my lane. What was that all about? How come people were willing to do big, much bigger things for me, and yet here it was impossible to get others to do this one small thing for me? – to slow down a moment so I could drive on. I couldn’t find the answer until I read about the differences between the Prisoner’s Dilemma as a one-shot game and the iterated version, and then I got it.
We must make a distinction between players who play the Prisoner’s Dilemma once and never meet again, and others who play it repeatedly. The first version will inevitably end in mutual betrayal. The iterated Prisoner’s Dilemma version (that is, a repeated game) is, however, inherently different. When I asked my army friends for a favour, they consciously or subconsciously knew that we’d be playing this game again, and that I’d reciprocate all favours given to me. In repeated games, players expect to be rewarded for letting others ‘win’ occasionally. When someone gives way on the road for me, I don’t have time to stop and write down their licence plate to enable me to return the favour next time we meet on the road. That would be irrational. However, people tend to cooperate when faced with what Robert Axelrod called the ‘shadow of the future’ – when further encounters are expected, as real possibilities, we change the way we think.
A popular experiment, conducted in quite a few executive training workshops, is based on the Prisoner’s Dilemma. Participants are paired up, each player is given, say, $500 and a bunch of cards that read S and B, and they are told they’ll play the following games 50 times with each other. The game is about losing as few dollars as possible, and the rules of the game are meant to hide the fact that this is the Prisoner’s Dilemma in disguise. If both players in a pair choose the S card (and agree to remain silent), $1 will be deducted from their $500 (as in one year in prison); if both choose B (and betray each other), they lose $18; and if one chooses S and the other chooses B, the latter gets to keep his entire $500, while $20 is deducted from the former’s pot. Let me stress this: every pair plays the game 50 times.
Most players understand the rules of the game quite quickly – after all, they are executives – but that rarely helps them. Failing to see the catch, they make the same calculations people do when they play only once and conclude that, regardless of what the other player does, B is their best choice. Yet after they have played and lost $18, and another $18, and $18 more, they realize that this strategy is very wrong, because if they lose $18 times 50, they will not only lose their entire allowance (the original $500) but will also owe $400 to the conductor of the experiment. It’s at this stage, most often around the third round, that we begin seeing attempts at cooperation. Players strategically choose S and hope that their partner will get the hint and do the same, which should help them to keep most of their $500.
I guess Abba Eban, the late Israeli statesman, was right when he said: ‘History teaches us that men and nations behave wisely once they have exhausted all other alternatives.’
In the iterated Prisoner’s Dilemma there’s a catch when we approach round 50. At that stage, I could tell myself that there’s no longer a reason for me to signal that I want cooperation. After all, regardless of the other player’s choice, if I choose betrayal, I lose less money. Yet once you start thinking like that, you might start an endless loop: because I believe that the results of round 50 are inevitable, I realize that there’s no reason for me to cooperate at round 49 and we’ll probably both betray each other, so I choose betrayal then. By that logic the same consideration now applies to round 48 as well! Now we have a new paradox: if both players are so very rational, perhaps they should opt for betrayal right from the start.
Hence, calculating backwards may not be worth it. It only complicates things. This is the so-called ‘surprise quiz paradox’ or the ‘unexpected hanging’ game, and this is how it goes. In the last class on a Friday, the teacher announces there’ll be a surprise quiz next week. The students all go pale, but Joe speaks up: ‘Sir,’ he says, ‘you can’t have a surprise quiz next week.’ ‘Why not?’ the teacher asks. ‘It’s obvious,’ says Joe. ‘The quiz can’t take place next Friday because if there’s no quiz by Thursday, we’ll know it will happen on Friday, and so it wouldn’t be a surprise. The same will happen on Thursday because, if there’s no quiz on Monday, Tuesday, and Wednesday, and we have already ruled out Friday, it must be on Thursday – so now we know and you won’t surprise us, Sir.’
Though the definition of a surprise quiz is not quite clear, and though the teacher was convinced by Joe that a surprise quiz next week would be impossible, he still surprised his students, who trusted too much in Joe’s logic, and gave the quiz on Tuesday.
The same logic applies to the Prisoner’s Dilemma when played a known number of times (when I give workshops, I usually don’t specify the number of rounds in advance) because players will start thinking like Surprise Quiz Joe. That kind of backtracking will only take us to a dead end.
The aforementioned Robert Axelrod is a government science scholar from the University of Michigan, but he studied mathematics as well and gained fame when he took part in computerized Prisoner’s Dilemma games (you can read about them in his 1984 book, The Evolution of Cooperation). He asked various people, all intelligent and wise, to send him clever strategies for the iterated Prisoner’s Dilemma game, defining the rules of the game as follows: if both players are silent, each earns 3 points; if both play traitors, they win a point each; and if they split, the traitor wins 5 points while the strong and silent type ends up with zero. Axelrod declared there would be 200 rounds for each game and asked people to suggest strategies. What did he mean by ‘strategy’?
In iterated Prisoner’s Dilemma games, there are many strategic possibilities. ‘Always silent’ is one of the simplest strategies, but it’s clearly unwise, because the opposite player can easily capitalize on the fact that his betrayal goes unpunished. ‘Always betray’ is a much tougher strategy. All kinds of bizarre alternative strategies may be chosen here: for example, betray or keep silent every other time, or toss a coin and play S or B randomly.
You, my clever reader, should be clear by now that the best strategy would be to make moves in reaction to the opponent’s choices. Indeed, in the first ‘Olympic’ computerized Prisoner’s Dilemma games the winning strategy was described as ‘tit for tat’. It was also the shortest: only four lines in Basic.
That strategy came from Anatol Rapoport (1911–2007), who was born in Russia and worked in the USA. According to this template, you’re supposed to keep silent in the first round – that is, play nice. Then from the second round on, you simply echo your opponent’s previous move: if he or she kept silent in the first round, you choose S in the second. Ask not what you can do to your opponent, but what they did to you first; and then follow suit. The ‘t
it for tat’ strategy earned 500 points on average, which is quite high: remember that if both players choose S, they earn 3 points per round each, which means that 600 points per game is very good indeed. That strategy was rated the highest.
Interestingly, the most complicated strategy, with the longest description, was rated the lowest. The second Olympiad featured a ‘tit for two tats’ approach: if the other player betrays you, you let them atone for their sins, and only if they choose B again do you follow this up with a B of your own. This is even ‘nicer’ than the original tit-for-tat strategy, but it may be too nice for your own good because it scored quite low.
When hearing about tit for tat, people who know nothing about Game Theory usually protest: ‘That’s a great discovery? This is what people normally do.’ After all, the tit-for-tat strategy is not some astonishing Nobel-sized mathematical discovery but an observation on ordinary human behaviour: you be nice to me and I’ll be nice to you; you be unkind to me and I’ll pay you in kind; tit for tat and all that.
Axelrod further discovered that for the tit-for-tat strategy to succeed, the players must follow four rules:
1 Play nice. Never be the first to betray.
2 Always react to betrayals. Blind optimism is not a good idea.
3 Be forgiving. Once the opponent stops betraying, you should do that too.
4 Don’t be envious. There’ll be specific rounds you won’t win, but you’ll succeed overall.
Another interesting version of the Prisoner’s Dilemma game is when it’s played by several players at once, not just two. One of many examples of this multi-player variant is whaling. All the countries whose economies rely heavily on whaling want severe restrictions on whaling to apply to every other country (the silent strategy), while its fishermen may whale to their hearts’ content (the betrayal strategy). The problem here is quite obvious: if all whaling countries go for the B strategy, the result would be disastrous for all of those countries (not to mention that whales might be driven to extinction in the process). This is a case of a multi-player Prisoner’s Dilemma game. The same can apply to foresting, climate change negotiations (the temptation payoff to tear up the agreement and pollute heavily is there, but everyone is better off if all parties agree to cut pollution) or to more prosaic issues such as condominium (block of flats) maintenance fees: To pay or not to pay? That is the question. Of course, every tenant would love it if all the tenants paid their condo fees – all except them, that is. If that happened, all would be well: the gardens would bloom, the lobby would be well lit, the elevators would run smoothly, and yet they have paid nothing! Trouble starts when more and more tenants (all of them, eventually) start thinking that perhaps they too shouldn’t pay the fees, and stop paying. Imagine the elevators and gardens of such condos.