## Some people just can’t stay out of the questioning room

In class, we discussed the prisoner’s dilemma, in which two players play a single-round game where they can choose whether or not to rat each other out to the police. One such payoff matrix for this game is as follows:

Where cooperate/defect are defined in terms of the players’ treatment of each other. The graphic also shows that, knowing the payoffs of each strategy, each player would choose to defect. However, what would the implications be if the game were not only played once with the two players, but over and over again–with the players remembering how the past games were played? This “iterated prisoner’s dilemma” game allows us to examine strategies across multiple games, as opposed to within just one.

The repeated nature of such a game means a completely cooperative and completely defective strategy leads to very large variance, and leads to the consideration of many different strategies that take past games into account. Some are as follows:

- Tit-for-Tat, in which a player will cooperate first, then act as the opponent acted in the previous round. For example, if the opponent defected first, this player will defect next round.
- Always cooperate, self-explanatory.
- Always defect, self-explanatory.
- Grudger, where a player cooperates until the opponent defects, from which point the player defects to the end of times

These are the more extreme/straightforward strategies a player a take. However, there can be middle ground in determining how one should act. For example, the grudger might instead forgive the opponent after some cooperation, the tit-for-tat might start with a defect before playing the opponent’s previous move or always defect on the last (or last few) move(s), if the number of games is known, etc. Depending on the algorithm used to determine the player’s next move, calculations can become rather complicated. Unsurprisingly, some of these strategies are more prone to failure/exploitation than the others. For example, 100% cooperate strategy loses significantly against an always defect strategy, and two always defecting players will have a far worse time than two always cooperating players. So how do we, without outright empirically testing strategies against one another, determine what strategy will generally perform the best across iterations of the game?

After actually doing said empirical testing, it was determined that the top iterated prisoner’s dilemma strategies shared the following traits:

- Niceness, i.e. players don’t defect before the opponent does (note that two nice players would be cooperative always)
- Retaliation, i.e. players will respond to defection in kind (rather than unconditional nice-ness)
- Forgiving, i.e. players won’t go on a defection rampage and will possible return to being nice after the opponent satisfies some conditions determined by how it players subsequently (so, not grudger)
- Non-envying, i.e. players don’t particularly care about scoring higher than the opponents.

It turns out that the Tit-For-Tat strategy is the simplest one that satisfies all of these traits. It starts nice, and only defects right after the other player defects. It is also forgiving in that if the opponent stops defecting, so will the player. Lastly, it’s clearly non-envying since the strategy is invariant of the players’ current scores. In experimentation, tit-for-tat has in fact been shown to be a tried-and-tested winning(or very close to winning) strategy against most other strategies. There have been many such competitions, and their findings are quite interesting, frequently presenting variants of largely tit-for-tat strategies to be amongst the top, and available to public, such as at:

- http://lesswrong.com/lw/7f2/prisoners_dilemma_tournament_results/
- http://arxiv.org/ftp/cs/papers/0609/0609017.pdf

Personally speaking, in one of my computer science classes in high school, students were asked to program for this iterated game strategies that were later pitted against each other, as in the aforementioned iterated prisoner’s dilemma tournaments. After repeatedly filtering the top-scoring prisoners and pitting those best performers against each other. We found that the tit-for-tat strategy, which was not only straightforward in its logic, but as a result very simple to code, did in fact lead to among the highest scoring prisoners.

Sources:

Some strategies, and proposed variants | http://www.iterated-prisoners-dilemma.net/prisoners-dilemma-strategies.shtml

Obligatory Wikipedia | http://en.wikipedia.org/wiki/Prisoner’s_dilemma#The_iterated_prisoners.27_dilemma

Wikipedia has pretty comprehensive references too | http://en.wikipedia.org/wiki/Prisoner’s_dilemma#References