Iterated Prisoner’s Dilemma and Long-Term Strategies
http://www.lifl.fr/IPD/references/from_lifl/alife5/html/graduels.html
In class, Professor Easley mentioned the prisoner’s dilemma, a well-known game where players can cooperate (“deny”) or defect (“confess”). The prisoner’s dilemma is remarkable in that it has a single Nash Equilibrium where both players defect, yet the payout for mutual cooperation is strictly better. In a single game, the safest choice for each player, then, is to defect. But, it may also be the case that the same players play multiple rounds of the same prisoner’s dilemma game with each other. In that case, each player can choose to base their next move on his opponent’s previous moves, and in this way he can create a long-term strategy that applies knowledge about how his opponent tends to move. In the iterated prisoner’s dilemma, each player’s score is the sum of his scores for each round.
A simple strategy, for example, would be to always defect. If the other player always cooperates, then the defector will obviously win in the end. However, in real-life scenarios, it’s unlikely that a player would continue to cooperate if his opponent were to consistently defect, as defecting would offer a better payout to that player, and he would learn quickly to defect in future rounds. On the other hand, two players who always cooperate with each other would earn more points than two players who always defect.
Because iterated prisoner’s dilemma does not have a simple solution, researchers have come up with numerous strategies to maximize a player’s score in the iterated prisoner’s dilemma. They assume that the payout for cooperation is greater than the sum of the payouts when one player defects and the other cooperates, in order to avoid strategies that alternate between cooperation and defection. The above link suggests a strategy, gradual, that appears to offer the best performance.
In this paper, Beaufils et al. test the effectiveness of strategies by simulating a round robin tournament where each of the twelve strategies plays a large number of games with the eleven other strategies. The total score for a strategy is the sum of the scores it receives playing with each of the other strategies. Implementing a large number of strategies ensures that a simple strategy such as one that always defects will ultimately not receive as many points as it could, hence the winning strategy is also the most versatile strategy. Some of the other strategies implemented include:
- Cooperate: cooperates
- Random: cooperates or defects with equal probability
- Tit-for-tat: cooperates, and then plays on each turn the opponent’s previous move. This is a simple strategy that is consistently shown to be one of the most versatile.
The gradual strategy implemented in the article is an extension of tit-for-tat: the player cooperates, and if the opponent defects, he will defect for n turns and then cooperate twice, where n is the number of defections that the opponent has made in total.
The simulation that the article ran returns the following tables:
Table of scores
coop | def | rand | tft | spite | p_nst | p_kn | sft_mj | mist | prob | grad | pav | |
coop | 3000 | 0 | 1481 | 3000 | 3000 | 999 | 2001 | 3000 | 2997 | 6 | 3000 | 3000 |
def | 5000 | 1000 | 3003 | 1004 | 1004 | 2332 | 3668 | 1004 | 1000 | 1008 | 1340 | 3000 |
rand | 4012 | 499 | 2228 | 2250 | 505 | 1667 | 2824 | 1980 | 2240 | 1581 | 940 | 2239 |
tft | 3000 | 999 | 2248 | 3000 | 3000 | 1998 | 2667 | 3000 | 2500 | 2999 | 3000 | 3000 |
spite | 3000 | 999 | 3010 | 3000 | 3000 | 2331 | 3663 | 3000 | 1003 | 1007 | 3000 | 3000 |
p_nst | 4334 | 667 | 2502 | 2003 | 671 | 1666 | 3335 | 671 | 1999 | 2006 | 979 | 3002 |
p_kn | 3666 | 333 | 2024 | 2667 | 343 | 1665 | 2334 | 3666 | 2664 | 2664 | 767 | 2003 |
sft_mj | 3000 | 999 | 2380 | 3000 | 3000 | 2331 | 2001 | 3000 | 2500 | 2999 | 3000 | 3000 |
mist | 3002 | 1000 | 2244 | 2500 | 1003 | 1999 | 2669 | 2500 | 1000 | 3000 | 3001 | 2003 |
prob | 4996 | 998 | 2522 | 2999 | 1002 | 1996 | 2669 | 2999 | 2995 | 1004 | 2999 | 1998 |
grad | 3000 | 915 | 2815 | 3000 | 3000 | 2219 | 3472 | 3000 | 2996 | 2999 | 3000 | 3000 |
pav | 3000 | 500 | 2244 | 3000 | 3000 | 1332 | 2833 | 3000 | 1998 | 2003 | 3000 | 300 |
Final scores
|
These scores support that gradual offers better performance than tit-for-tat, although the authors also concede that more complex strategies, such as when n is replaced by a more complicated function such as n(n+1)/2, may offer better performance than gradual.
An interesting thing to note here is that the defect-oriented strategies scored significantly lower than the “kinder” strategies; in particular, defect scored noticeably lower than cooperate and mistrust (same as tit-for-tat except defects on the first round) scored significantly lower than tit-for-tat. Even though the Nash equilibrium for a single instance of this game suggests that both players defect, when long-term payoffs are to be calculated, an effective strategy is one that is by nature cooperative but that also reacts to defections so as not to be taken advantage of. The tit-for-tat strategy is the most basic example of this. Other, more complex strategies such as gradual begin to describe the sorts of interactions we can expect between humans in day-to-day dealings, where many “games” are played more than once and people have the ability to remember past interactions with others.