Skip to main content



Iterated Prisoner’s Dilemma and Long-Term Strategies

http://www.lifl.fr/IPD/references/from_lifl/alife5/html/graduels.html

 

In class, Professor Easley mentioned the prisoner’s dilemma, a well-known game where players can cooperate (“deny”) or defect (“confess”). The prisoner’s dilemma is remarkable in that it has a single Nash Equilibrium where both players defect, yet the payout for mutual cooperation is strictly better. In a single game, the safest choice for each player, then, is to defect. But, it may also be the case that the same players play multiple rounds of the same prisoner’s dilemma game with each other. In that case, each player can choose to base their next move on his opponent’s previous moves, and in this way he can create a long-term strategy that applies knowledge about how his opponent tends to move. In the iterated prisoner’s dilemma, each player’s score is the sum of his scores for each round.

A simple strategy, for example, would be to always defect. If the other player always cooperates, then the defector will obviously win in the end. However, in real-life scenarios, it’s unlikely that a player would continue to cooperate if his opponent were to consistently defect, as defecting would offer a better payout to that player, and he would learn quickly to defect in future rounds. On the other hand, two players who always cooperate with each other would earn more points than two players who always defect.

Because iterated prisoner’s dilemma does not have a simple solution, researchers have come up with numerous strategies to maximize a player’s score in the iterated prisoner’s dilemma. They assume that the payout for cooperation is greater than the sum of the payouts when one player defects and the other cooperates, in order to avoid strategies that alternate between cooperation and defection. The above link suggests a strategy, gradual, that appears to offer the best performance.

In this paper, Beaufils et al. test the effectiveness of strategies by simulating a round robin tournament where each of the twelve strategies plays a large number of games with the eleven other strategies. The total score for a strategy is the sum of the scores it receives playing with each of the other strategies. Implementing a large number of strategies ensures that a simple strategy such as one that always defects will ultimately not receive as many points as it could, hence the winning strategy is also the most versatile strategy. Some of the other strategies implemented include:

 

  • Cooperate: cooperates
  • Random: cooperates or defects with equal probability
  • Tit-for-tat: cooperates, and then plays on each turn the opponent’s previous move. This is a simple strategy that is consistently shown to be one of the most versatile.

 

The gradual strategy implemented in the article is an extension of tit-for-tat: the player cooperates, and if the opponent defects, he will defect for n turns and then cooperate twice, where n is the number of defections that the opponent has made in total.

 

 

The simulation that the article ran returns the following tables:

 

Table of scores

coop def rand tft spite p_nst p_kn sft_mj mist prob grad pav
coop 3000 0 1481 3000 3000 999 2001 3000 2997 6 3000 3000
def 5000 1000 3003 1004 1004 2332 3668 1004 1000 1008 1340 3000
rand 4012 499 2228 2250 505 1667 2824 1980 2240 1581 940 2239
tft 3000 999 2248 3000 3000 1998 2667 3000 2500 2999 3000 3000
spite 3000 999 3010 3000 3000 2331 3663 3000 1003 1007 3000 3000
p_nst 4334 667 2502 2003 671 1666 3335 671 1999 2006 979 3002
p_kn 3666 333 2024 2667 343 1665 2334 3666 2664 2664 767 2003
sft_mj 3000 999 2380 3000 3000 2331 2001 3000 2500 2999 3000 3000
mist 3002 1000 2244 2500 1003 1999 2669 2500 1000 3000 3001 2003
prob 4996 998 2522 2999 1002 1996 2669 2999 2995 1004 2999 1998
grad 3000 915 2815 3000 3000 2219 3472 3000 2996 2999 3000 3000
pav 3000 500 2244 3000 3000 1332 2833 3000 1998 2003 3000 300

 

 

Final scores

strategy final score
gradual 33416
tit-for-tat 31411
soft_majo 31210
spite 30013
prober 29177
pavlov 28910
mistrust 25921
cooperate 25484
per_kind 24796
defect 24363
per_nasty 23835
random 22965

 

These scores support that gradual offers better performance than tit-for-tat, although the authors also concede that more complex strategies, such as when n is replaced by a more complicated function such as n(n+1)/2, may offer better performance than gradual.

An interesting thing to note here is that the defect-oriented strategies scored significantly lower than the “kinder” strategies; in particular, defect scored noticeably lower than cooperate and mistrust (same as tit-for-tat except defects on the first round) scored significantly lower than tit-for-tat. Even though the Nash equilibrium for a single instance of this game suggests that both players defect, when long-term payoffs are to be calculated, an effective strategy is one that is by nature cooperative but that also reacts to defections so as not to be taken advantage of. The tit-for-tat strategy is the most basic example of this. Other, more complex strategies such as gradual begin to describe the sorts of interactions we can expect between humans in day-to-day dealings, where many “games” are played more than once and people have the ability to remember past interactions with others.

Comments

Leave a Reply

Blogging Calendar

September 2015
M T W T F S S
 123456
78910111213
14151617181920
21222324252627
282930  

Archives