Iterated Prisoner’s Dilemma and Long-Term Strategies : Networks Course blog for INFO 2040/CS 2850/Econ 2040/SOC 2090

Iterated Prisoner’s Dilemma and Long-Term Strategies

http://www.lifl.fr/IPD/references/from_lifl/alife5/html/graduels.html

In class, Professor Easley mentioned the prisoner’s dilemma, a well-known game where players can cooperate (“deny”) or defect (“confess”). The prisoner’s dilemma is remarkable in that it has a single Nash Equilibrium where both players defect, yet the payout for mutual cooperation is strictly better. In a single game, the safest choice for each player, then, is to defect. But, it may also be the case that the same players play multiple rounds of the same prisoner’s dilemma game with each other. In that case, each player can choose to base their next move on his opponent’s previous moves, and in this way he can create a long-term strategy that applies knowledge about how his opponent tends to move. In the iterated prisoner’s dilemma, each player’s score is the sum of his scores for each round.

A simple strategy, for example, would be to always defect. If the other player always cooperates, then the defector will obviously win in the end. However, in real-life scenarios, it’s unlikely that a player would continue to cooperate if his opponent were to consistently defect, as defecting would offer a better payout to that player, and he would learn quickly to defect in future rounds. On the other hand, two players who always cooperate with each other would earn more points than two players who always defect.

Because iterated prisoner’s dilemma does not have a simple solution, researchers have come up with numerous strategies to maximize a player’s score in the iterated prisoner’s dilemma. They assume that the payout for cooperation is greater than the sum of the payouts when one player defects and the other cooperates, in order to avoid strategies that alternate between cooperation and defection. The above link suggests a strategy, gradual, that appears to offer the best performance.

In this paper, Beaufils et al. test the effectiveness of strategies by simulating a round robin tournament where each of the twelve strategies plays a large number of games with the eleven other strategies. The total score for a strategy is the sum of the scores it receives playing with each of the other strategies. Implementing a large number of strategies ensures that a simple strategy such as one that always defects will ultimately not receive as many points as it could, hence the winning strategy is also the most versatile strategy. Some of the other strategies implemented include:

Cooperate: cooperates
Random: cooperates or defects with equal probability
Tit-for-tat: cooperates, and then plays on each turn the opponent’s previous move. This is a simple strategy that is consistently shown to be one of the most versatile.

The gradual strategy implemented in the article is an extension of tit-for-tat: the player cooperates, and if the opponent defects, he will defect for n turns and then cooperate twice, where n is the number of defections that the opponent has made in total.

The simulation that the article ran returns the following tables:

Table of scores

	coop	def	rand	tft	spite	p_nst	p_kn	sft_mj	mist	prob	grad	pav
coop	3000	0	1481	3000	3000	999	2001	3000	2997	6	3000	3000
def	5000	1000	3003	1004	1004	2332	3668	1004	1000	1008	1340	3000
rand	4012	499	2228	2250	505	1667	2824	1980	2240	1581	940	2239
tft	3000	999	2248	3000	3000	1998	2667	3000	2500	2999	3000	3000
spite	3000	999	3010	3000	3000	2331	3663	3000	1003	1007	3000	3000
p_nst	4334	667	2502	2003	671	1666	3335	671	1999	2006	979	3002
p_kn	3666	333	2024	2667	343	1665	2334	3666	2664	2664	767	2003
sft_mj	3000	999	2380	3000	3000	2331	2001	3000	2500	2999	3000	3000
mist	3002	1000	2244	2500	1003	1999	2669	2500	1000	3000	3001	2003
prob	4996	998	2522	2999	1002	1996	2669	2999	2995	1004	2999	1998
grad	3000	915	2815	3000	3000	2219	3472	3000	2996	2999	3000	3000
pav	3000	500	2244	3000	3000	1332	2833	3000	1998	2003	3000	300

Final scores

strategy	final score
gradual	33416
tit-for-tat	31411
soft_majo	31210
spite	30013
prober	29177
pavlov	28910
mistrust	25921
cooperate	25484
per_kind	24796
defect	24363
per_nasty	23835
random	22965

These scores support that gradual offers better performance than tit-for-tat, although the authors also concede that more complex strategies, such as when n is replaced by a more complicated function such as n(n+1)/2, may offer better performance than gradual.

An interesting thing to note here is that the defect-oriented strategies scored significantly lower than the “kinder” strategies; in particular, defect scored noticeably lower than cooperate and mistrust (same as tit-for-tat except defects on the first round) scored significantly lower than tit-for-tat. Even though the Nash equilibrium for a single instance of this game suggests that both players defect, when long-term payoffs are to be calculated, an effective strategy is one that is by nature cooperative but that also reacts to defections so as not to be taken advantage of. The tit-for-tat strategy is the most basic example of this. Other, more complex strategies such as gradual begin to describe the sorts of interactions we can expect between humans in day-to-day dealings, where many “games” are played more than once and people have the ability to remember past interactions with others.

September 11, 2015 | category: Uncategorized

Networks

Iterated Prisoner’s Dilemma and Long-Term Strategies

Comments

Leave a Reply

Blogging Calendar

Archives