## An Application of Automata in Game Theory

https://courses.cs.cornell.edu/cs2800/2017sp/handouts/pass_tseng_discmath.pdf (p. 128-130)

A while ago, while reading through supplementary reading for CS 2800 for deterministic finite automatas (DFAs), I came across a connection with DFAs and game theory with the general idea being that we can model certain games using DFAs (in this case, prisoner’s dilemma). Seeing a crossover of two different courses that I am concurrently taking interested me greatly. Hence in this post, I discuss the use of DFAs in the prisoner’s dilemma, a topic that we covered earlier in the semester. Their language and acronyms are a little different than ours so I will use theirs and explain accordingly. We start with a regular 2 player prisoner’s dilemma game. Each player can cooperate (C) or defect (D) with certain utilities based on outcomes (here, utilities mean payoffs from our class).

As we know, if both players play C, they are both fine as they both get out of jail. However, if one player plays C, the other should play D to get maximize payoff. If they both play D however, they are stuck in jail. The Nash equilibrium here is (D,D) as discussed before. Here is where things change. Suppose we play the game 100 times. We now need a robust model describing what happens at the nth round. Thus we can model the total utility of the player as . Here u^i is the utility of a player at round i and 0 <delta < 1 is the discount factor that accounts for inflation and interests over time. Now we can think in terms of stores on the same street: playing C means both stores continue business as normal and playing D means the store would burn the other down. What’s the nash equilibrium here? If one store cooperates all the time, the other store should defect on the 100th round. Knowing this, the first store would defect on the 99th round. Continuing this argument of induction we see that the only NE is again defecting all the time, as expected from our study of the prisoner’s dilemma.

A further question we can ask is, what would happen in real life in this scenario? Can we really expect both stores to burn each other down as soon as possible? Turns out, a common strategy is the “tit-for-tat” strategy which simply means that a player bases their decision on the other player’s decision in the previous round. Thus the player in round n would cooperate if the other player cooperated in round n-1. Now we can change our game theoretic model and introduce DFAs to compute the players’ decisions. The input of the DFA is the decision of the other player in the previous round with more states of the DFA corresponding to higher cost for the players. Then the tit-for-tat game is a DFA with one state s and the transition function f(s,C) = (s,C) and f(s,D) = (s,D). The first argument is the state and the second is the decision. Now if we face a player following the tit-for-tat strategy, the best strategy would be to cooperate until round 99 and defect at round 100. However we know that this would require more states as we have to count to round 100, which would mean a higher cost for us. Thus if we restrict the players to a 1-state DFA then both players would follow tit-for-tat, ie this DFA would be the strict nash equilibrium:

I wanted to add that in a tit-for-tat game, disregarding this strategy and going strictly by the definition of a tit-for-tat (TFT) game where a player responds based on the decision of the other player in the previous round, the DFA would look like this:

This DFA represents the TFT game by definition because the player chooses C if the other player chooses C in the previous round. The player Chooses D if the other player chooses D in the previous round.

All in all, exploring the connection between DFAs and game theory was very interesting because I did not think there was a connection between them when first learning them. However I learned that when we consider the amount of rounds, ie increasing the number of rounds to infinity, automata are a great tool to model long term behavior. This makes me think that we could also combine the topic of markov chains and dynamical systems from linear algebra to study long term behavior of games as well. Overall this reading was great because it further solidified my understanding of game theory and prisoner’s dilemma by having me combine previous knowledge from a different domain to it.