Skip to main content



Game Theory and Decision Networks: Texas Hold’em Artificial Intelligence

Since the success of IBM’s chess playing supercomputer, Deep Blue, artificial intelligence (AI) has repeatedly shown its proficiency in winning games against human competitors. Poker is one of these games. A major development in AI for Texas Hold’em came with the utilization of a combination of network and game theory. Researchers from Monash University in Australia developed an AI system for Texas Hold’em poker called the Bayesian Poker Program (BPP).

The AI uses a Bayesian decision network, which is essentially a directed network consisting of conditional variables (nodes) linked by conditional dependencies (edges). Each conditional variable has a set of possible values. For example, in the figure below, BPP_Action is the computer’s possible strategy in each round of betting and can take the value of fold, check, call, bet, raise, pay small blind, pay large blind, or pass. The edges are directional and show conditional dependency on other variables. For example, the Winnings depend on the Boolean variable BPP_Win and the OPP_Action (or the opponent’s action) depends on the pot size, the computer’s action, his current hand, the community cards and the current round of betting. The network is dynamic in that each node may take a different value during each round of betting, causing in effect a link to become positive or negative, weak or strong.

Figure 1. A decision network for poker

Because of the complex probabilistic nature of the network, game theory is used extensively in the AI to determine which strategy to employ. In this game, the players are the computer and human opponent, the strategies are the eight betting options and there is a dynamic payoff matrix that changes when the players place bets and new cards are dealt. If the player wins the hand, his payoff is the total amount the other player put in the pot and if a player loses, his payoff is the negative of what he put in the pot. Therefore, after the final card is played, the payoff can be calculated as the pot size minus the portion that player contributed to the pot. However, before the last card is dealt, the BPP player computes the expected value of its payoff based on the probability that its hand will improve. It analyzes the plays made by the opponent in the past to determine the strength of his hand and at the same time, predicts a dominant strategy or best response. Finally, the BPP performs one of the eight actions listed above that it has predicted to be the best response to its opponents play.

Over the course of several hands, it is important that the BPP does not act in a predictable manner that the human opponent can easily counter. This can be accomplished through the use of a mixed strategy. The BPP player uses “betting curves” to randomize its actions, which essentially varies the value of his bets, the frequency of raises and re-raises, the size of bets to which he will fold and the probability that he will bluff. The addition of this mixed strategy is very important to the development of functional AI, because without it, the human player could easily take advantage of the very predictable behavior of the computer.

Reference:

USING BAYESIAN DECISION NETWORKS TO PLAY TEXAS HOLD’EM POKER

Ann E. Nicholson, Kevin B. Korb and Darren Boulton

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.86.5081&rep=rep1&type=pdf

Comments

Leave a Reply

Blogging Calendar

September 2011
M T W T F S S
 1234
567891011
12131415161718
19202122232425
2627282930  

Archives