Case-Based Strategies in Computer Poker

Page created by Brenda Farmer
 
CONTINUE READING
1

Case-Based Strategies in Computer Poker

Jonathan Rubin a and Ian Watson a                          prohibitively large. Furthermore, empirical results
a
 Department of Computer Science.                           tend to support the intuition that solving larger
University of Auckland Game AI Group                       models results in better quality strategies1 . How-
E-mail: jrubin01@gmail.com,                                ever, equilibrium finding algorithms are only one
E-mail: ian@cs.auckland.ac.nz                              of many approaches available within the computer
                                                           poker test-bed. Alternative approaches such as im-
The state-of-the-art within Artificial Intelligence has    perfect information game tree search [8] and, more
directly benefited from research conducted within the      recently, Monte-Carlo tree search [36] have also re-
computer poker domain. One such success has been           ceived attention from researchers in order to han-
the advancement of bottom up equilibrium finding al-       dle challenges within the computer poker domain
gorithms via computational game theory. On the other       that cannot be suitably addressed by equilibrium
hand, alternative top down approaches, that attempt
                                                           finding algorithms, such as dynamic adaptation to
to generalise decisions observed within a collection of
                                                           changing game conditions.
data, have not received as much attention. In this work
we employ a top down approach in order to construct           The algorithms mentioned above take a bottom
case-based strategies within three computer poker do-      up approach to constructing sophisticated strate-
mains. Our analysis begins within the simplest vari-       gies within the computer poker domain. While
ation of Texas Hold’em poker, i.e. two-player, limit       the details of each algorithm differ, they roughly
Hold’em. We trace the evolution of our case-based ar-      achieve their goal by enumerating (or sampling)
chitecture and evaluate the effect that modifications      a state space together with its pay-off values in
have on strategy performance. The end result of our        order to identify a distribution over actions that
experimentation is a coherent framework for produc-        achieves the greatest expected value. An alterna-
ing strong case-based strategies based on the observa-
                                                           tive top down procedure attempts to construct so-
tion and generalisation of expert decisions. The lessons
                                                           phisticated strategies by generalising decisions ob-
learned within this domain offer valuable insights, that
we use to apply the framework to the more complicated      served within a collection of data. This lazier top
domains of two-player, no-limit Hold’em and multi-         down approach offers its own set of problems in
player, limit Hold’em. For each domain we present re-      the domain of computer poker. In particular, any
sults obtained from the Annual Computer Poker Com-         top down approach is a slave to its data, so quality
petition, where the best poker agents in the world are     data is a necessity. While massive amounts of data
challenged against each other. We also present results     from online poker sites is available [25], the quality
against human opposition.                                  of the decisions contained within this data is usu-
Keywords: Imperfect Information Games, Game AI,            ally questionable. The imperfect information world
Case-Based Reasoning                                       of the poker domain can often mean that valuable
                                                           information may be missing from this data. More-
                                                           over, the stochastic nature of the poker domain en-
1. Introduction                                            sures that it is not enough to simply rely on out-
                                                           come information in order to determine decision
   The state-of-the-art within Artificial Intelli-         quality.
gence (AI) research has directly benefited from re-           Despite the problems described above, top down
search conducted within the computer poker do-             approaches within the computer poker domain
main. Perhaps its most notable achievement has             have still managed to produce strong strategies
been the advancement of equilibrium finding al-            [4,28]. In fact, empirical evidence from interna-
gorithms via computational game theory. State-
of-the-art equilibrium finding algorithms are now            1 See [38] for a discussion of why this is not always the

able to solve mathematical models that were once           case.

AI Communications 25 (2012) 1948
DOI 10.3233/AIC-2012-0513
ISSN 0921-7126, IOS Press. All rights reserved
2                   Jonathan Rubin and Ian Watson / Case-Based Strategies in Computer Poker

tional computer poker competitions [1] suggest            how our framework deals with these issues. For
that, in a few cases, top down approaches have            each of the three poker sub-domains mentioned
managed to out-perform their bottom up counter-           above we produce strategies that have been ex-
parts. In this work we describe one such top down         tensively evaluated. In particular, we present re-
approach that we have used to construct sophis-           sults from Annual Computer Poker Competitions
ticated strategies within the computer poker do-          for the years 2009 – 2011 and illustrate the per-
main. Our case-based approach can be used to pro-         formance trajectory of our case-based strategies
duce strategies for a range of sub-domains within         against the best available opposition.
the computer poker environment, including both               The remainder of this document proceeds as
limit and no-limit betting structures as well as          follows. Section 2 describes the rules of Texas
two-player and multi-player matches. The case-            Hold’em poker, highlighting the differences be-
based strategies produced by our approach have            tween the different variations available. Section
achieved 1st place finishes for our agent (Sartre) at     3 provides the necessary background and details
the Annual Computer Poker Competition (ACPC)              some related work. Section 4 further recaps the
[1]. The ACPC is the premier computer poker               benefits of the poker domain as a test-bed for arti-
event and the agents submitted typically represent        ficial intelligence research and provides the motiva-
the current state-of-the-art in computer poker re-        tion for the use of case-based strategies as opposed
search.                                                   to alternative algorithms. Section 5 details the ini-
   We have applied and evaluated case-based strate-       tial evolution of our case-based architecture for
gies within the game of Texas Hold’em. Texas              computer poker in the two-player, limit Hold’em
Hold’em is currently the most popular poker varia-        domain. Experimental results are presented and
tion. To achieve strong performance, players must         discussed. Sections 6 and 7 extrapolate the result-
be able to successfully deal with imperfect infor-        ing framework to the more complicated domains
mation, i.e. they cannot see their opponents’ hid-        of two-player, no-limit Hold’em and multi-player
den cards. Also, chance events occur in the do-           limit Hold’em. Once again, results are presented
main via the random distribution of playing cards.        and discussed for each separate domain. Finally,
Texas Hold’em can be played as a two-person game          Section 8 concludes the document.
or a multi-player game. There are multiple varia-
tions on the type of betting structures used that
can dramatically alter the dynamics of the game           2. Texas Hold’em
and hence the strategies that must be employed for
successful play. For instance, a limit game restricts       Here we briefly describe the game of Texas
the size of the bets allowed to predefined values.        Hold’em, highlighting some of the common terms
On the other hand, a no-limit game imposes no             which are used throughout this work. For more de-
such restriction.                                         tailed information on Texas Hold’em consult [33],
   In this work we present case-based strategies in       or for further information on poker in general see
three poker domains. Our analysis begins within           [32].
the simplest variation of Texas Hold’em, i.e. two-          Texas Hold’em can be played either as a two-
player, limit Hold’em. Here we trace the evolution        player game or a multi-player game. When a game
of our case-based architecture and evaluate the ef-       consists only of two players it is often referred to
fect that modifications have on strategy perfor-          as a heads up match. Game play consists of four
mance. The end result of our experimentation in           stages – preflop, flop, turn and river. During each
the two-player, limit Hold’em domain is a coherent        stage a round of betting occurs. The first round
framework for producing strong case-based strate-         of play is the preflop where all players at the ta-
gies, based on the observation and generalisation         ble are dealt two hole cards, which only they can
of expert decisions. The lessons learned within this      see. Before any betting takes place, two forced bets
domain offer valuable insights, which we use to ap-       are contributed to the pot, i.e. the small blind and
ply the framework to the more complicated do-             the big blind. The big blind is typically double
mains of two-player, no-limit Hold’em and multi-          that of the small blind. In a heads up match, the
player, limit Hold’em. We describe the difficulties       dealer acts first preflop. In a multi-player match
that these more complicated domains impose and            the player to the left of the big blind acts first pre-
Jonathan Rubin and Ian Watson / Case-Based Strategies in Computer Poker                    3

flop. In both heads up and multi-player matches,            from the shuffled deck of cards as follows: flop – 3
the dealer is the last to act on the post-flop betting      community cards, turn – 1 community card, river
rounds (i.e. the flop, turn and river). The legal bet-      – 1 community card. All players combine their hole
ting actions are fold, check/call or bet/raise. These       cards with the public community cards to form
possible betting actions are common to all vari-            their best five card poker hand. A showdown oc-
ations of poker and are described in more detail            curs after the river where the remaining players re-
below:                                                      veal their hole cards and the player with the best
                                                            hand wins all the chips in the pot. If both players’
Fold: When a player contributes no further chips            hands are of equal value, the pot is split between
    to the pot and abandons their hand and any              them.
    right to contest the chips that have been
    added to the pot.
Check/Call: When a player commits the minimum               3. Background
    amount of chips possible in order to stay in
    the hand and continues to contest the pot.              3.1. Strategy Types
    A check requires a commitment of zero fur-
    ther chips, whereas a call requires an amount              As mentioned in the introduction, many AI
    greater than zero.                                      researchers working in the computer poker do-
Bet/Raise: When a player commits greater than               main have focused their efforts on creating strong
    the minimum amount of chips necessary to                strategies via bottom up, equilibrium finding algo-
    stay in the hand. When the player could have            rithms. When equilibrium finding algorithms are
    checked, but decides to invest further chips            applied to the computer poker domain, they pro-
    in the pot, this is known as a bet. When the            duce -Nash equilibria. -Nash equilibria are ro-
    player could have called a bet, but decides to          bust, static strategies that limit their exploitability
    invest further chips in the pot, this is known          () against worst-case opponents. A pair of strate-
    as a raise.                                             gies are said to be an -Nash equilibrium if nei-
   In a limit game all bets are in increments of a          ther strategy can gain more than  by deviating.
certain amount. In a no-limit game a player may             In this context, a strategy refers to a probabilistic
bet any amount up to the total value of chips that          distribution over available actions at every deci-
they possess. For example, assuming a player be-            sion point. Two state-of-the-art equilibrium find-
gins a match with 1000 chips, after paying a forced         ing algorithms are Counterfactual Regret Minimi-
small blind of one chip they then have the op-              sation (CFRM) [39,18] and Excessive Gap Tech-
tion to either fold, call one more chip or raise by         nique (EGT) [13]. CFRM is an iterative, regret
contributing anywhere between 3 and 999 extra               minimising algorithm that was developed by the
chips2 . In a standard game of heads-up, no-limit           University of Alberta Computer Poker Research
poker, both players’ chip stacks would fluctuate            Group (CPRG)3 . The EGT algorithm, developed
between hands, e.g. a win from a previous hand              by Andrew Gilpin and Thomas Sandholm from
would ensure that one player had a larger chip              Carnegie Mellon University, is an adapted version
stack to play with on the next hand. In order to            of Nesterov’s excessive gap technique [21], which
reduce the variance that this structure imposes, a          has been specialised for two-player, zero-sum, im-
variation known as Doyle’s Game is played where             perfect information games.
the starting stacks of both players are reset to a             The -Nash equilibrium strategies produced via
specified amount at the beginning of every hand.            CFRM and EGT are solid, unwavering strate-
   Once the round of betting is complete, as long           gies that do not adapt given further observations
as at least two players still remain in the hand,           made by challenging particular opponents. An al-
play continues on to the next stage. Each post-             ternative strategy type is one that attempts to
flop stage involves the drawing of community cards          exploit perceived weaknesses in their opponents’
                                                            strategies, by dynamically adapting their strat-
  2 The minimum raise would involve paying 1 more chip to   egy given further observations. This type of strat-
match the big blind and then committing at least another
2 chips as the minimum legal raise.                           3 http://poker.cs.ualberta.ca/
4                    Jonathan Rubin and Ian Watson / Case-Based Strategies in Computer Poker

egy is known as an exploitive (or maximal) strat-             As poker is a stochastic game that consists of
egy. Exploitive strategies typically select their ac-      chance events, the variance can often be large es-
tions based on information they have observed              pecially between agents that are close in strength.
about their opponent. Therefore, constructing an           This requires many hands to be played in order to
exploitive strategy typically involves the added dif-      arrive at statistically significant conclusions. Due
ficulty of generating accurate opponent models.            to the large variance involved, the ACPC employs
                                                           a duplicate match structure, whereby all players
3.2. Strategy Evaluation and the Annual                    end up playing the same set of hands. For example,
     Computer Poker Competition                            in a two-player match a set of N hands are played.
                                                           This is then followed by dealing the same set of
   Both -Nash equilibrium based strategies and
                                                           N hands a second time, but having both players
exploitive strategies have received attention in the
                                                           switch seats so that they receive the cards their
computer poker literature [14,15,7,8,17]. Overall a
                                                           opponent received previously. As both players are
larger focus has been applied to equilibrium find-
ing approaches. This is especially true regarding          exposed to the same set of hands, this reduces the
agents entered into the Annual Computer Poker              amount of variance involved in the game by en-
Competition. Since 2006, the ACPC has been held            suring one player does not receive a larger pro-
every year at conferences such as AAAI and IJCAI.          portion of higher quality hands than the other. A
The agents submitted to the competition typically          two-player match involves two seat enumerations,
represent the strongest computer poker agents in           whereas a three-player duplicate match involves
the world, for that particular year. Since 2009, the       six seat enumerations to ensure each player is ex-
ACPC has evaluated agents in the following vari-           posed to the same scenario as their opponents. For
ations of Texas Hold’em:                                   three players (ABC) the following seat enumera-
                                                           tions need to take place:
    1. Two-player, Limit Hold’em.
    2. Two-player, No-Limit Hold’em.                                            ABC ACB
    3. Three-player, Limit Hold’em.                                             CAB CBA
  In this work, we restrict our attention to these                              BCA BAC
three sub-domains. Agents are evaluated by play-
ing many hands against each other in a round-
robin tournament structure. The ACPC employs               4. Research Motivation
two winner determination procedures:
                                                              This work describes the use of case-based strate-
1. Total Bankroll. As its name implies the total
                                                           gies in games. Our approach makes use of the Case-
     bankroll winner determination simply records
                                                           based Reasoning (CBR) methodology [26,19]. The
     the overall profit or loss of each agent and
                                                           CBR methodology encodes problems, and their so-
     uses this to rank competitors. In this divi-
                                                           lutions, as cases. CBR attempts to solve new prob-
     sion, agents that are able to achieve larger
     bankrolls are ranked higher than those with           lems or scenarios by locating similar past prob-
     lower profits. This winner determination pro-         lems and re-using or adapting their solutions for
     cedure does not take into account how an              the current situation. Case-based strategies are top
     agent achieves its overall profit or loss, for in-    down strategies, in that they are constructed by
     stance it is possible that the winning agent          processing and analysing a set of training data.
     could win a large amount against one com-             Common game scenarios, together with their play-
     petitor, but lose to all other competitors.           ing decisions are captured as a collection of cases,
2. Bankroll Instant Run-Off. On the other hand,            referred to as the case-base. Each case attempts to
     the instant run-off division uses a recursive         capture important game state information that is
     winner determination algorithm that repeat-           likely to have an impact on the final playing de-
     edly removes the agents that performed the            cision. The training data can be both real-world
     worst against a current pool of players. This         data, e.g. from online poker casinos, or artificially
     way agents that achieve large profits by ex-          generated data, for instance from hand history
     ploiting weak opponents are not favoured, as          logs generated by the ACPC. Case-based strate-
     in the total bankroll division.                       gies attempt to generalise the game playing deci-
Jonathan Rubin and Ian Watson / Case-Based Strategies in Computer Poker                   5

sions recorded within the data via the use of sim-         lows the opportunity to apply an abundance of
ilarity metrics that determine whether two game            strategies ranging from basic concepts to sophisti-
playing scenarios are sufficiently similar to each         cated strategies and counter-strategies. Moreover,
other, such that their decisions can be re-used.           the rules of Texas Hold’em poker are incredibly
   Case-based strategies can be created by training        simple. Contrast this with CBR related research
on data generated from a range of expert players or        into complex environments such as real-time strat-
by isolating the decisions of a single expert player.      egy games [3,20,22,23], which offer similar issues
Where a case-based strategy is produced by train-          to deal with – uncertainty, chance, deception –
ing on and generalising the decisions of a single          but don’t encapsulate this within a simple set of
expert player, we refer to the agent produced as           rules, boundaries and performance metrics. Suc-
an expert imitator. In this way, case-based strate-        cesses and failures achieved by applying case-based
gies can be produced that attempt to imitate dif-          strategies to the game of poker may provide valu-
ferent styles of play simply by training on separate       able insights for CBR researchers using complex
datasets generated by observing the decisions of           strategy games as their domain, where immedi-
expert players, each with their own style. The lazy        ate success is harder to evaluate. Furthermore, it
                                                           is hoped that results may also generalise to do-
learning [2] of case-based reasoning is particularly
                                                           mains outside the range of games altogether to
suited to expert imitation where observations of
                                                           complex real world domains where hidden infor-
expert play can be recorded and stored for use at
                                                           mation, chance and deception are commonplace.
decision time.
                                                              One of the major benefits of using case-based
   Case-based approaches have been applied and
                                                           strategies within the domain of computer poker
evaluated in a variety of gaming environments.             is the simplicity of the approach. Top down case-
CHEBR [24] was a case-based checkers player that           based strategies don’t require the construction
acquired experience by simply playing games of             of massive, complex mathematical models that
checkers in real-time. In the RoboCup soccer do-           some other approaches rely on [13,30,27]. Instead,
main, [11] used case-based reasoning to construct          an autonomous agent can be created simply via
a team of agents that observes and imitates the            the observation of expert play and the encoding
behaviour of other agents. Case-based planning             of observed actions into cases. Below we outline
[16] has been investigated and evaluated in the            some further reasons why case-based strategies
domain of real-time strategy games [3,22,23,34].           are suited to the domain of computer poker and
Case-based tactician (CaT) described in [3] selects        hence worthy of investigation. The reasons listed
tactics based on a state lattice and the outcome of        are loosely based on Sycara’s [35] identification
performing the chosen tactic. The CaT system was           of characteristics of a domain where case-based
shown to successfully learn over time. The Darmok          reasoning is most applicable (these were later ad-
architecture described by [22,23] pieces together          justed by [37]).
fragments of plans in order to produce an over-
                                                              1. A case is easily defined in the domain.
all playing strategy. Performance of the strategies
                                                                 A case is easily identified as a previous sce-
produced by the Darmok architecture were im-                     nario an (expert) player has encountered in
proved by first classifying the situation it found               the past and the action (solution) associated
itself in and having this affect plan retrieval [20].            with that scenario such as whether to fold,
Combining CBR with other AI approaches has also                  call or raise. Each case can also record a final
produced successful results. In [31] transfer learn-             outcome from the hand, i.e. how many chips
ing was investigated in a real time strategy game                a player won or lost.
environment by merging CBR with reinforcement                 2. Expert human poker players compare cur-
learning. Also, [6] combined CBR with reinforce-                 rent problems to past cases.
ment learning to produce an agent that could re-                 It makes sense that poker experts make their
spond rapidly to changes in conditions of a domi-                decisions based on experience. An expert
nation game.                                                     poker player will normally have played many
   The stochastic, imperfect information world of                games and encountered many different sce-
Texas Hold’em poker is used as a test-bed to                     narios; they can then draw on this experience
evaluate and analyse our case-based strategies.                  to determine what action to take for a current
Texas Hold’em offers a rich environment that al-                 problem.
6                     Jonathan Rubin and Ian Watson / Case-Based Strategies in Computer Poker

    3. Cases are available as training data.                Nash equilibrium for the game. In fact, it proves
       While many cases are available to train a            impossible to reasonably store this strategy by to-
       case-based strategy, the quality of their solu-      day’s hardware standards [18]. For these reasons
       tions can vary considerably. The context of          alternative approaches, such as case-based strate-
       the past problem needs to be taken into ac-          gies, can prove useful given their ability for gener-
       count and applied to similar contexts in the         alisation.
       future. As the system gathers more experi-              Over the years we have conducted an exten-
       ence it can also record its own cases, together      sive amount of experimentation on the use of case-
       with their observed outcomes.                        based strategies, using two-player, limit Hold’em
    4. Case comparisons can be done effectively.            as our test-bed. In particular we have investigated
       Cases are compared by determining the sim-           and measured the effect that changes have on areas
       ilarity of their local features. There are many      such as feature and solution representation, simi-
       features that can be chosen to represent a           larity metrics, system training and the use of dif-
       case. Many of the salient features in the poker      ferent decision making policies. Modifications have
       domain (e.g. hand strength) are easily com-          ranged from the very minor, e.g. training on dif-
       parable via standard metrics. Other features,        ferent sets of data to the more dramatic, e.g. the
       such as betting history, require more involved       development of custom betting sequence similar-
       similarity metrics, but are still directly com-      ity metrics. For each modification and addition to
       parable.                                             the architecture we have extensively evaluated the
    5. Solutions can be generalised.                        strategies produced via self-play experiments, as
       For case-based strategies to be successful, the      well as by challenging a range of third-party, arti-
       re-use or adaptation of similar cases’ solu-         ficial agents and human opposition. Due to space
       tions should produce a solution that is (rea-        limitations we restrict our attention to the changes
       sonably) similar to the actual, known solu-          that had the greatest affect on the system architec-
       tion (if one exists) of the target case in ques-     ture and its performance. We have named our sys-
       tion. This underpins one of CBR’s main as-           tem Sartre (Similarity Assessment Reasoning for
       sumptions: that similar cases have similar so-       Texas hold’em via Recall of Experience) and we
       lutions. We present empirical evidence that          trace the evolution of its architecture below.
       suggests the above assumption is reasonable
       in the computer poker domain.                        5.1. Overview

                                                               In order to generalise betting decisions from a
5. Two-Player, Limit Texas Hold’em                          set of (artificial or real-world) training data, first
                                                            it is required to construct and store a collection
   We begin with the application of case-based              of cases. A case’s feature and solution representa-
strategies within the domain of two-player, limit           tion must be decided upon, such as the identifica-
Texas Hold’em. Two-player, limit Hold’em offers             tion of salient attribute-value pairs that describe
a beneficial starting point for the experimenta-            the environment at the time a case was recorded.
tion and evaluation of case-based strategies, within        Each case should attempt to capture important in-
computer poker. Play is limited to two players and          formation about the current environment that is
a restricted betting structure is imposed, whereby          likely to have an impact on the final solution. Af-
all bets and raises are limited to pre-specified            ter a collection of cases has been established, deci-
amounts. The above restrictions limit the size of           sions can be made by searching the case-base and
the state space, compared to Hold’em variations             locating similar scenarios for which solutions have
that allow no-limit betting and multiple oppo-              been recorded in the past. This requires the use of
nents. However, while the size of the domain is re-         local similarity metrics for each feature.
duced, compared to more complex poker domains,                 Given a target case, t, that describes the im-
the two-player limit Hold’em domain is still very           mediate game environment, a source case, s ∈
large. The game tree consists of approximately              S, where S is the entire collection of previously
1018 game states and, given the standards of cur-           recorded cases and a set of features, F , global sim-
rent hardware, it is intractable to derive a true           ilarity is computed by summing each feature’s lo-
Jonathan Rubin and Ian Watson / Case-Based Strategies in Computer Poker                             7

Fig. 1. Overview of the architecture used to produce case-based strategies. The numbers identify the six key areas within the
architecture where the affects of maintenance has been evaluated.

                                                                                           Table 1
cal similarity contribution, simf , and dividing by
                                                                       Preflop and postflop case feature representation.
the total number of features:
                                                                              Preflop              Postflop
                       X simf (tf , sf )                                 1.   Hole Cards           Hand Strength
             G(t, s) =                                 (1)
                             |F |                                        2.   Betting Sequence     Betting Sequence
                         f ∈F
                                                                         3.                        Board Texture
   Fig. 1. provides a pictorial representation of the
architecture we have used to produce case-based
strategies. The six areas that have been labelled in             for each game scenario. Our case-based strategies
Fig. 1. identify six key areas within the architec-              use a simple attribute-value representation to de-
ture where maintenance has had the most impact                   scribe a set of case features. Table 1 lists the fea-
and led to positive affects on system performance.               tures used within our case representation. A sep-
They are:                                                        arate representation is used for preflop and post-
                                                                 flop cases, given the differences between these two
  1.   Feature Representation
                                                                 stages of the game. The features listed in Table 1
  2.   Similarity Metrics
  3.   Solution Representation                                   were chosen by the authors as they concisely cap-
  4.   Case Retrieval                                            ture all the necessary public game information, as
  5.   Solution Re-Use Policies, and                             well as the player’s personal, hidden information.
  6.   System Training                                              Each feature is explained in more detail below:
                                                                 Preflop
5.2. Architecture Evolution
                                                                 1. Hole Cards: the personal hidden cards of the
                                                                     player, represented by 1 out of 169 equivalence
  Here we describe some of the changes that have
                                                                     classes.
taken place within the six key areas of our case-
                                                                 2. Betting Sequence: a sequence of characters that
based architecture, identified above. Where possi-
                                                                     represent the betting actions witnessed until
ble, we provide a comparative evaluation for the
                                                                     the current decision point, where actions can
maintenance performed, in order to measure the
                                                                     be selected from the set, Alimit = {f, c, r}.
impact that changes had on the performance of the
case-based strategies produced.                                  Postflop
5.2.1. Feature Representation                                    1. Hand Strength: a description of the player’s
   The first area of the system architecture that we                 hand strength given a combination of their
discuss is the feature representation used within                    personal cards and the public community
a case (see Fig. 1, Point 1). We highlight results                   cards.
that have influenced changes to the representation               2. Betting Sequence: identical to the preflop se-
over time. In order to construct a case-based strat-                 quence, however with the addition of round
egy a case representation is required that estab-                    delimiters to distinguish betting from previ-
lishes the type of information that will be recorded                 ous rounds, Alimit ∪ {−}.
8                    Jonathan Rubin and Ian Watson / Case-Based Strategies in Computer Poker

3. Board Texture: a description of the public com-              ues to hands with greater potential. Typically
    munity cards that are revealed during the                   in poker, hands with similar strength values,
    postflop rounds                                             but differences in potential, are required to
                                                                be played in strategically different ways [33].
  While the case features themselves have re-
                                                                Once again bucketing is used where the de-
mained relatively unchanged throughout the archi-
                                                                rived E[HS2 ] values are mapped into 1 of 20
tecture’s evolution, the actual values that each fea-
                                                                unique buckets for each postflop round.
ture records has been experimented with to deter-
mine the affect on final performance. For example,            The resulting case-based strategies were eval-
we have compared and evaluated the use of differ-          uated by challenging the computerised opponent
ent metrics for the hand strength feature from Ta-         Fell Omen 2 [10]. Fell Omen 2 is a solid two-player
ble 1. Fig. 2. depicts the result of a comparison be-      limit Hold’em agent that plays an -Nash equilib-
tween three hand strength feature values. In this          rium type strategy. Fell Omen 2 was made pub-
experiment, the feature values for betting sequence        licly available by its creator Ian Fellows and has
and board texture were held constant, while the            become widely used as an agent for strategy evalu-
hand strength value was varied. The values used to         ation [12]. The results depicted in Fig. 2. are mea-
represent hand strength were as follows:                   sured in small bets per hand (sb/h), i.e. where the
                                                           total number of small bets won or lost are divided
CATEGORIES: Uses expert defined categories to              by the total number of hands played. Each data
    classify hand strength. Hands are assigned             point records the outcome of three matches, where
    into categories by mapping a player’s per-             3000 duplicate hands were played. The 95% confi-
    sonal cards and the available board cards              dence intervals for each data point are also shown.
    into one of a number of predefined categories.            Results were recorded for various levels of case-
    Each category represents the type of hand the          base usage to get an idea of how well the system is
    player currently has, together with informa-           able to generalise decisions. The results in Fig. 2.
    tion about the drawing potential of the hand,          show that (when using a full case-base) the use of
    i.e. whether the hand has the ability to im-           E[HS2 ] for the hand strength feature produces the
    prove with future community cards. In total            strongest strategies, followed by the use of CATE-
    284 categories were defined4 .                         GORIES and finally E[HS]. The poor performance
E[HS]: Expected hand strength is a one-dimensional,        of E[HS] is likely due to the fact that this metric
    numerical metric. The E[HS] metric com-                does not fully capture the importance of a hand’s
    putes the probability of winning at showdown           future potential. When only a partial proportion of
    against a random hand. This is given by enu-           the case-base is used it becomes more important
    merating all possible combinations of commu-           for the system to be able to recognise similar at-
    nity cards and determining the proportion of           tribute values in order to make appropriate deci-
    the time the player’s hand wins against the            sions. Both E[HS] and E[HS2 ] are able to gener-
    set of all possible opponent holdings. Given           alise well. However, the results show that decision
    the large variety of values that can be pro-           generalisation begins to break down when using
    duced by the E[HS] metric, bucketing takes             CATEGORIES. This has to do with the similar-
    place where similar values are mapped into             ity metrics used. In particular, the CATEGORIES
    a discrete set of buckets that contain hands           strategy in Fig. 2 is actually a baseline strategy
    of similar strength. Here we use a total of 20         that used overly simplified similarity metrics for
    buckets for each postflop round.                       each of its feature values. Next we discuss the area
E[HS2 ]: The final metric evaluated involves squar-        of similarity assessment within the system archi-
    ing the expected hand strength. Johanson [18]          tecture, which is intimately tied to the particular
    points out that squaring the expected hand             values chosen within the feature representation.
    strength (E[HS2 ]) typically gives better re-
                                                           5.2.2. Similarity Assessment
    sults, as this assigns higher hand strength val-
                                                              For each feature that is used to represent
  4 A listing of all 284 categories can be found at        a case, a corresponding local similarity metric,
the following website: http://www.cs.auckland.ac.nz/       simf (f1 , f2 ), is required that determines how simi-
research/gameai/sartreinfo.html                            lar two feature values, f1 and f2 , are to each other.
Jonathan Rubin and Ian Watson / Case-Based Strategies in Computer Poker                               9

Fig. 2. The performance of three separate case-based strategies produced by altering the value used to represent hand strength.
Results are measured in sb/h and were obtained by challenging Fell Omen 2.

The use of different representations for the hand                  check if possible, otherwise it would call an oppo-
strength feature in Fig. 2. also requires the use                  nent’s bet. This default-policy was selected by the
of separate similarity metrics. The CATEGORIES                     authors as it was believed to be preferable to other
strategy in Fig. 2. employs a trivial all-or-nothing               trivial default policies, such as always-fold, which
similarity metric for each of its features. If the                 would always result in a loss for the system.
value of one feature has the same value of an-                        The other two strategies in Fig. 2. (E[HS] and
other feature, a similarity score of 1 is assigned.                E[HS2 ]) do not use trivial all-or-nothing similar-
On the other hand, if the two feature values dif-                  ity. Instead the hand strength features use a sim-
fer at all, a similarity value of 0 is assigned. This              ilarity metric based on Euclidean distance. Both
was done to get an initial idea of how the sys-                    the E[HS] and E[HS2 ] strategies also employ in-
tem performed using the most basic of similarity                   formed similarity metrics for their betting sequence
retrieval measures. The performance of this base-                  and board texture features, as well. Recall that
line system could then be used to determine how                    the betting sequence feature is represented as a se-
improvements to local similarity metrics affected                  quence of characters that lists the playing deci-
overall performance.                                               sions that have been witnessed so far for the cur-
   The degradation of performance observed in Fig.                 rent hand. This requires the use of a non-trivial
2. for the CATEGORIES strategy (as the propor-                     metric to determine similarity between two non-
tion of case-base usage decreases) is due to the use               identical sequences. Here we developed a custom
of all-or-nothing similarity assessment. The use of                similarity metric that involves the identification of
the overly simplified all-or-nothing similarity met-               stepped levels of similarity, based on the number
ric meant that the system’s ability to retrieve sim-               of bets/raises made by each player. The exact de-
ilar cases could often fail, leaving the system with-              tails of this metric are presented in Section 5.3.2.
out a solution for the current game state. When                    Finally, for completeness, we determine similarity
this occurred a default-policy was relied upon to                  between different board texture classes via the use
provide the system with an action. The default-                    of hand picked similarity values.
policy used by the system was an always-call pol-                     Fig. 2. shows that, compared to the CATE-
icy, whereby the system would first attempt to                     GORIES strategy, the E[HS] and E[HS2 ] strategies
10                    Jonathan Rubin and Ian Watson / Case-Based Strategies in Computer Poker

                                                                                       Table 2
do a much better job of decision generalisation as
                                                            Total cases stored for each playing round using single value
the usable portion of the case-base is reduced. The
                                                            solution representation compared to vector valued solutions
eventual strategies produced do not suffer the dra-
matic performance degradation that occurs with                Round     Total Cases - Single    Total Cases - Vector
the use of all-or-nothing similarity.                         Preflop         201,335                   857
                                                               Flop           300,577                  6,743
5.2.3. Solution Representation                                 Turn           281,529                  35,464
   As well as recording feature values, each case              River          216,597                  52,088
also needs to specify a solution. The most obvious             Total          1,000,038                95,152
solution representation is a single betting action,
a ∈ Alimit . As well as a betting action, the solution
can also record the actual outcome, i.e. the numeri-        to decrease the number of cases required to be
cal result, o ∈
Jonathan Rubin and Ian Watson / Case-Based Strategies in Computer Poker                    11

tion from a single action solution representation           1. Probabilistic The first solution re-use policy
to a vector valued solution representation (as de-              simply selects a betting action probabilisti-
scribed in Section 5.2.3). Initially, a variable value          cally, given the proportions specified within
of k was allowed, whereby the total number of                   the action vector, P (ai ) = ai , for i = 1 . . . n.
similar cases retrieved varied with each search of              Betting decisions that have greater propor-
the case-base. Recall, that a case representation               tions within the vector will be made more of-
that encodes solutions as single actions results in             ten then those with lower proportions. In a
a redundant case-base that contains multiple cases              game-theoretic sense, this policy corresponds
with the exact same feature values. The solution                to a mixed strategy.
of those cases may or may not advocate different            2. Max-frequency Given an action vector A =
playing decisions. Given this representation, a final           (a1 , a2 , . . . , an ), the max-frequency solution
probability vector was required to be created on-               re-use policy selects the action that corre-
the-fly at runtime by retrieving all identical cases            sponds to arg maxi ai , i.e. it selects the ac-
and merging their solutions. Hence, the number of               tion that was made most often and ignores all
retrieved cases, k, could vary between 0 and N .                other actions. In a game-theoretic sense, this
When k > 0, the normalised entries of the proba-                policy corresponds to a pure strategy.
bility vector were used to make a final playing de-         3. Best-Outcome Instead of using the values con-
cision. However, if k = 0, the always-call default-             tained within the action vector, the best-
policy was used.                                                outcome solution re-use policy selects an ac-
   Once the solution representation was updated to              tion, given the values contained within the
record action vectors (instead of single decisions)             outcome vector, O = (o1 , o2 , . . . , on ). The fi-
a variable k value was no longer required. Instead,             nal playing decision is given by the action, ai ,
the algorithm was updated to simply always re-                  that corresponds to arg maxi oi , i.e. the action
trieve the nearest neighbour, i.e. k = 1. Given fur-            that corresponds to the maximum entry in the
ther improvements to the similarity metrics used,               outcome vector.
the use of a default-policy was no longer required
as it was no longer possible to encounter scenarios            Given the three solution re-use policies de-
where no similar cases could be retrieved. Instead,         scribed above, it is desirable to know which policies
the most similar neighbour was always returned,             produce the strongest strategies. Table 3 presents
no matter what the similarity value. This has re-           the results of self-play experiments where the three
sulted in a much more robust system that is actu-           solution re-use policies were challenged against
ally capable of generalising decisions recorded in          each other. A round robin tournament structure
the training data, as opposed to the initial proto-         was used, where each policy challenged every other
type system which offered no ability for graceful           policy. The figures presented are from the row
degradation, given dissimilar case retrieval.               player’s perspective and are in small bets per
                                                            hand. Each match consists of 3 separate dupli-
5.2.5. Solution Re-use Policies                             cate matches of 3000 hands. Hence, in total 18,000
  The fifth area of the architecture that we dis-           hands of poker were played between each competi-
cuss (Fig. 1, Point 5) concerns the choice of               tor. All results are statistically significant with a
a final playing decision via the use of separate            95% level of confidence.
policies, given a retrieved case and its solution.             Table 3 shows that the max-frequency pol-
Consider the probabilistic action vector, A =               icy outperforms its probabilistic and best-outcome
(a1 , a2 , . . . , an ), and a corresponding outcome vec-   counterparts. Of the three, best-outcome fares the
tor, O = (o1 , o2 , . . . , on ). There are various ways    worst, losing all of its matches. The results indicate
to use the information contained in the vectors to          that simply re-using the most commonly made de-
make a final playing decision. We have identified           cision results in better performance than mixing
and empirically evaluated several different policies        from a probability vector and that choosing the
for re-using decision information, which we label           decision that resulted in the best outcome was the
solution re-use policies. Below we outline three so-        worst solution re-use policy. Moreover, these re-
lution re-use policies, which have been used for            sults are representative of further experiments in-
making final decisions by our case-based strategies.        volving other third-party computerised agents.
12                   Jonathan Rubin and Ian Watson / Case-Based Strategies in Computer Poker

                                                         Table 3
                               Results of experiments between solution re-use policies. The
                               values shown are in sb/h with 95% confidence intervals.
                                 Max-frequency      Probabilistic       Best-outcome     Average
               Max-frequency                         0.011 ± 0.005      0.076 ± 0.008    0.044 ± 0.006
               Probabilistic     −0.011 ± 0.005                         0.036 ± 0.009    0.012 ± 0.004
               Best-outcome      −0.076 ± 0.008      −0.036 ± 0.009                      −0.056 ± 0.005

   One of the reasons for the poor performance of               down. For hands that were folded before a show-
best-outcome is likely due to the fact that good                down, this information is lost. It is difficult to train
outcomes don’t necessarily represent good betting               a strategy on data where this information is miss-
decisions and vice-versa. The reason for the suc-               ing. More importantly, any attempt to train a sys-
cess of the max-frequency policy is less obvious. In            tem on only the data where showdowns occurred
our opinion, this has to do with the type of oppo-              would result in biased actions, as the decision to
nent being challenged, i.e. agents that play a static,          fold would never be encountered.
non-exploitive strategy, such as those listed in Ta-               It is for these reasons that our case-based strate-
ble 3, as well as strategies that attempt to approxi-           gies have been trained on data made publicly avail-
mate a Nash equilibrium. As an equilibrium-based                able from the Annual Computer Poker Competi-
strategy does not attempt to exploit any bias in                tion [1]. This data records hand history logs for
its opponent’s strategy, it will only gain when the             all matches played between computerised agents
opponent ends up making a mistake by selecting                  at a particular year’s competition. The data con-
an inappropriate action. The action that was made               tains perfect information for every hand played
most often is unlikely to be an inappropriate ac-               and therefore can easily be used to train an
tion, therefore sticking to this decision avoids any            imitation-based system. Furthermore, the comput-
exploration errors made by choosing other (possi-               erised agents that participate at the ACPC each
bly inappropriate) actions. Moreover, biasing play-             year are expected to improve in playing strength
ing decisions towards this action is likely to go un-           over the years and hence re-training the system
punished when challenging a non-exploitive agent.               on updated data should have a follow on affect on
On the other hand, against an exploitive opponent               performance for any imitation strategies produced
the bias imposed by choosing only one action is                 from the data. Our case-based strategies have typ-
likely to be detrimental to performance in the long             ically selected subsets of data to train on, based
run and therefore it would become more important                on the decisions made by the agents that have per-
to mix up decisions.                                            formed the best in either of the two winner deter-
                                                                mination methods used by the ACPC.
5.2.6. System Training                                             There are both advantages and disadvantages
   How the system is trained is the final key area of           for producing strategies that rely on generalising
the architecture that we discuss, in regard to sys-             decisions from training data. While this provides a
tem maintenance. One of the major benefits of pro-              convenient mechanism for easily upgrading a sys-
ducing case-based strategies via expert imitation,              tem’s play, there is an inherent reliance on the
is that different types of strategies can be produced           quality of the underlying data in order to produce
by simply modifying the data that is used to train              reasonable strategies. Furthermore, it is reasonable
the system. Decisions that were made by an expert               to assume that strategies produced in this way are
player can be extracted from hand history logs and              typically only expected to do as well as the original
used to train a case-based strategy. Experts can be             expert(s) they are trained on.
either human or other artificial agents.
   In order to train a case-based strategy, per-                5.3. A Framework for Producing Case-Based
fect information is required, i.e. the data needs to                 Strategies in Two-Player, Limit Texas
record the hidden card information of the expert                     Hold’em
player. Typically, data collected from online poker
sites only contains this information when the orig-               For the six key areas of our architecture (de-
inal expert played a hand that resulted in a show-              scribed above) maintenance was guided via com-
Jonathan Rubin and Ian Watson / Case-Based Strategies in Computer Poker                              13

                                                            Table 4
A case is made up of three attribute-value pairs, which describe the current state of the game. A solution consists of an action
and outcome triple, which records the average numerical value of applying the action (-∞ refers to an unknown outcome).
                          Attribute               Type                       Example
                          1. Hand Strength      Integer                       1 – 50

                          2. Betting Sequence    String             rc-c, crrc-crrc-cc-, r, ...

                                                                  No-Salient, Flush-Possible,
                          3. Board Texture       Class     Straight-Possible, Flush-Highly-Possible,
                                                                               ...
                          Action                 Triple         (0.0, 0.5, 0.5), (1.0, 0.0, 0.0), ...

                          Outcome                Triple        (-∞, 4.3, 15.6), (-2.0, -∞, -∞), ...

parative evaluation and overall impact on perfor-                      actions that have taken place in the current
mance. The outcome of this intensive, systematic                       round, as well as previous rounds. Characters
maintenance is the establishment of a final frame-                     in the string are selected from the set of al-
work for producing case-based strategies in the do-                    lowable actions, Alimit = {f, c, r}, rounds are
main of two-player, limit Hold’em.                                     delimited by a hyphen.
   Here we present the details of the final frame-                 3. Board Texture: The board texture refers to im-
work we have established for producing case-based                      portant information available, given the com-
strategies. The following sections illustrate the de-                  bination of the publicly available community
tails of our framework by specifying the following                     cards. In total, nine board texture categories
sufficient components:                                                 were selected by the authors. These categories
                                                                       are displayed in Table 5 and are believed
  1. A representation for encoding cases and game
                                                                       to represent salient information that any hu-
     state information
                                                                       man player would notice. Specifically, the cat-
  2. The corresponding similarity metrics required
                                                                       egories focus on whether it is possible that an
     for decision generalisation.
                                                                       opponent has made a flush (five cards of the
5.3.1. Case Representation                                             same suit) or a straight (five cards of sequen-
   Table 4 depicts the final post-flop case repre-                     tial rank), or a combination of both. The cate-
sentation used to capture game state information.                      gories are broken up into possible and highly-
A single case is represented by a collection of                        possible distinctions. A category labelled pos-
attribute-value pairs. Separate case-bases are con-                    sible refers to the situation where the oppo-
structed for the separate rounds of play by pro-                       nent requires two of their personal cards in
cessing a collection of hand histories and recording                   order to make their flush or straight. On the
values for each of the three attributes listed in Ta-                  other hand, a highly-possible category only
ble 4. The attributes have been selected by the au-                    requires the opponent to use one of their per-
thors as they capture all the necessary information                    sonal cards to make their hand, making it
required to make a betting decision. Each of the                       more likely they have a straight or flush.
post-flop attribute-value pairs are now described
in more detail:                                                    5.3.2. Similarity Metrics
                                                                      Each feature requires a corresponding local sim-
1. Hand Strength: The quality of a player’s hand                   ilarity metric in order to generalise decisions con-
    is represented in our framework by calculat-                   tained in a set of data. Here we present the metrics
    ing the E[HS2 ] of the player’s cards and then                 specified by our framework.
    mapping these values into 1 out of 50 evenly
    divided buckets, i.e. uniform bucketing.                       1. Hand Strength: Equation 2 specifies the met-
2. Betting Sequence: The betting sequence is rep-                      ric used to determine similarity between two
    resented as a string. It records all observed                      hand strength buckets (f1 , f2 ).
14                      Jonathan Rubin and Ian Watson / Case-Based Strategies in Computer Poker

                                                                   A B   C           D E    F             G     H      I
                                                                                                                         
                                                               A   1  0   0           0  0   0             0     0     0
                                      |f1 − f2 |
       sim(f1 , f2 ) = max{1 − k ·               , 0} (2)      B  0  1  0.8         0.7 0   0             0     0     0 
                                          T                    C
                                                                                                                         
                                                                  0 0.8 1           0.7 0   0             0     0     0 
    Here, T refers to the total number of buckets              D 0 0.7 0.7          1  0   0             0     0     0 
    that have been defined, where f1 , f2 ∈ [1, T ]            E0   0   0           0  1 0.8            0.7    0    0.6 
                                                                                                                          
    and k is a scalar parameter used to adjust the             F 
                                                                  0  0   0           0 0.8 1             0.7    0    0.5 
                                                                                                                          
    rate at which similarity should decrease.                  G 0  0   0           0 0.7 0.7            1    0.8   0.8 
                                                                                                                          
2. Betting Sequence: To determine similarity be-               H0    0   0           0  0   0            0.8    1    0.8 
    tween two betting sequences we developed                   I   0  0   0           0 0.6 0.5           0.8   0.8    1
    a custom similarity metric that involves the                       Fig. 3. Board texture similarity matrix.
    identification of stepped levels of similarity,
    based on the number of bets/raises made
    by each player. The first level of similarity
    (level0) refers to the situation when one bet-                                     Table 5
    ting sequence exactly matches that of another.                               Board Texture Key
    If the sequences do not exactly match the next               A    No salient
    level of similarity (level1) is evaluated. If two            B    Flush possible
    distinct betting sequences exactly match for                 C    Straight possible
    the active betting round and for all previous                D    Flush possible, straight possible
    betting rounds the total number of bets/raises               E    Straight highly possible
    made by each player are equal then level1 sim-               F    Flush possible, straight highly possible
    ilarity is satisfied and a value of 0.9 is as-               G    Flush highly possible
    signed. Consider the following example where                 H    Flush highly possible, straight possible
    the active betting round is the turn and the                 I    Flush highly possible, straight highly possible
    two betting sequences are:
     1. crrc-crrrrc-cr
     2. rrc-rrrrc-cr                                              equal (the same applies for the flop and the
     Here, level0 is clearly incorrect as the se-                 turn). Therefore, level1 similarity is not sat-
     quences do not match exactly. However, for                   isfied. However, the number of raises encoun-
     the active betting round (cr ) the sequences                 tered for all the previous betting rounds com-
     do match. Furthermore, during the preflop (1.                bined (1. rrc-cc-cc and 2. cc-rc-crc) are the
     crrc and 2. rrc) both players made 1 raise                   same for each player, namely 1 raise by each
     each, albeit in a different order. During the                player. Hence, level2 similarity is satisfied and
     flop (1. crrrrc and 2. rrrrc) both players now               a similarity value of 0.8 would be assigned. Fi-
     make 4 raises each. Given that the number                    nally, if level0, level1 and level2 are not satis-
     of bets/raises in the previous rounds (preflop               fied level3 is reached where a similarity value
     and flop) match, these two betting sequences                 of 0 is assigned.
     would be assigned a similarity value of 0.9.             3. Board Texture: To determine similarity between
     If level1 similarity was not satisfied the next              board texture categories a similarity matrix
     level (level2) would be evaluated. Level2 simi-              was derived. Matrix rows and columns in Fig.
     larity is less strict than level1 similarity as the
                                                                  3. represent the different categories defined in
     previous betting rounds are no longer differen-
                                                                  Table 5. Diagonal entries refer to two sets of
     tiated. Consider the river betting sequences:
                                                                  community cards that map to the same cate-
     1. rrc-cc-cc-rrr                                             gory, in which case similarity is always 1. Non-
     2. cc-rc-crc-rrr                                             diagonal entries refer to similarity values be-
     Once again the sequences for the active round                tween two dissimilar categories. These values
     (rrr ) matches exactly. This time, the num-                  were hand picked by the authors. The matrix
     ber of bets/raises in the preflop round are not              given in Fig. 3. is symmetric.
Jonathan Rubin and Ian Watson / Case-Based Strategies in Computer Poker                 15

5.4. Experimental Results                                  5.4.2. 2010 AAAI Computer Poker Competition
                                                             Following the maintenance experiments pre-
   We now present a series of experimental results         sented in Section 5.2, an updated case-based strat-
collected in the domain of two-player, limit Texas         egy was submitted to the 2010 ACPC, held at
Hold’em. The results presented are obtained from           the Twenty-Forth AAAI Conference on Artificial
annual computer poker competitions and data col-           Intelligence. Our entry, once again named Sartre,
lected by challenging human opposition. For each           used the following architecture snapshot:
evaluated case-based strategy, we provide an ar-
chitecture snapshot that captures the relevant de-            1. Feature Representation
tails of the parameters used for each of the six key               (a) Hand Strength – 50 buckets E[HS2 ]
architecture areas, that were previously discussed.                (b) Betting Sequence – string
5.4.1. 2009 IJCAI Computer Poker Competition                       (c) Board Texture – categories
   We begin with the results of the 2009 ACPC,                2. Similarity Assessment
held at the International Joint Conference on Ar-
tificial Intelligence. Here, we submitted our case-                (a) Hand Strength – Euclidean
based agent, Sartre, for the first time, to challenge              (b) Betting Sequence – custom
other computerised agents submitted from all over                  (c) Board Texture – matrix
the world. The following architecture snapshot de-            3.   Solution Representation – vector
picts the details of the submitted agent:                     4.   Case Retrieval – k = 1
  1. Feature Representation                                   5.   Re-Use Policy – probabilistic
                                                              6.   System Training MANZANA
       (a) Hand Strength – categories
       (b) Betting Sequence – string                          Here a vector valued solution representation was
       (c) Board Texture – categories                      used together with improved similarity assessment.
  2.   Similarity Assessment – all-or-nothing              Given the updated solution representation, a sin-
  3.   Solution Representation – single                    gle nearest neighbour, k = 1, was retrieved via
  4.   Case Retrieval – variable k                         the k-NN algorithm. A probabilistic solution re-use
  5.   Re-Use Policy – max-frequency                       policy was employed and the system was trained
  6.   System Training – Hyperborean-08                    on the decisions of the winner of the 2009 total
                                                           bankroll division. The final results are presented
   The architecture snapshot above represents a            in Table 7. Once again two winner determination
baseline strategy where maintenance had yet to be          divisions are presented and the values are depicted
performed. Each of the entries listed above corre-         in small bets per hand with 95% confidence inter-
sponds to one of the six key architecture areas in-        vals. Given the improvements, Sartre was able to
troduced in Section 5.2. Notice that trivial all-or-       achieve a 6th place finish in the runoff division and
nothing similarity was employed along with a sin-          a 3rd place finish in the total bankroll division.
gle action solution representation, which resulted
in a redundant case-base. The value for system             5.4.3. 2011 AAAI Computer Poker Competition
training refers to the original expert whose deci-           The 2011 ACPC was held at the Twenty-Fifth
sions were used to train the system.                       AAAI Conference on Artificial Intelligence. Our
   The final results are displayed in Table 6. The         entry to the competition is represented by the fol-
competition consisted of two winner determina-             lowing architecture snapshot:
tion methods: bankroll instant run-off and total              1. Feature Representation
bankroll. Each agent played between 75 and 120
duplicate matches against every other agent in or-                 (a) Hand Strength – 50 buckets E[HS2 ]
der to obtain the average values displayed. Each                   (b) Betting Sequence – string
match consisted of 3000 duplicate hands. The val-                  (c) Board Texture – categories
ues presented are the number of small bets per
                                                              2. Similarity Assessment
hand won or lost. Our case-based agent, Sartre,
achieved a 7th place finish in the instant run-off                 (a) Hand Strength – Euclidean
division and a 6th place finish in the total bankroll              (b) Betting Sequence – custom
division.                                                          (c) Board Texture – matrix
You can also read