An approach of solving early phase multiplayer No-Limit Hold'em poker using empirical Bayesian statistics and vector spaces

Page created by Wesley Russell
 
CONTINUE READING
An approach of solving early phase multiplayer
  No-Limit Hold’em poker using empirical
    Bayesian statistics and vector spaces
       A thesis presented by Damiaan Reijnaers (10804137)
        for the degree of Bachelor of Science in Artificial Intelligence

                                  Credits: 18 EC

                             University of Amsterdam
                                Faculty of Science
                                Science Park 904
                              1098 XH Amsterdam

                                    Supervisor
                                dr. M.A.F. Lewis
                   Institute for Logic, Language and Computation
                                  Faculty of Science
                               University of Amsterdam
                                   Science Park 107
                                 1098 XG Amsterdam

                                March 20th, 2020

                                        1
Abstract
This thesis aims to propose an efficient methodology for the purpose of
representation, and decision-making based on comparison, of early-game
states in No-Limit Texas Hold’em poker. The paper presents a predictive
model in the form of an augmented decision tree based on the distance
between geometrically represented game situations in an euclidean vector
space, wherein opponent models contribute as situational characteristics.
This research identifies a shortcoming in existing work on opponent
modelling, and solves it by introducing a mixed technique based on
Bayesian statistics and beta-binomial regression. An implementation is
suggested and tested. By using the proposed method, it can be concluded
that an artificial agent is able to distinguish between different game
situations and to make (a variety of) decisions based on situational factors.

                                     2
Contents
Abstract                                                                                                                                                                         2

1 Introduction                                                                                                                                                                   5

2 Putting the idea into perspective                                                                                                                                             6

3 Method                                                                                                                                                                         8
  3.1 Proposed methodology . . . . . . . . . . . .      .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    9
      3.1.1 General definitions . . . . . . . . . .      .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    9
      3.1.2 Opponent modelling . . . . . . . . .        .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   10
      3.1.3 Situational space . . . . . . . . . . .     .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   15
      3.1.4 Decision trees . . . . . . . . . . . . .    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   18
  3.2 Suggested implementation . . . . . . . . . .      .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   21
      3.2.1 Dataset . . . . . . . . . . . . . . . .     .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   21
      3.2.2 Population . . . . . . . . . . . . . .      .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   21
      3.2.3 Decisions based on situations . . . .       .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   22
  3.3 Experiments and results . . . . . . . . . . . .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   26
      3.3.1 Weights . . . . . . . . . . . . . . . .     .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   26
      3.3.2 Estimated hole card ranges . . . . . .      .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   26
      3.3.3 Predictions for actions and raise sizing    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   27

4 Conclusion and discussion                                                                                                                                                     28

Bibliography                                                                                                                                                                    31

Appendix A - Relevant rules of Texas Hold’em Poker                                                                                                                              33

Appendix B - Glossary of used poker terms                                                                                                                                       35

Appendix C - Questionnaire concerning quality of experiment results                                                                                                             37

                                                            3
List of Figures
  1    Dependency graph for variables in a poker game . . . . . . . . . . .        .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    9
  2    Visualisation of VPIP- and PFR/VPIP-values for players in dataset .         .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   11
  3    Clusters of player characteristics . . . . . . . . . . . . . . . . . . .    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   13
  4    Relation between number of observations and player metrics . . . .          .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   14
  5    Beta distributions resulting from beta-binomial regression . . . . .        .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   15
  6    Generated situational space for a raise from first position . . . . . .      .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   16
  7    Relation between folds and raising size . . . . . . . . . . . . . . . .     .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   19
  8    Sketch of situation with raise, call and fold to agent on BU . . . . .      .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   20
  9    Sketch of situation with raise from first position, folds to agent on SB     .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   27
  10   Part of decision tree after agent re-raises . . . . . . . . . . . . . . .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   27

List of Tables
  1    Estimations for player metrics using various methods . . . . . . . .        .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   22
  2    Analysis of parameter d in equation 14 . . . . . . . . . . . . . . . .      .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   23
  3    Observations of situations in used dataset . . . . . . . . . . . . . .      .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   23
  4    Estimated weights for raise from first position . . . . . . . . . . . .      .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   26
  5    Estimated hole card range for a player who raised from first position.       .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   27

List of Algorithms
  1    Estimating a raising size for the agent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                                         25

                                                          4
1     Introduction                                          The game’s complexity does not necessarily implicate
                                                            that a computer can never be profitable playing games of
The rise of Deep Blue, solving chess in 1996; and the poker. Since there is a clear distinction between profitable
recent victory of AlphaGo, ‘solving’ the game of Go and non-profitable players, it can be assumed that at least
in 2017—the human player against whom AlphaGo a part of the game is not based on chance. And, however
competed actually won one out of three rounds, which small this part in the distribution between skill and chance
can be called a victory for AlphaGo, but can merely be might be, it is big enough to turn hundreds of online play-
called a definitive ‘solution’—has made developments ers into winners of tens of thousands of US dollars3. Two
towards solving Texas Hold’em poker more relevant. studies involving groups of instructed and non-instructed
Occasionally, programs such as PokerSnowie gain a poker players reinforced the statement of a greater role
considerable amount of attention by making progress of skill than chance (DeDonno and Detterman, 2008).
towards creating an ‘unbeatable poker algorithm1.’ Moreover, although broadly sceptical on the matter,
However, poker is a totally different game, which, unlike Meyer et al. (2013) as well states that “experts seem to
Chess and Go, involves a great deal of chance. It is a game be better able to minimize losses when confronted with
with imperfect and unreliable information, involving disadvantageous conditions.” Nevertheless, it is worth
opponent modelling, risk management and deception – noticing that besides concluding “that the outcomes of
this illustrates the relevance of the game to significant poker games are predominantly determined by chance,”
areas in AI research (Billings et al., 1998).               Meyer et al. nuances this conclusion by stating that their
                                                            conclusion applies “at least to short game sequences,”
Despite the relatively narrow scope of this thesis, thus leaving long game sequences open for discussion.
multiple different disciplines of AI are involved in DeDonno and Detterman filled up this gap by showing
this paper. Moreover, the methods presented in this that skill is the determining factor in long-term outcome.
paper are relevant for broader use than just poker. The This is the same advice professional poker coaches teach
to-be-introduced formulas for beta-binomial regression their students: “it is about focusing on small advantages
in section 3.1.2 are applicable for any problem in which to win in the long run4 , 5.”
the estimation of long-term probabilities plays a role –
an example may be the ‘batting average’ of baseball or In contrast to the work discussed above, this thesis
cricket players. This is the number of a player’s ‘hits’ will not yield an answer to the debate whether poker is
divided by their ‘at bats’2 (Robinson, 2017). Vector a game of chance or skill, but will instead focus on how
spaces (further explained in section 3.1.3) can be used as a computer could flawlessly take over certain aspects
an approach for a wide variety of problems containing a of the game, using the theory of Bayesian statistics and
large or an infinite number of (game) states that can be vector spaces. The aim of this thesis is to answer the
represented by numbers, such as drug repositioning and following question: how can an agent efficiently make use of
stock markets (Manchanda and Anand 2017; Bai et al. the theory of Bayesian statistics and vector spaces to represent
2018, p. 217). For this reason, concepts in this paper are and compare game situations in order to make decisions in
represented both symbolically, using first-order logic, and the early stages of a multi-player No-Limit Texas Hold’em
mathematically, in an attempt to make the concepts more game? In order to adequately come up with an answer to
accessible for other purposes than poker.                   this question, this research question will be subdivided
                                                            into two subquestions: How can opponents be modelled
                                                            by considering a population of players? and How can these
                                                                           opponent models be used to make decisions based on publicly
                                                                           available information?
   1PokerSnowie, Challenge PokerSnowie, https://www.pokersno
wie.com/blog/taxonomy/term/13, Accessed on February 27th, 2020
   2Major League Baseball, What is a Batting Average (AVG)?, http://           3HighstakesDB, Biggest Poker Winners - Top Money Winners in On-
m.mlb.com/glossary/standard-stats/batting-average, Ac-                     line Poker, https://www.highstakesdb.com/poker-players.asp
cessed on February 27th, 2020                                              x?sortby=winners, accessed on February 25, 2020

                                                                       5
The approach presented in this paper is based on a                2      Putting the idea into perspective
couple of personal observations. At first, if for a willing
and competent human, taking part in a relatively small            The significance of poker as a testbed for Artificial
sample of poker hands (a couple of hundred thousand or            Intelligence has led to extensive research in the field,
million) suffice to turn the person into a winning player,          which yielded a wide variety of attempted approaches
a computer can certainly do it with the same number of            towards ‘solving the game.’ These include reinforcement
hands and probably less. Secondly, when playing against           learning (Dahl, 2001) and neural networks (Davidson
an unknown opponent, one uses the knowledge gained                1999; Billings et al. 2002, p. 226-227). The approach
previously by playing against other opponents, which              which is presented in this thesis will solely try to
indicates the use of a ‘population’ – a statement supported       maximize estimated expected value for a decision tree,
by Chen and Ankenman (2006, p. 38, 67).                           using statistics directly derived from past observations
                                                                  (further explained in section 3.1.4) – this highly benefits
It should be noted that it is assumed that the reader             the explainability of choices the agent makes.
is aware of the rules of Texas Hold’em, which is necessary
to understand the essence of the method presented in              According to popular opinion within the commu-
this paper. A brief introduction to the rules of the game         nity of poker players; within their community, poker
relevant for this thesis are outlined in Appendix A.              players are divided into two groups: mathematical players
Throughout the document, conventional poker terms will            (players who base their decisions on calculations) and
be used. A list of these terms and their explanations can         intuitive players (players who base their decisions on a
be found in the glossary in Appendix B. This document             certain ‘gut feeling’). I challenge the distinction between
will often use terms such as ’the agent’ or ’the program,’        these two ‘types’ of players and treat them as equivalent
referring to a hypothetical program implementing the              – mathematical players actively make use of existing
strategy described in this thesis.                                theorems and formulas, while intuitive players apply
                                                                  these same methods subconsciously. Because of the
                                                                  uncertain and complex nature of the game, one has to infer
                                                                  playing characteristics of opponents solely by observing
                                                                  an opponent play. The sample sizes of these observations
                                                                  are often small or non-existent, which makes a Bayesian
                                                                  approach to poker a straightforward choice compared to a
                                                                  frequentist interpretation6 of the game. This reasoning is
                                                                  followed by Chen and Ankenman (2006, p. 38).

                                                                  A known approach to the problem of predicting op-
                                                                  ponents’ hole cards is to use Bayes’ theorem in a like
                                                                  manner (Chen and Ankenman 2006, p. 60-64; Van der
                                                                  Kleij 2010, p. 39; Korb et al. 1999). A corresponding
                                                                      4My Poker Coaching, How to become a professional poker
                                                                  player, https://www.mypokercoaching.com/become-professio
                                                                  nal-poker-player/, accessed on February 4, 2020
                                                                      5Best Poker Coaching, Create Good Habits with a Winning Pregame
                                                                  Routine, https://www.bestpokercoaching.com/create-good-h
                                                                  abits-winning-pregame-routine/, accessed on February 4, 2020
                                                                      6In probability theory, a distinction is often made between frequentist
                                                                  and Bayesian approaches; the former defining probabilities as represent-
                                                                  ing the occurrence of events in long run frequencies, the latter basing a
                                                                  probability on indications of the plausibility of an event. “A Bayesian is
                                                                  one who, vaguely expecting a horse, and catching a glimpse of a donkey,
                                                                  strongly believes he has seen a mule.” - Karl Pearson

                                                              6
approach is taken for the prediction of opponents’ actions                   In addition, when combining multiple variables into a
(Ponsen et al. 2008; Southey et al. 2012; Korb et al.                        Bayesian model, which is generally required when mod-
1999). This thesis introduces an empirical Bayesian                          elling a multi-player No-Limit Hold’em game, another
approach towards the modelling of opponents. The                             problem arises: the huge complexity of the game. A large
proposition made in this paper is uncommon—and proba-                        number of possible game states causes data sets to often
bly unprecedented—in the field of research on poker. The                      be too sparse for reliably representing the needed prior
                                                                                     ,
method allows for an accurate estimation of statistics on                    beliefs9 29. The choice of vector spaces also formulates a
opponents (e.g. a player’s degree of aggression) based on                    solution to this problem, as any previously observed poker
the player population. It demonstrates a bias in existing                    game state can be considered in a weighted comparison
work, such as in the mentioned Ponsen et al. (2008),                         with a newly encountered game state in which the same
and solves it using beta-binomial regression which is                        action sequence occurred. This model will still behave
explained in section 3.1.2. Although Bayesian statistical                    similarly as when a complete Bayesian model would
methods will be extensively used in the most fundamental                     have been considered, albeit still achieving to avoid the
aspects of the opponent models, this paper will differ from                   explained complications related to continuous bet and
the earlier mentioned work by using vector spaces for the                    stack sizes, and being less prone to small samples. This
representation of different game states and the prediction                    will be further discussed at the end of section 3.1.3.
of an opponent’s actions, hole cards and raise sizes. This
choice is inspired by the idea of conceptual spaces.                         Essential for calculating the expected value for branches
                                                                             of the to-be-constructed decision trees, as explained
Conceptual spaces allow for the representation of                            in section 3.1.4, is estimating the relative strength or
concepts in a geometric space, in which the dimensions                       equity of a starting hand. After discounting the two
represent characteristics (or features) of the concept. A                    hole cards initially dealt to the agent, (50
                                                                                                                       5
                                                                                                                          ) = 2, 118, 760
similarity function is used to measure the similarity of                     different board combinations exist. The strength of one
concepts within the space (Gärdenfors, 2004). An imple-                      single starting hand versus another hand can be precisely
mentation based on conceptual spaces is pre-computable                       calculated by concerning all possible run-outs of the
and handles bet sizing and stack sizing in a continuous                      board. The results of these calculations are readily
way—No-Limit Hold’em is often deemed ‘too complex’                           available10. In more thorough problems, such as when
for AIs to solve because of the abundance of possible game                   determining the strength of a starting hand versus a range
                                                         ,
states due to the allowance of bet sizes without limit7 8                    of hands, a hand’s equity can be approximated with
(Johanson, 2013)—whereas different implementations                            Monte Carlo simulations11 and are observed to be fastly
opt for pre-specified static bet sizes through abstractions                   converging12 (Metropolis and Ulam, 1949, p. 335-341).
(Moravčík et al. 2017, p. 3; Brown and Sandholm 2017;                        Many different implementations of hand evaluators are
Brown et al. 2018), including already cited work (Van der                    available on software development platforms13, many of
Kleij, 2010, p. 44). The implementation proposed in this                     which are based on mechanisms invented for efficient
thesis does not pre-specify any kind of bet sizing.                          computing14. Large lookup tables for pre-calculated
                                                                             evaluations, such as for the Two Plus Two evaluator
                                                                             (containing 32,487,834 entries) are freely available15.

   7Pokernews, Artificial Intelligence and Holdem, Part 3: No-Limit                9More formally called the prior probability of an event occurring.
Holdem, The Next Frontier, https://www.pokernews.com/strate                  It expresses one’s ‘belief’ of a random event happening when applying
gy/artificial-intelligence-hold-em-3-23218.htm, Accessed                     Bayesian statistical inference.
on November 15, 2019                                                             10PokerStove, preflop-matchups.txt.gz, http://web.archive.org/
   8Cardschat, Poker Bots Arent Powerful Enough to Solve No Limit Hol-       web/20110612052656/http://www.pokerstove.com/analysis/
dem (Yet), https://www.cardschat.com/news/poker-bots-are                     preflop-matchups.txt.gz, accessed on February 28th, 2020
nt-powerful-enough-solve-no-limit-holdem-yet-52909, Ac-                          11Monte Carlo simulations are a class of random sampling algorithms
cessed on November 15, 2019                                                  to approximate probabilities by considering subsets of a set of events.

                                                                         7
Considering that a hand’s perceived equity depends on                       3     Method
various factors, such as ‘post-flop playability,’ other ap-
proaches have been studied as well. Dalpasso and Lancia                     This section is composed of three subsections. The first
(2015) observe the concept of equity as a combination of                    subsection is concerned with the proposed methodology
features of a (hand on the) flop, such as ‘the flop contains                  and it effectively answers the main research question
one overcard.’ Although based on the limit variant of the                   stated in the introduction: how can an agent efficiently
game, groupings of starting hands have been proposed by,                    make use of the theory of Bayesian statistics and vector spaces
among others, Johanson et al. (2013) and Sklansky and                       to represent and compare game situations in order to make
Malmuth (1999, p. 14-15). The hand groupings stemming                       decisions in the early stages of a multi-player No-Limit Texas
from the latter mentioned work will be used in section                      Hold’em game? In section 3.2, an implementation of the
3.1.3 to ‘learn’ weights for the formerly introduced vector                 proposed methodology is suggested. Section 3.3 proceeds
spaces. As the scope of this thesis pertains only to the ‘pre-              to present the findings of this implementation and serves
flop’ phase of the game, a precise interpretation of equity                  to strengthen the usefulness of further investigating the,
is prefered over an interpretation depending on ‘post-flop                   in section 3.1, proposed methods. In appendix C, as an
play.’ In this thesis, Monte Carlo simulations will be used                 additional feature, a questionnaire and its results regarding
to estimate the equity of a hand versus (possibly multiple)                 the quality of the suggested implementation is attached.
opponents’ ranges of hands.
                                                                            In this thesis, a database of 6,552,060 hands and
                                                                            63,657 different players is consistently used. Each hand
                                                                            is played by six players. The choice of this particular
                                                                            dataset is motivated in section 3.2.1.

                                                                            Only the first stage of the game is considered: the
                                                                            pre-flop betting round. This is the first betting round of
                                                                            the game when no ‘community cards’ have been revealed.
                                                                            For convenience purposes this part of the game will be
                                                                            referred to by using the following terminology: poker
                                                                            game, game, poker hand or hand. All games are assumed
                                                                            to take place in a ‘rake-free environment:’ no commission
                                                                            is withheld, unlike usually the case in poker games.
                                                                            Although briefly explained in appendix B, it is necessary
                                                                            to clarify what is meant by a ‘hole card range.’ At the start
                                                                            of every hand, player are dealt two hidden cards, which
                                                                            are used to form combinations with five community cards
                                                                            visible to all players. These cards are a player’s ‘hole
                                                                            cards.’ A ‘hole card range’ is a group of cards which a
                                                                            player is assumed to hold in a certain situation – instead
                                                                            of predicting an opponent’s exact hole cards, a weighted
                                                                            spectrum of possible holdings is proposed.
                                                                              13GitHub, XPokerEval - A collection of poker hand evaluation source
                                                                            code compiled by James Devlin., https://github.com/tangentfo
                                                                            rks/XPokerEval, accessed on February 28th, 2020
                                                                              14Suffecool, Cactus Kev’s Poker Hand Evaluator, http://suffe.co
                                                                            ol/poker/evaluator.html, accessed on February 28th, 2020
   12Miscellaneous Remarks, Ideas, Trials, Poker 4: Monte Carlo Anal-         15GitHub, HandRanks.dat, https://github.com/christophsc
ysis, http://oscar6echo.blogspot.com/2012/09/poker-4-mon                    hmalhofer/poker/blob/master/XPokerEval/XPokerEval.TwoP
te-carlo-analysis.html, accessed on February 28th, 2020                     lusTwo/HandRanks.dat, accessed on February 28th, 2020

                                                                        8
ω ∈Ω                Ω     Ωβ = {ω ∣ β ∈ Bω }                            big blinds in order to preserve the ability to shift
                                                                          between, and take into account different stakes.
                            [sβ ∣ β = m ∧ om ∈ O]                 • An ordered action sequence Aω of I actions
                                                                    ⟨α1 , . . . , αI ⟩. During a game, this action sequence can
   ζ      γ          sβ     {Υ(oβ ) ∣ oβ ∈ Bω ∧ oβ ∈ O}             be expanded by an action [αi ∈ ϕ (ωh )] performed by
                                                                    a player [pn = {{γ , ζ , η }, β } ∧ pn ∈ Bω ].
                                            ′
                                          B
                                                                       – A poker situation or situation is reffered to by
                                                                         a game’s action sequence. If referring to a
   η      ∈ ϕ (ω )   αi+1       A         O          τ                   similar situation, an identical action sequence
                                                                         is assumed, regardless of the size of x when
           ∀pn ∈ Ψ(ωh )
                                                                         αi = raise x.
Figure 1: Dependency graph for variables in a poker game               – x denotes the value of the total bet including
ω . The dashed line illustrates the dynamics of players                  the amount before the bet’s increment. In this
influencing each other. Dotted lines point to logical ex-                 paper, raise sizes are also denoted as a percent-
pressions for relations.                                                 age for the total bet, with respect to the pot.
                                                                         For example: if the pot contains $0.15 and a
                                                                         player raises to $0.25, this raise is denoted as a
3.1     Proposed methodology                                             166.67%-raise.

This subsection has been further subdivided into four And in general defined are:
parts. The first part introduces a formal explanation of the • A function ϕ (ωh ) returning a set of currently legal
game and deals with general definitions which are used         actions a ⊂ {fold, call, check, raise x} where x is a dis-
in the three subsequent subsections to answer the research    crete amount between νω and γβ .
question. Section 3.1.2 addresses the first subquestion:
How can opponents be modelled by considering a population   • A function ψ (ωh ) returning a one-element set (or the
of players? While the two remaining sections focus on the     empty set) containing the player pn = {{γ , ζ , η }, β }
second subqestion: How can these opponent models be used      who is to generate the next action αi+1 . The ordered
to make decisions based on publicly available information?.   sequence of all acting players is denoted by Ψ(ωh )
                                                              and we denote the i-th acting player pi by Ψi .
3.1.1   General definitions                                        • A set Bω = {β1 . . . βN } ⊆ O of N players and a
                                                                                 ′
                                                                    nested set Bω = {p1 , . . . , pn } = {{γ , ζ , η }} × B where
A set of definitions and variables are introduced below.
                                                                    γβ denotes the player’s current stack size divided
The dependence of these variables is shown as a graphical
                                                                    by νω ; ζβ denotes the relative position on the ta-
model in figure 1.                                                              1   2
                                                                    ble; and ηβ , ηβ ∈ {2, 3, 4, 5, 6, 7, 8, 9, 10, J, Q, K, A}×
A poker game will be denoted by ωh and is defined                                       1       2
                                                                    {♠, ♡, ♣, ♢}, ηβ ≠ ηβ denotes the player’s hole
as a set consisting of the following elements:                      cards.

  • An integer νω denoting the value of the big blind.                 – Since players can get raised, Ψ(ωh ) might con-
                                                                         tain the same β -element multiple times.
        – Players on a later position at the table have ac-            – When referring to a player’s hole cards, a two-
          cess to more information (the actions of earlier               letter notation (such as AK and T9 for variants
          positioned opponents) which influences action                   of A♣ K♠ and 10♣ 9♣ ) is often used, which may in-
          taken on such position (seat).                                 clude an additional o or s (see appendix B under
        – The player’s stack is divided by the number of                 ‘suited’). A ‘T’ refers to a ten ( 10♣ , 10♠ , 10♡ or 10♢ ).

                                                              9
• A set Ω = {ω1 . . . ωH } of H previously com-             3.1.2   Opponent modelling
     pleted games. Each completed poker game ωh =
                                                               One of the unknown variables defined in figure 1 is the
     {Aω , Bω , νω } is inferred by parsing hand histories.
                                                               strategy sβ or playing style of an opponent. In accordance
                                                               with section 3.1.1, it is assumed that a player’s strategy
   • A variable τ representing the factor of time.             depends on the previous games in which the player
                                                               participated. This relation is illustrated in figure 1.
        – A population changes its playing style over time     The agent will emulate and enhance this behaviour by
          since opponents are influenced by each other          additionally taking into account games in which the agent
          and by the increasing availability of outlines on    did not participate.
                                                       ,
          strategy and the mathematics of the game16 17.
                                                               A form of representing poker situations is needed
   • A set O of all possible opponents, which is referred      in order to compare similar game situations. In the next
     to as the population. Every possible opponent is          section, a multi-dimensional space will be introduced in
     considered as having strategy Υ(om ) = sm . Further,      which vectors live whose components consist of values
                       ′
     defined is a set O = {o1 . . . oM ∣ om ∈ Bωh ∧ ωh ∈ Ω}     describing the characteristics of these situations. The
     of all previously encountered opponents.                  described opponent models of each player participating
                                                               in the hand serve as some of the components of these
                                                               vectors. Since only the pre-flop part of the game is
                                                               considered, Equation 1 and Equation 2 are introduced for
                                                               describing sβ . These two metrics will be used solely to
                                                               characterize each player and to identify similar players,
                                                               by comparing these metrics in a geometric space.

                      count({ω ∣ ω ∈ Ω ∧ β ∈ Bω ∧ ∃αi (αi ∈ Aω ∧ αi ∈ {call, raise x} ∧ β ∈ Ψi (ω ))})
         VPIP(β ) =                                                                                    ⋅ 100          (1)
                                             count({ω ∣ ω ∈ Ω ∧ β ∈ Bω })
                         count({ω ∣ ω ∈ Ω ∧ β ∈ Bω ∧ ∃αi (αi ∈ Aω ∧ αi = raise x ∧ β ∈ Ψi (ω ))})
             PFR(β ) =                                                                            ⋅ 100               (2)
                                             count({ω ∣ ω ∈ Ω ∧ β ∈ Bω })

   16Reddit, How has NLHE strategy changed over the last 5 -     17TwoPlusTwo, Will Poker games continue to get tougher?,
10 years?, https://www.reddit.com/r/poker/comments/2ojlig      https://forumserver.twoplustwo.com/32/beginners-quest
/how_has_nlhe_strategy_changed_over_the_last_5_10/, Ac-        ions/will-poker-games-continue-get-tougher-1435373/,
cessed on November 20th, 2019                                  Accessed on November 20th, 2019

                                                              10
Equation 1 formalizes the concept of a VPIP-value – the
percentage of hands a player voluntarily puts money into
a pot when presented with the opportunity of doing so.
                                                                                      80
This value is an indicator of the tightness or looseness
of a player, i.e. how many hands a player likes to
                                                                                      60

                                                                    PFR/VPIP
play. Equation 2 is similar to the preceding equation,
except that in this equation only raises count towards the
percentage. The resulting value is the player’s pre-flop                               40
raise- or PFR-value and indicates the aggressiveness or
passiveness of a player pre-flop.                                                      20

In agreement with commonly held beliefs in the                                             0
poker community, different groups of poker players exist,                                         20        40           60     80
distinguished by their playing characteristics. The plots in                                                     VPIP
figure 2 hint the existence of at least three of the ‘classical’                                (a) Sample of 3,821 players (≥ 1000 hands)
groups of players which are believed to significantly
populate the overall player population: tight-aggressive,
                                                                                     100
tight-passive and loose-passive players. For a minimum
of 1,000 hand observations per player, the Mean Shift
procedure (Fukunaga and Hostetler, 1975) seems to                                     80

affirm this intuition as shown in figure 3a on page 13.
                                                                          PFR/VPIP

Thorndike’s famous Elbow method (Thorndike, 1953) as                                  60

well hints the existence of three clusters when a clustering
range of 1 to 10 is specified. This is visualized together                             40

with respective k-means clustering (with k = 3) in figure
3b and 3c (Lloyd, 1982). These findings confirm the                                     20
usefulness of characterizing players by the two introduced
metrics, as they convey sufficient information to classify
                                                                                       0
a playing style.                                                                                  20        40           60         80      100

                                                                                                                  VPIP
As oftentimes there are too little observations (or                                            (b) Sample of 20,989 players (≥ 100 hands)
no observations at all) on opponents to be encountered,
a frequentist approach to calculating an individual
opponent’s VPIP and PFR values is often inaccurate.               Figure 2: Visualisation of VPIP- and PFR/VPIP-values
Nevertheless, many profitable online poker players use             for players in dataset
software based on frequentist analyses to keep track of
                  , ,
these statistics18 19 20. Although some players opt to

   18BeatingBetting, How Valuable Are Poker HUDs?, https:
//www.beatingbetting.co.uk/poker/best-poker-hud/#How_
Valuable_Are_Poker_HUDs, accessed February 29th, 2020
  19CardsChat, How necessary is a HUD?, https://www.cardscha
t.com/f61/how-necessary-a-hud-234552/, accessed on February
29th, 2020
  20TwoPlusTwo, What % of players use a HUD?, https:
//forumserver.twoplustwo.com/32/beginners-questions
/what-players-use-hud-962957/, accessed on February 29th,
2020

                                                               11
,
tweak their software21 22; the only known software in               of probabilities of a probability (in this case a players’
the field to have directly implemented an alternative                VPIP or PFR/VPIP value) as shown in equation 3 to 6
acknowledges the shortcomings of a frequentist approach,            (Raïffa and Schlaifer 1961; Diaconis and Ylvisaker 1979,
but does not implement a full solution such as described            p. 274). Here, s denotes the number of successes (e.g.
below23. To solve the problem of inaccurate statistics,             a player voluntarily putting money into a pot), while
players using this kind of software generally wait until             f = n − s denotes the number of failures where n is the
                                               , ,
their statistics on players start to converge24 25 26, missing      total number of observations on a player. As P(θ ) is
valuable information in the meantime.                               beta distributed, and P(X∣θ ) is binomially distributed,
                                                                    P(θ ∣X) is also beta distributed. By integrating the prod-
Instead, in this paper these metrics are proposed to uct of their probability mass and density functions, the
be estimated using the population as prior knowledge. By probability density function for another beta distribution,
replacing the introduced PFR-metric with a derived met- with parameters α + s and β + f , is derived. This is the
ric abbreviated as PFR/VPIP and defined as VPFR         PIP
                                                            ⋅ 100, player-specific distribution based on observations on the
both metrics can be interpreted as probabilities27. Since analysed player. Note that α , β and σ in this section refer
the population can be modelled as a distribution of these to parameters of distributions rather than the definitions
player characteristics, the population essentially becomes in section 3.1.1.
a distribution of probabilities. As the beta distribution28
is a continuous probability distribution defined on
                                                                                               population
[0, 1], this distribution is a probability distribution of player specific                       Ì ÏÍ Î
probabilities. This makes it perfectly suitable as the                 Ì ÏÍ Î      P(X∣θ ) ⋅ P(θ )               P(X∣θ ) ⋅ P(θ )
Bayesian prior29. Since the player characteristics are                 P(θ ∣X) =                          =
                                                                                          P(X)               ∫ P(X∣θ ) ⋅ P(θ )d θ
probabilities themselves, the Bayesian likelihood is
Bernoulli distributed30. The theory of conjugate priors
                                                                                                                                    (3)
can be used to construct a player’s own Beta distribution
                                                                                                              α −1        β −1
                                                                                             s           f θ       (1−θ )
                                                                                       (ns)θ (1 − θ ) ⋅          B(α ,β )
   21TwoPlusTwo, Help - Using PT4 to analyse population tendencies,             =                                                   (4)
                                                                                                           θ α −1 (1−θ )β −1
https://forumserver.twoplustwo.com/185/heads-up-sng-s                              ∫ ((ns)θ s (1 − θ ) f ⋅      B(α ,β )
                                                                                                                              ) d θ
pin-gos/help-using-pt4-analyse-population-tendencie
s-1208561/, accessed on March 1st, 2020
   22TwoPlusTwo, , https://forumserver.twoplustwo.com/15/
                                                                                                                    s+α −1          f +β −1
poker-theory/wilson-score-interval-1023431/, accessed on                                                        θ         (1−θ )
March 1st, 2020
                                                                                                         (ns)            B(α ,β )
   23Poker Copilot, User Guide – Statistics or probabili-                                       =                                                          (5)
                                                                                                         (ns)θ s+α −1 (1−θ ) f +β −1
ties?, https://pokercopilot.com/userguide/6/en/topic/sta                                            ∫(               B(α ,β )
                                                                                                                                         ) dθ
tistics-or-probabilities, accessed on February 29th, 2020
   24Smart Poker Study, HUD Reliability: Number of Hands and Sam-
ple Sizes, https://www.smartpokerstudy.com/hud-reliability
                                                                                                        s+α −1               f +β −1
-number-of-hands-and-sample-sizes-226, accessed on March                                            θ        (1 − θ )
1st, 2020                                                                                       =                                        = Beta(α + s, β + f )
   25PokerStars School, HUD Stats             Youre Doing it Wrong!,                                     B(s + α , f + β )
https://www.pokerstarsschool.com/strategies/hud-sta                                                                                                        (6)
ts-doing-it-wrong/668/, accessed on March 1st, 2020
   26TwoPlusTwo, How Do I Use HUD Stats With Small Samples?,
https://forumserver.twoplustwo.com/32/beginners-quest                                 29Bayes’ theorem defines a probability by taking prior knowledge
ions/how-do-i-use-hud-stats-small-samples-1551067/,                               into account. A probability of an event A given evidence B is given by:
                                                                                               P(B∣A)P(A)
accessed on March 1st, 2020                                                       P(A ∣ B) = P(B) where P(B ∣ A) conveys the likelihood of evidence
   27If we wouldn’t have divided the value for a player’s PFR by their            B given that A indeed occurred, and P(A) conveys the prior probability
VPIP-value, then their PFR-value would always been capped at their                of A occurring at all.
VPIP-value.                                                                           30The probability density function of a Bernoulli distribution is
   28The probability density function of a Beta distribution is f (x; α , β ) =                k
                                                                                   f (k; p) = p (1 − p)
                                                                                                        1−k
                                                                                                            where k ∈ {0, 1} and 0 ≤ p ≤ 1. Here, p is
            α −1        β −1
   1
 B(α ,β )
          x      (1 − x) , where 0 ≤ x ≤ 1 and α , β > 0.                         the parameter of the Bernoulli distribution.

                                                                              12
A subsequent problem arises: the majority of players in
                                                                          a population are often players on which there are very
                                                                          few observations. The dataset used in this paper consists
           80
                          TAg
                          TAg                                             for 67,02% of players with less than 100 observations.
                                                                          Attempting to improve the level of certainty of a dataset
           60
PFR/VPIP

                                                                          by leaving out all players with a number of observations
                                                                          below a certain threshold, results in a bias. As pointed out
           40                                                             in section 2 of this thesis, some well-received papers on
                                   L/TPs                                  this subject still incorrectly implement this process. The
                                   L/TPs
           20                                                             reason for the biased Bayesian prior is straightforward:
                                                            LPs
                                                            LPs           professional players can be assumed to play differently
            0                                                             than ‘recreational players,’ but professional players also
                     20       40               60           80            play more often and thus have a higher chance of ending up
                                        VPIP
                                                                          (more frequently) in the dataset of observations on players.
                (a) Mean Shift clustering (3 clusters detected)
                                                                          By letting n denote the total number of played hands for
                                                                          a player in the dataset, I proceed to verify the statement
                                    k = 3, score = 439298.53
           1750000
                                                                          above by calculating the mean VPIP and PFR/VPIP for all
           1500000                                                        players for every possible n. So, the means for n = 55 are
           1250000
                                                                          the averaged metrics of every player who participated in a
           1000000
                                                                          total of 55 hands. To indicate convergence of a metric, an
                                                                          uncertainty value based on the Wilson Score Interval31
            750000
                                                                          is introduced and shown in equation 7 (Wilson, 1927).
            500000
                                                                          This uncertainty value is plotted against a logarithmic
            250000                                                        scale of n in figure 4 on page 14. By performing both
                     1    2   3     4      5        6   7        8   9    weighted and unweighted linear regression, the figure
                                          k                               illustrates a correlation between the VPIP and PFR/VPIP
             (b) Distortion score elbow for K-means clustering            metrics and the number of observations on players. These
                                                                          graphs comply with a generally known idea among the
                                                                          poker player community that recreational players (or fish
                                                                          in poker jargon) often play more hands and show less
           80                                                                           ,
                          TAg
                          TAg                                             aggression32 33.

           60
PFR/VPIP

                              T/LAg
                              T/LAg                                                                                         √
                                                                                                                                            z2
                                                                                                                 2              p̂⋅(1− p̂)+ 4⋅n
           40                                                                                            p̂ +   z
                                                                                                                2⋅n
                                                                                                                      −z⋅             n
                                                                              Uncertainty = 2 ⋅ ( p̂ −                          2
                                                                                                                                                  ) (7)
                                                                                                                       1 + zn
           20                       T/LPs
                                    T/LPs
                                                                             31The Wilson Score Interval is a method for showing binomial pro-
            0                                                             portion confidence intervals.
                     20       40               60           80               32BlackRain79, What Does VPIP Mean? An Extremely Sim-
                                        VPIP                              ple Explanation, https://www.blackrain79.com/2016/07/what
                                                                          -is-vpip.html, accessed on December 7th, 2019
                 (c) K-means clustering of Figure 2a (k = 3)
                                                                             33PokerVIP, Tips for Identifying the Recreational Players,
                                                                          https://www.pokervip.com/strategy-articles/texas-h
                                                                          old-em-no-limit-beginner/tips-for-identifying-the-rec
                Figure 3: Clusters of player characteristics              reational-players, accessed on December 7th, 2019

                                                                         13
In equation 3 to 6, a new beta distribution is introduced
of which the parameters depend not only on observations
on the population as a whole, but also on player-specific        1.0             Uncertainty = 2( p̂-Wilson(z = 0.95, n, p̂ = 0.257))

observations. As information on specific players gradu-                          Mean VPIP for players with n hands
                                                                                Trend (unweighted linear regressor)
ally accumulates, information on the population becomes         0.8
                                                                                Trend (weighted by number of players)

                                                                              VPIP as probability
less relevant for the estimation of player-specific statistics.
The more we observe a player’s play, the more we look           0.6
at that player’s playing history instead of to that of the
population. The less information we have on a player,
                                                                0.4
the more we use the population to fill the gap. If we
totally lack information on an opponent—if we have never
observed that particular opponent—the population is the         0.2

only data we look at. In order to deal with the problem
related to uncertainty, as explained above and illustrated      0.0

in figure 4, we let the original parameters (α , β ) depend            0          2                4               6                8

on ln(n). This way, a different pair of (α , β ) can be                     ln(n) for 0 < n ≤ 5000 (3,821 players)
obtained for every possible n. Now this pair of parameters (a) Average VPIP values among players seem to decrease as cer-
(α , β ) (which now only depends on the number of hands tainty increases. Value p̂ = 0.257 is chosen as it is the weighted
played) can be updated in accordance with equation 6 as average VPIP value over the whole sample.
player-specific observations come through.

In equation 8, µ is defined as the inverse logit                                                                       Uncertainty = 2 ( p̂-Wilson(z = 0.95, n, p̂ = 0.720))
function (or expit function)34 of a linear function of ln(n).                                                         Mean PFR/VPIP for players with n hands
                                                                                                        0.8
The parameters α and β can now be defined with regard                                                                  Trend (unweighted linear regressor)
                                                                              PFR/VPIP as probability

                                                                                                                      Trend (weighted by number of players)
to a mean and dispersion value (Pananos, 2020). By
defining σ as the dispersion value, α and β are written                                                  0.6

in terms of µ and σ in equation 9. By considering the
probability density function of the beta distribution; a,                                               0.4
b and σ can be estimated using maximum likelihood
estimation35 (Robinson, 2017, p. 56-64). The effect is
illustrated in figure 5 on page 15, which plots the beta                                                 0.2

distributions for four subsequent exponents of n. As
expected, the distribution shifts towards a lower value for                                             0.0

a player’s VPIP as the number of played hands increases.                                                      0        2               4                6                8
Finally, the expected value for a metric estimated using                                                          ln(n) for 0 < n ≤ 5000 (3,821 players)
this method is given in equation 10.
                                                                          (b) Average PFR/VPIP values among players seem to increase as
                                                                          certainty increases. This is partly due to decreased VPIP values
                                       1                                  as shown in subfigure a. Value p̂ = 0.720 is chosen as it is the
                        µ=                                         (8)    weighted average PFR/VPIP value over the whole sample.
                              1 + e−(a+b⋅ln(n))
                       ⇒ α = µ ⋅ σ , β = (1 − µ ) ⋅ σ (9)
                           α
       E[Beta(α , β )] =                              (10) Figure 4: Relation between number of observations and
                         α +β                              player metrics
   34The logit function maps probabilities (thus, values ranging from 0
to 1) to a value ranging to infinity. The inverse function, which we use
in this thesis, does the opposite: it maps any value to a probability.

                                                                         14
3.0
                                                                              The idea of a ‘situational space’ is introduced, in
                                 0
                               10     hands                                   which a player’s expected value for the VPIP and
                                  1
                   2.5
                               10     hands                                   PFR/VPIP metrics are merely used as ‘input values’
                                  2
                               10     hands                                   for constructing this geometric space. For every poker
                                  3
                               10     hands
                   2.0
                                                                              situation, a multi-dimensional space is generated in which
                                                                              vectors live whose components represent ‘characteristics’
 p(V PIP∣α , β )

                   1.5
                                                                              of the situation. Each vector, directing to a point in
                                                                              this space, refers to an identical historical situation (as
                   1.0
                                                                              explained in section 3.1.1, a poker situation is described
                                                                              to be identical if the same action sequence occurred)
                                                                              drawn from the dataset. Every player taking part in the
                   0.5
                                                                              situation (in other words: players who did not fold at
                                                                              the point of evaluation) produces a part of the attributes
                   0.0
                         0.0      0.2         0.4           0.6   0.8   1.0   of the situation, captured by the components of the
                                                    V PIP                     vectors living in the corresponding situational space. The
                                                                              following values are described for each active player:
Figure 5: Non-player-specific Beta distributions resulting
from performing beta-binomial linear regression on the
                                                                               1. The player’s expected VPIP-value, estimated by the
variable of the number of hands played by a player.
                                                                                  method presented in section 3.1.2 and equation 10;
                                                                               2. the player’s expected PFR/VPIP-value, estimated by
                                                                                  the method presented in section 3.1.2 and equation
3.1.3                    Situational space
                                                                                  10;
The, in the previous section, introduced VPIP and                              3. the player’s stack size at the beginning of the hand;
PFR/VPIP values can be directly used to set up a strategy                      4. whether the player was all-in (either 0 or 1);
against opponents by assuming a player’s hole card range                       5. the size of the raise made by the player (if applicable)
to consist of the top portion of hands sampled from                               as a percentage with respect to the pot size;
the distribution of all starting hands, where the player’s                     6. the size of a second raise (if applicable) as a percent-
metrics determines the size of this portion36. As it is                           age with respect to the pot size.
not impossible for a player to hold a hand outside of this
implied range (for example, players can have a liking for                   Each vector has an additional corresponding ‘label.’ This
a certain type of hand), I opt to diverge from this idea.                   label captures the hole cards of the analysed player (if
More importantly, players can play different cards in                        known); the action of the analysed player; and the size
different ways – a player could, for example, decide on a                    of the raise made by the analysed player (if applicable).
more conservative play while holding a weaker hand (e.g.                    An example could be the situation in which the player
η = 9♢ 8♢ ), and bet a lower amount compared to when the                    on the first position raised. If focussing on the player
same player would have been dealt a stronger hand (e.g.                     directly due to act after the raising player, the constructed
                                                                                                                  25
 ♠ ♣ ). This kind of behaviour among opponents can be
K K                                                                         situational space would span R since six players
profitably exploited by taking a more flexible approach,                      participate of which one specifies a raise size (in other
such as the method I propose in this thesis.                                words, a 25-dimensional space would be generated).
                                                                            The similarity of these situations can be measured by
   35In maximum likelihood estimation (or MLE) the most likely param- concerning the distance between the points defined by
eters for a probability distribution are estimated by either differentiating these vectors. Building on the previous example, if facing
or using iterative methods of trying different parameters.
   36The Pokerbank, How to use VPIP in poker, https://www.thep
                                                                            a raise from the first position and analysing the player in
okerbank.com/articles/software/vpip/, accessed on March 8th, the second position, all identical historical situations in
2020                                                                        the dataset are used to generate vectors with labels that

                                                                          15
convey information about the successive action taken by                     If a player is due to act after the agent, the labels can
the player in the second position. If we already know that                  be used to predict a future action by the opponent.
the player in the second position took a certain action (e.g.               Elaborating on the previous example, if the agent would
called the bet raised by the player in the first position), we               be seated in the first position and would have yet to decide
can consider all vectors with a label corresponding to that                 on which action to take, these situational spaces can be
taken action (in this example we would only draw vectors                    constructed as if the agent would have raised, in order to
reflecting historical situations in which the player in the                  determine the (1) likelihood of the player in the second
second position called after the player in the first position                position calling the raise, and (2) the estimated hole card
raised) and suggest a hole card range for the currently                     range with which this player would call. As will be
encountered opponent. Since the vectors geometrically                       explained in section 3.1.4, in order to make a decision
encompass the characteristics of historical situations with                 based on predictions for every participating opponent’s
the same action sequence, the closer two points are in                      actions, raise sizes and hole card ranges, the agent will
space, the more similar the situations they represent.                      generate situational spaces for all actions for every likely
Weight is given to the actually held hole cards with                        action sequence.
respect to the distance between the point representing the
historical situation in which these hole cards are known to                 Considering not all of a situation’s characteristics
be held, and the point representing the current situation in                are of equal importance, the weights for each individual
which the opponent’s hole cards are still unknown. This                     vector component (corresponding to a dimension in the
example is illustrated in figure 6 below.                                    space) should be estimated. For example, the player facing
                                                                            a raise from the first position is probably more interested
                                                                            in the VPIP-value of the raiser, than in the VPIP-value
                                                                            of the person to act next. These weights cannot be
     raise size (%/pot)
                               ⎡ ˆ ⎤                                        obtained by solving linear systems of equations as we
                               ⎢
                               ⎢.85 ⎥
                                    ⎥
                               ⎢
                               ⎢    ⎥
                                    ⎥} part of vector in R25                lack knowledge of any distance (or similarity) between
                               ⎢ ⎥
                               ⎢.61 ⎥
                               ⎢
                               ⎢    ⎥                                       two given situations. In order to solve this problem, two
                               ⎣.46⎥
                                we  ⎦                                       ‘substitutes’ for distance values are introduced. These
                                  igh
                                     ted                                    serve as ‘outcome values,’ with which the similarity
        ⎡
        ⎢
          ˆ ⎤
         .82 ⎥
                 A K                       Mi                      ⎡ ˆ ⎤
        ⎢
        ⎢    ⎥
             ⎥
                 ♠ ♡                          nk                   ⎢
                                                                   ⎢.18 ⎥
                                                                        ⎥   (or distance) between two situations can be roughly
        ⎢
        ⎢.31 ⎥
             ⎥
                                                ow
                                                  ski              ⎢
                                                                   ⎢    ⎥
                                                                        ⎥
        ⎢
        ⎢    ⎥
             ⎥                                          dis        ⎢
                                                                   ⎢.74 ⎥
                                                                        ⎥   approximated:
        ⎣.31⎥
        ⎢                                                          ⎢
                                                                   ⎢    ⎥
             ⎦                                                     ⎣.09⎥
                                                           tan
                                                              ce        ⎦      • Historical situations where the hole cards of the
                                                                   8 7
                                                                   ♡ ♣           player for which the situational space is analysed are
                                                                                 known are grouped together, according to Sklansky’s
VPIPraiser                  ⎡ ˆ ⎤ K♣ 6♢        PFR/VPIPraiser                    hand groups. These hand groups, which have been
                            ⎢
                            ⎢.48 ⎥
                                 ⎥
                            ⎢
                            ⎢.50⎥⎥                                               shortly mentioned in section 1, embody nine ranges
                            ⎢
                            ⎢    ⎥
                                 ⎥
                            ⎢
                            ⎢    ⎥
                            ⎣ ⎥
                             .03 ⎦                                               of hands in decreasing order with respect to hand
                                                                                 strength. These groups are used when estimating
Figure 6: Situational space for a faced raise from first                          weights for the purpose of determining hole cards
position. The hole card range for the faced raiser is ap-                        and raise sizes. These groups are divided as follows:
proximated by using the weighted Minkowski distance for                          (Sklansky and Malmuth, 1999, p. 14-15)
                                    st nd       th                                  – Group 1: AA, AKs, KK, QQ, JJ
comparing with past events. The 1 , 2 and 5 compo-
nents of the normalized vectors are shown. For illustrative                         – Group 2: AKo, AQs, AJs, KQs, TT
purposes, only 4 manually selected vectors out of 536,951                           – Group 3: AQo, ATs, KJs, QJs, JTs, 99
                                                                                    – Group 4: AJo, KQo, KTs, QTs, J9s, T9s, 98s, 88
(of which 79,578 with known hole cards) are shown. The
                                                                                    – Group 5: A9s-A2s, KJo, QJo, JTo, Q9s, T8s, 97s,
coloured vector represents the ‘live situation’ for which                             87s, 77, 76s, 66
hole cards should be predicted.                                                     – Group 6: ATo, KTo, QTo, J8s, 86s, 75s, 65s, 55, 54s

                                                                         16
– Group 7: K9s-K2s, J9o, T9o, 98o, 64s, 53s, 44, 43s,
          33, 22
        – Group 8: A9o, K9o, Q9o, J8o, J7s, T8o, 96s, 87o,                            mean within group   mean of means within groups
          85s, 76o, 74s, 65o, 54o, 42s, 32s                                            Ì      ÏÍ     Ì
                                                                                                     Î       ÏÍ        Î
                                                                                k   »
                                                                                    » 1
                                                                                           j              k      j                      »»2
                                                                       wi = ⋅ ∑ (»»»» ( ⋅ ∑ c ji ) − ( ⋅ ∑ ( ⋅ ∑ c ji ))                 »» )
        – Group 9: All other hands                                         1                  k        1     1      k
                                                                                                                                          »»
                                                                           k
                                                                              k=1
                                                                                    »
                                                                                    »  j
                                                                                          j=1
                                                                                                       k
                                                                                                         k=1
                                                                                                             j
                                                                                                                j=1
                                                                                                                                           »»
   • Categories grouped by the actual action taken are                                                                                  (11)
     used when determining weights for the purpose of                                 wi − min w
                                                                         ⇒ wi =      i
                                                                                                                                        (12)
     predicting a future opponent’s action: (1) check/fold,                         ∑i=1 wi − min    w
     (2) call, (3) raise x.
                                                             The similarity of two vectors (representing situations) us-
                                                             ing these weights is then calculated by dividing by the
                                                             weighted Minkowski distance38, as shown in equation 13.
Each group now contains vectors representing historical The sum of all the similarities for a specific element (e.g.
situations (note that a vector can now occur in both one of A K (hand); call (action); or 133.33 (raise size)) is divided
                                                              ♡ ♡
the groups categorized by the Sklansky hand groups listed by the sum of all similarities for elements of the same type,
above, and one of the groups divided by the action taken). to obtain the weight of that element being relevant to the
Now, we generate a matrix of which every row represents analysed opponent (e.g. holding A K ; calling; or when
                                                                                                   ♡ ♡
a vector in the group. This matrix is then ‘divided’ by (de- raising, raising with 133.33).
fined as taking the element-wise division of every value
for every row in the matrix, with every value of a denom-
inator vector) a vector holding the maximum encountered                                         1
values for the considered component. In other words:                      S(a, b) =      i
                                                                                                                     (13)
                                                                                    1 + ∑i=1 ∣ wi ⋅ (ai − bi ) ∣
the vector components are normalized by dividing their
individual values by the outcome of a max()-function ex-
ecuted column-wise37. If for a component the maximum
encountered value equals 0, the maximum value is set to The proposed method in this paper is a combination of
1 to avoid division by zero. For every group, the means two main ideas: empirical Bayesian statistics and vec-
are taken for each component over the normalized values. tor spaces. As explained in section 2, this approach is
The variance of these mean values corresponding to the chosen to diminish the negative effects of having too lit-
same component among groups is then taken as the weight tle data available. This effect is further reduced by not
for that component in that situation. This idea rests on the only concerning situations at the same point of evalua-
the intuition that the more a value varies among different tion. In other words: based on the example used on the
groups, the more that value helps to determine to which previous page, if we are constructing the vector space for
group it belongs. This is summarized in equation 11 and the raiser, we would also include situations in which the
12. In the last step, all weights are normalized so that player in the second position folded or re-raised, instead
∑i=1 wi = 1 where i in c ji corresponds to the number of of only considering situations in which that player called
   i                       k

dimensions in the situational space, k to the number of the faced raise. If a player participated in many hands,
groups and j to the number of vectors within group k.        the player will often have contributed to many different
                                                             points in a vector space generated for an often-occuring
                                                             situation. But as players are characterised by their VPIP
                                                             and PFR/VPIP-values, which serve as components for the

   37NumPy v.1.17 manual, numpy.ndarray.max, Return the maximum       38The Minkowski distance is a generalization of the Eucledian distance
along a given axis.https://docs.scipy.org/doc/numpy/referen        (a straight line between two points) and ‘taxicab distance’ (or ‘Manhat-
ce/generated/numpy.ndarray.max.html, accessed on March 9th,        ten distance’ – a stepwise distance calculated by summing the absolute
2020                                                               difference of discrete coordinates) used in normalized vector spaces.

                                                                  17
vectors used to compare situations, points reflecting situ-               the first position, players who did not fold before that
ations in which the same player played on the same posi-                 raise are due to act more than once. This means that
tion will automatically be positioned close to each other                the implemented algorithm should be recursive and
as their VPIP and PFR/VPIP-values are identical. This is                 causes the previously listed type of action sequence
compatible with ideas such as that a player’s range often                to be nestedly included within this type as well.
widens when being positioned later39, and that holdings
                                                                  When presented with a situation in which the agent has to
are usually stronger when a player raises from an early po-
                                                                  make a decision, a tree of all possible action sequences
sition compared to when that player would be seated in a
                                                                  and corresponding vector spaces will be generated. As
later position40. The more observations obtained on a spe-
                                                                  the raise-option could theoretically cause the tree to be
cific player, the more significant the historical actions of
                                                                  infinite, the number of allowed raises is ideally limited.
that particular player, and the heavier these are weighted
                                                                  Having observed too few instances of a situation is an
in predicting future actions for that player. This can be
                                                                  insurmountable obstacle which results in inadequate
seen as a geometric adaptation of prior information in a
                                                                  estimations, although to a lesser degree than when
Bayesian model. Furthermore, this method allows for pat-
                                                                  exact situations (including raise sizes and such) would
terns in an opponent’s playing style to be recognized and
                                                                  have been compared with each other. This problem is
exploited. Also, since all players are taken into account in
                                                                  inevitable and can be overcome by leaving these situations
the situational vectors, this model supports the idea that
                                                                  out of the equation. As this only applies to situations
opponents adapt to each other at the table, instead of only
                                                                  with a low number of observations, these observations
the agent adapting to the opponents. Another advantage of
                                                                  are inherently statistically improbable. Table 3 on page
this technique is the possibility of caching the matrices—
                                                                  23 shows the degree of relevance of this phenomenon to
together with their corresponding weights—in which the
                                                                  the used dataset in this thesis. As cutting out situations
vectors are contained, in order to speed up future com-
                                                                  could cause probabilities of actions occuring to not add
putation. In addition, the values of the matrix represent
                                                                  up to one, the probabilities should be normalized after a
explainable values which make it easier to understand why
                                                                  few more steps are introduced.
the algorithm would decide on a particular move.
                                                                  An example is considered wherein the agent is
3.1.4   Decision trees                                            seated in the fourth position, ‘BU’ or the ‘button,’ and
In order to explain the final concept which connects all           A = ⟨raise .20, fold, call⟩. If the maximum amount of
previously mentioned techniques, a distinction is made            raises is set to 1, all possible action sequences are
between two types of player action sequences:                     generated as shown in the list below. Bold text denotes
                                                                  the agent’s action – this style is used throughout the paper.
   • Action that happened before the agent is due to act.
     For the opponents involved in this type of action only         1.   A = ⟨raise .20, fold, call, call, call, call⟩
     the hole card ranges will be estimated using the vector        2.   A = ⟨raise .20, fold, call, call, call, fold⟩
     spaces discussed in section 3.1.3.                             3.   A = ⟨raise .20, fold, call, call, fold, call⟩
   • Action that happens after the agent is due to act. For         4.   A = ⟨raise .20, fold, call, call, fold, fold⟩
     these opponents the action and the hole card range             5.   A = ⟨raise .20, fold, call, fold, call, call⟩
     corresponding to that action is predicted using vector         6.   A = ⟨raise .20, fold, call, fold, call, fold⟩
     spaces. If the action is a raise, the raising size is also     7.   A = ⟨raise .20, fold, call, fold, fold, call⟩
     predicted. If a raise is made by a player not seated in        8.   A = ⟨raise .20, fold, call, fold, fold, fold⟩

  39CardsChat, Your Guide To Pre-Flop Calling Ranges, https:      The likelihood for every listed action sequence is calcu-
//www.cardschat.com/preflop-calling-hand-ranges.php, ac-          lated by consulting the similarity measures for the actions
cessed on March 9th, 2020
  40PartyPoker, How To Play Poker In ‘Early’ Position,
                                                                  derived from the situational spaces. If an opponent
https://www.partypoker.com/en/how-to-play/school/a                raises after the agent called or raised, the agent is due
dvanced/early-position, accessed on March 10th, 2020              to act multiple times. This needs to be handled in a

                                                               18
You can also read