LUDUS: An Optimization Framework to Balance Auto Battler Cards

Page created by Allen Rose

Arts & Entertainment

English

Like
Share
Embed
Fullscreen
Slides
Download HTML
Download PDF
Abuse

←

→

Page content transcription

If your browser does not render page correctly, please read the page content below

PRELIMINARY PREPRINT VERSION: DO NOT CITE
The AAAI Digital Library will contain the published
version some time after the conference.

L UDUS: An Optimization Framework to Balance Auto Battler Cards
Nathaniel Budijono∗1 , Phoebe Goldman∗2 , Jack Maloney∗3 ,
Joseph B. Mueller4,5 , Phillip Walker5 , Jack Ladwig5 , Richard G. Freedman5
1
University of Minnesota Twin Cities
2
New York University
3
University of Wisconsin-Madison
4
University of Minnesota
5
SIFT
budij001@umn.edu, phoebe.goldman@nyu.edu, jmaloney3@wisc.edu,
jmueller@sift.net, pwalker@sift.net, jladwig@sift.net, rfreedman@sift.net

Abstract of the Coast n.d.b), Pokémon (The Pokémon Company n.d.),
Auto battlers are a recent genre of online deck-building and Yu-Gi-Oh! (Konami n.d.). In collectible card games, a
games where players choose and arrange cards that then com- battle between two decks is called a game. Because it en-
pete against other players’ cards in fully-automated battles. compasses and affects multiple games, the deck-building
As in other deck-building games, such as trading card games, process is called the metagame.
designers must balance the cards to permit a wide variety of Deck-building games and metagames are often described
competitive strategies. We present L UDUS, a framework that as being either healthy or unhealthy. The health of a
combines automated playtesting with global search to opti- metagame depends on a variety of factors, of which we
mize parameters for each card that will assist designers in highlight diversity. Diverse metagames are those where
balancing new content. We develop a sampling-based approx-
imation to reduce the playtesting needed during optimization.
many different decks/lineups coexist with similar overall
To guide the global search, we define metrics characterizing win rates.
the health of the metagame and explore their impacts on the Metagame balance in deck-building games poses a chal-
results of the optimization process. Our research focuses on lenge for game designers because a relatively small num-
an auto battler game we designed for AI research, but our ap- ber of cards are combined to form a large number of pos-
proach is applicable to other auto battler games. sible decks, and these decks are paired into an even larger
number of possible match-ups. Magic: the Gathering relies
1 Introduction on human playtesting in advance of releasing a set (DeTora
Auto battlers are a new and wildly popular genre of online 2017), but human playtesting before release has repeatedly
game. Auto Chess released in January of 2019 (Drodo Stu- failed to identify card designs or sets that lead to unhealthy
dio 2019), and within that same month was regularly seeing metagames. Even when it does successfully identify and fix
70,000 concurrent players (Tack 2019). By June, Dota Un- unhealthy metagames, human playtesting requires signifi-
derlords and Teamfight Tactics were among the most popu- cant labor and time from a number of skilled players.
lar online games in the world (Grayson 2019). Since then, In addition, digital card games like Hearthstone collect
entries like Hearthstone: Battlegrounds (Blizzard Entertain- and analyze data after release from consumer play (Zook
ment 2019), Fire Emblem Heroes: Pawns of Loki (Nintendo 2019). While inexpensive, this approach is insufficient on its
Mobile 2020), and Storybook Brawl (Good Luck Games own because it requires a significant number of consumers
2021) have further refined the genre. to play in unhealthy metagames in order to identify prob-
In an auto battler, a number of players (typically eight) lematic designs.
compete to build the best lineup of cards through a number Healthy and diverse metagames are desirable because
of rounds. Each round consists of a deck-building phase and they lead to more varied, and therefore more enjoyable, bat-
a battle. In the deck-building phase, each player improves tles. In a homogenous metagame, any two battles are rela-
their lineup by selecting cards from a shared pool and ar- tively similar. This reduced novelty bores players, who may
ranging their order. In the battle phase, two players’ lineups then reduce their involvement with the game and purchase
compete automatically. Players who lose too many of these fewer products. Also, unsatisfied players often share their
battles are eliminated. Play proceeds in rounds until all but opinions on social media, discouraging new players from
one player are eliminated. The remaining player wins. joining the game.
Auto battlers are in many ways similar to deck-building For example, Magic: the Gathering’s set Throne of El-
collectible card games like Magic: the Gathering (Wizards draine was criticized for containing a number of overpow-
∗
These authors contributed equally. ered cards. Decks that contained these powerful cards re-
Copyright c 2022, Association for the Advancement of Artificial liably beat decks without them. High-level players quickly
Intelligence (www.aaai.org). All rights reserved. converged on a small number of deck designs that best uti-

lized the powerful Throne of Eldraine cards. On at least one expected to be common in the metagame. However, after our
occasion, a tournament was canceled due to lack of interest approach provides suggestions for grounded card instances,
(Triske 2019). Wizards of the Coast, the company respon- game designers can apply Xu et al.’s methods for a deeper
sible for Magic: the Gathering, banned a total of six cards analysis of the suggested card designs. If the designers have
from Throne of Eldraine in response (Wizards of the Coast some expected lineups based on the templates alone, then it
n.d.a; Duke 2019, 2020b,a). Even after the bans, high-profile should be possible to combine our methods. The informa-
professional players have complained on social media about tion available to the game designers would enable them to
Throne of Eldraine’s impact on the Magic: the Gathering explore not just which lineups are considered the best, but
metagame (Scott-Vargas 2020). also how the best lineups change for various card revisions.
In this paper, we introduce L UDUS, a framework to re- Many games that support competitive play over the in-
duce the cost of playtesting without exposing consumers to ternet collect data about what and how people are playing.
unhealthy metagames by automating the balancing process. Game designers can take advantage of various data science
Section 2 discusses other research related to auto battlers approaches to find trends in the logged data that inform them
and deck-building metagames. Section 3.1 develops an auto about how to balance the game (Nguyen, Chen, and Seif El-
battler game we designed to facilitate artificial intelligence Nasr 2015). In the virtual collectible card game Hearthstone,
(AI) research and presents an algorithm that efficiently and clusters of similar deck compositions implicitly describe
accurately approximates the average win rates of all the line- archetypes that are currently popular in the metagame. De-
ups in a metagame. Section 3.2 defines three metrics that signers can use this information to decide if some archetypes
evaluate the diversity of a metagame, and we theorize how are too common (implying they might be too powerful) and
game designers could define more precise metrics for their respond through updates to card descriptions, releasing new
own games. Section 3.3 explains how we use a genetic al- cards that counter the overused archetypes, or releasing new
gorithm to balance a metagame. Section 3.4 explores how cards that support the underused archetypes (Zook 2019).
game designers can use L UDUS and these metrics for in- This organizes human-operated playtesting at macro-scale
sights about their current card designs. Section 4 presents where designers can iterate on their game between updates.
experimental results evaluating the sampling-based approxi- Because human-operated playtesting is expensive
mation described in Section 3.1 and the genetic optimization resource-wise and rarely exhaustive enough to identify all
described in Section 3.3. Section 5 reviews the implications points of concern, automating the playtesting process can
of our experimental results. Section 6 considers possible ex- speed up the number of games played and discover edge
tensions to our work. cases humans might not consider. For deck-building games
where the players have agency and outcomes are nondeter-
2 Related Works ministic (shuffled decks, random outcomes of effects, etc.),
As the auto battler genre is still new, we were only able to the space of possible deck combinations and game states can
find one other work that specifically addresses the use of AI be immeasurable. Following in the footsteps of DeepMind’s
in their design. Xu et al. (2020) focused on autonomously work on AlphaGo (Silver et al. 2016), Stadia trained a
identifying the best lineups that players can construct given deep learning model through many games of self-play in
the available cards, game rules, and lineups that the game’s order to develop a function that could evaluate the quality
designers expect to be played the most. 1 . In a two-step pro- of a game state for a deck-building game (Kim and Wu
cess, their method first simulates play between randomly 2021). Their automated game-playing agent then competed
generated lineups and the designers’ expected lineups to cre- against itself with designer-constructed decks in order to
ate a training dataset for a neural network—the model learns compare the overall decks’ performance, accounting for
a mapping between a given lineup and its estimated win the nondeterministic outcomes through many games as
rate against the designers’ expected lineups. In the second statistical samples. A game designer may manually inspect
step, a genetic algorithm searches for lineups that optimize the outcomes of all the matches to determine whether any
the learned win rate function under various constraints that revisions to the cards are necessary.
define lineup construction rules throughout the auto battler Bhatt et al. (2018) investigated the deck-building aspect
game. Constraints account for drafting additional cards be- of Hearthstone with genetic algorithms evolving the decks to
tween rounds of play, which are not elements of our auto optimize defeating a specified opposing deck. Their game-
battler game (see Section 3.1 for a description). playing agents used a greedy, myopic search algorithm as an
Game designers may use the optimized lineups to evaluate experimental control while the evolving decks were the ex-
whether the cards are balanced and, if not, which cards need perimental cases. The resulting evolutionary paths yield in-
revision. Our method instead revises the cards directly, opti- sights into the diversity of the metagame. In contrast, Zook,
mizing the viability of the set of possible lineups that players Harrison, and Riedl (2019) altered the parameters of Monte
can construct. This alters the initial efforts of the game de- Carlo Tree Search to study how different types of players
signers to focus on the rules and some card templates with- performed with fixed decks in their own collectible card
out committing to specific grounded card instances. Because game. It revealed balance issues from the perspective of the
the templates are lifted representations of the actual cards, it game’s rules, such as a clear advantage to the player with the
is more difficult for designers to identify which lineups are first turn. To investigate more specific scenarios, Jaffe et al.
(2012) enabled designers to specify restrictions on player
1
Their specific auto battler uses ‘piece’ instead of ‘card.’ parameters and available cards. Their simulator ran multiple

games and summarized the results with respect to checking to itself and donates the remaining health points to its
various features of the win percentages and play logs. Chen player’s other cards evenly.
and Guy (2020) considered both simulating multiple games Armor Points The card starts with this many armor points.
and training a deep learning model to assess the metagame When the card would be damaged, it instead loses an
as they procedurally generated new cards using grammars. armor point. When the card has zero armor points, it is
damaged normally.
3 Methods Damage Split Percent When the card would be damaged,
To achieve the goals outlined in the introduction, L UDUS it instead receives this many percent of the damage to it-
contains an auto battler game with a number of special cards self and splits the remaining damage evenly to its player’s
with extra mechanics and a basic ‘vanilla’ card that only en- other cards.
gages in standard combat without extra mechanics. The de-
Middle Age Prior to entering combat, the card increases its
tails of these cards and their mechanics are explained below
attack by 1 if it has not participated in this many com-
in Section 3.1. The health and attack values of all cards can
bat phases. After this many combat phases, the card de-
be adjusted as well as some values pertaining to the extra
creases its attack by 1 before entering combat.
mechanics (such as how much damage to deal to an oppo-
nent’s card when a card dies). Multiple cards of the same Target Age When the card dies, if it participated in this
type that have different attack, health, or other values can be many combat phases, it deals this many damage points
considered in the same game. to each of the opponent’s cards. If the card participated
Beyond the auto battler itself, L UDUS also contains an in fewer than this many combat phases, it heals each of its
optimization framework to find configurations of available player’s other cards by health points equal to the number
cards that are balanced, fair, and elicit desired play expe- of combat phases in which the card participated.
riences. These criteria are obviously quite subjective; so Detonation Time When the card has been healed or dam-
game designers can easily plug new metrics into the exist- aged this many times, the card and its opponent’s card
ing framework as they develop new ways of capturing their both die.
goals and preferences describing the metagame. This paper
explores a few options as well. Other card mechanics without parameters include the
copying the attack of the opponent’s cards (morphing ene-
3.1 Playtesting Simulator mies) and swapping its health points and attack whenever its
attack is greater than its health points (survivalist).
To serve as a data source for quantitative analysis, we create
a playtesting simulator for an original auto battler game we Match Procedure Gameplay proceeds as follows in the
designed. This simulator is designed for research purposes, simulator:
allowing the user to create cards with new mechanics and
run tournaments between arbitrary lineups. 1. Each player in a tournament selects an ordered list of
All cards are equally available to all players, and there cards to be their lineup.
are no restrictions on the number of copies of a card al- 2. Prior to combat, arrange the lineup’s cards left-to-right.
lowed in players’ lineups or in the entire game. Some auto 3. Each combat phase pits the leftmost surviving card of
battlers have a drafting process in which cards are selected each player against each other. The cards take damage
or purchased from a pool, but drafting and/or deck-building equal to their opponent card’s attack. Cards with 0 or less
take different forms across different games (Blizzard Enter- health points die, and surviving cards are rearranged to be
tainment 2019; Good Luck Games 2021; Nintendo Mobile the rightmost card of their respective lineups.
2020). We do not explore variants of these processes in this
paper, but we will consider them in future work. 4. Repeat the combat phase until one or fewer players has a
surviving card or the maximum number of combat phases
Card Definition Cards have positive integer statistics in- has been reached. If both or neither players have at least
cluding health points, attack, and parameters pertaining to a one surviving card, the game is a draw. Otherwise, the
special mechanic. In our game, such special parameters in- player with a surviving card wins.
clude the following:
Win Rate Approximation We run a round-robin tour-
Explosion Damage When the card dies, it damages each of nament where each lineup plays one match against every
the opponent’s cards by this many health points. other lineup. Unfortunately, for n lineups, this results in
Heal Amount Prior to entering combat, the card restores n(n − 1)/2 = n2 /2 − n/2 matches. This quickly becomes
this many health points to its player’s rightmost card prohibitively large to simulate. We developed a sampling-
Attack Growth Per Hit When the card takes damage, its based approximation that runs a subset of the games in order
attack increases by this many points. to estimate the win rates of cards and lineups. It randomly
partitions the lineups into groups of uniform size and runs
Explosion Heal When the card dies, it heals each of its round-robin tournaments within these groups. For a group
player’s other cards by this many health points. tournament with k groups and n total lineups, this results in
Heal Donation Percent When the card would be healed, it an upper bound of k · (n/k)(n/k − 1)/2 = n2 /(2k) − n/2
instead restores this many percent of the health points matches in the case where n is divisible by k.

3.2 Metrics
Bruiser Attack Versus Grow on Damage Health
Quantitative measures of metagame health are needed to
guide card statistic optimization. We present metrics built
upon the output data of the playtesting simulator.
The playtesting simulation outputs a data vector v, the
lineup payoff vector. Each entry vi ∈ [−1, 1] represents the
average payoff of the lineup at index i during the last round

Attack
of playtesting in the optimization process. A payoff of 1 in-
dicates a win. A payoff of 0 means a tie, and a loss results in
a payoff of -1. The average of these payoffs over the course
of the simulated tournament for a lineup make up the entries
of v.
We then transform v into another vector of length n that
maps a card to the average win rate of the lineups in which
the card appears. This vector w, the card win rate vector, Health
has entries wj ∈ [0, 1], the average win rate of the lineups in
which the card at index j appears. We present three metrics
Figure 1: Heatmap showing how changes in the attack of
that evaluate the uniformity of this distribution of win rates
the bruiser card and the health of the grow-on-damage card,
among the cards.
while all other card parameters remain fixed, affect the stan-
Each of the following metrics operate on the card win rate
dard deviation metric.
vector.
Per-card Payoff
1X 2
3.3 Optimization
PCP = |payoff(wj )| (1)
n j We use the PyGAD python library (Gad 2021) for genetic
algorithms. This searches for the optimal values of spec-
Since average payoffs are equal to 0 when a lineup a card ified parameters, which may include health points, attack,
appears in wins as many matches as it loses, this is a metric and special parameters, with respect to the chosen metric.
to minimize if we want to punish overly powerful or terrible For our experiments, we use the standard deviation metric.
cards. The squared term disproportionately attacks outliers. We constrain the genetic algorithm to consider integer
Standard Deviation Metric values between 1 and 10, inclusive. While hyperparameter
v tuning may have yielded better results, we chose to run the
u   2
u X genetic algorithm with 8 solutions and 4 parents mating per
u1 1 X
SDM = t wj −  wj  (2) population for 32 generations in each experiment.
n j n j
3.4 Qualitative Analysis
Minimizing standard deviation of the win rates may avoid
Another way to understand the landscape of the metagame
cards with drastically higher or lower win rates compared to
is through more qualitative methods. We look at two differ-
other cards.
ent ways of plotting data about the game results that give
Entropy Metric insights into the health and stability of the metagame.
! The first plot sweeps over two variables, typically of the
w w
P j log P j
X
EM = − (3) same card, although not necessarily. This gives a visual rep-
j j wj j wj resentation of the metagame sensitivity under variations of
a configuration, which can lead to some interesting insights
When the entropy of the win rates is maximized, we ap-
about the relationships between cards.
proach a uniform distribution of win rates over the cards.
Take, for example, the plot in Figure 1 of the value of the
Other Metrics The L UDUS framework is flexible; re- standard deviation metric given various values for the attack
searchers and designers alike may develop their own met- of the bruiser card and the health of the grow-on-damage
rics corresponding to their own ideas of what constitutes a card. The bruiser card is a vanilla card with a high attack (al-
healthy metagame. For example, a game designer may know though variable in this situation) and one health point. The
from historical data that a certain distribution of win rates grow-on-damage card has one attack growth per hit. Initially
over cards or lineups correspond to a period of celebrated this card has one attack, and in this situation has a variable
parity in the metagame. The designer could conceivably use amount of health. If the grow on damage card is able to con-
the Wasserstein distance or Kullback-Leibler divergence be- sistently become a powerful threat, it can become too strong
tween this historical distribution and another distribution as and a necessity for high-level play, leading to an unbalanced
a metric. Minimizing this metric might maintain similar lev- metagame.
els of parity through future iterations of card releases. Met- In the figure, we can see that configurations in the upper
rics could also evaluate other measurable elements like av- left corner, where the attack of the bruiser is greater than the
erage turns per match. health of the grow on damage card, tend to be much more

Round Robin Win Rates Group Versus Round Robin Tournament Errors

Absolute Error (Win Rate)
Lineups

Group Size (Cards)

Win Rate
Figure 4: Absolute value of the difference in win rate be-
Figure 2: Distribution of win rates before optimization. tween a group tournament of varying size and a full round-
robin tournament of 1,728 possible lineups.
Round Robin Win Rates

3.5 Experiment Design
Our experiments share common phases.
1. Build lineups. We generate a set of cards and their asso-
ciated attack, health points, and special parameters. De-
Lineups
Lineups

pending on the experiment, a subset of these parameters
may vary between simulations during the optimization
process. From this set of cards, the lineups are permuta-
tions of 3 cards from the set, allowing for duplicates.
2. Run tournaments. This will be either be a group or a
round-robin tournament as described in Section 3.1.
3. Optimize. These tournaments are run within each itera-
Win Rate tion of our genetic algorithm to produce a metric estimat-
ing the best possible health of the metagame with respect
Figure 3: Distribution of win rates after optimization. to configurations of the card parameters.

4 Results
favorable than the configurations in the lower right corner, We explore applications of L UDUS to problems that game
where the opposite is true. This evidence supports the in- designers commonly face.
tuitive idea that the bruiser’s high attack is a good counter
to the grow on damage card because the bruiser card can 4.1 Group Versus Round-Robin Tournament
quickly remove the grow on damage card before it has had The first experiment investigates the efficacy of our
enough interactions to grow its damage to a large number. sampling-based approximation in Section 3.1. To do this, we
This kind of information can be quite useful to a game de- compare the win rates of a complete round-robin tournament
signer, who can see very clearly that the inclusion of a high- (every combination of lineups played) with the distribution
damage card can improve the metagame dramatically if the of win rates over a 16-fold tournament of randomly parti-
grow on damage card is found to be too powerful. tioned lineups into groups of various sizes, playing every
The second plot is a histogram of the lineup win rates af- combination of the lineups within those groups.
ter running a tournament amongst all possible lineups. The Figure 4 compares the win rates of each lineup in the
resulting distribution can have features that may be informa- group tournament versus the round-robin tournament, and
tive to game designers. For example, examine the difference we can see that at a group size of 256, the mean error be-
between the distributions in Figures 2 and 3 from tourna- comes negligible (about 0.0382). We use this group size in
ments using two configurations of the same card parameters. the other experiments rather than a round-robin of all 1,728
Given playtesting input from actual players and historical lineups.
data from other successful metagames, game designers can
interpret the changes new cards have brought and implement 4.2 Optimizing Cards without Special Mechanics
changes to the game or metrics for optimization that will The next experiment optimizes a set of four ‘vanilla’ cards
improve the experience of human players. with no extra mechanics. We expected that the ‘optimal’

solution in terms of the standard deviation metric would optimized them alongside the first set fixed at the solution
have four identical cards because the interchangeable cards found previously—this is ten cards with variable parameters
should have zero standard deviation in the win rate. Due to for only the second set of cards.
the limited number of generations in our genetic algorithm, The optimizer improved the win rate standard deviation
we recognized that there was no guarantee it would find this from 0.0591 to 0.0541. This is a comparatively smaller im-
solution. From a game design perspective, a less-optimal so- provement than the previous experiments, but the initial state
lution seems more interesting in this case as a game with was already in much better condition than any of the previ-
only a single card is obviously uninteresting. We wanted to ous experiments. This is interesting because it could indicate
see what ‘almost-optimal’ solutions appear for this setup. that adding new cards to an already well-balanced set may
This observation also illustrates a case where the standard not disturb the balance as much as one might expect.
deviation metric fails to fully capture the qualities we wish
to optimize in our game.
The results of this experiment illustrate a different failure
5 Discussion
case of our standard deviation metric that we did not pre- We can see that the L UDUS framework yielded some very
dict. The genetic algorithm quickly found the solution (5/3), useful results from a game design perspective. Given a
(5/4), (8/3), (8/1) where (a,b) indicates a card with a attack small, yet representative, set of cards, our methods were able
and b health points. This solution had the optimal zero stan- to find more balanced configurations for cards that should
dard deviation—why? Any pairing of these cards will result create a far more enjoyable auto battler game to play.
in both cards killing each other simultaneously. Thus, every We also implemented an approximation method that con-
game ends after one round of combat phases in a tie, and siderably reduces the computation time necessary to evalu-
there is no variance in win rates amongst the lineups. ate configurations during optimization, especially for large
card sets. This method randomly breaks the tournament into
4.3 Optimizing Only Special Mechanics smaller groups that only compete within themselves. Our
The third experiment optimizes only the special mechanic empirical results indicate that under most circumstances, the
parameters of twelve cards. We assigned each card an in- approximations are close enough to the complete tourna-
tuitive value for its attack and health, which a game de- ment to produce satisfactory optimization results.
signer might do for initial context. L UDUS then optimized We applied the L UDUS framework to some standard prob-
the parameters for the special mechanics of these partially- lems game designers face and found promising results. No-
designed cards. We were interested to see how it optimized tably, the algorithm successfully optimized cards under var-
under these constraints and how impactful the special me- ious constraints from partially-designed cards to expansions
chanics would be for the overall balance. Table 1 describes of previously optimized card sets.
the cards used in this experiment. The optimizer improved Beyond the immediate results of this paper, we con-
the win rate standard deviation from 0.0751 to 0.0589. tributed an auto battler game and its associated tools. Given
the recent nature of this genre, we believe this is an impor-
4.4 Optimizing All Parameters tant contribution that will aid future research in the area.
After fixing the attack and health to focus solely on the spe-
cial mechanics, the next experiment tunes the attack, health, 6 Future Work
and special mechanics all at once. To reduce the huge di-
The genre of auto battlers is very young, and our research
mensionality of this experiment, we selected only five of the
only explores a narrow slice of questions in this new area.
twelve cards to optimize. They are listed in Table 2.
We discuss problems that extend our current work.
The optimizer improved the win rate standard deviation
from 0.0704 to 0.0584. This minimum is approximately the
same as the one achieved in the previous experiment, modi- 6.1 Other Auto Battler Features
fying only the values of the special mechanics. One impor- Unlike our game, some auto battlers have players iteratively
tant observation we made is that the optimizer was still mak- build their lineup between battles by purchasing cards from
ing good progress in the final generations of both our exper- an array of options. While our experiments were concerned
iments. This indicates that these might not be minima, and with fixed lineups of size 3, L UDUS’s ability to quantify,
compute time for additional generations could continue to qualify, and optimize balance for other deck-building rules
improve the card designs. remains to be assessed. We previously only considered the
case where any card may be replaced with any other card to
4.5 Optimizing After a Set Rotation create a new lineup. However, the purchasing value of cards
Another common scenario designers of deck-building is another dimension of their design; future work should
games encounter is releasing a new set of cards that main- consider comparing lineups that players can build after the
tain the compatibility, fairness, and competitiveness with the same number of battles and purchasing phases.
previously released cards. To apply L UDUS to this problem, In many auto battlers, cards have multiple special me-
we took the set of five cards from Section 4.4 with their op- chanics that each have variable parameters. Our work only
timized solution as the first set of cards released. Next, we considers cards with a single mechanic, but this should be
selected five additional cards, described in table Table 3, and extended to optimize design for such cards.

Card            Attack       Health   Special Parameter (see Section 3.1)   Optimized Special parameter
         Explode On Death          2           1             Explosion Damage                            1
          Friendly Vampire         1           3                Heal Amount                              3
         Grow On Damage            0           5            Attack Growth Per Hit                        4
           Heal On Death           1           2              Explosion Health                           3
            Health Donor           1           4            Heal Donation Percent                        3
        Ignore First Damage        2           1                Armor Points                             1
           Morph Attack            0           3             Morphing Enemies                           N/A
             Pain Splitter         2           2            Damage Split Percent                         7
              Rampage              0           4                 Middle Age                              8
              Survivalist          2           2                 Survivalist                            N/A
              Threshold            2           2                 Target Age                              5
             Time Bomb             1           8               Detonation Time                           7

                    Table 1: Variable and Fixed Parameters for the Optimize Special Mechanics Experiment

               Card             Optimized Attack       Optimized Health     Optimized Special Mechanic (see Section 3.1)
           Survivalist                   9                       1                              N/A
          Morph Attack                   5                       5                              N/A
       Ignore First Damage               6                       8                               1
        Explode On Death                 4                       1                               3
             Vanilla                     7                       2                              N/A

                              Table 2: List of Cards in the First Set and Their Optimized Solution

               Card           Optimized Attack        Optimized Health     Optimized Special Mechanic (see Section 3.1)
        Friendly Vampire               7                     7                                  8
        Grow On Damage                 2                     6                                  9
         Heal On Death                 3                     9                                  1
            Rampage                    6                     7                                  5
             Vanilla                   7                     6                                 N/A

                             Table 3: List of Cards in the Second Set and Their Optimized Solution

6.2   Simulation Data                                                would have solved an unmet need of validating the results.
All metrics we present are based on the average win rates of
lineups collected from the simulator. This does not take into        6.4    Human Playtesting
account individual lineup-versus-lineup win rates, which are         Human playtesting is still required to determine the health
always 0 or 1 since our auto battler game is deterministic.          of the metagame as we have not verified to what extent
One could represent these relations via a domination graph           our metrics correspond to what humans deem as a healthy
with lineups as nodes and a directed edge to the lineup that         metagame. Designers with a historically healthy metagame
wins the head-to-head matchup (Goldman et al. 2021), game            can use those parameters to generate reference win rate dis-
matrices with each lineup as a strategy, and more. Further in-       tributions over lineups and cards using our L UDUS frame-
vestigation is required to determine whether perturbations of        work. These distributions can then shape new metrics and
card parameters causing a degradation in metagame health             imply their ideal values for parity. Human playtesting feed-
correspond to any changes in these representations. There            back could also motivate other metrics based on the game-
are also game design factors to consider besides win rates,          play experience, such as the number of turns. While this
such as game durations and intensity levels.                         work is an example of how designers can readily ap-
                                                                     ply the L UDUS framework, researching human playtesting-
6.3   Alternative Optimization Methods                               validated results marks an important avenue of future work.
Our choice of a genetic algorithm for optimization is some-
what arbitrary. Genetic algorithms can conveniently handle                             Acknowledgments
the familiar integer values of card parameters, unlike other         The authors would like the thank the anonymous reviewers
optimization methods. Gradient- and Hessian-based meth-              for their helpful feedback, Nathan Ringo for providing com-
ods are difficult to apply to this problem due to our parame-        putational resources to run the experiments, and everyone at
ters not being differentiable. Had they been applicable, they        SIFT for their support.

References                                Konami. n.d. Yu-Gi-Oh! Trading Card Game. https://www.
Bhatt, A.; Lee, S.; de Mesentier Silva, F.; Watson, C. W.;        yugioh-card.com/en/. Retrieved September 4, 2021.
Togelius, J.; and Hoover, A. K. 2018. Exploring the Hearth-       Nguyen, T.-H. D.; Chen, Z.; and Seif El-Nasr, M. 2015.
stone Deck Space. In Proceedings of the Thirteenth Inter-         Analytics-Based AI Techniques for a Better Gaming Expe-
national Conference on the Foundations of Digital Games,          rience. In Rabin, S., ed., Game AI Pro 2: Collected Wisdom
FDG ’18, 1–10. Malmö, Sweden: Association for Comput-            of Game AI Professionals, chapter 39, 481–500. New York:
ing Machinery. ISBN 9781450365710.                                A. K. Peters, Ltd. / CRC Press.
Blizzard Entertainment. 2019. Introducing Hearthstone             Nintendo Mobile. 2020. Fire Emblem Heroes - Loki
Battlegrounds.        https://playhearthstone.com/en-us/news/     Presents (Pawns of Loki) [Video]. YouTube. https://www.
23156373. Retrieved September 4, 2021.                            youtube.com/watch?v=Ldf3Np5lMhs.
Chen, T.; and Guy, S. 2020. Chaos Cards: Creating Novel           Scott-Vargas, L. 2020. Luis Scott-Vargas on Twitter: ”Is
Digital Card Games through Grammatical Content Gener-             Throne of Eldraine really legal for another year? It feels
ation and Meta-Based Card Evaluation. In Proceedings of           like it’s dominated Standard for years already...” / Twit-
the AAAI Conference on Artificial Intelligence and Inter-         ter. https://twitter.com/lsv/status/1339644691378282496?
active Digital Entertainment, volume 16, 196–202. Worch-          lang=en. Retrieved September 6, 2021.
ester, Massachusetts, USA.                                        Silver, D.; Huang, A.; Maddison, C. J.; Guez, A.; Sifre,
DeTora, M. 2017. Designing Hour of Devastation Cards              L.; van den Driessche, G.; Schrittwieser, J.; Antonoglou,
to Meet FFL Goals. https://magic.wizards.com/en/articles/         I.; Panneershelvam, V.; Lanctot, M.; Dieleman, S.; Grewe,
archive/play-design/designing-hour-devastation-cards-             D.; Nham, J.; Kalchbrenner, N.; Sutskever, I.; Lillicrap, T.;
meet-ffl-goals-2017-07-07. Retrieved September 6, 2021.           Leach, M.; Kavukcuoglu, K.; Graepel, T.; and Hassabis, D.
Drodo Studio. 2019. Auto Chess–Official Website. https:           2016. Mastering the Game of Go with Deep Neural Net-
//ac.dragonest.com/en. Retrieved September 4, 2021.               works and Tree Search. Nature, 529: 484–489.
Duke, I. 2019. November 18, 2019 Banned and Restricted            Tack, D. 2019. What Is Dota Auto Chess And Why Is
Announcement.            https://magic.wizards.com/en/articles/   Everyone Playing It? https://www.gameinformer.com/2019/
archive/news/november-18-2019-banned-and-restricted-              01/14/what-is-dota-auto-chess-and-why-is-everyone-
announcement. Retrieved September 6, 2021.                        playing-it. Retrieved September 8, 2021.
Duke, I. 2020a. August 3, 2020 Banned and Restricted An-          The Pokémon Company. n.d. Pokémon Trading Card Game
nouncement. https://magic.wizards.com/en/articles/archive/        | Pokemon.com. https://www.pokemon.com/us/pokemon-
news/august-8-2020-banned-and-restricted-announcement.            tcg/. Retrieved September 4, 2021.
Retrieved September 6, 2021.                                      Triske. 2019.      [Magic: The Gathering] Come All Ye
Duke, I. 2020b. June 1, 2020 Banned and Restricted An-            Christians and Learn of a Sinner: This Guy, Oko.
nouncement. https://magic.wizards.com/en/articles/archive/        https://www.reddit.com/r/HobbyDrama/comments/do8v2j/
news/june-1-2020-banned-and-restricted-announcement.              magic the gathering come all ye christians and/.         Re-
Retrieved September 6, 2021.                                      trieved September 4, 2021.
Gad, A. F. 2021. PyGAD: An Intuitive Genetic Algorithm            Wizards of the Coast. n.d.a.        Banned and Restricted
Python Library. arXiv:2106.06158.                                 Lists. https://magic.wizards.com/en/game-info/gameplay/
                                                                  rules-and-formats/banned-restricted. Retrieved September
Goldman, P.; Knutson, C. R.; Mahtab, R.; Maloney, J.;
                                                                  6, 2021.
Mueller, J. B.; and Freedman, R. G. 2021. Evaluating
Gin Rummy Hands Using Opponent Modeling and Myopic                Wizards of the Coast. n.d.b. Magic: The Gathering | Offi-
Meld Distance. Proceedings of the AAAI Conference on Ar-          cial Site for MTG News, Sets, and Events. https://magic.
tificial Intelligence, 35(17): 15510–15517.                       wizards.com/en. Retrieved September 4, 2021.
Good Luck Games. 2021. Storybook Brawl. https://                  Xu, J.; Chen, S.; Zhang, L.; and Wang, J. 2020. Lineup Min-
storybookbrawl.com/. Retrieved September 4, 2021.                 ing and Balance Analysis of Auto Battler. In Proceedings of
                                                                  the IEEE Sixth International Conference on Computer and
Grayson, N. 2019.          A Guide to Auto Chess, 2019’s
                                                                  Communications, 2300–2307. Chengdu, China.
Most Popular New Game Genre. https://kotaku.com/a-
guide-to-auto-chess-2019-s-most-popular-new-game-gen-             Zook, A. 2019. Better Games through Data: How Blizzard
1835820155. Retrieved September 4, 2021.                          Uses Data Science for Epic Experiences. Recording avail-
                                                                  able at https://youtu.be/ YSYVRdzUkE?t=2636. Last ac-
Jaffe, A.; Miller, A.; Andersen, E.; Liu, Y.-E.; Karlin, A.;
                                                                  cessed September 4, 2021.
and Popović, Z. 2012. Evaluating Competitive Game Bal-
ance with Restricted Play. In Proceedings of the Eighth           Zook, A.; Harrison, B.; and Riedl, M. O. 2019. Monte-Carlo
AAAI Conference on Artificial Intelligence and Interactive        Tree Search for Simulation-based Strategy Analysis. CoRR,
Digital Entertainment, AIIDE’12, 26—-31. Stanford, Cali-          abs/1908.01423: 1–9.
fornia, USA: AAAI Press.
Kim, J. H.; and Wu, R. 2021. Leveraging Machine Learning
for Game Development. https://ai.googleblog.com/2021/03/
leveraging-machine-learning-for-game.html.

You can also read