HTN Planning and Game State Management in Warcraft II

Page created by Armando Harmon
 
CONTINUE READING
HTN Planning and Game State Management in Warcraft II
HTN Planning and Game State Management in Warcraft II
                     Noah Brickman                                                   Nishant Joshi
          University of California at Santa Cruz                         University of California at Santa Cruz
                  noah@superlight.net                                           nishant@cse.ucsc.edu

ABSTRACT                                                       compelling game experience for the player. A more
                                                               ‘intelligent’ AI that performs real planning would make a
         A Hierarchical Task Network is a planning             more compelling opponent.
system that uses a hierarchy of primitive and compound                   Our project takes the approach of using a
tasks to define a planning domain. A HTN is created            Hierarchical Task Network (HTN) to encode the game
using the JSHOP2 system which implements a high level          domain and the various commands and operations a
planning system for Warcraft II build orders. The              player may have available to them. HTNs are a type of
resulting plans are executed in the ABL/Wargus                 automated planning algorithm that encode a problem
environment, a reactive planner coupled with a developer       domain and then produce a plan to accomplish a given
interface for the game Warcraft II. An ABL agent               task from an initial world state. A HTN encodes a
implements resource and unit tracking, handling the low        problem domain in a hierarchy of primitive and
level execution of the generated plan, sensitive to changes    compound tasks. At the highest level a compound task
in the environment by making real-time decisions.              could be ‘build-town’. The ‘build-town’ task can be
                                                               broken down into subtasks like ‘build-castle’ or ‘build-
1. INTRODUCTION                                                barracks’. Each of these tasks has unique preconditions
                                                               requiring availability of resources and worker units.
          There are a variety of ways that one may             Eventually compound tasks are broken down to primitive
approach the problem of developing an AI for a real time       operators which give specific commands to train an
strategy game (RTS). As with many AI design approaches         individual unit at a building or command a worker to
for a RTS, the problem can be divided into both high and       build a specific type of building.
low level tasks. At the high level we have general game                  The HTN for this project encodes some basic
strategy: what buildings and units to build, level of          rules about unit training, resource gathering, and building
resource gathering, upgrade priorities, etc. At the low        construction. It produces a plan, consisting of a sequence
level is the specific instructions given to units: move a      of commands that the game engine should carry out.
given unit to a specific position, build a building at               (:operator gather-resource
specific location, train a unit at a specific building, etc.            :parameters (?w ?l)
                                                                        :precondition (:and (worker ?w)
          One traditional approach to constructing the high                (:not(worker-full ?w))
level AI for a RTS domain has been to construct a finite                   (worker-location ?l ?w)
state machine (FSM) which encapsulates various game                        (resource-at ?l) )
states and the appropriate commands to effect the desired               :effect (worker-full ?w))
transitions between states. Such an approach has the                    Figure 1: A UCPOP Operator
advantage of being relatively simple to implement and
understand. At any given time the game will be in one of
the various ‘states’ encoded in the FSM. Various pre-                    A lot of game AI research is focused on making
planned scripted actions are carried out depending on the      the characters and the narrative exhibit behavioral
game state and the state of the FSM.                           patterns that have meaning within the context of the
          The problem with this approach is that only          game. We can imagine a bot in Unreal exhibiting a
those states hardcoded in the FSM are available to the AI.     vengeful behavior and repeatedly pursuing the human
Only those actions pre-scripted by the developer can be        player once attacked by her, or a defensive one who
executed by the AI. In this sense, the AI is rather limited    prefers to maintain a low profile.
in its ability to generate intelligent plans with respect to             One tool for authoring such behaviors is ABL, a
the game world. At a first glance, the AI may be able to       behavior description language. It provides the author with
play effectively against a novice player, but it will be       a useful abstraction, namely, authoring character AI in
limited in its ability to generate novel sequences of          terms of goals and behavioral patterns. An agent written
commands for an arbitrary game state. This may result in       in ABL pursues a high-level goal which can be achieved
predictable actions carried out by the AI, and a less          by pursuing predefined behaviors. These behaviors in turn
HTN Planning and Game State Management in Warcraft II
define sub goals which have their own associated               developed at Stanford in the early 70s’[2]. In a STRIPS
behaviors. “Pick up gun” would therefore be a sub-goal         style planning system, an initial state and goal state for
for the parent behavior “kill enemy”.                          the world is specified. A series of transformation rules are
          HTN planners like SHOP2 provide us with the          defined, each with a set of pre and post conditions. The
ability to use a domain theory specified in terms of initial   transformation rules take world state predicates as
state, tasks and methods (with preconditions and effects),     variable arguments, and result in a transformed world
and get as a result a plan that would achieve the goal         state, if the preconditions are satisfied. The algorithm
given the initial state and the operators and methods          attempts to find a sequence of operations on the world
available.                                                     state that will transform the initial state to the goal state.
          We can therefore have a set of behaviors             In F.E.A.R. this technique was used to plan actions and
                                                               animation sequences that would result in realistic
                                                               behavior of enemy agents in the world.
                                                                         Though this approach was useful in F.E.A.R., the
                                                               STRIPS style system is itself not sufficiently expressive
                                                               to encapsulate the planning system designed for this
                                                               project. An attempt was made to implement the RTS
                                                               planning system using UCPOP (Fig. 1), a STRIPS style
                                                               planning system. In UCPOP, a series of operators are

                                                                        harvest gold
                                                                        train peon
                                                                        harvest lumber
                                                                        train peon
                                                                        harvest gold
   Figure 2: A soldier executing the ‘patrol’ task in                   train peon
                     F.E.A.R. [2]                                       harvest lumber
                                                                        train peon
                                                                        build farm
implemented in ABL and let the HTN treat them as action                 build barracks
operators available for making state transitions. Planning              build lumber_mill
                                                                        train grunt
is traditionally employed in the action operator space, but
                                                                        build blacksmith
is equally applicable in the behavior space, which is much              train grunt
closer to the way humans formulate plans.                               upgrade blacksmith weapons
           ABL can be used to create simple (no sub-goals)              train grunt
                                                                        upgrade blacksmith shield
and complex behaviors based on the primitive actions of
the domain. These behaviors can be realized in the game
world by communicating with the game engine. This                          Figure 3: The game commands
allows us to delegate the task of plan actuation to ABL,                 generated by the JSHOP2 planner.
and worry about plan formulation using the HTN planner.
Using this approach, the entire planner could be replaced      specified, each with variable arguments, preconditions,
with a different system, provided it outputs its plan in the   and world effects. Though some interesting results were
format that ABL is expecting.                                  achieved with this system, it was not sufficiently
           However, ABL does more than just realizing the      expressive to encapsulate the RTS planning domain we
plan in the game world; it is a reactive planning language     desired. Features like tracking time and resource
which allows us to respond to changes in the game world        inventories cannot be expressed using the UCPOP
in real-time. Thus, using an HTN planner like SHOP2            language. Due to the fact that the algorithm plans using
with a reactive planner like ABL allows us to separate the     backwards-chaining, the evolution of such numerical
high-level strategic planning from the real-time reactive      features cannot be planned by the algorithm. Additionally,
planning that’s necessary for successfully executing the       all the operators exist on the same level hierarchically,
plan in the game world.                                        making it impossible to break a problem into high and
                                                               low level tasks and operators.
2. RELATED WORK                                                          Another planning system we studied was
                                                               SquadSmart[3]. In SquadSmart an HTN (called an
          Automated planning algorithms have been used         Hierarchical Transition Network) was used to plan actions
in other gaming environments. In the game F.E.A.R.[1], a       for the squad as a whole. High level tasks like ‘patrol’ and
first-person shooter released in 2005, the AI made use of      ‘defend’ were broken down into individual unit
an automated solver similar to the STRIPS system               assignments at a lower level. In this case it did not matter
                                                               to the high level operators which squad member was
assigned to a given tasks, this behavior was assigned to                A proxy implemented in Java is used to handle
lower level task operators that carried out the high level    all communication between ABL and Wargus. This
task goals. SquadSmart was an example of a HTN                involves retrieving game state (maps, units, resources
planning system applied to a first-person shooter             etc.) and sending action commands to the game engine.
environment (Unreal Tournament).                              (Moving units, training units, harvesting resources etc.)
                                                                        All this information is stored inside Working
3. METHODS                                                    Memory Elements (WME) which an agent can access
                                                              when it is making decisions. Each WME has a
          The first step in developing our planning system    corresponding sensor that is responsible for sensing
was the selection of a planner. For this project we chose     information from the game world at fixed time intervals
the JSHOP2 Java based implementation of the SHOP2[4]          and storing it inside the WME.
(Simple Hierarchical Ordered Planner) planning system                   Thus, an agent defined in ABL uses its
developed at University of Maryland. A planning domain        knowledge of the game world to make decisions, and acts
was developed to issue a sequence of commands that            accordingly. The sensors are responsible for maintaining
implement a basic build order to construct a basic town.      currency which makes the agent reactive to changes in the
In the SHOP2 planner, as implemented in JSHOP2, a             game world.
developer writes methods, operators, and axioms using a                 A JSHOP2 planning domain was developed to
range of logical expressions, variable assignments, and       implement ‘build-town’ task. Individual methods and
function calls. A SHOP2 problem consists of a high level      operators were written to implement sub-tasks like ‘build-
task to accomplish.                                           building’, ‘train-unit’, and ‘research-upgrade’ The
                                                              JSHOP2 problem specified the game state, generally ‘unit
 ; harvest resources                                          peon’ to indicate the player has a single worker unit. The
 (:operator
    ; head                                                    goal task was the ‘build-town’ task. The planner would
       (!harvest ?res_type)                                   then generate a plan (Figure 3) consisting of a sequence of
    ; pre                                                     grounded operators that represented commands to the
       ((unit peon free))                                     Wargus engine. These commands consisted of orders to
    ; delete list
       ((unit peon free))                                     train units, gather resources, and build specific buildings
    ; add list                                                in pursuit of the goal task of building a town.
       ((unit peon harvest ?res_type))                                  The generated sequence of game commands was
 )
                                                              then passed to the ABL interface to the Wargus engine.
            Figure 4: A JSHOP2 Operator                                 The planner output is parsed to get the individual
           SHOP2 methods are higher level tasks
consisting of a set of preconditions and associated sub-
task lists. The first precondition group for a given method
that is found to be true has its associated task list
executed. The task list for a method can contain other
tasks or operators, which are the lowest level of method
type. Operators take a set of variable parameters and have
a precondition that must be satisfied by a set of grounded
variables in order for the operator to be able to execute.
Each operator has an add and delete list. Logical
predicates from the add list are added to the world state,
and predicates from the delete list are removed from the
world state. In this way, much like in the STRIPS
automated planning system, a world state is transformed
to accomplish the goal task.
          The goal task developed for this planning
domain implements a build order for the game Warcraft              Figure 5: A town built with the JSHOP2 plan.
II. We had available to us a version of Warcraft II
implemented in the Stratagus[5] open-source gaming            commands which are behaviors defined in the ABL agent.
engine. The Stratagus implementation of Warcraft II,          A command like “harvest gold” triggers an ABL behavior
called Wargus, allows the AI that was released with the       that finds a worker and assigns it to mining gold. These
game to be replaced with a user developed engine. This        behaviors take care of the execution time decision making
allowed us to send our own commands to the game engine        necessary to successfully execute a plan. For example, a
and query the engine for game state.                          behavior that builds a barrack checks for the availability
                                                              of sufficient resources, places a hold on the resources
required (to ensure that the resources don’t get used up by   preconditions had to be defined carefully to ensure that
some other behavior), finds a worker to do the                the assumptions that the HTN planner was making were
construction, searches the map for a good place to build      correctly represented in the agent code.
the barrack, and then sends a command to the Wargus                     We faced a few problems while implementing
game engine to assign the selected worker to do the           the ABL behaviors for the actions that resulted in the
construction, releasing the hold on resources.                improvement of the agent, making it more robust. One of
          An important thing to observe is that the model     these was that resources allocated for use in one behavior
of the game available to the HTN is at a higher level of      were being used up inadvertently in another, which
abstraction compared to the ABL agent, and so, the            sometimes failed the other behavior. This was solved by
actions that the planner outputs are not as accurately        using the idea of ‘holding resources’ for an operation
specified as their execution in the game world demands.       while its underway. For example, while doing a build
Which is another advantage of using this approach since       task, a hold was placed on the resources required for the
ABL can pursue these actions using multiple behaviors         specific building, which was released later when the
depending on the current state of the world. Each of these    building was complete. This was possible because ABL
behaviors would have a different set of preconditions. The    allows us to create Working Memory Elements that are
HTN, in turn, does not need to worry about low level          not sensed, that is, they do not retrieve any data from the
operations such as selecting which of the various free        game world, instead, they serve as the local memory for
workers to assign to a task, or which specific location to    the agent. We used one such WME called a
select for constructing a building. The HTN concerns          ResourceHoldWME to maintain a hold on resources that
itself with the high level planning, and the ABL interface    were required for some task.
manages the execution of the planner’s commands.                        Another problem faced was the inability of the
                                                              agent to make correct decisions about the location of
4. RESULTS                                                    buildings while making decisions about their
                                                              construction. So, it often ended up trying to build a
                                                              building at an invalid location, which would result in the
Simple plans for a basic build order problem were             game engine’s failure to actuate the command, and hence
generated by the HTN planner. The planner was given           the behavior would fail without even realizing it. Another
the initial state of the game world, a domain theory in the   WME called TerrainWME was used in this case, to search
form of methods, operators and axioms, and a goal to          the map for a good location to do the construction. Unlike
pursue. It used these inputs to come up with a fully          the ResourceHoldWME, this WME has a sensor that is
ordered sequence of actions which were then executed by       responsible for periodically updating the WME using the
the ABL agent.                                                information it gets from the game engine.
          Originally it was planned to add features like                We are currently able to simulate basic build
time and resource tracking to the HTN. It was later           orders in the game world. So, given a town with one
determined that such features were counterproductive to       villager and a town hall, we can get to a point where the
the planner unless two way communication with the game        town is significantly bigger, with most of the basic
engine was also implemented. Otherwise, the planner           buildings (mills, farms, barracks, blacksmith) , a small
would make assumptions about the game state and               army (4 grunts) , and the basic weapon and shield
availability of resources that would not necessarily be       researches. Figure 5 shows the town as it looks like once
correct. The HTN that was eventually implemented does         the plan execution is complete. We tried out different
not plan at such a low level, instead relying on the ABL      planning problems, assuming the initial state to be
interface to track game state. The planner focuses on         different each time, which resulted in plans tailored to
producing a sequence of high level commands, such as          specific situations.
one person might describe a game strategy to another,                   The success that we achieved in this
without planning with respect to time or resource             experimental setting has motivated us to further explore
availability.                                                 this approach (see future work) and also leads us to the
          The execution phase was successful in that it       conclusion that it can be successfully implemented in
handled the run-time decision making correctly to             RTS games.
actually produce the behavior that the higher level
planning was expected to produce. Thus a train command
issued by the HTN planner translated to a host of tasks       5. FUTURE WORK
performed in execution time, such as checking for the
availability of the required building, waiting for the                 The JSHOP2/ABL/Wargus system developed
required resources, etc.                                      here successfully develops and implements a build order
          Every behavior had a set of preconditions that      plan. The overall flexibility of the system is rather limited
was checked before it was chosen for executions, these        and lacks some key features necessary to turn it into a
robust AI capable of playing against human opponents.          [6] Mateas, M. and Stern, A. In H. Prendinger and M.
Several additional features should be added to the overall     Ishizuka (Eds), Life-like Characters. Tools, Affective
system to facilitate such game play. Currently, the HTN        Functions and Applications, Springer, 2004
planner does not have two way communication with the
ABL agent. A complete AI using the HTN planner needs           [7] ABL Documentation.
to be able to receive game state from the gaming engine        http://mothership.cc.gt.atl.ga.us/abl/index.php/Main_Page
through the agent and incorporate this into its planning
approach, which is critical to do re-planning in the event
of plan failures.
          The planner also needs to be able to model the
passage of time and the accumulation and consumption of
resources. Ideally the planner should operate at the
strategic level, not concerning itself with the minutiae of
moving specific units around the gaming environment.
Still, the planner needs some level of knowledge of the
low level game state in order to make useful plans.
Knowing how long a given command will take to execute
and how many resources it will consume is important to
the planner’s ability to make efficient and effective plans.
Tracking features like time and resource consumption will
allow more detailed and optimized plans to be generated
          Plan invalidation and re-planning are crucial to
any good game AI implementation. The game should be
able to invoke the planner multiple times with a different
set of goals to plan for. The planner should be able to look
at the current state of the game and come up with an
optimal strategy for achieving the goals. For incorporating
this re-planning model, its important to have two way
communication between the HTN planner and the ABL
agent, which is the most obvious next step here. One of
the reasons the JSHOP2 system was chosen was that it
was written in Java, and allows external Java functions to
be called in the plan operators. Though not implemented
in this project, such function calls would allow the
planner to talk directly to the ABL agent in order to
execute a plan.

6. REFERENCES

[1] R. Fikes and N. Nilsson. 1971. STRIPS: A New
Approach to the Application of Theorem Proving to
Problem Solving. Artificial Intelligence, 2:189-208.

[2] J Orkin. 2006. Three States and a Plan: The AI of
F.E.A.R. Game Developers Conference 2006

[3] P. Gorniak and I. Davis. 2007. SquadSmart:
Hierarchical Planning and Coordinated Plan Execution for
Squads of Characters. AAAI Press. 14.

[4] D. Nau, T. Au, O. Ilghami, et. al. 2003. SHOP2: AN
HTN Planing System. Journal of Artificial Intelligence
Research 20 (2003) 379-404.

[5] Stratagus. http://www.stratagus.org/
You can also read