Improving Search by Utilizing State Information in OPTIC Planners Compilation to LP

Page created by Melvin Walters
 
CONTINUE READING
Improving Search by Utilizing State Information in OPTIC Planners Compilation
                                                                                to LP
                                                                         Elad Denenberg1 , Amanda Coles2 , and Derek Long2
                                                                       1
                                                                       University of Haifa, Abba Khoushy Ave 199, Haifa, Israel
                                                                  2
                                                                    King’s College London, 30 Aldwych, London, United Kingdom
arXiv:2106.07924v1 [cs.AI] 15 Jun 2021

                                                                     Abstract                                  that contain temporal as well as discrete and continuous
                                                                                                               change, the planner would require more complex ap-
                                           Automated planners are computer tools that allow au-
                                           tonomous agents to make strategies and decisions by deter-          proaches. For instance, SMTPlan formulating the problem
                                           mining a set of actions for the agent that to take, which will      into SAT (Cashmore et al. 2016), ENHSP using interval
                                           carry a system from a given initial state to the desired goal       relaxation (Scala et al. 2016), DiNo discretizing time
                                           state. Many planners are domain-independent, allowing their         (Piotrowski et al. 2016), and qtScoty using convex op-
                                           deployment in a variety of domains. Such is the broad family        timization       (Fernández-González, Karpas, and Williams
                                           of OPTIC planners. These planners perform Forward Search            2017).
                                           and call a Linear Programming (LP) solver multiple times               This work focuses on a family of planners that uses
                                           at every state to check for consistency and to set bounds on        Linear Programming (LP) solvers to schedule the plan.
                                           the numeric variables. These checks can be computationally
                                                                                                               This family includes COLIN (Coles et al. 2012), POPF
                                           costly, especially in real-life applications. This paper suggests
                                           a method for identifying information about the specific state       (Coles et al. 2010), and OPTIC (Benton, Coles, and Coles
                                           being evaluated, allowing the formulation of the equations to       2012). Planners from this family were used in a variety of
                                           facilitate better solver selection and faster LP solving. The       real-world applications including robotics, (Cashmore et al.
                                           usefulness of the method is demonstrated in six domains and         2015), Autonomous Underwater Vehicle (AUV) control
                                           is shown to enhance performance significantly.                      (Cashmore et al. 2014), Micro Aerial Vehicles (MAV) con-
                                                                                                               trol (Bernardini, Fox, and Long 2014) and space applica-
                                                                                                               tions (Coles et al. 2019; Denenberg and Coles 2018). These
                                                               1 Introduction                                  planners perform forward state-space search starting from
                                         Automated Planning (often called AI Planning) is concerned            the initial state. At each search state, an LP solver is used
                                         with formulating a sequence of actions that transforms a sys-         once to determine whether a consistent schedule for the
                                         tem from a given initial state into a desired goal state. One         plan exists. If no consistent schedule exists, the search
                                         strength of AI Planning is domain-independence: a single              branch can be pruned. If the state is consistent, the LP is
                                         general planner can plan in a wide range of different ap-             then used several more times to bound the numeric vari-
                                         plication domains. Examples of domains in which Planning              ables and thus tightening the space of applicable actions in
                                         was used include space (Chien et al. 2000), battery usage             this state, narrowing the search space ahead. Recent work
                                         (Fox, Long, and Magazzeni 2011), and software penetration             (Denenberg, Coles, and Long 2019) has shown that solving
                                         testing (Obes, Sarraute, and Richarte 2013). To facilitate ap-        LPs at every state multiple times may cause the search pro-
                                         plication in realistic problems, planners need to reason with         cess to become slow and ineffective.
                                         expressive models of the world. Such models can be tem-                  The contribution of this work is twofold:
                                         poral: finding a plan with timestamped actions, taking into
                                         account action durations and concurrency, and numeric: con-           1. We propose a better translation from the search state to
                                         sidering variables that change discretely, or continuously,              LP. Our new compilation allows the better selection of
                                         over time (Fox and Long 2003).                                           optimization tools required for the consistency check of
                                            To solve expressive problems that contain tempo-                      the problem, possibly eliminating the necessity of an LP
                                         ral constraints, the planner requires a scheduler – A                    solver.
                                         technique for assigning values to the action’s times-
                                                                                                               2. We propose methods for using information from a given
                                         tamps that would result in a valid plan. For example,
                                                                                                                  state to compute the current variable bounds in the search,
                                         the temporal planners SAPA (Do and Kambhampati
                                                                                                                  thus calling the solver fewer times and improve the per-
                                         2001) and Temporal Fast Downward (Bell et al.
                                                                                                                  formance of any planner from the OPTIC family.
                                         2010) utilize the decision epoch mechanism, Crikey
                                         3 (Bemporad, Ferrari-trecate, and Morari 2000) uses Sim-                The rest of the article is ordered as follows: Section 2 de-
                                         ple Temporal Network (STN). To solve hybrid problems                  scribes the problem and the current way OPTIC planners
m
                           Leg1                Leg2
                                                                                       distance
                                                                                                   Can start
                                                                                                   observing
                       Obs1     Obs2     Obs3           Obs4
             Obs1 start dist.          Obs4 start dist.                                                 Precondition

                Figure 1: The flying observer                                                                    time

                                                                                    Figure 2: Distance Requirement
solve it. Section 3 presents the suggested methodology. Sec-
tion 4 presents the performance of the suggested improve-
ments in a variety of different domains.
                                                                     to plan an Unmanned Aerial Vehicle (UAV) observation mis-
                                                                     sion. The UAV is required to fly legs over a defined stretch
                       2 Background                                  of land containing objects to be observed. Each leg is of dif-
2.1   Problem Definition                                             ferent length. Each observation has a different duration and
A temporal planning problem with discrete and linear con-            requires a different type of equipment. A target-start dis-
tinuous numeric effects is a tuple:                                  tance defines the area within the leg in which the observa-
                                                                     tion must take place. The observation can only occur when
                            hP, V, I, Gi                       (1)   the UAV has flown more than the target-start distance of that
                                                                     leg (flownl ≥ target-starto ). A continuous numeric effect
   where P is a set of propositions and V a set of numeric
                                                                     of the flyl action updates the distance flown so far in a leg:
variables. A state S is defined as a set of value assignments        dflownl
to the variables in P and V. I is such a set representing the           dt   = Vell , where Vell is the flight velocity.
initial state of the system. G is the goal: a conjunction of            Fig. 1 illustrates an instance of this domain: in this in-
propositions in P , and linear numeric conditions over the           stance, two legs are defined (marked in solid blue lines).
variables in V , of the form w1 v 1 + w2 v 2 + ... + wi v i {<       In each leg, two observations are required (marked in red,
, ≤, =, ≥ >}c (w1 ...wi and c ∈ R are constants). An action          pattern-filled lines). All observations have a target-start dis-
changes the values and carries the system from one state to          tance defined, but for clarity, only the starting distance of the
another. An action is defined as a tuple as well:                    first and last observations are shown.
                                                                        In order to perform an observation, a defined piece of
          hd, pre⊢ , eff ⊢ , pre↔ , eff ↔ , pre⊣ , eff ⊣ i     (2)   equipment needs to be calibrated and configured for a spe-
where d is the duration of the action constrained by a con-          cific observation. Once the observation is done, the equip-
junction of numeric conditions. pre⊢ and pre⊣ are conjunc-           ment needs to be released to become available for future ob-
tions of preconditions (facts and numeric conditions) that           servations. The domain comprises the following actions:
must be true at the start and end of the action, pre↔ are            take-offl(dur=5; pre⊢ ={on-ground,first-legl};
invariant conditions (preconditions that must hold through-          eff ⊢ ={¬on-ground, flownl =0}; eff ⊣ ={flyingl}),
out the action’s duration), eff ⊢ and eff ⊣ are instantaneous        set-coursel1,l2 (dur=1; pre⊢ ={donel1 , nextl1,l2 };
effects that occur at the start and end of the action. Such ef-      eff ⊢ ={¬donel1}; eff ⊣ ={flyingl2, flownl2 =0}),
fects may add or delete propositions p ∈ P (eff + , eff − )          flyl (dur=distancel /speedl ; pre⊢ ={flyingl };
or update a numeric variable v i ∈ V according to a linear           pre↔ ={flownl ≤distancel }; eff ⊣ ={donel , ¬flyingl }
instantaneous change:                                                eff ↔ ={dflownl /dt +=1}),
                                                                     configureo,e(dur=1; pre⊢ ={availablee , optionforo,e};
        u{+=, =, -=}w1 v 1 + w2 v 2 + ... + wi v i + c         (3)
                                                                     eff ⊢ ={¬availablee } eff ⊣ ={configuredforo, pendingo,e}),
where u, v i ∈ V are numeric variables, and c, wi ∈ R are            observel,o (dur=time-foro; pre⊢ ={configuredforo,
weights. eff ↔ is a conjunction of continuous effects that act       containsl,o, awaitingo, target-starto≤flownl };
upon numeric variables throughout the action’s duration. In          pre↔ ={flyingl }; eff ⊢ ={¬awaitingo}; eff ⊣ ={observedo}),
this work, we assume all change is linear and is of the form:        releaseo,e (dur=1; pre⊢ ={pendingo,e};
                          dv                                         eff ⊢ ={¬configuredforo, ¬pendingo,e}; eff ⊣ ={availablee })
                             {+=, =, -=} c                     (4)      The target distance precondition and temporal constraints
                          dt
                                                                     force the observe actions to fit within the fly action. The
where c ∈ R is a constant.                                           meaning of the precondition is illustrated in Fig. 2: The blue
   The planner is required to find a set of actions in A and         line is a depiction of the distance change as the UAV flies
their schedule, that would carry the system from the initial         over the leg. The dashed red line is the precondition signi-
state to the goal state.                                             fying the distance required for the start of the observation.
                                                                     When the distance reaches the value required in the precon-
2.2   Running Example                                                dition, the observation can start. Notice that this problem is,
(Denenberg, Coles, and Long 2019) first introduced this ex-          in fact, temporal, the numeric constraint can be easily con-
ample; it is an anonymized model of a real-life problem. In          verted to a temporal one depending on the manifestation in
this domain, named flying observer, the planner is required          the temporal state can be seen in Fig. 3.
Fly Leg 1                Fly Leg 1                     ǫ is a small positive constant. The temporal and ordering
          Configure                   Configure
                                                                         constraints, formulated as Eq. (5), constitute a STN and the
                       Observe                           Observe
                                                                         planner uses a Simple Temporal Problem (STP) solver to
                                                                         check for temporal consistency. If the STP can solve all
      Distance requirement        Distance requirement                   equations (i.e., assign values to all time-steps such that the
                                                                         equations are valid), then the STP was able to find a sched-
          (a) Possible Plan          (b) Impossible Plan                 ule, and the state is temporally consistent.
                                                                            If S ′ is propositionally and temporally consistent, the
   Figure 3: Durative Meaning of Distance Requirement                    planner will compile the problem into an LP and then check
                                                                         for numerical consistency. Each step of the plan i is given
                                                                         an LP variable ti . Each variable v ∈ V is given three LP
2.3   OPTIC and Forward Search                                           variables for each step i: vi , vi′ , and δvi . vi denotes the value
The OPTIC family of planners is based on the methodology                 of v just before applying the action in step i, and vi′ is the
of converting a state to an LP described in COLIN paper                  value right after the action’s application. The planner applies
(Coles et al. 2012). Here we survey that methodology.                    Eq. (3) to the affected variable thus:
   To find a path from the initial state to the goal, OPTIC
performs Forward Search. Starting from the initial state, OP-                     vi′ = vi + w1 u1i + w2 u2i + ... + wn uni + c          (6)
TIC branches over applicable actions, exploring partially-               where v, u ∈ V are the numerical variables,        uni
                                                                                                                            is the value
ordered but un-time-stamped sequences of instantaneous ac-               of the nth numerical variable at step i. wn s are weights.
tions. Durative actions are converted to a pair of instanta-                δvi is the value of the sum of all changes currently acting
neous snap-action. Snap-actions mark the start (A⊢ ) and end             on v. Recall; in this work, each continuous effect is defined
(A⊣ ) of a durative action A. A⊢ has preconditions pre ⊢ A               by a constant. When a new action is applied at step prev, the
and effects eff ⊢ A; A⊣ is analogous. We define the set Ainst            contribution of its effect is added to δvprev ; when an effect
to contain all instantaneous actions in A, including snap-               ends at step i, the value is removed from δvi .
actions.                                                                                   
   A state S in the search can be thought of as a set contain-                                δvi−1 + cA         ifAi = A⊢
                                                                                    δvi =                                            (7)
ing: propositions (S.p ⊆ P ) that are true in S, and upper                                    δvi−1 − cA         ifAi = A⊣
(S.max (v)) and lower (S.min(v)) bounds on the value each                The value of vi denoting the value just before the application
variable in V can hold in S. In the initial state, all variables         of the action at step i can be computed thus:
have max (v) = min(v) = vI the value of v specified in the
initial state; max (v) and min(v) will only differ from each                            vi = vprev + δvprev (ti − tprev )                (8)
other in following states if a durative action with a continu-           where i is the current step index and prev is the index of
ous effect has acted on the variable.                                    the last step in which the value of v was computed. Note; if
   An action is deemed applicable if all its propositional               the calculation of the value of v is required for an invariant
invariants are satisfied by S.p and if all numerical invari-             in step j, the next time v will be computed the time interval
ants can be satisfied by any value between S.max (v) and                 will be ti − tj , regardless whether δv is changed or if the
S.min(v). The planner compiles a list of all open applicable             action is acting on v.
(named openlist). Search proceeds by popping the first state                If the STN or LP finds that the state S ′ is inconsistent (i.e.,
from the openlist: in our work, we use WA* (W=5), so sort                there is no solution, no schedule that would enable achiev-
the openlist by h(S) + 5.g(s), using the temporal-numeric                ing the state), it is pruned, and the search will not advance
RPG heuristic of COLIN (Coles et al. 2012).                              down that branch. If the state is consistent, then the LP solver
   All successors S ′ of S are generated by adding or deleting           is called two more times for each variable v to optimize it
all propositions in eff +         −
                        a and eff a respectively, and applying           and compute the new max (v) and min(v) using the stan-
all discrete numeric effects to both max (v) and min(v) for              dard temporal/numeric relaxed planning graph heuristic of
all v ∈ V affected by eff num
                            a   . This guarantees the S ′ to be          Colin (Coles et al. 2012). It is then inserted into the openlist,
propositionally consistent.                                              providing h(S ′ ) 6= ∞, i.e., the heuristic does not indicate S ′
   OPTIC then transforms all the temporal constraints to the             is a dead-end.
following form:                                                             Note that when no numerical change is present, it is suf-
                        Lb ≤ tj − ti ≤ U b                         (5)   ficient to use STN to prove consistency. When continuous
                                                                         changes or numeric constraints are present, the LP solver is
where Lb, U b ∈ R are the upper and lower bounds of a                    required for the proof of consistency.
time interval. OPTIC also adds the necessary ordering con-                  In addition to the consistency check, the LP is also used
straints to the plan. The action that has just been applied is           to compute the bounds max (v) and min(v) of S ′ . If all con-
ordered after the following: last actions to add each of its             tinuous actions on v have ended, then the last defined value
preconditions, actions whose preconditions it deletes, and               vi can be maximized and minimized to compute the bounds.
actions with numeric effects on variables it updates or refers           If v had a continuous effect start but has not yet ended, an-
to in preconditions/effects. All ordering constraints are of             other time variable is added to the LP denoted tnow , repre-
the form tj − ti ≥ ǫ, where tj ,ti are the times at which                senting the latest timestamp. For each variable with an ac-
the new and existing action must occur, respectively, and                tive effect, a variable vnow is added. tnow is ordered after all
Step Action           variables    constraints                        comment
0    TakeOff              t0       ≥0
                                                                                                   3    Informed Selection of Solver for
                          t1       −t0 ≥ ǫ                            Step1 afer Step0                      Consistency and Update
                     f lown l01    =0                                 Initial Assignlemt
1   Fly⊢l0                                                                                  The OPTIC methodology described in the previous chapter
                                   f lown l01                         Value after action
                     f lown l0′1
                                   ≤ distance l0                      Invariant             was developed to accommodate the general case in which
                         t2        −t1 ≥ ǫ                            Step2 after Step1
                                   = f lown l0′1 + 1 ∗ (t2 − t1 )     Value before action   hybrid planning is to be done, covering all possible state
                     f lown l02    ≥ target-start o1                  Start precondition    types. The planner uses the general representation both in
2   Observe⊢o1,l0                  ≤ l0 dist                          Invariant
                                   = f lown l02                       Value after action
                                                                                            consistency check and in the variable update. It was shown
                     f lown l0′2   ≥ T arget dist o1                  Start precondition    in (Denenberg, Coles, and Long 2019) that the general ap-
                                   ≤ l0 length                        Invariant
                                                                                            proach could, at times, lead to slow solving.
                                   −t2 ≥ ǫ                            Step3 after Step2
                         t3
                                   −t2 ≤ time − f oro1                Action duration          In this section, we propose two new methods for identi-
                                   = f lown l0′2 + 1 ∗ (t3 − t2 )     Value before action   fying two specific cases that frequently arise in the state in
3   Observe⊣o1,l0    f lown l03
                                   ≤ distance l0                      Invariant
                                   = f lown l03                       Value after action    real-life problems. In such cases, the use of an LP solver can
                     f lown l0′3
                                   ≤ l0 length                        Invariant             be made redundant, facilitating faster solving. In other cases,
                         tnow      −t3 ≥ ǫ,−t2 ≥ ǫ,−t1 ≥ ǫ,           After All Steps       information from the current state may be injected into the
4   now
                    f lown l0now   = f lown l0′3 + 1 ∗ (tnow − t3 )   Value Now
                                                                                            problem definition to allow for faster solving.
               Table 1: LP Equations of a Partial Plan                                         The first method examines the latest added action that car-
                                                                                            ried the system from state S to state S ′ . The second involves
                                                                                            a conversion of specific numeric constraints and effects into
                                                                                            STN form. Finally, we describe how both these processes
other time steps, and vnow is calculated using Eq. (8). The                                 can facilitate a more effective update of variable bounds.
LP solver then minimizes and maximizes vnow to find the
possible bounds.
                                                                                            3.1 Observing the Latest Action
                                                                                            As stated previously, OPTIC solvers attempt to prove incon-
   (Denenberg, Coles, and Long 2019) showed that though                                     sistency with an STP first. Then if continuous numeric ef-
it was previously thought that calling the LP solver is ben-                                fects and numeric constraints are present in the current state,
eficial both for state consistency and for the variable bound                               the planner compiles the problem as an LP. OPTIC planners
update, in large real-life domains, the calls to the LP solver                              treat each state in the most general way: in the general case,
may slow the search down. The premise was that the LP                                       every action may render the new state inconsistent. How-
problems solved are small, and therefore the call to the off-                               ever, using knowledge about previous states, some instances
the-shelf solver would not be computationally expensive.                                    in which the LP solver can be avoided may be found.
However, it was shown that in some real-life applications,                                     Consider the state TakeOff⊢ , TakeOff⊣ , Flyl⊢1 ,
this was not true: when the state contained many actions                                    configure⊢o1,e2 : The partial plan contains continuous
and multiple variables, the LP grew large and the solving                                   numeric effects on the variable f lownl0 . Therefore, to
of which became slow.                                                                       prove this partial plan consistent, OPTIC requires an LP
   Table 1 demonstrates the process of converting a state into                              solver. However, since the planner is performing forward
an LP. The table shows the LP equations for the partial-plan:                               search, to reach this state, the planner must have been
take-off, fly⊢l0 , observe⊢o1,l0, observe⊣o1,l0 (for conciseness we                         in a previous state, which it found consistent: TakeOff⊢ ,
assume that take-off is instantaneous and no configure ac-                                  TakeOff⊣, Flyl⊢1 . The configure action that is added does not
tions are required).                                                                        require the value of f lownl0 , and its effect is propositional.
                                                                                               Assume the state S is propositionally, temporally, and nu-
   The first action receives a single time variable t0 . The                                merically consistent. The new state S ′ , which is reached
                     ⊢
second action F lyl0   , receives a time variable t1 , which is                             from S by addition of action A ∈ Ainst cannot be rendered
ordered after t0 , and the value of f lown is computed. The                                 numerically inconsistent if A does not contain any numer-
value before the fly action is the initial assignment, which                                ical effects or constraints. Furthermore, since OPTIC only
is 0. Since there is no instantaneous effect on the action’s                                examines states S ′ that are generated to be propositionally
start, the value just after the application of the action is the                            consistent, the state only has to be tested only for temporal
same. The invariants on the value are enforced just after the                               consistency.
beginning of the fly action. The same process is repeated                                      Thus, if an added action A contains only propositional or
for the next action Observe⊢o0 : assigning a time variable for                              temporal constraints and effects, the state S ′ can be deemed
the action, calculating flown before and after the application                              consistent by using the STP, and an LP is not required.
of the action, and enforcing invariants. Note the ordering                                     Note that if this test has determined that S ′ is consistent,
constraints formulated as temporal constraints in all actions.                              there is no need for a numerical variable bound update, as
Also, the f lown variable is computed at each step, and its                                 those do not change.
value depends on the previous step.
   The next section will describe a method for changing                                     3.2 Reformulation of LP
Eq. (8) in a way that would allow calling the LP solver fewer                               During the search, OPTIC compiles the state into an LP, as
times, and compile certain problems containing numerical                                    depicted in the previous section. This transformation is done
constraints and change as STP.                                                              in a step-wise manner, meaning each step is transformed into
Step Action           variables    constraints                        comment
0    TakeOff              t0       ≥0
                                                                                            Notice that at step i, all the variables on the right-hand side
                          t1       −t0 ≥ ǫ                            Step1 afer Step0      of Eq. (11) are known and are constant. Therefore, Eq. (11)
1   Fly⊢l0
                     f lown l01    =0                                 Initial Assignlemt    can be formulated for each step i as a constraint of the form
                                   f lown l01                         Value after action
                     f lown l0′1
                                   ≤ distance l0                      Invariant             of Eq. (10) as long as vpef f is constant. Notice that the nu-
                         t2        −t1 ≥ ǫ                            Step2 after Step1     merical constraint in Eq. (10) is converted to a temporal con-
                                   = f lown l0′1 + 1 ∗ (t2 − t1 )     Value before action   straint. If all constraints in the state can be converted thus,
                     f lown l02    ≥ target-start o1                  Start precondition
2   Observe⊢o1,l0                  ≤ l0 dist                          Invariant             then the problem is, in fact, an STN, and the LP solver is not
                                   = f lown l02                       Value after action    required.
                     f lown l0′2   ≥ T arget dist o1                  Start precondition
                                   ≤ l0 length                        Invariant                The example given above can be converted in such a way.
                         t3
                                   −t2 ≥ ǫ                            Step3 after Step2     The constraint f lown l02 > T arget dist o1 can be con-
                                   −t2 ≤ time − f oro1
                                   = f lown l0′1 + 1 ∗ (t3 − t1 )
                                                                      Action duration
                                                                      Value before action
                                                                                            verted to t2 − t1 >(T arget dist o1−f lown l01 ) /1 . All numeric
3   Observe⊣o1,l0    f lown l03                                                             constraints in this partial plan can be converted in the same
                                   ≤ distance l0                      Invariant
                                   = f lown l03                       Value after action    way. This means that even though continuous numerical ef-
                     f lown l0′3
                                   ≤ l0 length                        Invariant
                         tnow      −t3 ≥ ǫ,−t2 ≥ ǫ,−t1 ≥ ǫ,           After All Steps       fects and numerical constraints are present, this problem is
4   now
                    f lown l0now   = f lown l0′1 + 1 ∗ (tnow − t1 )   Value Now             temporal, and can be solved with an STP.
                                                                                               If all numerical constraints were converted to temporal,
             Table 2: New LP Equations of a Partial Plan                                    the planner could determine consistency using the STN.
                                                                                            However, updating the bounds is still required. Observing
                                                                                            Eq. (8), we note that since vpef f is constant, the maximum
a set of equations, and each step builds on the previous one.                               and minimum of vi are dependent on the interval
No consideration is taken as to what effect a step has on a                                                                           
variable; as long as the value of the variable is required, its                                                     Ti = ti − tpef f                     (12)
value will be computed, and the next step will use said com-                                The minimal size of the interval is zero. The maximum may
puted value. The notation tprev denotes the previous step                                   be drawn from the state: if another constraint exists such that
at which the value of variable v was calculated, and the next                               limits ti or if tpef f is a start action beginning an effect, and
step i > prev that computes the value of v will use the value                               is the only continuous numeric effect present, the maximal
stored in vprev .                                                                           interval is the duration of the action. If no such value can be
   Here we suggest making a distinction between two types                                   derived from the problem, then the interval is set to infinity.
of steps that affect the value of v: A step containing a start or                           If δvi > 0 then the minimum value for vi is when Ti = 0
end of a continuous numeric effect and a step that does not.                                and is min (vi ) = vpef f ; and the maximum is max (vi ) =
The later is a step containing numerical constraints on v but                               vpef f + δvef f Ti . The case in which δvi < 0 is analogous.
does not change the value of δv. To distinguish between the                                 This update method does not require an external solver and,
two, we propose two notations ef f for steps that start or end                              therefore, very fast.
an effect and const for steps containing only constraints.
   Using the new notation tpef f would be the last time in                                  3.3 Efficient Variable Update
which v had an effect start or end, vpef f would be the value                               (Denenberg, Coles, and Long 2019) have shown that the
of v calculated at that time point. Then Eq. (8) is written                                 variable update is often the task that is most computation-
thus:                                                                                      ally expensive as it requires several calls to the LP solver,
                vi = vpef f + δvpef f ti − tpef f             (9)                           depending on the number of numerical variables in the par-
The conversion of the equations is demonstrated in the pre-                                 tial plan. The previous section has detailed several cases in
vious example: this is presented in Table 2. Notice the dif-                                which the variable update can be avoided or done without
ference between Table 1: The effect acting on the variable                                  the use of an LP. Here we attempt to facilitate faster LP solv-
f lown l0 started in step 1, and therefore the computation of                               ing in the update phase in case it is still required.
f lown l02 f lown l03 is always done with respect to step 1.                                   The LP solver may use one of several optimization meth-
   If no continuous numeric actions have been acting on                                     ods; however, the selection of the method, as well as the
v before the last action at tpef f , then the value of vpef f                               method speed, depend on whether the feasible space is
is a constant. This is seen in our example. The value of                                    bounded and in which direction. We would like to supply
f lownl 0′1 is the same as the initial assignment. This conver-                             the solver with information about variable boundlessness.
sion extremely useful when all constraints are of the form:                                 We cannot use the bounds from the previous state S as those
                                                                                            might have changed by action A.
                                       v≤C                                          (10)       Therefore, we again examine the snap action A added in
where v ∈ V and C ∈ R. This constraint is written as a                                      the last step that carried the system from previous state S
less than-equal-to constraint. Without loss of generality, this                             to the current state S ′ . Recalling state S contains bounds on
includes all constraints that have a single variable on one                                 v (S.max (v) andS.min(v)). We wish to determine whether
side and a constant on the other. If vpef f is constant, then                               the action A is capable of expanding the limits of v, caus-
using Eq. (9) when enforcing Eq. (10) at step i, we can write                               ing the interval [min(v), max (v)] to grow if A causes the
                                                                                           bounds to contract, or if the interval retains its size but shifts.
                                C − vpef f                                                     If an instantaneous numeric effect exists, then the bounds
                 ti − tpef f ≤                            (11)
                                  δvpef f                                                   on the latest defined vi (the step at which all continuous ef-
Observations
             Instance Observations Legs
                                          Required in Goal
                                                                    server that (Denenberg, Coles, and Long 2019) has pre-
                1         10       28            4                  sented as a domain stemming from the industry. Two dis-
                2         15       38            6                  tinct models of this physical domain were tested: The first is
                3         20       48            8
                4         25       58            10
                                                                    identical to the running problem. The second variant of this
                5         30       68            12                 domain has a requirement that the observer is flying before
                6         40       78            14                 configuring or releasing pieces of equipment.
                7         40       88            16                    The problems in the first variant all require a single obser-
                8         40       88            18
                ..                                ..                vation to be made in each leg. The instances are described in
                 .                                 .                Table 3.
                17        40       88            36
                                                                       In all instances of the second variant, six observations are
       Table 3: Single Observation Per-Leg Instances                defined for each leg, where the first and last leg both share
                                                                    an observation that cannot be performed in the first leg. The
                                                                    fact the observation cannot be used in the first leg may lead
fects have ended, or vnow ) can either be shifted due to an         the planner down a branch, which would not be useful. The
increase or decrease. An assignment would make the latest           first instance contains two legs, the second three, and so on.
value a constant (max (v) = min(v) =Assigned value). If             (Denenberg, Coles, and Long 2019) showed that these prob-
snap action A contains a continuous numeric effect on v,            lems are challenging for the planner.
then the bounds of v may expand. Therefore, when solving            Factory Floor Quality Assurance (QA) This domain de-
the LP to update the bounds of v, the bounds of vnow are            scribes QA sampling planning on a factory floor. A machine
defined as [−∞, ∞]. If snap action A does not contain a             produces parts at a specific rate; at some point, several parts
continuous numeric effect on v or an instantaneous effect on        are taken for sampling. The number of produced parts is lim-
v, then the bounds can only contract. Therefore, the bounds         ited for storage reasons. This domain is similar to the pre-
from S are passed to the LP solver, leading to a smaller            vious domain (the flown distance is analogous to produced
search space and faster update.                                     parts); only here, we limit the total number of parts that can
                                                                    be produced, adding a global numerical invariant condition.
                      4 Evaluation                                  In this domain, too, we have two variants - one allowing the
In this section, we examine the performance of the pro-             calibration of measuring machinery before the beginning of
posed changes in six different domains, which stem from             the manufacturing process, and one that does not. As in the
four physical world examples. The domains brought here              previous domain, the second variant is used for instances
were chosen to demonstrate both the strengths and the weak-         that require multiple samples of the same part.
nesses of the contribution: The first four illustrate the fam-
                                                                    Single Rover IPC Domain and Linear Generator The
ily of cases in which the contribution is meaningful, shed-
                                                                    Single Rover domain is a standard domain taken from the
ding light not only to the benefits of our suggestion but also
                                                                    standard IPC 3, and (Coles et al. 2012) used it to demon-
to ways in which to better model a problem for any OP-
                                                                    strate the hybrid planning mechanism.
TIC family planner. The other two domains are brought to
                                                                        The Linear Generator is yet another standard domain that
demonstrate cases in which the contribution is not useful, in
                                                                    was widely used in previous papers. It describes a generator
an attempt to assess the price of using it.
                                                                    consuming fuel to generate energy. The generator may be re-
   As this work’s contribution is in improving the OPTIC
                                                                    fueled from auxiliary tanks. All actions affect the main-tank
family of planners, we compare our suggestions with the
                                                                    fuel quantity, and the fact that all actions contain numerical
performance of the latest implementation of OPTIC, and ap-
                                                                    change and constraints it was expected OPTIC-II to show
ply our improvements to the same code base, and name it
                                                                    little to no improvement in solving problems from this do-
OPTIC with Injected Information (OPTIC-II).
                                                                    main.
   The results and results of other additional tests on various
IPC domains are publicly available at extra data1 .                 4.2 Results
   All tests were performed on an Intel i7-8550U
CPU@1.80GHz×8 with 8GB RAM. The PDDL files of the                   Implementing the changes suggested in this work requires
domains, the respective problems, and result data will be           additional tests before building an LP. These tests and
published on a public website. Runtime results are presented        checks come with a computational price. However, as can
in Table 4. “X” in the table means the runtime was over             be seen in Table 4, that price is not high. In the Linear gen-
1000s. Since the domains and processes are deterministic            erator domain and the single rover, none of the changes are
no statistical analysis is required, as results may only vary       useful. In most states, the information from the applied ac-
due to CPU noise.                                                   tion cannot reduce the computations, and the problem is not
                                                                    convertible to temporal. This is because in many states, for
4.1   Domains                                                       instance, there are often two continuous linear actions op-
Flying Observer We demonstrate OPTIC-II on a set of                 erating on the same variable. The changes not being useful
previously published domains. The first is the flying ob-           mean that all checks will be false when running OPTIC-II,
                                                                    and the LP solver will be used just as in OPTIC. The results
   1                                                                show that OPTIC-II indeed performs a bit slower when used
     This is currently given in the extra data. When paper is ac-
cepted it will be published publicly online                         in these domains.
Flying Observer
                                                            No limit on configure                                                                 Configure only when flying
Instance       1      2       3        4      5      6     7       8        9    10     11     12       13       14       15      1         2         3      4      5       6      7      8
OPTIC         0.52   6.58   90.15   172.82 268.59 808.65 X 33.15 55.62 66.42 82.5 91.19               456.68   831.02     X      6.99     35.96    105.62 213.33 237.86 538.46     X      X
Sec 3.3       0.55   5.19   76.23   164.19 257.38 795.11 X 31.22 52.16 63.58 79.71 89.87              445.21   825.16     X      6.31     34.09     99.76 198.00 224.11 504.43     X      X
Sec 3.1       0.19   2.91   37.11   85.15 149.76 578.61 X        9.32 13.63 19.84 26.14 26.59         192.26   441.77   924.58   1.69      7.91    27.62 58.88 68.15 442.05      332.87   X
Sec 3.1+3.2   0.17   1.83   31.51   77.38 141.17 560.44 X        8.30 11.83 15.88 20.91 22.07         163.18   387.70   854.78   1.17      6.60    22.58 49.40 56.87 93.32       236.55   X
Sec 3.1+3.3   0.21   2.31   35.51   84.23 149.12 580.07 X        9.34 13.70 19.81 22.89 26.53         190.98   434.36   924.04   1.64      7.81    27.35 57.95 65.76 441.35      336.51   X
OPTIC-II      0.15   1.88   32.91   79.63 140.55 557.89 X        8.29 11.07 14.91 21.82 24.50         167.22   391.78   857.92   1.23     5.98     20.40 49.15 56.91 91.02       241.99   X
                                                                              Factory Floor QA
                                                     No limit on calibrate                                                  Calibrate only when manufacturing
Instance       1      2       3        4      5      6     7       8        9    10     11     12      13        1        2      3       4      5     6       7           8
OPTIC         0.47   6.34   44.70   293.68 547.98 828.47 X 30.90 63.54           X 594.82 906.96       X        6.35    36.64 122.41 267.91 314.17 452.16 880.83          X
Sec 3.3       0.45   6.89   53.80   302.40 535.98 807.05 X 26.95 48.61           X 526.68 814.11       X        5.83    34.54 118.00 254.39 294.76 424.92 827.24          X
Sec 3.1       0.33   7.09   53.38   275.61 478.96 733.08 X 28.64 59.74           X 570.86 872.45       X        5.36    30.56 92.71 205.55 232.52 351.25 726.83           X
Sec 3.1+3.2   0.33   5.80   43.69   254.10 463.60 730.79 X 24.50 54.89           X 550.04 874.37       X        5.18    29.23 90.89 201.99 234.87 347.51 725.29           X
Sec 3.1+3.3   0.35   6.74   47.87   256.16 451.83 714.85 X 26.82 56.17           X 538.03 831.99       X        4.45    24.65 85.27 192.05 219.80 325.14 690.79           X
OPTIC-II      0.33   6.92   47.19   253.12 465.30 722.89 X 26.65 56.25           X 548.36 843.43       X        4.19    22.36 78.29 191.23 215.33 327.49 687.34           X
                                                                  Single Rover
 Instance      1    2    3             4      5      6     7       8        9    10     11     12       13    14          15      16       17
OPTIC         0.08 0.04 0.09         0.18 0.35 0.88 7.80 12.03 0.29 374.94 0.22 0.06                  225.89 2.98        0.66    36.23     X
OPTIC-II      0.07 0.04 0.10         0.18 0.35 0.96 8.13 13.09 0.23 375.69 0.23 0.06                  221.19 3.04        0.65    36.68     X
                                      Linear Generator
  Tanks        10 20 30               40     50     60    70      80       90   100
OPTIC         0.61 3.28 12.14       130.89 554.31 48.14 91.27 160.68 300.70 X
OPTIC-II      0.63 4.18 15.36       127.18 559.67 47.71 91.58 159.89 301.27 X

                                                                                Table 4: Results

   In the four domains from the Flying Observer and QA,                                        tests slightly slow the search process.
OPTIC-II performs far better than OPTIC. We present the
results of all three suggested changes, the contribution of                                    Reformulating the LP The last contribution we exam-
each change separately, and possible combinations to better                                    ine is the one described in Section 3.2. For implementa-
understand the results.                                                                        tion reasons, this change was only applicable to the previous
                                                                                               change. This change was useful in both the flying observer
Efficient Variable Update In Section 3.3, we suggest us-
                                                                                               domains. When applied to these domains, all problems were
ing information about the current action to update the vari-
                                                                                               converted to temporal, and no LPs were solved. The result
ables’ bounds efficiently. This improves the LP solving in
                                                                                               was a significant improvement in planning speed.
the variable update stage of the search. Therefore, we expect
this change to be more prominent when the planner handles                                         In the QA domains, only some of the states were con-
large plans that result in larger LP problems, and when a                                      verted from numerical to temporal, and therefore the change
large amount of variables needs to be updated.                                                 was less prominent. The planner was able to identify that the
   In Table 4, the line named “Sec. 3.3” presents OPTIC                                        requirement for the total amount of produced parts was the
performance when only this change is present in the four                                       variable that prevented the conversion. If the domain expert
domains taken from (Denenberg, Coles, and Long 2019). It                                       believes the total amount limit cannot be reached, he can re-
can be seen that this change contributes to the performance                                    move that constraint from the domain and allow for much
is more prominent in higher instances where the plan is, in-                                   faster Planning. Thus, using this method, we can improve
deed, quite long and contains many variables.                                                  performance, and perform knowledge engineering, present-
   Though this change’s contribution is not visible in simple                                  ing the model expert with possible ways to improve the plan-
academic domains, it helps scale large problems such that                                      ning process.
arise in a real-life domain and, therefore, useful.
Observing the Latest Action In Section 3.1, we sug-                                                                              5       Conclusions
gested using information about the current action to decide
whether an STP can be used to prove temporal consistency                                       This work presented three methods for improving the search
even though a numeric change is present in the state. This                                     in the OPTIC family of planners: injecting state information
change may lower the number of LPs that will be solved                                         into the consistency check, injecting state information into
during the search and, therefore, speed up the search. The                                     the variables bound update, and reformulating the LP as an
runtime results of OPTIC running only this improvement is                                      STN. These suggested changes to the planner can improve
dubbed “Sec. 3.1” in the table.                                                                LP’s solving time or, at times, help avoid using an LP solver
   In the first four domains, we see a significant reduction of                                altogether. These changes were shown to be relatively cheap
LPs solved in the search2 , which results in a faster search.                                  and useful in many cases. These changes apply to a board
In the last two domains, the reduction is minimal, and so the                                  and a popular family of planners.
                                                                                                  Future work would include additional tests and profiling.
    2                                                                                          Also, exploiting state information in other forward search
      The number of states proved consistent with an LP out of the
visited states is given in the extra data, and will be published online                        planners can be examined.
References                                 and Planning Applications Workshop (SPARK), 2019 ICAPS
Bell, P. C.; Delvenne, J.-C.; Jungers, R. M.; and Blondel,         Workshop.
V. D. 2010. The continuous Skolem-Pisot problem. Theo-             Do, M. B.; and Kambhampati, S. 2001. SAPA: a Domain-
retical Computer Science 411(40-42): 3625–3634.                    Independent Heuristic Metric Temporal Planner. In Euro-
Bemporad, A.; Ferrari-trecate, G.; and Morari, M. 2000. Ob-        pean Conf. on Planning (ECP).
servability and controllability of piecewise affine and hy-        Fernández-González, E.; Karpas, E.; and Williams, B. C.
brid systems. IEEE Transactions on Automatic Control 45:           2017. Mixed Discrete-Continuous Planning with Convex
1864–1876.                                                         Optimization. In AAAI Conference on Artificial Intelligence.
Benton, J.; Coles, A.; and Coles, A. 2012. Temporal                Fox, M.; and Long, D. 2003. PDDL2.1: An extension to
Planning with Preferences and Time-Dependent Continu-              PDDL for expressing temporal planning domains. Journal
ous Costs. In Proceedings of the Twenty-Second Inter-              of artificial intelligence research 20: 61–124.
national Conference on International Conference on Auto-           Fox, M.; Long, D.; and Magazzeni, D. 2011. Automatic
mated Planning and Scheduling (ICAPS), 2–10.                       Construction of Efficient Multiple Battery Usage Policies.
Bernardini, S.; Fox, M.; and Long, D. 2014. Planning the be-       In The Twenty-Second International Joint Conference on Ar-
haviour of low-cost quadcopters for surveillance missions.         tificial Intelligence (IJCAI), 2620–2625.
In Twenty-Fourth International Conference on Automated             Obes, J. L.; Sarraute, C.; and Richarte, G. 2013. Attack Plan-
Planning and Scheduling (ICAPS), 445–453.                          ning in the Real World. The Computing Research Repository
Cashmore, M.; Fox, M.; Larkworthy, T.; Long, D.; and Mag-          (CoRR) abs/1306.4044.
azzeni, D. 2014. AUV mission control via temporal plan-            Piotrowski, W.; Fox, M.; Long, D.; Magazzeni, D.; and Mer-
ning. In 2014 IEEE International Conference on Robotics            corio, F. 2016. Heuristic Planning for PDDL+ Domains. In
and Automation (ICRA), 6535–6541.                                  Proceedings of the Twenty-Fifth International Joint Confer-
Cashmore, M.; Fox, M.; Long, D.; and Magazzeni, D. 2016.           ence on Artificial Intelligence (IJCAI), 3213–3219.
A Compilation of the Full PDDL+ Language into SMT. In              Scala, E.; Haslum, P.; Thiebaux, S.; and Ramirez, M. 2016.
Twenty-Sixth International Conference on Automated Plan-           Interval-Based Relaxation for General Numeric Planning. In
ning and Scheduling(ICAPS).                                        Proceedings of the Twenty-Second European Conference on
Cashmore, M.; Fox, M.; Long, D.; Magazzeni, D.; Ridder,            Artificial Intelligence (ECAI), 655–663.
B.; Carreraa, A.; Palomeras, N.; Hurtós, N.; and Carrerasa,
M. 2015. ROSPlan: Planning in the Robot Operating Sys-
tem. In Proceedings of the Twenty-Fifth International Con-
ference on International Conference on Automated Planning
and Scheduling (ICAPS), 333–341.
Chien, S.; Rabideau, S.; Knight, R.; Sherwood, R.; Engel-
hardt, B.; Mutz, D.; Estlin, T.; Smith, B.; Fisher, F.; Barrett,
T.; Stebbins, G.; and Tran, D. 2000. ASPEN - Automated
Planning and Scheduling for Space Mission Operations. In
Space Ops.
Coles, A.; Coles, A.; Fox, M.; and Long, D. 2010. Forward-
chaining partial-order planning. In Twentieth Interna-
tional Conference on Automated Planning and Scheduling
(ICAPS).
Coles, A.; Coles, A.; Fox, M.; and Long, D. 2012. COLIN:
Planning with Continuous Linear Numeric Change. Journal
of Artificial Intelligence Research 44: 1–96.
Coles, A.; Coles, A.; Martinez, M.; Savas, E.; Keller, T.;
Pommerening, F.; and Helmert, M. 2019. On-board plan-
ning for robotic space missions using temporal PDDL. In
11th International Workshop on Planning and Scheduling
for Space (IWPSS).
Denenberg, E.; and Coles, A. 2018. Automated planning in
non-linear domains for aerospace applications. In 58th Israel
Annual Conference on Aerospace Sciences.
Denenberg, E.; Coles, A.; and Long, D. 2019. Evalu-
ating the Cost of Employing LPs and STPs in planning:
lessons learned from large real-life domains. In Scheduling
You can also read