Learning Task-Specific Dynamics to Improve Whole-Body Control

Page created by Ana Hines

Business

English

Like
Share
Embed
Fullscreen
Slides
Download HTML
Download PDF
Abuse

←

→

Page content transcription

If your browser does not render page correctly, please read the page content below

Learning Task-Specific Dynamics to Improve Whole-Body Control

Learning Task-Specific Dynamics to Improve Whole-Body Control
                                                         Andrej Gams1 , Sean A. Mason2 , Aleš Ude1 , Stefan Schaal2,3 and Ludovic Righetti3,4

                                           Abstract— In task-based inverse dynamics control, reference
                                        accelerations used to follow a desired plan can be broken down
                                        into feedforward and feedback trajectories. The feedback term
                                        accounts for tracking errors that are caused from inaccurate
                                        dynamic models or external disturbances. On underactuated,
                                        free-floating robots, such as humanoids, high feedback terms
                                        can be used to improve tracking accuracy; however, this can
arXiv:1803.01978v2 [cs.RO] 8 Mar 2018

                                        lead to very stiff behavior or poor tracking accuracy due to
                                        limited control bandwidth. In this paper, we show how to
                                        reduce the required contribution of the feedback controller by
                                        incorporating learned task-space reference accelerations. Thus,
                                        we i) improve the execution of the given specific task, and
                                        ii) offer the means to reduce feedback gains, providing for
                                        greater compliance of the system. With a systematic approach
                                        we also reduce heuristic tuning of the model parameters and
                                        feedback gains, often present in real-world experiments. In
                                        contrast to learning task-specific joint-torques, which might
                                        produce a similar effect but can lead to poor generalization,
                                        our approach directly learns the task-space dynamics of the
                                        center of mass of a humanoid robot. Simulated and real-world
                                        results on the lower part of the Sarcos Hermes humanoid robot              Fig. 1. Simulated and real-world lower part of the Sarcos Hermes humanoid
                                        demonstrate the applicability of the approach.                             robot used in the experiments.

                                                              I. INTRODUCTION
                                                                                                                   feedback gains to ensure good tracking performance on real
                                           Unmodeled dynamics (i.e. friction, link flexibilities, un-              robots which do not have accurate dynamic models.
                                        modeled actuator dynamics or approximate model parame-                        Acquiring dynamic models of robots and task can be
                                        ters) can have severe effects on tracking performance which                partially offset by iterative learning of the control signals,
                                        can be problematic for balancing legged robots. Indeed,                    which relies on one of the main characteristics of robots
                                        models are often difficult to obtain and/or incorrect, and                 – repeatability of actions in case of the same input signals.
                                        while parameter identification can improve their quality [1],              Iterative learning control (ILC) [7] was extensively applied in
                                        it does not take into account unmodeled dynamic effects.                   robotics, including for learning of task-specific joint control
                                        The lack of accurate models has led to the use of combined                 torques [8].
                                        feedforward and feedback control, which preserves stability
                                        and robustness to disturbances. The greater the error of                   A. Problem Statement
                                        the model, the greater the feedback gains necessary to                        In this paper we investigate task-specific learning of
                                        ensure robust task achievement. This comes at the cost of                  dynamics for improved task execution and compliance in
                                        a significant increase in stiffness and damping requirements.              the scope of optimization-based inverse dynamics control.
                                           Different methods of acquiring dynamic models [2] and                   Therefore, we want:
                                        exploiting them in control have been proposed [3]. Optimiza-
                                                                                                                      • to reduce the required contribution of the feedback term
                                        tion based approaches, such as hierarchical inverse dynamics
                                                                                                                        in the control,
                                        [4]–[6], have gained in popularity in the recent years for
                                                                                                                      • consequently improve task-space tracking and compli-
                                        the control of legged robots. However, these approaches
                                                                                                                        ance, and
                                        rely a dynamics model and often necessitate high task-space
                                                                                                                      • act in the system’s task space, typically the center of

                                          *This work was supported by the Slovenian Research Agency project
                                                                                                                        mass (CoM) dynamics.
                                        BI-US/16-17-063.                                                           Acting in task space as opposed to joint space potentially
                                          1 Humanoid and Cognitive Robotics Lab, Dept. of Automatics, Bio-
                                                                                                                   offers the means for application beyond the scope of the
                                        cybernetics and Robotics, Jožef Stefan Institute, Ljubljana, Slovenia.
                                        name.surname@ijs.si                                                        learned dynamics, i. e., through generalization.
                                          2 Computational Learning and Motor Control Lab, University of Southern
                                                                                                                      The intended application of the proposed algorithm is
                                        California, Los Angeles, California, USA.                                  improved control of dynamic tasks on humanoid robots. We
                                          3 Autonomous Motion Department, Max Planck Institute for Intelligent
                                        Systems, Tuebingen, Germany.                                               performed our experiments on the lower part of the Sarcos
                                          4 Tandon School of Engineering, New York University, New York, USA.      Hermes humanoid robot, depicted in Fig. 1.

This paper is organized as follows:. Section II provides       models that can then be used to synthesize control laws.
a short literature overview. Section III gives a short recap     Iterative methods to compute locally optimal control policies
on the QP inverse dynamics control. Section IV explains the      have, for example, recently been used with such learned
proposed algorithm of applying learned torques in quadratic-     models [21]. Iterative repetitions of the process are then used
program based inverse dynamics control. Section V depicts        to collect additional data and re-learn a new policy [21],
simulated and real-world results. A discussion is in Section     [22]. In these approaches, the learned models and resulting
VI, followed by conclusion and future prospects of this work.    control policies operate on the whole robot. It is therefore
                                                                 not clear how other tasks and additional constraints can
                  II. RELATED WORK                               be incorporated without impairing the resulting behaviors.
   Two bodies of work are relevant to this research: 1)          In this paper, we learn task-specific feedforward models
learning and exploiting control torques for improved task        which take into account the error in dynamic models during
execution, and 2) optimization based control of floating-base    task execution and combine them with a QP-based inverse
systems.                                                         dynamics controller. This allows to significantly improve task
   Learning of control signals to iteratively improve task       tracking while creating more compliant behaviors.
execution using the aforementioned ILC [7], has been exten-
sively used in robotics [9]. Feedback-error-learning, where                             III. CONTROL
the feedback motor command is used as an error signal to           In this Section we briefly introduce our QP based inverse
train a neural network which then generates a feedforward        dynamics, originally proposed in [4].
motor command, was proposed by Kawato [10]. The idea
was extended for learning of contact interaction. In their       A. QP-Based Hierarchical Inverse Dynamics
work, Pastor et al. [11] combined learning of forces with
                                                                    The floating-base dynamics of a legged humanoid robot
dynamical systems to improve force tracking in interac-
                                                                 are formulated given by
tion tasks. Similarly, in [12], dynamic movement primitives
(DMP) [13] were used in combination with learning of                            M(q)q̈ + h(q, q̇) = ST τ + JTc λ,              (1)
interaction forces.
   Learning to improve contact interaction was also applied      with a vector of position and orientation of the robot in space
directly to actuation torques. Measured interaction force was    and its joint configurations q ∈ SE(3) × Rn , the mass-inertia
used to calculate the joint-torques from the measured arm        matrix M ∈ Rn+6×n+6 , the nonlinear terms h ∈ Rn+6 ,
pose in [14]. These were applied in a feed-forward man-          the actuation matrix S ∈ [On×6 In×n ] and the end-effector
ner in control. Similarly, joint torques along the kinematic     contact Jacobian Jc ∈ R6m×n , where n is the number of
trajectory were learned and encoded as DMPs in [15] and          robot’s degrees of freedom, τ are actuation torques and λ
used to increase the accuracy in the next execution of the in-   are the contact forces.
contact task. This approach to improving trajectory execution       Given that a humanoid robot is a floating-base system, its
was also applied to full-sized humanoid robots, for example      dynamics can be decomposed into actuated and unactuated
in [16], where a particular trajectory was run, and the joint    (floating base) parts, respectively
torques from that trial were used as the feedforward term on
the next trial. These methods can go beyond mere repetition                 Ma (q)q̈ + ha (q, q̇) = τ + JTc,a λ                (2)
of the same task. In [8], the authors show how the learned                       Mu (q)q̈ + hu (q, q̇) =      JTc,u λ          (3)
joint-torque signals, encoded in parametric form as compliant
movement primitives (CMPs), can be generalized for new           Equation (2) shows that the actuation torques, τ are linearly
task parameters (weight, speed, etc.). However, because the      determined by q̈ and λ. Therefore it is a redundant variable
approach is rooted in joint-torques, the generalization is       and optimization needs to be performed only over q̈ and λ
somewhat limited to variations of the same task.                 and dynamic consistency is reduced to enforcing Eq. (3).
   Optimization-based inverse dynamics control has become        Kinematic contact constrains ensuring that the part of the
a very popular method to control legged robots [4], [5], [17],   robot in contact with the environment do not move
[18] as it allows to directly specify control objectives for                             Jc q̈ + J̇c q̇ = 0                    (4)
multiple tasks, while ensuring priorities between tasks and
important constraint satisfaction (actuation limits, friction    are additional equality constraints of the optimization, while
cones, etc). The redundancy of a complex robot such as a         foot center of pressure (CoP), friction forces, resultant
humanoid can therefore be optimally exploited. It is also        normal torques, joint torques and joint accelerations are
possible to compute optimal feedback gains in a receding         inequality constraints. The cost to be minimized is
horizon manner directly in task space by leveraging the task
reduced dynamics [4], [6], [19] Such methods have remark-                min ||ẍ − ẍdes ||2Wx + ||Pnull q̈ − q̈des ||2Wq +
                                                                         q̈,λ
                                                                                                                               (5)
able capabilities, but are often limited by modeling errors
                                                                                     ||λ − λdes ||2Wλ + ||τ − τdes ||2Wτ .
which necessitate to significantly increase feedback gains
which leads to very stiff behaviors. Some approaches, as far     Here, W denotes weighting matrices, Pnull projects into the
back as [20], have proposed to learn task-specific dynamic       null space of the Cartesian task constraints, and x ∈ R6m are

m stacked Cartesian endeffector poses. Joint and Cartesian A. Encoding the Feedforward Signal
endeffectors are related through The recorded (learned) signal can be encoded in any form.
The focus in this paper is on periodic task. We chose to
ẍ = Jt q̈ + J̇t q̇, (6)
encode the signal as a linear combination of radial basis
with Jt ∈ R6mt ×n+6 is the Jacobian for mt unconstrained functions (RBF) appropriate for periodic tasks. Thus, the
endeffectors and Jc for mc constrained ones. signal encoding is compact and the signal itself is inherently
filtered. Most importantly, as discussed in [8], this repre-
Desired end-effector accelerations are computed through
sentation allows for computationally light1 generalization
ẍdes = ẍref + Px (xref − x ) + Dx (ẋref − ẋ ) . (7) using Gaussian Proces Regression (GPR) [23]. A linear
combination of RBF as a function of the phase φ is given
Matrices Px and Dx represent stiffness and damping gains by
PL
of the PD controller, respectively. Desired joint accelerations j=1 wj Γj (φ)
are specified by ẍfb, i−1 (φ) = PL r, (10)
j=1 Γj (φ)

q̈ = Pq (qref − q) − Dq . where Γi denotes the basis function, given by

For more details on the solver, see [4]. Γj (φ) = exp(hi (cos(φ − ci ) − 1)), (11)

wj is the weight of the j-th basis function, L is the number
IV. TASK S PECIFIC DYNAMICS
of basis functions, r is the amplitude gain, ci are centers of
With QP-based inverse dynamics control we can control the basis functions and hi > 0 their widths. The periodic
the behavior of the robot. To improve the tracking we periodic phase φ is determined by the phase oscillator
introduce an additional feedforward term.
φ̇ = Ω, (12)
The results of using QP-based inverse dynamics control
will depend on the accuracy of the model. This can be seen where Ω is the frequency of oscillations.
in (7), where
B. Generalization
ẍdes = ẍref + Px (xref − x ) + Dx (ẋref − ẋ ) .
|{z} | {z } In the manner of generalizing joint-space feedforward
feedforward feedback torques [8], we can also generalize the learned task-space
If the model were perfect, the contribution of the feedback CoM accelerations. Having chosen RBS encoding, we can
part would amount to 0. However, it is not, and the feedback generalize between the weights, for example using GPR. The
part accounts for the discrepancy. We propose recording the goal of generalization is to provide us with a function
contribution part and adding it in the next repetition of the FDb : κ 7−→ [w ] (13)
exact same task (desired motion). Thus, we get
that provides the output in the form of a vector of RBF
weights w, given the database of trained feedforward terms
ẍdes,i = ẍref + ẍfb, i−1 +Px (xref − x ) + Dx (ẋref − ẋ ) , Db and the input, i. e., the query κ. We refer the reader to
| {z }
updated feedforward [8], [23] for details on GPR.
(8) While this kind of generalization is analog to the one in
where [8], but in task space, potential generalization across tasks
might offer more possibilities.
ẍfb, i−1 = Px (xref − xi−1 ) + Dx (ẋref − ẋi−1 ) . (9)
V. EXPERIMENTAL RESULTS
We only used the feedback from 1 previous iteration (i = 2).
Experiments were performed on the lower part of the
Unlike in [16] or [8], the feedforward part is added in the
hydraulically actuated, torque controlled, Sarcos humanoid
task-space of the robot, and not in its joint space. It is also
robot. It has in total 17 degrees of freedom (DoFs), with 7
combined with QP-based optimization control.
in each leg and three in the torso. The system is depicted in
Using the recorded (learned) feedback signal in the next
Fig. 1. We used SL simulation and control environment [24]
iteration of the same task provides us with an improved feed-
for evaluation in simulation.
forward signal, which is task-specific. However, we can build
To demonstrate the applicability of the result we chose
up a database of such signals for different task variations.
the task of periodic squatting with a predefined frequency.
Thus, the proposed algorithm can significantly correct the
We compared the position of the CoM without and with the
discrepancy of the model and the real system. Furthermore,
added feedforward term.
we can use the database to generate an appropriate signal for 1 The calculation of the hyperparameters for GPR is computationally
previously untested tasks and task variations using statistical expensive, but it is performed offline. Simple matrix multiplication is
generalization [8]. performed online.

rection. The plot also shows the encoded feedforward signal
(purple),which very closely matches the original feedback
signal. The matching could be increased, for example, with a
higher number of basis functions. In the experiments we used
L = 25 basis functions; the number was chosen empirically.
B. Generalization
We performed a basic generalization experiment over a
variation of the task, to show that the approach can be
used with generated feedforward signals. We used GPR to
generalize the feedforward term for a squatting amplitude
of κ = 5 cm. The database consisted of feedforward terms
for different squatting amplitudes, i. e., task-parameters κ =
2, 4, 6, 8 cm2 . Fig. 4 shows the results, where we can see that
the generalized feedforward term allowed for very similar,
low CoMz tracking errors as the recorded feedforward term
for κ = 5 cm.
Fig. 2. Center of mass position during a squatting experiment without
(i − 1) and with (i) the learned feedforward term, and with the feedforward C. Real World Results
term but with lower gains (i0 ). The top plot shows the CoM position in We performed the same experiment on the real robot. To
the vertical z axis for all experiments in the world coordinate frame. Black
dashed line depicts the start of the experiment, green-dashed line shows the increase the dynamics of the task, we tested the approach on
desired motion, blue line shows the results without the added feedforward two different frequencies: 0.25 Hz and 0.5 Hz. The error of
term (i − 1), red line shows the results with the added feedforward term (i), CoMz tracking for both cases is depicted in Fig. 5. We can
and ocher line shows the results with the added feedforward term but with
5 times reduced P feedback gains (i0 ). The bottom plot show the error of see in the plots that the error is again significantly reduced
CoMz tracking for all three cases. for both cases. Fig. 6 shows a series of still photos depicting
the real system performing one squat.
A. Simulation
VI. DISCUSSION
In the first experiment, we performed squatting at a
With regard to our problem statement, we can say that
frequency of 0.25 Hz. Squatting was defined as sinusoidal
the proposed approach thoroughly resolves it. The results
motion of the center of mass up and down for 6 cm; this
show a reduced contribution of the feedback term; a clear
range is close to the maximal motion the robot can perform
improvement of the tracking performance and the possibility
without hitting joint limits.
to reduce the feedback gains to thus increase the compliance;
In simulation, the model used to compute the inverse
and the approach is applied in the task space of the system.
dynamics control is perfect; however, we set the task space
In the following we briefly discuss these points.
gains of the optimization such that it did not reach perfect
We first discuss the improvement of the tracking per-
tracking. Fig. 6 shows a sequence of pictures illustrating the
formance and the reduction of the feedback term contri-
squatting task.
bution. The question whether the system behavior is the
Fig. 2 shows in the top plot the desired and actual vertical
same with the additional task-specific feedforward term and
motion of CoM for three cases: without (i − 1) the added
term, with the added term (i), and with the added term with 2 The database is too small. Typically it would consist of at least 10 entries,

reduced P0 = 0.2P gains (i0 ). Improvement of tracking is but in this toy example they would be too close together.
clearly visible in the top plot for both cases using the added
feedforward term. The tracking error is depicted in the lower 0.2
plot. Here we can see that the addition of the feedforward
term reduces the tracking error by an order of magnitude 0.1
even with reduced P gains. Before the start of the squatting
(depicted by the black dashed line), the learned feedforward 0
term was not added to the system. With low gains (i0 ), this
clearly results in higher error of the CoMz tracking, bit in -0.1
higher compliance as well. The higher compliance is directly
related to the error of tracking before the feedforward term -0.2
5 10 15 20 25
is added, because the error of the model can be interpreted
as an external force.
The plot in Fig. 3 shows the amplitude of the feedback part Fig. 3. The contribution of the feedback term in QP-based inverse dynamic
control when squatting as defined in the experiment. Without an additional
of the controller, again without (i − 1), with (i), and with the feedforward term in blue (i − 1), with the additional term in red (i), with
additional feedforward term but with reduced feedback gains the term but with reduced gains in ocher (i0 ), and the encoded feedforward
(i0 ). We can again see the difference in needed feedback cor- signal in purple.

the system has been recognized as one of the key elements
for real-world deployment of robots in unstructured environ-
ments, as it provides robustness for unplanned disturbances
[17].
Our approach reduces the contribution of the feedback
term in a manner similar to an improved dynamic model. As
shown in the literature (e. g. in [1]), a dynamic model never
completely describes the behavior of a complex system.
On a real system, such as the lower part of the Hermes
bipedal robot used in this work, this difference is easily
explained by the influence of hydraulic hoses that were not
modelled. The difference of our approach to improving the
actual model is that our approach is task-specific. The learned
feedback torques for squatting are by default only applicable
to squatting. As already outlined in Section III, building up
a database is a rather straightforward solution. This has not
only been applied to the model (for lack of a better word), but
Fig. 4. Center of mass position during a squatting experiment without (i−1) also to control policies as a result of optimization. In [26],
and with(i) the learned feedforward term, and with the feedforward term a database of such control policies is used to warm-start the
generated from the database (i00 ). The top plot shows the CoM position in optimization. A more advanced solution than just building
the vertical z axis for all three experiments in the world coordinate frame.
Black dashed line depicts the start of the experiment, green-dashed line up a database is to use the database to generate feedforward
shows the desired motion, blue line shows the results for i − 1, red line terms for previously untrained situations. Different methods
shows the results for i, and ocher line shows the results for the generalized can be applied, for example statistical learning, such as GPR,
feedforward term i00 . The bottom plot show the error of CoMz tracking for
all three cases. which was used in a similar manner for joint torques in
[8]. A broad-enough database could potentially be applied
0.01 with deep neural networks, with the output the weights of
the CMP. A similar approach was applied for kinematic
0 trajectories, which were generated from raw images in [27].
The database would have to be rather substantial, though.
-0.01 The real advantage of learning and applying the feedfor-
ward terms in task space is the similarity it has over different
-0.02 tasks. For example, during walking, the CoM position is
0 5 10 15 20 25 30
moving from one side to the other, which is (from CoM point
0.01
of view) the same as simply shifting the weight from one side
to the other. The advantage of the proposed approach is quite
0
obvious, as feedforward terms for tasks that are easier to
successfully implement can be used in a feedforward manner
for more complex tasks. In the given case, feedforward terms
-0.01
for shifting of the weight from one side to the other can
be used for walking. However, this cross-task generalization
-0.02
0 5 10 15 20 25 30 remains an open research question.
VII. CONCLUSION & FUTURE WORK
Fig. 5. Real-world error of CoMz tracking with and without the added term
for the squatting experiment at two different squatting frequencies, 0.25 Hz In this paper we showed that basic iterative learning can
in the top and 0.5 Hz in the bottom. In both plots the results without the be applied to task-space accelerations in order to improve
added term are in blue, and with the added feedforward term in red. the task execution of a complex, free-floating robot system,
controlled with QP-based inverse dynamic control. Our ap-
low feedback gains or without the additional task-specific proach has fully resolved the specified problem statement.
feedforward term and high gains, has previously been studied Furthermore, it has the potential to substantially improve the
by Goldsmith [25]. The author shows that an equivalent behavior of model-based control methods with application
feedback can always be constructed from the ILC parameters of learned signals across different task. The latter, however,
with no additional plant knowledge, whether or not the ILC remains a task for future work.
includes current-cycle feedback. Note that in (8), current
R EFERENCES
cycle feedback is present. If the ILC converges to zero error,
the equivalent feedback is a high gain controller. Herein lies [1] M. Mistry, S. Schaal, and K. Yamane, “Inertial parameter estimation
of floating base humanoid systems using partial force sensing,” in
the main advantage of using ILC – low feedback gains can 2009 9th IEEE-RAS International Conference on Humanoid Robots,
be used, resulting in increased compliance. Compliance of pp. 492–497, Dec 2009.

Fig. 6. Still images of the lower part of the Sarcos Hermes robot performing one squat in simulation (top) and on the real robot (bottom). See also the
accompanying video for the real-world experiment.

 [2] D. Nguyen-Tuong and J. Peters, “Model learning for robot control: a       [16] N. S. Pollard, J. K. Hodgins, M. J. Riley, and C. G. Atkeson, “Adapting
     survey,” Cognitive Processing, vol. 12, pp. 319–340, Nov 2011.                 human motion for the control of a humanoid robot,” in Proceedings
 [3] D. W. Franklin and D. M. Wolpert, “Computational mechanisms of                 2002 IEEE International Conference on Robotics and Automation,
     sensorimotor control,” Neuron, vol. 72, no. 3, pp. 425 – 442, 2011.            vol. 2, pp. 1390–1397 vol.2, 2002.
 [4] A. Herzog, N. Rotella, S. Mason, F. Grimminger, S. Schaal, and            [17] B. J. Stephens and C. G. Atkeson, “Dynamic balance force control
     L. Righetti, “Momentum control with hierarchical inverse dynamics on           for compliant humanoid robots,” in 2010 IEEE/RSJ International
     a torque-controlled humanoid,” Autonomous Robots, vol. 40, pp. 473–            Conference on Intelligent Robots and Systems, pp. 1248–1255, Oct
     491, Mar 2016.                                                                 2010.
 [5] A. Escande, N. Mansard, and P.-B. Wieber, “Hierarchical quadratic         [18] M. DeDonato, V. Dimitrov, R. Du, R. Giovacchini, K. Knoedler,
     programming: Fast online humanoid-robot motion generation,” The                X. Long, F. Polido, M. A. Gennert, T. Padr, S. Feng, H. Moriguchi,
     International Journal of Robotics Research, vol. 33, pp. pp. 1006–             E. Whitman, X. Xinjilefu, and C. G. Atkeson, “Human-in-the-loop
     1028, June 2014.                                                               control of a humanoid robot for disaster response: A report from the
 [6] S. Kuindersma, F. Permenter, and R. Tedrake, “An efficiently solvable          darpa robotics challenge trials,” Journal of Field Robotics, vol. 32,
     quadratic program for stabilizing dynamic locomotion,” in 2014 IEEE            no. 2, pp. 275–292, 2015.
     Conference on Robotics and Automation, pp. 2589–2594, IEEE, 2014.         [19] A. Herzog, N. Rotella, S. Schaal, and L. Righetti, “Trajectory genera-
 [7] D. A. Bristow, M. Tharayil, and A. G. Alleyne, “A survey of iterative          tion for multi-contact momentum control,” in IEEE-RAS International
     learning control,” IEEE Control Systems, vol. 26, pp. 96–114, June             Conference on Humanoid Robots (Humanoids), pp. 874–880, 2015.
     2006.                                                                     [20] S. Schaal and C. Atkeson, “Robot juggling: implementation of
 [8] M. Deniša, A. Gams, A. Ude, and T. Petrič, “Learning compliant               memory-based learning,” IEEE Cont. Sys., vol. 14, pp. 57 – 71, 1994.
     movement primitives through demonstration and statistical generaliza-     [21] S. Levine and P. Abbeel, “Learning neural network policies with
     tion,” IEEE/ASME Transactions on Mechatronics, vol. 21, pp. 2581–              guided policy search under unknown dynamics,” in Advances in Neural
     2594, Oct 2016.                                                                Information Processing Systems 27 (Z. Ghahramani, M. Welling,
 [9] M. Norrlöf, Iterative Learning Control – Analysis, Design, and Ex-            C. Cortes, N. D. Lawrence, and K. Q. Weinberger, eds.), pp. 1071–
     periments. PhD thesis, Linköpings Universitet, 2000.                          1079, Curran Associates, Inc., 2014.
[10] M. Kawato, “Feedback-error-learning neural network for supervised         [22] M. P. Deisenroth and C. E. Rasmussen, “Pilco: A model-based and
     motor learning,” in Advanced Neural Computers (R. ECKMILLER,                   data-efficient approach to policy search,” in Proceedings of the 28th
     ed.), pp. 365 – 372, Amsterdam: North-Holland, 1990.                           International Conference on International Conference on Machine
[11] P. Pastor, M. Kalakrishnan, L. Righetti, and S. Schaal, “Towards               Learning, ICML’11, (USA), pp. 465–472, Omnipress, 2011.
     associative skill memories,” in 2012 12th IEEE-RAS International          [23] C. Rasmussen and C. Williams, Gaussian processes for machine
     Conference on Humanoid Robots (Humanoids), pp. 309–315, 2012.                  learning. Cambridge, MA, USA: MIT Press, 2006.
[12] A. Gams, B. Nemec, A. J. Ijspeert, and A. Ude, “Coupling movement         [24] S. Schaal, “The sl simulation and real-time control software package,”
     primitives: Interaction with the environment and bimanual tasks,”              tech. rep., Los Angeles, CA, 2009. clmc.
     IEEE Transactions on Robotics, vol. 30, pp. 816–830, Aug 2014.            [25] P. B. Goldsmith, “On the equivalence of causal lti iterative learning
[13] A. J. Ijspeert, J. Nakanishi, H. Hoffmann, P. Pastor, and S. Schaal,           control and feedback control,” Automatica, vol. 38, no. 4, pp. 703 –
     “Dynamical Movement Primitives: Learning Attractor Models for                  708, 2002.
     Motor Behaviors.,” Neural computation, vol. 25, pp. 328–73, 2013.         [26] N. Mansard, A. Del Prete, M. Geisert, S. Tonneau, and O. Stasse,
[14] R. Calandra, S. Ivaldi, M. Deisenroth, and J. Peters, “Learning torque         “Using a Memory of Motion to Efficiently Warm-Start a Nonlinear
     control in presence of contacts using tactile sensing from robot skin,”        Predictive Controller,” in 2018 IEEE International Conference on
     in IEEE-RAS 15th International Conference on Humanoid Robots                   Robotics and Automation, to appear, 2018.
     (Humanoids), pp. 690–695, Nov 2015.                                       [27] R. Pahič, A. Gams, A. Ude, and J. Morimoto, “Deep Encoder-Decoder
[15] F. Steinmetz, A. Montebelli, and V. Kyrki, “Simultaneous kinesthetic           Networks for Mapping Raw Images to Movement Primitives,” in
     teaching of positional and force requirements for sequential in-contact        2018 IEEE International Conference on Robotics and Automation, to
     tasks,” in IEEE-RAS 15th International Conference on Humanoid                  appear, 2018.
     Robots (Humanoids), pp. 202–209, Nov 2015.