Autonomous Obstacle Avoidance by Learning Policies for Reference Modification

Page created by Laurie Carroll
 
CONTINUE READING
Autonomous Obstacle Avoidance by Learning Policies for Reference
                                                                     Modification
                                                                   Benjamin Evans1 , Herman A. Engelbrecht1 and Hendrik W. Jordaan1

                                           Abstract— The key problem for autonomous robots is how to
                                        navigate through complex, obstacle-filled environments, without
                                        colliding into obstacles. The navigation problem is split into
                                        generating a global reference path to the goal and then using a
                                        local planner to track the reference path and avoid obstacles by
arXiv:2102.11042v2 [cs.RO] 4 May 2021

                                        generating velocity and steering references for the robot control
                                        system to execute.                                                                       Path
                                           This paper presents a hybrid local planning architecture                            Follower                              RL Agent
                                        that combines a classic path following algorithm with a deep
                                        reinforcement learning agent. The novel “modification planner”
                                        uses the path follower to track the global plan and the deep
                                        reinforcement learning agent to modify the references generated
                                        by the path follower to avoid obstacles. Importantly, our                                     Hybrid Modification Planner
                                        architecture does not require an updated obstacle map and
                                        only 10 laser range finders to avoid obstacles. The modification
                                        planner is evaluated in the context of F1/10th autonomous
                                        racing and compared to a mapless navigation baseline, the
                                        Follow the Gap Method and an optimisation based planner. The
                                        results show that the modification planner can achieve faster
                                        average times compared to the baseline mapless navigation
                                        planner and a 94% success rate which is similar to the baseline.          Fig. 1. Examples of Autonomous Navigation: The green dotted line is
                                                                                                                  the global plan, the blue blocks are the obstacles and the red line is the path
                                                                                                                  taken by the vehicle. Path following algorithms are able to perfectly follow
                                                                                                                  the global plan, yet crash into unknown obstacles. Reinforcement learning
                                                                I. I NTRODUCTION                                  agents are able to avoid obstacles, yet suffer from being inefficient. The
                                                                                                                  lower image shows the desired behaviour which we create in the proposed
                                           The general problem of robot navigation is to plan and                 modification planner.
                                        execute a trajectory that moves a robot from a starting loca-
                                        tion to a goal location without colliding with any obstacles.
                                        Successful approaches to robot navigation have implemented                smoothly than a mapless navigation planner and does not
                                        a hierarchical approach by splitting the problem into global              require an updated obstacle map as with optimisation based
                                        planning, generating a high-level plan before the mission                 planners. We evaluate our novel approach in the context
                                        begins, and local planning, following the global plan while               of F1/10th autonomous racing and compare it to current
                                        avoiding obstacles [1]. Local planning is a difficult problem             approaches which include a mapless navigation planner, the
                                        to solve because it has the dual objective of (1) trying to               Follow the Gap Method, and an oracle optimisation based
                                        follow a predetermined path (2) and being able to react                   planner.
                                        appropriately to uncertainty in the environment, namely                      The rest of the paper is summarised as follows. We
                                        smoothly avoiding obstacles.                                              first discuss previous attempts to solve the local planning
                                           In this paper, we address the problem of local planning                problem in Section II. This is followed by a theoretical
                                        without an updated obstacle, which involves using the global              overview of our proposed reference modification planner
                                        plan and onboard sensor readings to generate velocity and                 in Section III. In Section IV we explain the components
                                        steering commands that follow the global reference while                  that we use in building our solution and in Section V the
                                        avoiding obstacles. We present our novel approach which                   learning formulation for the Reinforcement Learning (RL)
                                        is a hybrid local planner that integrates a path follower                 agent is explained. Lastly in Sections VI and VII we detail
                                        and a deep reinforcement learning (RL) agent that modi-                   the evaluation methodology and present our results.
                                        fies the steering reference generated by the path follower
                                        to accomplish the dual objective of tracking a reference
                                                                                                                                           II. R ELATED W ORK
                                        path while also smoothly avoiding obstacles. We call our
                                        approach a reference modification planner. The advantage of                  Different methods of local planning, following a global
                                        our architecture is that it can learn to avoid obstacles more             plan while avoiding obstacles have been developed. We study
                                          1 Electrical and Electronic Engineering Department, Stellenbosch Uni-
                                                                                                                  reactive methods [2], [3], classical planning techniques [4],
                                        versity, Stellenbosch, 7600, South Africa. {19811799, hebrecht,           [5], learning based approaches [6], [7] and hybrid systems
                                        wjordaan}@sun.ac.za;                                                      [8], [9].
The earliest obstacle avoidance algorithms were reactive      jectory being followed and obstacles being smoothly avoided
algorithms (such as “The Bug” algorithm) which generate          as shown in Figure 1. The solution should track a reference
steering references based on the surrounding environment at      trajectory and deviate as necessary to safely and efficiently
each step [2]. One of the most recently proposed reactive ob-    avoid the obstacle. The desired behaviour is analogous to
stacle avoidance algorithms is the “Follow the Gap Method”       driving a car, seeing an obstacle in the road, swerving around
(FGM) [3]. The FGM method uses a dense LiDAR scan                it and then returning to the original lane.
to navigate away from the nearest obstacle into free space.
Reactive methods do not have a model of the vehicle or the        Global
ability to plan for multiple timesteps and thus they cannot       Plan                                                  +
guarantee optimality.                                                         Path Follower
   Classical local planning methods aimed to improve on                                                                  + Modified
reactive methods by storing a map of the environment which                                                                 references
can be used to plan a trajectory, multiple time-steps in         Vehicle
advance. Classical methods typical build a map using a           state
Simultaneous Localisation and Mapping (SLAM) algorithm                                                RL Agent
[10], and then use the map to plan and optimise a collision-
free trajectory [4]. The trajectory is then followed using a     Fig. 2. Proposed Local Planner Architecture: A path follower to track
path tracking algorithm such as Pure Pursuit [5]. Classical      the reference path is combined in parallel with an RL agent that modifies
planning and path tracking methods can generate and execute      the path follower references.
near-optimal trajectories, however, because map building
is computationally expensive and slow, these solutions are          We present a hybrid local planner that features a pure
not feasible for fast-moving, dynamic environments such as       pursuit path follower in parallel with an RL agent. Figure 2
autonomous racing [11].                                          shows how we combine the path follower and an RL agent
   Due to the hardware requirements/constraints of classical     in parallel such that the agent can modify the path follower
planning solutions, machine learning (ML) methods have           references to avoid obstacles. This system is designed that
been applied to the problem of local planning for autonomous     the path follower maintains the reference path and the agent
navigation [12]. Neural networks, trained with reinforcement     can deal with the uncertainty of obstacles.
learning (RL) [6] and imitation learning (IL) [7] have been         The path follower uses the current location of the vehicle
used to associate sensor readings (camera images and LiDAR       and a list of way-points to calculate the required steering
scans) directly with navigation commands. While using an         and velocity commands to follow the global plan. an RL
RL agent as a local planner does not require the expensive       agent is added in parallel to avoid obstacles by modifying
maintenance of a map, RL agents exhibit poor long term           the steering angle calculated by the path follower. We train
performance and have shown to require a high-level planner       the agent by punishing it for crashing and rewarding it for
in certain circumstances [13].                                   progress along the global plan. These two components in
   In response to the problem of neural networks exhibiting      the reward signal represent the dual objectives of achieving
poor long-term performance in complex environments, hy-          efficient, collision-free paths.
brid architectures have been introduced which combine high-         Where previous attempts have used RL agents to replace
level planners with neural networks. Paths generated with A*     the local planner and learn to navigate, we use an RL agent
search have been used in conjunction with a neural network       to learn to navigate as a subsystem within our local planner.
as a local planner and shown to be able to execute long          This is done with the aim of better utilising the global path
missions [9], [8]. The above methods completely replace the      and avoiding obstacles more smoothly. The aim is that the
local planner with an RL agent. A recent survey notes that       agent will learn to swerve around obstacles where needed
using machine learning (ML) for specific subsystems tends        without modifying the behaviour of the vehicle when no
to lead towards better overall performance [12]. We aim to       obstacles are present.
improve upon the performance of these solutions by using an
                                                                           IV. P RELIMINARIES : NAVIGATION S TACK
RL agent to learn the local planning subsystem of obstacle
avoidance. We aim to improve the performance of the vehicle         We design our system in the context of a standard per-
by giving the agent less influence over the navigation. Our      ception, planning and control stack as shown in Figure 3.
solution doesn’t require an updated obstacle map, as with        Before the mission begins, the global planner uses a map of
classical model-based planning solutions, and can harness the    the environment to generate a reference path of way-points
adaptability of learning techniques to form a planner which is   that connect the start and goal locations. The map does not
able to smoothly avoid obstacles while maintaining a global      contain any dynamic obstacles. During the mission, the local
trajectory.                                                      planner and control system continuously operates to move
                                                                 the vehicle to the goal location.
                III. T HEORETICAL D ESIGN                           We use a pure pursuit path follower which navigates
  We address the problem of generating navigation refer-         towards a way-point situated at a fixed look-ahead distance
ences for autonomous vehicles that result in a reference tra-    in front of the vehicle. Figure 4 shows how the vehicle and
Environment Map          Global Planner                                    actions. For each training step, the network receives a state
                                                                              and selects an action that is implemented in the environment
                                                                              as shown in Figure 5. The simulator then returns the updated
       Perception            Local Planner               Control System       state and a reward to the agent and the process is repeated.
                                                                                               State                              Action
                                                                                                                    Agent
                                    Vehicle                                                   Reward

       Fig. 3.   Planning, Perception, and Control Navigation Stack
                                                                                                                Environment

way-points are modelled so that the steering angle can be
                                                                                Fig. 5.   Reinforcement Learning Agent Interacting With Environment
calculated. Using the notation from Figure 4, the desired
steering angle is calculated according to,
                                                                               We train the agent to modify the pure pursuit steering ref-
                                2L sin(α)                                     erence δpp to prevent collisions. The state which the network
                δpp = arctan                 ,        (1)
                                    ld                                        receives consists of a vector containing the current velocity
where L is the length of the vehicle body, ld is the look                     Vt and steering of the vehicle δt , the references calculated
ahead distance to the way-point being pursued and α is the                    by the path follower δpp , and the current range finder scan
angle between the vehicles current heading and the heading                    [r1 , ..., r10 ]. The network outputs an action quantity which is
to the way-point.                                                             used to modify the steering reference calculated by the path
                                                                              follower. The path follower (pp) and neural network (nn)
                                                                              steering commands are combined as,
                                                     α
                                                                                                             δref = δpp + δnn .                        (3)
                            R                                                 The reference steering value is limited to be within the
                                                ld                            maximum range and then sent to the control system to be
                                                         L                    executed. The inputs and outputs from the network are in
                       2α                                                     the range [−1, 1] and are scaled according to the maximum
                                                                              values.
                                R        Path
Fig. 4. Pure Pursuit Path Follower: R is the turning radius, g is the goal
point, alpha is the angle to the goal and delta is the steering angle which
is calculated.                                                                              r1 , ..., r10
                                                                                                                                    Action
                                                                                               State
   The velocity is calculated as the fastest the vehicle can                                                                        δnn
safely travel while turning at the current steering angle. The                                δpp , δt , V
velocity is calculated according to the maximum amount of
lateral friction the vehicle can withstand, relative to the force
exerted by the car in turning based on the current steering                   Fig. 6. Reinforcement Learning Agent Architecture: The state consists
angle. We compute the velocity as,                                            of 10 sparse laser range finders and the pure pursuit steering reference. A
                            s                                                 neural network with two fully connected layers is used to calculate an action
                                    bg                                        which is the steering reference which is used to modify the path follower
                     V = fs                    ,              (2)             reference.
                               tan(|δref| )/ld
where b is the coefficient of friction, m is the mass of the                     The aim in training the network is that the vehicle does not
vehicle, l is the wheelbase length, and sf is a safety factor.                crash and that it tracks the reference path closely. Therefore,
The velocity is limited to the vehicle’s maximum velocity.                    we use a reward signal which severely punishes crashing
   The navigation stack uses a low-level proportional control                 and rewards progress along the reference path. We write our
system to command the onboard vehicle actuators to follow                     reward signal as,
the local planning references. The control system calculates                                
acceleration and steering-velocity commands which can be                                    rcrash = −1
                                                                                                                             if crash
implemented on hardware.                                                       r(st , at ) = rcomplete = 1                    if lap complete
                                                                                            
                                                                                              rtracking = (st+1 − st )/stotal otherwise,
                                                                                            
V. L EARNING P OLICIES FOR R EFERENCE M ODIFICATION                                                                                        (4)
   The RL agent comprises a neural network that is trained                    where s is the progress along the track at a specific step. The
from experience to maximise a reward signal [14]. The neural                  sum of the tracking rewards for completing the episode will
network is used to represent a policy that maps states to                     always be 1 since the sum of the progress differences will
be equal to the track length. The tracking reward must be              runs at a higher frequency of 100Hz compared to the local
a potential function of the terminal reward so that the agent          planner which runs at 10Hz. Table I notes several important
is rewarded in proportion to the progress towards the ideal            parameters used in our simulation.
terminal state.
                                                                             Simulation Parameter                   Value
                 VI. E VALUATION M ETHODOLOGY                                Simulation dynamics time step          0.01 s
                                                                             Local planning time step               0.1 s
   The ability of the modification planner to follow a refer-                Max range finder value                 10 m
ence path and avoid obstacles efficiently is evaluated in the                Range finder noise σs                  0.01
context of F1/10th autonomous racing. We aim to determine                    Max vehicle speed                      7 m/s
                                                                             Max steering angle                     0.4 rad
the performance of the modification planner and compare                      Pure Pursuit Look-ahead distance       0.8 m
it to current baseline solutions. We measure performance in
                                                                                                 TABLE I
terms of the following metrics, i.e. (1) the time to navigate
                                                                                     PARAMETERS U SED IN THE S IMULATION
to a target and (2) the success rate, which is the percentage
of episodes successfully completed without crashing.
                                                                          The new vehicle state is used to simulate a LiDAR scan of
                                                                       10 sparse range finders on the map and the new state and the
                                                                       scan is returned to the local planner. To simulate the noise
                                                                       in range finder readings, Gaussian distributed process noise,
                                                                       N (0, σs ), is added to the measurements.

                                                                       B. Baseline Solutions

Fig. 7.   F1/10th Vehicle Simulator with 10 Laser Range Finder Beams      We compare the modification planner to three common
                                                                       baseline local planners. The first baseline is the “Follow
                                                                       the Gap Method” (FGM) [3]. The FGM method identifies a
A. Simulation Environment                                              bubble around the nearest obstacle and then navigates away
   We train and test our local planner architectures using             from the bubble into free space. In the implementation of
a custom-built, F1/10th autonomous racing simulator. The               this method, we adapt the simulator to produce 1080 beams.
simulator models a non-holonomic 1/10th scale autonomous                  The second baseline which is used is an optimisation
race car with a LiDAR sensor as shown in Figure 7. At each             based, oracle planner which has full access to a map of
time-step, the simulator receives an action, updates the state         the environment and the location of the obstacles before
according to the transition dynamics and then returns the new          the episode begins. The obstacle map is used to generate
state of the vehicle. A version of our test code is available          an optimal path around the obstacles. The optimal plan is
online 1 .                                                             followed using the pure pursuit algorithm. This solution is
                                                                       treated as the best performance which could be achieved by
                                                                       any local planner.
                              Action        Control System
          Local Planner                                                   Lastly, we attempt to replicate the mapless navigation
                                                                       planner presented by Tai et al. [6] and adapt it to a non-
              State                                                    holonomic vehicle. The planner presented by Tai uses a
                                                                       neural network that receives 10 laser range finder readings
                                                                       and the relative position of the target. The network is trained
       LiDAR Simulator                     Vehicle Dynamics            using reinforcement learning and rewarded according to the
                                                                       progress made towards the goal.
                        Simulation Environment                            For all of our baseline implementations, we use the same
                                                                       method of calculating the speed based on the steering angle
    Fig. 8.     Schematic of Planning Loop in Simulation Environment   to ensure that the results are comparable.

   Figure 8 shows how the simulator interfaces with the local          C. Reinforcement Learning
planner that is being evaluated. The local planner selects an
action consisting of velocity and steering angle commands                 The RL agent in the modification planner is trained
which are passed to the simulator and used as the references           using the Twin Delayed Deterministic Policy Gradient (TD3)
for the control system. The feedback control system calcu-             reinforcement learning algorithm with the original training
lates acceleration and steering-velocity commands which are            hyper-parameters [15]. Neural networks with two hidden
the input to the kinematic bicycle model equations used to             layers of 200 neurons are trained in mini-batches of 100
model the car. As in practice, the simulated control system            transitions. The training lasts for 200,000 steps which is
                                                                       generally just below 4000 episodes. Figure 9 shows the
  1 Available   at: https://github.com/BDEvan5/ReferenceModification   average reward per episode for the training of the network.
2
 Average Reward
                  1                                                                        x                                                         x

                                                                                             Modification
                  0

                      0             1,000          2,000       3,000        4,000
                                            Episode Number
                                                                                           x                                                         x
                          Fig. 9.    Modification Planner Training Graph
                                                                                               Navigation

                                                                                     Fig. 10. Example Trajectories Executed by the Modification and Navigation
               VII. E VALUATION R ESULTS                                             Planners
A. Random Forests
  The modification planner is evaluated in a random forest
which consists of a straight race track with randomly located                        high success rates show that both planners learn to avoid a
obstacles. The random forest is a straight track (20m x 2m)                          significant majority of obstacles.
where 4 obstacles (0.5m x 0.5m) are randomly placed every                               The influence of the network in modifying the path
episode along the track. Table II shows the results from                             follower references is shown below. Figure 11 shows a
running 100 test episodes and taking the average times to                            graph of the output from the neural network as the vehicle
navigate to goal and success rate.                                                   moves along the path. The neural network modifies the path
                                                                                     enough to avoid the obstacles while roughly maintaining the
 Vehicle                                    No Obstacles    Obstacles      Success   reference path.
 Follow the Gap Method                      3.73            4.38            99%
 Oracle                                     3.72            3.87           100%
 Navigation Planner                         5.08            6.18            97%                    x                                                  x
 Modification Planner                       3.82            4.63            94%
                                          TABLE II
                             R ESULTS FROM 20 M R ANDOM F OREST                         0.4
                                                                                        0.2
   In the tests with no obstacles the modification planner is                             0
able to track the optimal path, straight line through the forest,                      −0.2
and achieve a very similar time to the optimisation based                              −0.4
oracle planner (within 3% of the oracle planner). The similar                                  0                5          10            15               20
times to the optimisation based planner when no obstacles                                                                 Time
are present shows that the RL agent learns to not modify
the steering command unnecessarily. The navigation planner                           Fig. 11. Example Trajectory from the Modification Planner with Neural
achieves a significantly slower average time of 5.08 seconds                         Network Output
in this test compared to the 3.72 seconds which the other
planners are able to achieve.
   In the tests with obstacles present, the optimisation based                       B. Repeatability
oracle planner outperforms the other planners as expected                               To measure the repeatability of the learned policies, we
with an average time of 3.87 seconds. The FGM planner                                train and evaluate 10 agents on the random forest described
achieves a mediocre time of 4.33 seconds, however, this                              above. The average and standard deviation statistics are
is still faster than any of the learning-based planners. The                         presented in Table III.
modification planner achieves an average time of 4.63 sec-
                                                                                       Statistic                            Average    Standard Deviation
onds for the forest filled with obstacles which is faster than
                                                                                       No Obstacle Average Time              3.85 s          ± 0.10
the navigation planners time of 6.18 seconds. The decrease                             Obstacle Average Time                 4.63 s          ± 0.17
in time shows that the modification planner is able to find                            Obstacle Success Rate                 92.2%           ± 5.3
smoother, faster trajectories than the mapless navigation                                                             TABLE III
solution. Figure 10 shows a comparison of the trajectories                                             R ESULTS FROM R EPEATABILITY E VALUATION
taken by the navigation and modification planners and how
the modification planner learns to take smoother trajectories.
   The modification planner achieves a slightly lower success                           The standard deviation in the times achieved by the
rate of 94% compared to the navigation planner’s 97%. These                          different agents is small and this shows that the method of
training agents to achieve similar times is stable and highly      our planner is able to avoid obstacles more smoothly than
repeatable. The standard deviation of the success rate is large,   previous solutions from map-less navigation.
5%, which shows that there is a non-trivial difference in how
                                                                                                R EFERENCES
agents learn to avoid obstacles.
                                                                    [1] P. Falcone, F. Borrelli, H. E. Tseng, J. Asgari, and D. Hrovat,
                                                                        “A hierarchical model predictive control framework for autonomous
C. Discussion                                                           ground vehicles,” in 2008 American Control Conference. IEEE, 2008,
   The results show that the modification planner can learn             pp. 3719–3724.
                                                                    [2] V. J. Lumelsky and A. A. Stepanov, “Path-planning strategies for a
policies that improve upon map-less navigation planners.                point mobile automaton moving amidst unknown obstacles of arbitrary
Specifically, the modification planner avoids obstacles more            shape,” Algorithmica, vol. 2, no. 1-4, pp. 403–430, 1987.
smoothly which results in the vehicle being able to travel          [3] V. Sezer and M. Gokasan, “A novel obstacle avoidance algo-
                                                                        rithm:“follow the gap method”,” Robotics and Autonomous Systems,
at higher speeds. This shows that the modification planner              vol. 60, no. 9, pp. 1123–1134, 2012.
achieves the behaviour that it was designed for, allowing           [4] L. Heng, L. Meier, P. Tanskanen, F. Fraundorfer, and M. Pollefeys,
non-holonomic vehicles to avoid obstacles smoothly at high              “Autonomous obstacle avoidance and maneuvering on a vision-guided
                                                                        mav using on-board processing,” in 2011 IEEE International Confer-
speeds. The success rate which the planners achieve show                ence on Robotics and Automation. IEEE, 2011, pp. 2472–2477.
that while the modification planner is able to learn to avoid       [5] R. C. Coulter, “Implementation of the pure pursuit path tracking
a significant number of obstacles, it still underperforms the           algorithm,” Carnegie-Mellon UNIV Pittsburgh PA Robotics INST,
                                                                        Tech. Rep., 1992.
map-less navigation planners ability to consistently avoid          [6] L. Tai, G. Paolo, and M. Liu, “Virtual-to-real deep reinforcement
obstacles.                                                              learning: Continuous control of mobile robots for mapless navigation,”
                                                                        in 2017 IEEE/RSJ International Conference on Intelligent Robots and
   It is suggested that the reason for the lower rate of                Systems (IROS), 2017, pp. 31–36.
completion is that the complexity of the learning problem           [7] G. Kahn, T. Zhang, S. Levine, and P. Abbeel, “Plato: Policy learning
for the modification planner is higher than for the map-                using adaptive trajectory optimization,” in 2017 IEEE International
                                                                        Conference on Robotics and Automation (ICRA). IEEE, 2017, pp.
less navigation baseline. The transition dynamics (movement             3342–3349.
from one state to the next) for the modification planner are        [8] W. Gao, D. Hsu, W. S. Lee, S. Shen, and K. Subramanian, “Intention-
dependant on combining the action, which the agent has                  net: Integrating planning and deep learning for goal-directed au-
                                                                        tonomous navigation,” in Conference on Robot Learning. PMLR,
selected, with the pure pursuit reference. The result of this           2017, pp. 185–194.
combination is that the agent has less influence on how the         [9] M. Wei, S. Wang, J. Zheng, and D. Chen, “Ugv navigation opti-
vehicle moves. Having less influence improves the quality               mization aided by reinforcement learning-based path tracking,” IEEE
                                                                        Access, vol. 6, pp. 57 814–57 825, 2018.
of the path taken which results in the modification planner        [10] H. Durrant-Whyte and T. Bailey, “Simultaneous localization and
achieving faster times. A problem that arises in having less            mapping: part i,” IEEE robotics & automation magazine, vol. 13, no. 2,
influence is that it achieves a lower rate of completion.               pp. 99–110, 2006.
                                                                   [11] C. Cadena, L. Carlone, H. Carrillo, Y. Latif, D. Scaramuzza, J. Neira,
   This problem of guaranteeing vehicle safety is a current             I. Reid, and J. J. Leonard, “Past, present, and future of simultaneous
topic of research and could be solved in future work by ex-             localization and mapping: Toward the robust-perception age,” IEEE
ploring different training methods such as imitation learning.          Transactions on Robotics, vol. 32, no. 6, pp. 1309–1332, 2016.
                                                                   [12] X. Xiao, B. Liu, G. Warnell, and P. Stone, “Motion control for mobile
Another possible solution is to combine the RL agent with               robot navigation using machine learning: a survey,” arXiv preprint
a safety system that can stop the vehicle before it crashes.            arXiv:2011.13112, 2020.
   A recent survey [12] suggested that future work in mobile       [13] F. de Villiers and W. Brink, “Learning fine-grained control for mapless
                                                                        navigation,” in 2020 International SAUPEC/RobMech/PRASA Confer-
navigation should continue into using ML to learn specific              ence. IEEE, 2020, pp. 1–6.
subsystems of the navigation pipeline. In future, we plan to       [14] R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction.
investigate how RL agents can be used in different archi-               MIT press, 2018.
                                                                   [15] S. Fujimoto, H. Hoof, and D. Meger, “Addressing function approxi-
tectures to learn obstacle avoidance behaviours by reducing             mation error in actor-critic methods,” in International Conference on
the complexity of the navigation problem to more simple                 Machine Learning. PMLR, 2018, pp. 1587–1596.
components.

                    VIII. C ONCLUSION
   In this paper, we addressed the problem of local planning
for autonomous navigation. Local planners are responsible
for tracking a reference trajectory and avoiding obstacles.
We presented a novel local planning architecture that uses a
path following algorithm in conjunction with an RL agent
to smoothly avoid obstacles while maintaining the global
plan. Our architecture is evaluated in the context of F1/10th
autonomous racing and shown to achieve comparable racing
times to optimisation based and reactive methods, yet using
fewer inputs. Specifically, our modification local planner
requires only 10 sparse laser range finder readings and is
able to smoothly avoid obstacles. Additionally, we show that
You can also read