EP4078320B1 - Planning in mobile robots - Google Patents
Planning in mobile robots Download PDFInfo
- Publication number
- EP4078320B1 EP4078320B1 EP21703178.0A EP21703178A EP4078320B1 EP 4078320 B1 EP4078320 B1 EP 4078320B1 EP 21703178 A EP21703178 A EP 21703178A EP 4078320 B1 EP4078320 B1 EP 4078320B1
- Authority
- EP
- European Patent Office
- Prior art keywords
- optimizer
- mobile robot
- stage
- trajectory
- computed
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/60—Intended control result
- G05D1/644—Optimisation of travel parameters, e.g. of energy consumption, journey time or distance
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W30/00—Purposes of road vehicle drive control systems not related to the control of a particular sub-unit, e.g. of systems using conjoint control of vehicle sub-units
- B60W30/08—Active safety systems predicting or avoiding probable or impending collision or attempting to minimise its consequences
- B60W30/09—Taking automatic action to avoid collision, e.g. braking and steering
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W60/00—Drive control systems specially adapted for autonomous road vehicles
- B60W60/001—Planning or execution of driving tasks
- B60W60/0015—Planning or execution of driving tasks specially adapted for safety
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W2520/00—Input parameters relating to overall vehicle dynamics
- B60W2520/10—Longitudinal speed
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W2554/00—Input parameters relating to objects
- B60W2554/20—Static objects
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W2554/00—Input parameters relating to objects
- B60W2554/40—Dynamic objects, e.g. animals, windblown objects
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01C—MEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
- G01C21/00—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
- G01C21/20—Instruments for performing navigational calculations
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01C—MEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
- G01C21/00—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
- G01C21/26—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 specially adapted for navigation in a road network
- G01C21/34—Route searching; Route guidance
- G01C21/3446—Details of route searching algorithms, e.g. Dijkstra, A*, arc-flags or using precalculated routes
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D2105/00—Specific applications of the controlled vehicles
- G05D2105/20—Specific applications of the controlled vehicles for transportation
- G05D2105/22—Specific applications of the controlled vehicles for transportation of humans
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D2107/00—Specific environments of the controlled vehicles
- G05D2107/10—Outdoor regulated spaces
- G05D2107/13—Spaces reserved for vehicle traffic, e.g. roads, regulated airspace or regulated waters
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D2109/00—Types of controlled vehicles
- G05D2109/10—Land vehicles
Definitions
- the present invention pertains to planning in autonomous vehicles and other mobile robots.
- An autonomous vehicle also known as a self-driving vehicle, refers to a vehicle which has a sensor system for monitoring its external environment and a control system that is capable of making and implementing driving decisions automatically using those sensors. This includes in particular the ability to automatically adapt the vehicle's speed and direction of travel based on perception inputs from the sensor system.
- a fully-autonomous or "driverless" vehicle has sufficient decision-making capability to operate without any input from a human driver.
- autonomous vehicle as used herein also applies to semi-autonomous vehicles, which have more limited autonomous decision-making capability and therefore still require a degree of oversight from a human driver.
- Other mobile robots are being developed, for example for carrying freight supplies in internal and external industrial zones. Such mobile robots would have no people on board and belong to a class of mobile robot termed UAV (unmanned autonomous vehicle).
- UAV unmanned autonomous vehicle
- Autonomous air mobile robots (drones) are also being developed.
- a receding horizon planner is formulated as Nonlinear Model Predictive Control (NMPC), subject to a set of hard constraints, namely (1) respecting a transition model of the system, including kinematic and dynamic constraints (2) maintaining the vehicle within the limits of the road and (3) avoiding other traffic participants in the sense of guaranteeing a probability of collision below p ⁇ .
- NMPC Nonlinear Model Predictive Control
- a cost function penalizes deviation from a current acceleration and steering angle specified by a human driver - a form of "soft constraint" or "objective” according to the terminology used herein.
- Salvado Joao et al. “Motion planning and goal assignment for robot fleets using trajectory optimisation” describes a scheme for solving the overall fleet management problem in the case of fleets navigating in an environment with obstacles. The method is based on a two phase approach, whereby the first phase solves for fleet-wide boolean decision variables via Mixed Integer Quadratic programming, and the second phase solves for real-valued variables to obtain an optimised set of trajectories for the fleet.
- Hult Robert et al. proposes a two-stage approximation algorithm to solve the problem of coordinating automated vehicles at intersections.
- the procedure (a) first solved a Mixed Integer Quadratic Program (MIQP) to compute an approximate solution to the order in which the vehicles cross the intersection; then (b), solves a Nonlinear Program (NLP) for the optimal state and control trajectories.
- MIQP Mixed Integer Quadratic Program
- NLP Nonlinear Program
- the techniques of [1] formulate planning as a non-linear constrained optimization problem, and seek to solve that problem using a receding horizon formulation (the solution being a set of control inputs that optimize the cost function). It is possible to extend this framework to more complex cost functions, for example with explicit comfort objectives, or to include more sophisticated vehicle (or, more generally, mobile robot) models.
- a challenge with this approach is that convergence to an optimal set of control inputs is both slow and uncertain, particularly as the complexity of the cost function and/or the mobile robot model increases, i.e. the optimizer may take a long time to converge to a solution, or may never successfully converge.
- Another challenge is that non-linear constrained optimization solvers tend to be local in nature and thus have a tendency to converge to local optima which may be far from the globally optimal solution. This can significantly impact mobile robot performance.
- the present invention also formulates mobile robot planning as a constrained optimization problem. Given a scenario with a desired goal, the problem is to find a series of control actions ("policy") that substantially (exactly or approximately) globally optimises a defined cost function for the scenario and the desired goal.
- policy a series of control actions
- the present invention addresses the local optima problems that arise in the context of planning based on constrained optimization. This is a form of trajectory planning (synthesis) based on constrained optimization.
- a core objective herein is that of reducing instances of convergence to locally but non-globally optimal solutions within a constrained optimization planning framework, thereby improving the robustness of the planning process and the quality of the trajectories it produces.
- Mere local optima are a particular problem in optimization-based planning, because they typically correspond to low quality trajectories.
- a first stage is formulated as an optimization problem that is similar to, but simpler than the more complex planning problem that ultimately needs to be solved.
- the first stage may use a linear cost function and linear robot dynamics model.
- Such a problem is generally less susceptible to local optima.
- the solution of the first stage is then used to initialize the second stage, in which the "full" planning problem is solved.
- the solution of the first stage will be more likely to be close to a globally optimal solution to the full planning problem - up to up to some acceptable level of error introduced by the simplification assumptions of the first stage - and therefore reduces the tendency of the second stage to converge to local optima far from the global solution.
- a first aspect of the invention provides a computer-implemented method of determining control actions for controlling a mobile robot, the method comprising:
- a less-complex cost function and dynamics model can be used in the first stage, whilst still providing an effective initialization to the second stage.
- the term complexity refers to the form of the cost function and the model, in the space of variables over which they are defined.
- a robot dynamics model is a predictive component that predicts how the mobile robot will actually move in the scenario in response to a given sequence of control actions, i.e. it models the mobile robot's response to control actions.
- a higher-complexity model as used in the second stage, can model that response more realistically.
- the lower-complexity model is free to use highly-simplifying assumptions about the behaviour of the mobile robot but these may be relatively unrealistic.
- the first predicted trajectory may not even be fully dynamically realisable.
- a higher-complexity cost function and model, as used in the second stage, can provide superior trajectories, which may be of sufficient quality that they can be used as a basis for effective planning and/or control decisions.
- higher-quality trajectories will be obtained when convergence to an approximately globally optimal solution (i.e. at or near a global optima of the full cost function) is achieved.
- the complexity of the full cost function and model increases, achieving such convergence becomes increasingly dependent on the quality of the initialization.
- the simplifying assumptions applied in the first stage make it inherently less susceptible to the problem of non-local optima, i.e. the ability of the first optimizer to converge to an approximately globally optimal solution is far less dependent on any initialization of the first optimization phase.
- the output of the simplified first stage is unlikely to be of sufficient quality to use as a basis for such planning decisions directly, and the trajectories it produces may not even be full dynamically realisable (depending on the simplifying assumptions that are made in the first stage).
- the initialization data of the first stage can still facilitate faster and more reliable convergence to an approximately globally optimal solution in the second stage, which will correspond to a dynamically realisable trajectory.
- the present invention thus benefits from the high-quality output of the more complex second stage, whilst avoiding (or at least mitigating) the issues of local optima convergence that would otherwise come with it, through the provision of an effective initialization to the second stage.
- the described embodiments consider a two-stage constrained optimization.
- the first constrained optimization stage that is applied to determine the initialization data could, itself, be a multi-stage optimization.
- two or more preliminary cost functions may be optimized in the first stage, with at least one of the preliminary cost functions being optimized in order to initialize another of the preliminary cost functions, before ultimately determining the initialization data to the above-mentioned second constrained optimization stage.
- the computed trajectory may be determined, based on an initial mobile robot state, as a series of subsequent mobile robot states.
- Each mobile robot state of the first computed trajectory may be determined by applying the full robot dynamics model to at least the previous mobile robot state of the first computed trajectory and a corresponding control action of the first series of control actions.
- Each mobile robot state of the second computed trajectory may be determined by applying the full robot dynamics model to at least the previous mobile robot state of the second computed trajectory and a corresponding control action of the second series of control actions.
- the preliminary robot dynamics model may be linearly dependent on at least the previous mobile robot state of the first computed trajectory and the corresponding control action of the first series of control actions, and the full robot model may be non-linearly dependent on at least one of the previous mobile robot state of the second computed trajectory and the corresponding control action of the second series of control actions.
- the preliminary cost function may be linearly dependent on the mobile robot states of the first computed trajectory, and the full cost function may be non-linearly dependent on the mobile robot states of the second computed trajectory.
- the preliminary cost function may be linearly dependent on the control actions of the first series, and the full cost function may be non-linearly dependent on the control actions of the second series.
- the first optimizer may be a mixed integer linear programming (MILP) optimizer
- the second optimizer may be a non-linear programming (NLP) optimizer.
- MILP mixed integer linear programming
- NLP non-linear programming
- the hard constraints of the first stage may comprise one or more mixed integer collision avoidance constraints for one or more static or moving obstacles in the scenario and/or one or more mixed integer permitted area constraints for keeping the mobile robot within a permitted area.
- the hard constraints of the second stage may comprise one or more similar collision avoidance and/or permitted area constraints formulated in terms of non-integer variables.
- the first optimizer may apply a receding horizon approximation to iteratively optimize component costs of the preliminary cost function, and thereby determine the first series of control actions, and the second optimizer may not use any receding horizon approximation and may instead optimize the full loss function as a whole.
- the goal may be defined relative to a reference path, and each cost function may encourage achievement of the goal by penalizing at least one of lateral deviation from the deference path, and longitudinal deviation from a reference location on the reference path.
- Each of the computed trajectories may be represented in a frame of reference defined by the reference path.
- the preliminary cost function may be linearly dependent on the above lateral and/or longitudinal deviation, and the full cost function is non-linearly dependent thereon.
- Both cost functions may penalize deviation from a target speed.
- the method may be implemented in a planner of a mobile robot and comprise the step of using control data of at least one of: the second computed trajectory, and the second series of control actions to control motion of the mobile robot.
- Embodiments and optional implementations of the invention address the problem of speed though the use of function approximation.
- the first and second optimizations in real-time may be occasions when running the first and second optimizations in real-time is not feasible.
- one or both of the optimizers may be replaced, in a real-time context, with a function approximator training to approximate the first and/or second optimization stage as applicable.
- the method may be performed repeatedly for different scenarios so as to generate a first training set comprising inputs to the first optimizer and corresponding outputs computed by the first optimizer, and the training set may be used to train a first function approximator to approximate the first optimizer (training method 1).
- the method may be performed repeatedly for different scenarios so as to generate a second training set comprising inputs to the second optimizer and corresponding outputs computed by the second optimizer, and the training set may be used to train a second function approximator to approximate the second optimizer (training method 2).
- the method may comprise the step of configuring a runtime stack of mobile robot to implement one of the following combinations:
- the method may be performed repeatedly for different scenarios so as to generate a training set comprising inputs to the first optimizer and corresponding outputs computed by the second optimizer, and the training set may be used to train a single function approximator to approximate both of the first and the second optimizers (training method 3).
- the method may comprise the step of configuring a runtime stack of mobile robot to implement the single function approximator.
- the runtime stack may be configured to implement one of combinations (i) and (ii), or the single function approximator, and the method may comprise an additional step of configuring the runtime stack with a verification component configured to verify an output of the second function approximator or the single function approximator.
- a further embodiment of the invention provides a computer-implemented method of determining control actions for controlling a mobile robot, the method comprising:
- a further aspect herein provides a computer system comprising one or more optimization components configured to implement or approximate: a first optimization stage as defined in any of the above aspects or embodiments; and a second optimization stage as defined in any of the above aspects or embodiments.
- the one or more optimization components may comprise a first optimization component configured to implement or approximate the first optimization stage, and a second optimization component configured to implement or approximate the second optimization stage, using initialization data provided by the first optimization component.
- the first optimization component may take the form of a first function approximator trained in accordance with training method 1 above and/or the second optimization component may take the form of a second function approximator trained in accordance with training method 2 above to approximate the second optimization stage.
- the second optimization component may take the form of a second function approximator, and the computer system may additionally comprise a verification component configured to verify at least one of a second trajectory and a second series of control actions computed by the second function approximator.
- the one or more optimization components may comprise a single optimization component, trained in accordance with training method 3 above, to approximate both of the first and second optimization stages.
- the computer system may additionally comprise a verification component configured to verify at least one of a second trajectory and a second series of control actions computed by the single function approximator.
- the or each function approximator may have a neural network architecture.
- the computer system may be embodied in an autonomous vehicle or other mobile robot, wherein the computer system may be further configured to control the motion of the mobile robot via one or more actuators of the mobile robot using control data provided by the second optimization component.
- a further aspect herein provides a computer program for programming one or more computers to implement any of the methods disclosed herein.
- Embodiments are described in the context of an autonomous vehicle. However, the description applies equally to other forms of autonomous mobile robot.
- Figure 1 shows a highly schematic block diagram of a runtime stack 100 for an autonomous vehicle (AV).
- the run time stack 100 is shown to comprise a perception stack 102, a prediction stack 104, a planner 106 and a controller 108.
- the perception stack 102 receives sensor outputs from an on-board sensor system 110 of the AV.
- the on-board sensor system 110 can take different forms but generally comprises a variety of sensors such as image capture devices (cameras/optical sensors), LiDAR and/or RADAR unit(s), satellite-positioning sensor(s) (GPS etc.), motion sensor(s) (accelerometers, gyroscopes etc.) etc., which collectively provide rich sensor data from which it is possible to extract detailed information about the surrounding environment and the state of the AV and any external actors (vehicles, pedestrians, cyclists etc.) within that environment.
- image capture devices cameras/optical sensors
- LiDAR and/or RADAR unit(s) satellite-positioning sensor(s) (GPS etc.)
- GPS etc. satellite-positioning sensor(s)
- motion sensor(s) accelerelerometers, gyroscopes etc.
- the sensor outputs typically comprise sensor data of multiple sensor modalities such as stereo images from one or more stereo optical sensors, LiDAR, RADAR etc.
- Stereo imaging may be used to collect dense depth data, with LiDAR/RADAR etc. proving potentially more accurate but less dense depth data. More generally, depth data collection from multiple sensor modalities may be combined in a way that respects their respective levels (e.g. using Bayesian or non-Bayesian processing or some other statistical process etc.). Multiple stereo pairs of optical sensors may be located around the vehicle e.g. to provide full 360° depth perception. This provides a much richer source of information than is used in conventional cruise control systems.
- the perception stack 102 comprises multiple perception components which co-operate to interpret the sensor outputs and thereby provide perception outputs to the prediction stack 104.
- the perception outputs from the perception stack 102 are used by the prediction stack 104 to predict future behaviour of the external actors.
- Predictions computed by the prediction stack 104 are provided to the planner 106, which uses the predictions to make autonomous driving decisions to be executed by the AV in a way that takes into account the predicted behaviour of the external actors.
- the planner 106 implements the techniques described below to plan trajectories for the AV and determine control actions for realizing such trajectories.
- a core function of the planner 106 is to determine a series of control actions for controlling the AV to implement a desired goal in a given scenario.
- a scenario is determined using the perception stack 102 but can also incorporate predictions about other actors generated by the prediction stack 104.
- a scenario is represented as a set of scenario description parameters used by the planner 106.
- a typical scenario would define a drivable area and would also capture predicted movements of any obstacles within the drivable area (such as other vehicles) along with a goal.
- a goal would be defined within the scenario, and a trajectory would then need to be planned for that goal within that scenario.
- obstacles are represented probabilistically in a way that reflects the level of uncertainty in their perception within the perception stack 102.
- the goal could for example be to enter a roundabout and leave it at a desired exit; to overtake a vehicle in front; or to stay in a current lane at a target speed (lane following).
- the goal may, for example, be determined by an autonomous route planner (not shown).
- the controller 108 executes the decisions taken by the planner 106 by providing suitable control signals to on-board actuators 112 such as motors of the AV.
- the controller 108 controls the actuators in order to control the autonomous vehicle to follow a trajectory computed by the planner 106.
- the planner 106 plans over acceleration (magnitude) and steering angle control actions simultaneously, which are mapped to a corresponding trajectory by modelling the response of the vehicle to those control actions. This allows constraints to be imposed both on the control actions (such as limiting acceleration and steering angle) and the trajectory (such as collision-avoidance constraints), and ensures that the final trajectories produced are dynamically realisable.
- the planner 106 will determine an optimal trajectory and a corresponding sequence of control actions that would result in the optimal trajectory according to whatever vehicle dynamics model is being applied.
- the control actions determined by the planner 106 will not necessarily be in a form that can be applied directly by the controller 108 (they may or may not be).
- control data is used herein to mean any trajectory information derived from one of both of the planned trajectory and the corresponding series of control actions that can be used by the controller 108 to realize the planner's chosen trajectory.
- the controller 108 may take the trajectory computed by the planner 106 and determine its own control strategy for realizing that trajectory, rather than operating on the planner's determined control actions directly (in that event, the controller's control strategy will generally mirror the control actions determined by the planner, but need not do so exactly).
- the planner 106 will continue to update the planned trajectory as a scenario develops. Hence, a trajectory determined at any time may never be realized in full, because it will have been updated before then to account for changes in the scenario that were not predicted perfectly.
- the present disclosure addresses the planning problem of determining an optimal series of control actions (referred to herein as a "policy") for a given scenario and a given goal as a constrained optimization problem with two optimization stages.
- a polyicy an optimal series of control actions
- the first optimization stage solves a Mixed Integer Linear Programming (MILP) problem where obstacles lead to hard constraints for the MILP.
- MILP Mixed Integer Linear Programming
- the outcome of this seeds (initializes) a subsequent nonlinear optimisation of the trajectory for dynamics and comfort constraints, in the second optimization stage.
- the second optimization stage solves a Non-Linear Programming (NLP) problem, initializing using the results of the MILP optimization stage.
- NLP Non-Linear Programming
- the MILP stage uses a mixed-integer formulation of collision avoidance and drivable area constraints as described later.
- the NLP stage is described first, in order to set out the final planning problem that ultimately needs to be solved (Problem 2 below), and to provide some context for the subsequent description of how best to provide an effective initialization (seed) from the MILP stage.
- a vehicle whose motion is being planned is referred to as the ego vehicle.
- the ego state at time 0 is given by X 0 , and a function of the planner 106 is to determine the ego states over the next N steps,
- Other traffic participants such as vehicles, pedestrians and cyclists, are indexed by i ⁇ ⁇ 1, ..., n ⁇ and their pose at time k is modelled using a Gaussian distribution having a mean and covariance ⁇ k i .
- the set of the mean poses of all traffic participants over time is O 0 : N 1 : n , and similarly for the covariances ⁇ 0 : N 1 : n , and both are given as inputs to the planning problem.
- the global coordinate frame is transformed to a reference path-based representation under an invertible transform as described in Sec. II-B.
- This representation significantly simplifies the problem of path tracking.
- a goal of the ego vehicle is defined as following a differentiable and bounded two-dimensional reference path in the global coordinate frame, , parameterized by the distance from the start of the path ( ).
- the reference path is a path which the ego vehicle is generally intending to follow, at a set target speed. However, deviation from the reference path and target speed, whilst discouraged, are permitted provided that no hard constraints (such as collision avoidance constraints) are breached.
- the reference path can be determined using knowledge of the road layout, which may use predetermined map data (such as an HD map of the driving area), information from the perception stack 104, or a combination of both. For complex layouts in particular (such as complex junctions or roundabouts), the reference path could be learned by monitoring the behaviour of other drivers in the area over time.
- Figure 2A illustrates the process of going from an input ( X 0 , O 0 : N 1 : n , ⁇ 0 : N 1 : n ) to the desired output X 1:N .
- the output of the planner 106 is an intended trajectory x 0: N in the reference path-based representation, which in turn is transformed back to the global coordinate frame by applying the inverse transformation to obtain X 0: N in the global coordinate frame. Further details of the transformation are described below.
- the invertible transform operates over three types of input: (1) poses, (2) velocities and (3) covariance matrices. Each of the individual operations is described next.
- Figure 2B shows a visual representation of the transform between the world frame of reference and the reference path-based frame, and , the inverse transform.
- the ego vehicle is modelled as a rigid body occupying an area S e ⁇ R 2 relative to its center, and the area occupied by the ego vehicle at state x k is given by
- the area each traffic participant occupies is defined as with probability larger than p ⁇ .
- a cost function J x 0 : N u 0 : N ⁇ 1 o 0 : N 1 : n ⁇ 0 : N 1 : n defined over the positions and controls of the ego vehicle and states and uncertainties of other traffic participants, a "policy synthesis" problem can be formulated.
- a "policy” in the present context refers to a time-series of control actions, which is mapped to a corresponding vehicle trajectory using a vehicle dynamics model.
- the hard constraints comprise (1) vehicle model constraints, including kinematic ones on the transitions of the model and on the controls it allows (Sec. III-1); and (2) collision avoidance constraints with the purpose of avoiding collisions with the boundaries of the road (which can include, e.g. construction areas or parked vehicles) as well as other traffic participants (Sec. III-2).
- Soft constraints make up the terms of the cost function (i.e. are embodied in the cost function itself) and are presented in Sec. III-3, whereas hard constraints constrain the optimization of the cost function. Sec. III-4 describes the full problem.
- a limit is imposed on the allowed ranges of controls by enforcing
- speed is constrained as 0 ⁇ v min ⁇ v k ⁇ v max , to guarantee forward motion within a set speed limit v max .
- Collision Avoidance The collision avoidance problem is divided into two different types of avoidance: the limits of the road (which may include other features such as construction zones or parked vehicles) and other traffic participants. For both, a representation of the area of the ego-vehicle is defined as ( x k ) for state x k . A simplified representation is used wherein the ego vehicle boundary is represented solely by its corners.
- the driveable surface is described by limits, b l ( ⁇ ) , b r ( ⁇ ) , which are continuous and differentiable functions of ⁇ defining the left and right borders, respectively, with ⁇ ⁇ : b l ( ⁇ ) ⁇ b r ( ⁇ ) .
- the weights associated with the soft constraints determine their relative importance.
- Soft constraints are defined over different objectives:
- NLP formulation The optimization problem is formulated using the constraints and cost J .
- this problem is a general (equality and inequality constraints) non-linear, non-convex, constrained optimization problem. While it is appealing to solve the problem directly, or using a receding horizon formulation as in [1], there are two major challenges to this approach:
- the tuple ( x 0 , O 0 : n 1 : N , ⁇ 0 : n 1 : N ) is an example of a set of scenario description parameters, describing a dynamic scenario and a starting state of the ego vehicle within it.
- the dynamics of the scenario is captured in terms of the motion components of x 0 , O 0 : n 1 : N and ⁇ 0 : n 1 : N , i.e. the speed and heading dimensions within the NLP formulation.
- Figure 3 shows a two-tier optimization architecture: a first stage 302 corresponds to solving a receding horizon formulation of the linearized version of the problem; in a second stage 304, the problem is solved in a non-linear non-convex constrained optimization fashion using the solution of the first stage as an initialization (e.g. a warm-start initialization).
- an initialization e.g. a warm-start initialization
- the two stage optimisation of Figure 3 may be referred to as 2s-OPT.
- a motivation behind the architecture is to avoid the local optima convergence in the non-linear, non-convex constrained optimization by providing an initial solution that is closer to the global optimum, which also should lead to faster and more reliable convergence.
- the first stage 302 solves a linearized version of Problem 2 in a finite, receding horizon manner using a Mixed Integer Linear Programming (MILP) formulation (details presented in Sec. IV-A).
- MILP Mixed Integer Linear Programming
- the second stage 304 uses the output of the MILP optimizer as an initial solution and solves the full Problem 2, as set out above. Given the similar representations of the linearized and non-linear problems, it is expected that this initialization improves convergence, speed and the quality of the final solution.
- Problem 2 is re-formulated as a Mixed Integer Linear Programming optimization problem, with mixed integer linear constraints and a multi-objective cost function.
- This representation is still based on the reference path , and is the same as the representation used in the second stage 304 in terms of the spatial dimensions.
- the second stage 304 uses an angular representation of motion (i.e. velocity and acceleration are represented in terms of magnitude and angle components)
- the first stage uses a linear representation of velocity and acceleration.
- Similar constraints are imposed as in the non-linear model of the second stage 304, in particular input bounds constraints, a x min ⁇ a x k ⁇ a x max and a y min ⁇ a y k ⁇ a y max ; jerk related constraints, a k + 1 x ⁇ a k x ⁇ ⁇ a max x ⁇ t and a k + 1 y ⁇ a k y ⁇ ⁇ a max y ⁇ t ; and velocity constraints v x min ⁇ v x k ⁇ v x max and v y min ⁇ v y k ⁇ v y max , with v x min ⁇ 0 to guarantee forwards motion.
- d is a function of the size of the ego vehicle with respect to its point estimate.
- d w 2 + l 2 / 2 , i.e. reduce the driveable surface to where ⁇ is the Minkowski-sum operator.
- the ellipses L a k i b k i defined in (9) are inscribed with axis-aligned rectangles, which are then augmented to consider the point estimate of the ego vehicle's pose.
- x k , min i , x k , max i , y k , min i , y k , max i can be computed in closed form from L a k i b k i , d x and d y .
- the collision avoidance constraint is the logical implication: x k , min i ⁇ x ⁇ x k , max i ⁇ y ⁇ y k , min i ⁇ y ⁇ y k , max i which can be understood as "if the ego position is aligned with the vehicle along x, then it must be outside the vehicle's borders in y".
- the cost function of the MILP stage 302 should be similar to the cost function from the non-linear stage 304 to minimize the gap between the optimum obtained in both stages.
- the optimal trajectory computed in the first stage 302 should approximate the final optimal trajectory computed in the second stage 304, and therefore provide a useful initialization.
- MILP Problem Definition With constraints and cost function the planning problem of the first stage 302 is formulated as a MILP problem with a receding horizon of K steps.
- a full trajectory x ⁇ 0: N of length N +1 is obtained at the first stage 302, to initialize the second stage 304.
- the receding horizon approximation iteratively breaks the problem down into a set of simpler optimization problems, each defined over K ⁇ N steps.
- a "cost component" of the full cost function is defined as the sum of the individual costs over only K steps, as set out below.
- the known initial vehicle state x 0 is used as a starting point, with subsequent states determined by applying the vehicle dynamics model F ⁇ t in an iterative fashion.
- the state x m-1 determined in the previous planning step m - 1 provides a starting point for that iteration, with subsequent states computed from that starting point in the same way.
- the state vectors of both the ego vehicles and the other actors, and control vectors are not restricted to integer values, or otherwise discretised to any fixed grid. Not discretizing over a grid is one of the advantages of the method, as it allows for smoother trajectories.
- the integer variables in the MILP stage are used to enforce the constraints related to collision avoidance and driveable surface only.
- the trajectory x 0: N computed in the MILP optimization stage 302 is an example is a "seed trajectory", as that term is used herein, whereas the trajectory x 0: N computed in the NLP optimization stage 304 (derived, in part, from the seed trajectory x 0: N ), is an example of a "final” trajectory.
- the trajectory is "final” within the specific context of the two-stage optimization.
- the term “final” does not necessarily imply finality within the context of the runtime 100 stack as a whole.
- the final trajectory x 0: N may or may not be subject to further processing, modification etc. in other parts of the runtime stack 100 before it is used as a basis for generating control signals within the controller 108.
- the linear cost function of Equation (22) and the linear dynamics model of Equations (15) and (16) are examples of a "preliminary” cost function and model respectively
- the non-linear cost function of Equation (11) and the non-linear dynamics model of Equation (5) are examples a “final” or, equivalently, “full” cost function and model respectively, where the terms “final” and “full” are, again, only used in the specific context of the two-stage optimization, and does not imply absolute finality.
- control actions u 0: N -1 of the MILP stage 302 are alternatively or additionally used to seed the NLP stage 304 (and they may be referred to as seed control actions in that event); similarly, as noted above, data of one or both of the final trajectory x 0: N and the final series of control actions u 0: N -1 may be provided to the controller 108 for the purpose of generating "actual" control signals for controlling the autonomous vehicle.
- the scenario description parameters ( x 0 , O 0 : n 1 : N , ⁇ 0 : n 1 : N ) have a non-linear form (because they are formulated in terms of heading ⁇ k )
- the scenario description parameters ( x 0 , O 0 : n 1 : N , ⁇ 0 : n 1 : N ) have a linear form (because motion is formulated in terms of components v k x , v k y instead of speed and heading v k , ⁇ k ), with x 0 denoting the linearized form of the ego vehicle state specifically.
- x 0: N and x 0: N are used as shorthand to represent complete trajectories, including the (known) current state x 0 .
- the additional states actually determined in the relevant optimization stage 302, 304 are x 1: N and x 1: N , as depicted in Figure 3 .
- An overtaking goal is defined by way of a suitably distant reference location (not shown) on the reference path , ahead of the forward vehicle 402.
- the effect of the progress constraints is to encourage the ego vehicle 400 to reach that reference location as soon as it can, subject to the other constraints, whilst the effect of the collision avoidance constraints is to prevent the ego vehicle 402 from pulling out until the oncoming vehicle stops being a collision risk.
- Figure 5 illustrates some of the principles underlying the MILP formulation of the first stage 302.
- the ego vehicle 400 remains axially aligned (i.e. aligned with the reference path ) at all times.
- This reflects the simpler, linear dynamics model that is used in the MILP stage - as can be seen from Equations (15) and (16) above, the simpler dynamics model does not consider rotation of the ego vehicle 200 (recall that the MILP stage uses a linear representation of velocity and acceleration).
- the output of the first stage 302 is a more "restricted" form of trajectory defined in terms of linear position and velocity but not orientation/heading of the ego vehicle 200.
- the linearized trajectory may not be fully dynamically realisable, but nevertheless serves as a useful initialization to the second stage 304.
- a solution to Problem 3 can be obtained using Branch and Bound, a divide and conquer algorithm first introduced and applied to mixed integer linear programming by Land and Doig in [10]. This solution is proven to be the globally optimal one [7], [11].
- modern solvers e.g. Gurobi or CPLEX
- the receding horizon formulation of the problem introduced for the sake of computational tractability, generates suboptimality by definition [12], [13]. Due to these factors, no strong theoretical guarantee can be given regarding the solution of the MILP stage.
- a solution that is close to the global optimum at each receding horizon step acts as a proxy towards a final solution to be obtained in the second stage 304 that is close to the global optimum and, in turn, initializing the NLP stage using this solution is expected to improve the quality of the solution at that second stage 304.
- the speed increase provided by the two-stage approach will admit direct application on an autonomous vehicle in real-time.
- a function approximator 304 can be trained to approximate this function of the NLP optimization stage 304.
- the trained function approximator 604 (which may be referred to as the NLP approximator 602) can, in turn, be implemented on-board the vehicle 400, rather than attempting to perform the full non-linear optimization of the second stage 304 on-board the vehicle in real-time.
- Each training example 601 of the NLP training set is made up of an input, which in turn comprises a particular set of scenario description parameters (x 0 , O 0 : n 1 : N , ⁇ 0 : n 1 : N ) together with a corresponding seed trajectory x 0:N (i.e. as computed in the MILP stage 302 for the corresponding scenario description parameters), and a corresponding ground truth, which in this case is a corresponding final trajectory x 0: N as computed in the NLP stage 304 for the corresponding input.
- scenario description parameters x 0 , O 0 : n 1 : N , ⁇ 0 : n 1 : N
- a corresponding seed trajectory x 0:N i.e. as computed in the MILP stage 302 for the corresponding scenario description parameters
- a corresponding ground truth which in this case is a corresponding final trajectory x 0: N as computed in the NLP stage 304 for the corresponding input.
- the NLP approximator 900 computes a corresponding final trajectory x ⁇ 0: N which will approximate a full NLP solution x 0: N that would be obtained from the full NLP optimization ( x 0: N may be referred to as the exact final trajectory in this context).
- a trajectory verification component 702 is provided, whose function is to verify the approximate final trajectory x ⁇ 0: N against the hard NLP constraints (i.e. the constraints as set out above in relation to Problem 2), or some practical variant of the hard NLP constraints (e.g.
- the NLP approximator 712 could be trained "conservatively" on somewhat more stringent hard constraints than the constraints the approximate final trajectory x ⁇ 0: N is actually required to satisfy, in order to reduce instances in which the approximate final trajectory x ⁇ 0: N fails the actual hard constraints imposed by the trajectory verification component 712). That is to say, the trajectory verification component 712 verifies the approximate final trajectory x ⁇ 0: N against a set of trajectory verification constraints, which may or may not be identical to the NLP constraints of Problem 2.
- the approximate final trajectory x ⁇ 0: N satisfies the trajectory verification constraints, it can be passed to the controller 108 to implement.
- the approximate final trajectory fails to satisfy at least one of the trajectory verification constraints, then it can modified, either at the planning level (by the planner 106) or control level (by the controller 108) so that it does. Assuming the NLP approximator 702 has been adequately trained such modifications, if required, should generally be relatively small.
- the different scenarios could, for example, be simulated scenarios, generated in a simulator.
- FIG. 7 shows how a similar approach may be adopted to approximate the MILP stage 302 online.
- a MILP training set 700 is created through repeated applications of the MILP optimizer on different scenarios, and each training example 701 of the MILP training set 700 is made up of an input in the form of particular set of scenario description parameters ( x 0 , O 0 : n 1 : N , ⁇ 0 : n 1 : N ) (mirroring the input to the MILP stage 302) and a corresponding ground truth, which is now the corresponding seed trajectory x 0: N (i.e. as obtained for the corresponding input parameters in the MILP stage 302).
- the seed trajectory x ⁇ 0:N forms part of the input of each training example 601 of the NLP training set 600 of Figure 6 (with the full trajectory x 0 providing the ground truth), in the MILP training set 700 of Figure 7 , the seed trajectory x 0: N now provides the ground truth.
- a function approximator 702 is trained on the MILP training set 700 to approximate the MILP stage 602.
- Such a trained MILP approximator 702 can then be used in online processing to compute an approximate seed trajectory x ⁇ ⁇ 0 : N given a set of online scenario description parameters (x 0 , O 0 : n 1 : N , ⁇ 0 : n 1 : N ), and the approximate seed trajectory x ⁇ ⁇ 0 : N can, in turn, be used to initialize an online instantiation of the NLP stage 304.
- the full NLP stage 304 is implemented, and the trajectory verification component 712 is therefore not required.
- x ⁇ 0: N and x ⁇ ⁇ 0 : N are used as shorthand to represent complete trajectories, including the current state x 0 .
- the states actually determined by the relevant function approximator are x ⁇ 0: N and x ⁇ ⁇ 1 : N .
- Certain figures show x ⁇ 0: N and x ⁇ ⁇ 0 : N as an output of a function approximator. It will be appreciated that this is a shorthand decision of the computed states x ⁇ 1: N / x ⁇ ⁇ 1 : N (as applicable), together with the known initial state x 0 provided as an input to the system. This also applies to Figures 8 and 9 (see below).
- Figure 8 shows an example of an implementation in which the MILP stage 302 and the NLP stage 304 are both approximated.
- the MILP approximator 702 of Figure 7 is used to seed the NLP approximator 604 of Figure 6 , i.e. an approximate seed trajectory x ⁇ ⁇ 0 : N forms an input to the NLP function approximator 604, which in turn is used in deriving an approximate full trajectory x ⁇ 0: N .
- the trajectory verification component 712 is used, as in Figure 6 , to verify the approximate full trajectory x ⁇ 0: N .
- Figure 9 shows another example implementation, in which a single function approximator 900 is trained to approximate both optimization stages 302, 304.
- the function approximator 900 takes, as input, a set of scenario description parameters, and directly computes an approximate full trajectory x ⁇ 0: N .
- the single approximator 900 would be trained on training examples whose inputs are scenario description parameters and whose ground truths are provided by the corresponding full NLP trajectories.
- a trajectory verification component 732 is similarly used in this case to verify the output of the single function approximator 900.
- the term "approximate" is used herein in two somewhat distinct senses.
- the MILP / NLP function approximator 302 / 304 is said to approximate the seed trajectory x 0: N / final trajectory x 0: N as applicable (i.e.
- the output of the trained MILP approximator 702, x ⁇ ⁇ 0 : N is said to be an approximation of the exact seed trajectory x 0: N that would be derivable via the full MILP optimization; likewise, the output of the trained NLP approximator 704, ⁇ x 0: N , is said to be an approximation of the exact final trajectory x 0: N that would be obtained via the full NLP optimization).
- the seed trajectory x ⁇ 0 : N / x ⁇ ⁇ 0 : N (whether exact or approximate in the sense of the preceding paragraph) is also said to approximate the full trajectory x 0: N / x ⁇ 0: N (whether exact or approximate in the sense of the preceding paragraph).
- the exact or approximate seed trajectory x ⁇ 0 : N / x ⁇ ⁇ 0 : N may be a relatively coarse approximation of the exact or approximate final trajectory x 0 : N / x ⁇ 0 : N .
- a relatively coarse approximation in this context is perfectly acceptable, because even a coarse seed trajectory (i.e.
- Equation (22) approximates the non-linear cost function of Equation (11) and the linear dynamics model of Equations (15) and (16) approximates the non-linear dynamics model of Equation (5).
- each approximator 602, 704 could alternatively or additionally be trained to determine an approximate series of seed control actions u ⁇ ⁇ 0 : N ⁇ 1 (approximating those of the MILP stage 302) and/or an approximate series of final control actions ⁇ 0: N -1 (approximating those of the NLP stage 304 - and which could in turn form (part of) or be used to derive the input to controller 108).
- each approximator 602, 704 could alternatively or additionally be trained to determine an approximate series of seed control actions u ⁇ ⁇ 0 : N ⁇ 1 (approximating those of the MILP stage 302) and/or an approximate series of final control actions ⁇ 0: N -1 (approximating those of the NLP stage 304 - and which could in turn form (part of) or be used to derive the input to controller 108).
- the same applies to the single approximator of Figure 9 which could alternatively or additionally compute ⁇ 0: N -1 directly from the scenario description parameters
- the above techniques be implemented in an "onboard” or “offboard” context.
- One example of an offboard context would be the above training performed in an offboard computer system.
- the above techniques can also be implemented as part of a simulated runtime stack in order to test the performance of the runtime stack in a simulator. Simulation is an increasingly crucial component of safety and other performance testing for autonomous vehicles in particular.
- the function approximator(s) can be tested in the simulator, in order to assess their safety and performance before they are deployed on an actual vehicle.
- Nonlinear non-convex constrained optimization methods are not guaranteed to converge even if feasible solutions exist. They may, therefore, be deployed with the implementation of a fallback mechanism (e.g. emergency braking), which is not necessarily guaranteed to meet the same safety constraints.
- a goal in this context is to improve the convergence and solution quality of existing nonlinear constrained optimization methods, thus improving the safety of the overall system by reducing the reliance on the potentially unsafe fallback mechanism, while yielding lower-cost plans. This is achieved via an informed initialization step.
- the present techniques are not limited to real-time applications.
- the described techniques for improving convergence and achieving higher quality solutions might compromise runtime efficiency, depending on the implementation, but such implementations may nevertheless be suitable for applications that can tolerate longer planning times in return for higher-quality plans - such as high-fidelity simulation or urban driving.
- a computer system comprises one or more computers that may be programmable or non-programmable.
- a computer comprises one or more processors which carry out the functionality of the aforementioned functional components.
- a processor can take the form of a general-purpose processor such as a CPU (Central Processing unit) or accelerator (e.g. GPU) etc. or more specialized form of hardware processor such as an FPGA (Field Programmable Gate Array) or ASIC (Application-Specific Integrated Circuit).
- CPU Central Processing unit
- accelerator e.g. GPU
- FPGA Field Programmable Gate Array
- ASIC Application-Specific Integrated Circuit
- a processor may be programmable (e.g. an instruction-based general-purpose processor, FPGA etc.) or non-programmable (e.g. an ASIC).
- a computer system may be implemented “onboard” a mobile robot (such as an AV) or “offboard” for example in a simulator context.
- an offboard implementation of the full multi-stage optimization could be implemented for the purpose of training on-board components, but also for other purposes, such as safety testing or other performance testing, verification etc.
- Performance testing could involve implementing any of the above trained function approximators 702, 604, 900 and (where applicable) the verification component 712 in an offboard computer system as part of the performance testing, applied to simulated scenarios.
- the multi-phase optimization may be implemented in full, or one or more (including all) optimization stages may be replaced with one or more function approximators.
- Figure 10 shows, in the top half, an overtaking scenario somewhat analogous to that of Figures 4 and 5 .
- the top scenario of Figure 10 shows a scenario with overtaking of static vehicles 402 with an oncoming vehicle 404 in residential driving.
- FIG. 10 shows an alternative, junction scenario, which uses like reference numerals to denote the ego vehicle 400 and an external actor 402. This is an unprotected right turn in a junction.
- Figures 11 and 12 show two qualitatively distinct driving situations.
- the ego-vehicle overtakes parked vehicles on a two-lane road while taking into account an oncoming vehicle on the other lands that is also overtaking a static vehicle in its own lane (the reference path) for overtaking, and returning to it as soon as possible, while always maintaining a smooth acceleration profile.
- the second example in Figure 12 shows an unprotected right turn into a main road.
- Figure 12(d) shows the ego-vehicle closely reference path (right turn into the main road) while smoothly responding to unforeseen behaviors of the dynamic vehicles.
- Figure 13 shows examples of the four different simulation scenarios we consider for experiments 2) and 3): (SO) static overtaking of parked vehicles on both side of the road; (SO+OV) static overtaking with a dynamic vehicle in the oncoming lane; (DO) dynamic overtaking of a slow moving vehicle; and (DO+OV) dynamic overtaking with an oncoming vehicle on the other lane.
- SO static overtaking of parked vehicles on both side of the road
- SO+OV static overtaking with a dynamic vehicle in the oncoming lane
- DO dynamic overtaking of a slow moving vehicle
- DO+OV dynamic overtaking with an oncoming vehicle on the other lane.
- Table D-I presents the percentage of solved examples within the allocated time (25s) per initialization method and scenario. As it can be observed, our method either performs similarly or outperforms other methods in all the scenarios considered.
- Figure 14 we analyze the problems that were solved using our initialization when compared to the other initialization methods. We observe that, across scenarios ZEROS and C.DEC lead to the highest percentage of solved problems (after ours) and same solutions, but also the highest percentage of different solutions.
- Table II shows the average % difference in cost that the alternatives achieve compared to ours.
- Table II presents the average % difference in solving time.
Landscapes
- Engineering & Computer Science (AREA)
- Automation & Control Theory (AREA)
- Transportation (AREA)
- Mechanical Engineering (AREA)
- Human Computer Interaction (AREA)
- Aviation & Aerospace Engineering (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)
Description
- The present invention pertains to planning in autonomous vehicles and other mobile robots.
- A rapidly emerging technology is autonomous vehicles (AVs) that can navigate by themselves on urban roads. Such vehicles must not only perform complex manoeuvres among people and other vehicles, but they must often do so while guaranteeing stringent constraints on the probability of adverse events occurring, such as collision with these other agents in the environments. An autonomous vehicle, also known as a self-driving vehicle, refers to a vehicle which has a sensor system for monitoring its external environment and a control system that is capable of making and implementing driving decisions automatically using those sensors. This includes in particular the ability to automatically adapt the vehicle's speed and direction of travel based on perception inputs from the sensor system. A fully-autonomous or "driverless" vehicle has sufficient decision-making capability to operate without any input from a human driver. However, the term autonomous vehicle as used herein also applies to semi-autonomous vehicles, which have more limited autonomous decision-making capability and therefore still require a degree of oversight from a human driver. Other mobile robots are being developed, for example for carrying freight supplies in internal and external industrial zones. Such mobile robots would have no people on board and belong to a class of mobile robot termed UAV (unmanned autonomous vehicle). Autonomous air mobile robots (drones) are also being developed.
- Recent work in the field has considered planning formulated as a constrained non-linear optimization problem, though mainly in the context of assistive (not fully-autonomous) driving. W. Schwarting, J. Alonso-Mora, L. Paull, S. Karaman, and D. Rus, "Safe nonlinear trajectory generation for parallel autonomy with a dynamic vehicle model," IEEE Transactions on Intelligent Transportation Systems, no. 99, pp. 1-15, 2017, denoted by [1] herein, discloses a "Parallel Autonomy" framework which allows deviation from human driver inputs to maintain safety, but seeks to minimize extreme intervention. A receding horizon planner is formulated as Nonlinear Model Predictive Control (NMPC), subject to a set of hard constraints, namely (1) respecting a transition model of the system, including kinematic and dynamic constraints (2) maintaining the vehicle within the limits of the road and (3) avoiding other traffic participants in the sense of guaranteeing a probability of collision below pε . A cost function penalizes deviation from a current acceleration and steering angle specified by a human driver - a form of "soft constraint" or "objective" according to the terminology used herein.
- Salvado Joao et al. "Motion planning and goal assignment for robot fleets using trajectory optimisation" describes a scheme for solving the overall fleet management problem in the case of fleets navigating in an environment with obstacles. The method is based on a two phase approach, whereby the first phase solves for fleet-wide boolean decision variables via Mixed Integer Quadratic programming, and the second phase solves for real-valued variables to obtain an optimised set of trajectories for the fleet.
- Hult Robert et al. "An MIQP-based heuristic for optimal coordination of vehicles at intersections" proposes a two-stage approximation algorithm to solve the problem of coordinating automated vehicles at intersections. The procedure (a) first solved a Mixed Integer Quadratic Program (MIQP) to compute an approximate solution to the order in which the vehicles cross the intersection; then (b), solves a Nonlinear Program (NLP) for the optimal state and control trajectories.
- Most Thomas "Approximation of complex nonlinear functions by means of neural networks" investigates the applicability of neural networks for the approximation of several nonlinear problems. The obtained results are compared to those from classical response surface approaches.
- The techniques of [1] formulate planning as a non-linear constrained optimization problem, and seek to solve that problem using a receding horizon formulation (the solution being a set of control inputs that optimize the cost function). It is possible to extend this framework to more complex cost functions, for example with explicit comfort objectives, or to include more sophisticated vehicle (or, more generally, mobile robot) models. However, a challenge with this approach is that convergence to an optimal set of control inputs is both slow and uncertain, particularly as the complexity of the cost function and/or the mobile robot model increases, i.e. the optimizer may take a long time to converge to a solution, or may never successfully converge. Another challenge is that non-linear constrained optimization solvers tend to be local in nature and thus have a tendency to converge to local optima which may be far from the globally optimal solution. This can significantly impact mobile robot performance.
- The present invention also formulates mobile robot planning as a constrained optimization problem. Given a scenario with a desired goal, the problem is to find a series of control actions ("policy") that substantially (exactly or approximately) globally optimises a defined cost function for the scenario and the desired goal. Among other benefits, the present invention addresses the local optima problems that arise in the context of planning based on constrained optimization. This is a form of trajectory planning (synthesis) based on constrained optimization.
- A core objective herein is that of reducing instances of convergence to locally but non-globally optimal solutions within a constrained optimization planning framework, thereby improving the robustness of the planning process and the quality of the trajectories it produces. Mere local optima are a particular problem in optimization-based planning, because they typically correspond to low quality trajectories.
- To this end, the present invention uses a multi-stage constrained optimisation. A first stage is formulated as an optimization problem that is similar to, but simpler than the more complex planning problem that ultimately needs to be solved. For example, the first stage may use a linear cost function and linear robot dynamics model. Such a problem is generally less susceptible to local optima. The solution of the first stage is then used to initialize the second stage, in which the "full" planning problem is solved. The solution of the first stage will be more likely to be close to a globally optimal solution to the full planning problem - up to up to some acceptable level of error introduced by the simplification assumptions of the first stage - and therefore reduces the tendency of the second stage to converge to local optima far from the global solution.
- A first aspect of the invention provides a computer-implemented method of determining control actions for controlling a mobile robot, the method comprising:
- receiving a set of scenario description parameters describing a scenario and a desired goal for the mobile robot therein;
- in a first constrained optimization stage, applying a first optimizer to determine a first series of control actions that substantially globally optimizes a preliminary cost function for the scenario, wherein the desired goal is fixed during the first constrained optimization stage, and wherein the preliminary cost function is based on a first computed trajectory of the mobile robot, as computed by applying a preliminary robot dynamics model to the first series of control actions, and
- in a second constrained optimization stage, applying a second optimizer to determine a second series of control actions that substantially globally optimizes a full cost function for the scenario, wherein the desired goal is fixed during the second constrained optimisation stage, and wherein the full cost function is based on a second computed trajectory of the mobile robot, as computed by applying a full robot dynamics model to the second series of control actions; wherein initialization data comprising at least one of the first computed trajectory and the first series of control actions is used to initialize the second optimizer for determining the second series of control actions, and wherein the preliminary robot dynamic model approximates the full robot dynamics model, the cost functions embody similar objectives to each encourage achievement of the desired goal, and both are optimized with respect to similar hard constraints, such that the initialization data guides the second optimizer to the substantially globally-optimal second series of control actions.
- A less-complex cost function and dynamics model can be used in the first stage, whilst still providing an effective initialization to the second stage. In the present context, the term complexity refers to the form of the cost function and the model, in the space of variables over which they are defined.
- A robot dynamics model is a predictive component that predicts how the mobile robot will actually move in the scenario in response to a given sequence of control actions, i.e. it models the mobile robot's response to control actions. A higher-complexity model, as used in the second stage, can model that response more realistically. The lower-complexity model is free to use highly-simplifying assumptions about the behaviour of the mobile robot but these may be relatively unrealistic. Depending on the simplifying assumptions applied in the first stage, the first predicted trajectory may not even be fully dynamically realisable.
- A higher-complexity cost function and model, as used in the second stage, can provide superior trajectories, which may be of sufficient quality that they can be used as a basis for effective planning and/or control decisions. However, generally speaking, higher-quality trajectories will be obtained when convergence to an approximately globally optimal solution (i.e. at or near a global optima of the full cost function) is achieved. As the complexity of the full cost function and model increases, achieving such convergence becomes increasingly dependent on the quality of the initialization.
- By contrast, the simplifying assumptions applied in the first stage make it inherently less susceptible to the problem of non-local optima, i.e. the ability of the first optimizer to converge to an approximately globally optimal solution is far less dependent on any initialization of the first optimization phase. The output of the simplified first stage is unlikely to be of sufficient quality to use as a basis for such planning decisions directly, and the trajectories it produces may not even be full dynamically realisable (depending on the simplifying assumptions that are made in the first stage). Nevertheless, provided the solution of the first stage is reasonably close to the globally optimal solution of the second stage, the initialization data of the first stage can still facilitate faster and more reliable convergence to an approximately globally optimal solution in the second stage, which will correspond to a dynamically realisable trajectory.
- The present invention thus benefits from the high-quality output of the more complex second stage, whilst avoiding (or at least mitigating) the issues of local optima convergence that would otherwise come with it, through the provision of an effective initialization to the second stage.
- The described embodiments consider a two-stage constrained optimization. However, other embodiments may use more than two stages. In that case, the first constrained optimization stage that is applied to determine the initialization data could, itself, be a multi-stage optimization. In that case, two or more preliminary cost functions may be optimized in the first stage, with at least one of the preliminary cost functions being optimized in order to initialize another of the preliminary cost functions, before ultimately determining the initialization data to the above-mentioned second constrained optimization stage.
- In embodiments, the computed trajectory may be determined, based on an initial mobile robot state, as a series of subsequent mobile robot states. Each mobile robot state of the first computed trajectory may be determined by applying the full robot dynamics model to at least the previous mobile robot state of the first computed trajectory and a corresponding control action of the first series of control actions. Each mobile robot state of the second computed trajectory may be determined by applying the full robot dynamics model to at least the previous mobile robot state of the second computed trajectory and a corresponding control action of the second series of control actions.
- The preliminary robot dynamics model may be linearly dependent on at least the previous mobile robot state of the first computed trajectory and the corresponding control action of the first series of control actions, and the full robot model may be non-linearly dependent on at least one of the previous mobile robot state of the second computed trajectory and the corresponding control action of the second series of control actions.
- The preliminary cost function may be linearly dependent on the mobile robot states of the first computed trajectory, and the full cost function may be non-linearly dependent on the mobile robot states of the second computed trajectory.
- The preliminary cost function may be linearly dependent on the control actions of the first series, and the full cost function may be non-linearly dependent on the control actions of the second series.
- The first optimizer may be a mixed integer linear programming (MILP) optimizer, and the second optimizer may be a non-linear programming (NLP) optimizer.
- The hard constraints of the first stage may comprise one or more mixed integer collision avoidance constraints for one or more static or moving obstacles in the scenario and/or one or more mixed integer permitted area constraints for keeping the mobile robot within a permitted area. The hard constraints of the second stage may comprise one or more similar collision avoidance and/or permitted area constraints formulated in terms of non-integer variables.
- The first optimizer may apply a receding horizon approximation to iteratively optimize component costs of the preliminary cost function, and thereby determine the first series of control actions, and the second optimizer may not use any receding horizon approximation and may instead optimize the full loss function as a whole.
- The goal may be defined relative to a reference path, and each cost function may encourage achievement of the goal by penalizing at least one of lateral deviation from the deference path, and longitudinal deviation from a reference location on the reference path.
- Each of the computed trajectories may be represented in a frame of reference defined by the reference path.
- The preliminary cost function may be linearly dependent on the above lateral and/or longitudinal deviation, and the full cost function is non-linearly dependent thereon.
- Both cost functions may penalize deviation from a target speed.
- The method may be implemented in a planner of a mobile robot and comprise the step of using control data of at least one of: the second computed trajectory, and the second series of control actions to control motion of the mobile robot.
- Embodiments and optional implementations of the invention address the problem of speed though the use of function approximation. Depending on the resources of the available hardware platform, there may be occasions when running the first and second optimizations in real-time is not feasible. As an optional optimization, one or both of the optimizers may be replaced, in a real-time context, with a function approximator training to approximate the first and/or second optimization stage as applicable.
- For example, the method may be performed repeatedly for different scenarios so as to generate a first training set comprising inputs to the first optimizer and corresponding outputs computed by the first optimizer, and the training set may be used to train a first function approximator to approximate the first optimizer (training method 1).
- Alternatively or additionally, the method may be performed repeatedly for different scenarios so as to generate a second training set comprising inputs to the second optimizer and corresponding outputs computed by the second optimizer, and the training set may be used to train a second function approximator to approximate the second optimizer (training method 2).
- The method may comprise the step of configuring a runtime stack of mobile robot to implement one of the following combinations:
- (i) the first optimiser and the second function approximator, wherein the second function approximator cooperates with the first optimiser within the run-time stack to approximate the second optimization stage;
- (ii) the first function approximator and the second optimizer, wherein the first function approximator approximates the first optimization stage for initializing the second optimizer; and
- (iii) the first function approximator and the second function approximator, which cooperate to approximate both optimization stages.
- Alternatively, the method may be performed repeatedly for different scenarios so as to generate a training set comprising inputs to the first optimizer and corresponding outputs computed by the second optimizer, and the training set may be used to train a single function approximator to approximate both of the first and the second optimizers (training method 3).
- The method may comprise the step of configuring a runtime stack of mobile robot to implement the single function approximator.
- The runtime stack may be configured to implement one of combinations (i) and (ii), or the single function approximator, and the method may comprise an additional step of configuring the runtime stack with a verification component configured to verify an output of the second function approximator or the single function approximator.
- A further embodiment of the invention provides a computer-implemented method of determining control actions for controlling a mobile robot, the method comprising:
- receiving a set of scenario description parameters describing a scenario;
- in a first constrained optimization stage, applying a first optimizer to determine a first series of control actions that substantially optimize a lower-complexity cost function for the scenario, wherein the desired goal is fixed during the first constrained optimization stage, and wherein the lower-complexity cost function is dependent on a first computed trajectory of the mobile robot, as computed by applying a lower-complexity robot dynamics model to the first series of control actions, and
in a second constrained optimization stage, applying a second optimizer to determine a second series of control actions that substantially optimize a higher-complexity cost function for the scenario, wherein the desired goal is fixed during the second constrained optimisation stage, and wherein the higher-complexity cost function is dependent on a second computed trajectory of the mobile robot, as computed by applying a higher-complexity robot dynamics model to the second series of control actions; - wherein the lower-complexity robot dynamic model approximates the higher-complexity robot dynamics model, wherein the cost functions embody similar objectives to each encourage achievement of a desired goal in the scenario, and both are optimized with respect to similar hard constraints, such that the first computed trajectory approximates the second computed trajectory, and wherein initialization data comprising at least one of the first computed trajectory and the first series of control actions is used to initialize the second optimizer for determining the second series of control actions.
- A further aspect herein provides a computer system comprising one or more optimization components configured to implement or approximate: a first optimization stage as defined in any of the above aspects or embodiments; and a second optimization stage as defined in any of the above aspects or embodiments.
- The one or more optimization components may comprise a first optimization component configured to implement or approximate the first optimization stage, and a second optimization component configured to implement or approximate the second optimization stage, using initialization data provided by the first optimization component.
- The first optimization component may take the form of a first function approximator trained in accordance with
training method 1 above and/or the second optimization component may take the form of a second function approximator trained in accordance withtraining method 2 above to approximate the second optimization stage. - The second optimization component may take the form of a second function approximator, and the computer system may additionally comprise a verification component configured to verify at least one of a second trajectory and a second series of control actions computed by the second function approximator.
- The one or more optimization components may comprise a single optimization component, trained in accordance with
training method 3 above, to approximate both of the first and second optimization stages. - The computer system may additionally comprise a verification component configured to verify at least one of a second trajectory and a second series of control actions computed by the single function approximator.
- The or each function approximator may have a neural network architecture.
- The computer system may be embodied in an autonomous vehicle or other mobile robot, wherein the computer system may be further configured to control the motion of the mobile robot via one or more actuators of the mobile robot using control data provided by the second optimization component.
- A further aspect herein provides a computer program for programming one or more computers to implement any of the methods disclosed herein.
- To assist understanding of the present disclosure and to show how embodiments of the present invention may be put into effect, reference is made by way of example to the accompanying drawings, in which:
-
Figure 1 shows a schematic block diagram of an autonomous vehicle runtime stack; -
Figure 2A shows mathematical transformations to and from a reference-path based frame of reference; -
Figure 2B geometrically depicts a representation from a global frame of reference into a reference path-based frame of reference; -
Figure 3 shows a two-stage constrained optimization architecture; -
Figure 4 schematically depicts an example trajectory; -
Figure 5 shows a comparison of example trajectories in first and second optimization stages respectively; -
Figure 6 shows how a function approximator may be trained to approximate a NLP constrained optimization stage; -
Figure 7 shows how a function approximator may be trained to approximate a MILP constrained optimization stage; -
Figure 8 shows an online planner comprising first and second function optimizers which have been trained to approximate a two-stage constrained optimization process; -
Figure 9 shows a single function approximator trained to approximate a multi-stage constrained optimization process; -
Figure 10 shows further examples of driving scenarios; -
Figure 11 shows an example of a plan computed for a first example driving scenario; -
Figure 12 shows an example of a plan computed for a second example driving scenario; -
Figure 13 shows an example of a residential driving scenario; and -
Figure 14 shows a plot summarizing a set of results that demonstrate the efficacy of initializations generated using the present techniques on different scenarios. - Embodiments are described in the context of an autonomous vehicle. However, the description applies equally to other forms of autonomous mobile robot.
-
Figure 1 shows a highly schematic block diagram of aruntime stack 100 for an autonomous vehicle (AV). Therun time stack 100 is shown to comprise aperception stack 102, aprediction stack 104, aplanner 106 and acontroller 108. - The
perception stack 102 receives sensor outputs from an on-board sensor system 110 of the AV. - The on-
board sensor system 110 can take different forms but generally comprises a variety of sensors such as image capture devices (cameras/optical sensors), LiDAR and/or RADAR unit(s), satellite-positioning sensor(s) (GPS etc.), motion sensor(s) (accelerometers, gyroscopes etc.) etc., which collectively provide rich sensor data from which it is possible to extract detailed information about the surrounding environment and the state of the AV and any external actors (vehicles, pedestrians, cyclists etc.) within that environment. - Hence, the sensor outputs typically comprise sensor data of multiple sensor modalities such as stereo images from one or more stereo optical sensors, LiDAR, RADAR etc.
- Stereo imaging may be used to collect dense depth data, with LiDAR/RADAR etc. proving potentially more accurate but less dense depth data. More generally, depth data collection from multiple sensor modalities may be combined in a way that respects their respective levels (e.g. using Bayesian or non-Bayesian processing or some other statistical process etc.). Multiple stereo pairs of optical sensors may be located around the vehicle e.g. to provide full 360° depth perception. This provides a much richer source of information than is used in conventional cruise control systems.
- The
perception stack 102 comprises multiple perception components which co-operate to interpret the sensor outputs and thereby provide perception outputs to theprediction stack 104. - The perception outputs from the
perception stack 102 are used by theprediction stack 104 to predict future behaviour of the external actors. - Predictions computed by the
prediction stack 104 are provided to theplanner 106, which uses the predictions to make autonomous driving decisions to be executed by the AV in a way that takes into account the predicted behaviour of the external actors. - The
planner 106 implements the techniques described below to plan trajectories for the AV and determine control actions for realizing such trajectories. In particular, a core function of theplanner 106 is to determine a series of control actions for controlling the AV to implement a desired goal in a given scenario. In a real-time planning context, a scenario is determined using theperception stack 102 but can also incorporate predictions about other actors generated by theprediction stack 104. A scenario is represented as a set of scenario description parameters used by theplanner 106. A typical scenario would define a drivable area and would also capture predicted movements of any obstacles within the drivable area (such as other vehicles) along with a goal. A goal would be defined within the scenario, and a trajectory would then need to be planned for that goal within that scenario. In the following, obstacles are represented probabilistically in a way that reflects the level of uncertainty in their perception within theperception stack 102. - The goal could for example be to enter a roundabout and leave it at a desired exit; to overtake a vehicle in front; or to stay in a current lane at a target speed (lane following). The goal may, for example, be determined by an autonomous route planner (not shown).
- The
controller 108 executes the decisions taken by theplanner 106 by providing suitable control signals to on-board actuators 112 such as motors of the AV. In particular, thecontroller 108 controls the actuators in order to control the autonomous vehicle to follow a trajectory computed by theplanner 106. - As described in further detail below, the
planner 106 plans over acceleration (magnitude) and steering angle control actions simultaneously, which are mapped to a corresponding trajectory by modelling the response of the vehicle to those control actions. This allows constraints to be imposed both on the control actions (such as limiting acceleration and steering angle) and the trajectory (such as collision-avoidance constraints), and ensures that the final trajectories produced are dynamically realisable. Theplanner 106 will determine an optimal trajectory and a corresponding sequence of control actions that would result in the optimal trajectory according to whatever vehicle dynamics model is being applied. The control actions determined by theplanner 106 will not necessarily be in a form that can be applied directly by the controller 108 (they may or may not be). Ultimately, the role of theplanner 106 is to plan a trajectory and the role of thecontroller 108 is to implement that trajectory. The term "control data" is used herein to mean any trajectory information derived from one of both of the planned trajectory and the corresponding series of control actions that can be used by thecontroller 108 to realize the planner's chosen trajectory. For example, thecontroller 108 may take the trajectory computed by theplanner 106 and determine its own control strategy for realizing that trajectory, rather than operating on the planner's determined control actions directly (in that event, the controller's control strategy will generally mirror the control actions determined by the planner, but need not do so exactly). - The
planner 106 will continue to update the planned trajectory as a scenario develops. Hence, a trajectory determined at any time may never be realized in full, because it will have been updated before then to account for changes in the scenario that were not predicted perfectly. - These functions of the
planner 106 will now be described in detail. As set out above, the present disclosure addresses the planning problem of determining an optimal series of control actions (referred to herein as a "policy") for a given scenario and a given goal as a constrained optimization problem with two optimization stages. - In the described embodiments, the first optimization stage solves a Mixed Integer Linear Programming (MILP) problem where obstacles lead to hard constraints for the MILP. The outcome of this seeds (initializes) a subsequent nonlinear optimisation of the trajectory for dynamics and comfort constraints, in the second optimization stage. The second optimization stage solves a Non-Linear Programming (NLP) problem, initializing using the results of the MILP optimization stage. The MILP stage uses a mixed-integer formulation of collision avoidance and drivable area constraints as described later.
- Although performed second, the NLP stage is described first, in order to set out the final planning problem that ultimately needs to be solved (
Problem 2 below), and to provide some context for the subsequent description of how best to provide an effective initialization (seed) from the MILP stage. - Before describing either stage in detail, the description sets out a general framework applicable to both stages.
- Following notation from [1], an index k ≡ tk, is assumed where tk = t0 + kΔt with t0 being the current time and Δt a fixed timestep. A trajectory is defined over the next N steps covering a temporal horizon of τ = NΔt. Vectors are in bold. A shorthand notation ri:e = {ri, ..., re } is used for any variable r.
- A vehicle whose motion is being planned is referred to as the ego vehicle. The ego vehicle's state at time k is given by X k =(Xk, Yk, Vk, Φk ) ∈ , where (Xk, Yk ) is the position of the vehicle, Vk is the speed and Φk is its heading in a global coordinate frame . The ego state at
time 0 is given by X 0, and a function of theplanner 106 is to determine the ego states over the next N steps, - Other traffic participants, such as vehicles, pedestrians and cyclists, are indexed by i ∈ {1, ..., n} and their pose at time k is modelled using a Gaussian distribution having a mean
and covariance . The set of the mean poses of all traffic participants over time is , and similarly for the covariances , and both are given as inputs to the planning problem. -
- A goal of the ego vehicle is defined as following a differentiable and bounded two-dimensional reference path in the global coordinate frame, , parameterized by the distance from the start of the path (). Tangential and normal vectors of the reference path in the global coordinate frame can be obtained at any point λ along the path as
respectively. - The reference path is a path which the ego vehicle is generally intending to follow, at a set target speed. However, deviation from the reference path and target speed, whilst discouraged, are permitted provided that no hard constraints (such as collision avoidance constraints) are breached. The reference path can be determined using knowledge of the road layout, which may use predetermined map data (such as an HD map of the driving area), information from the
perception stack 104, or a combination of both. For complex layouts in particular (such as complex junctions or roundabouts), the reference path could be learned by monitoring the behaviour of other drivers in the area over time. -
Figure 2A illustrates the process of going from an input (X 0, ) to the desired output X 1:N. Given an input (X 0, ) (upper case) the invertible transformation is applied to yield (x 0, ) (lower case) in the reference path-based representation, which theplanner 106 then uses to plan. The output of theplanner 106 is an intended trajectory x 0:N in the reference path-based representation, which in turn is transformed back to the global coordinate frame by applying the inverse transformation to obtain X 0:N in the global coordinate frame. Further details of the transformation are described below. -
- 1) Pose transform: maps poses (X, Y, Φ) in the global coordinate frame to poses (x, y, φ) in the reference path frame as shown in
Figure 2B .- x = [X Y] is the distance of the projection of [X Y] to from the beginning of the path, defined as the solution to the following optimization problem:
Due to the nature of the optimization, no closed-form solution can be obtained for x. -
, where n x is the normal vector of the reference path at λ = x as in (1), and . - φ = ∠t x - Φ, where
- x = [X Y] is the distance of the projection of [X Y] to from the beginning of the path, defined as the solution to the following optimization problem:
-
-
- 2) Velocity transform: since is defined spatially, speeds are invariant to it: v = (V) = V.
- 3) Covariance transform: considering a traffic participant with pose O and covariance Γ, such that (O) = [x y φ] T, the transformed covariance matrix in the reference path coordinate frame is given by:
where t x is the tangent of evaluated at λ = x, and R E SO(2) is a rotation matrix. - Following the reasoning presented in Sec. II-A, the planning problem is defined in the reference path coordinate frame.
- The state of the ego vehicle at time k is given by x k = (xk ,yk,vk ,φk ) ∈ where (xk ,yk ) is the 2D position of the vehicle, vk is its speed and φk is its heading. The evolution of the vehicle state is given by a discrete general dynamical system:
where f Δt is a discrete non-linear function with time parameter Δt, and u k = (ak ,δk ) ∈ are an acceleration and steering angle, respectively, applied at time k. The ego vehicle is modelled as a rigid body occupying an area relative to its center, and the area occupied by the ego vehicle at state x k is given by -
- A driveable surface area
is defined as the area in which it is safe for the ego vehicle to drive, with the unsafe area . With a cost function defined over the positions and controls of the ego vehicle and states and uncertainties of other traffic participants, a "policy synthesis" problem can be formulated. - As indicated above, a "policy" in the present context refers to a time-series of control actions, which is mapped to a corresponding vehicle trajectory using a vehicle dynamics model.
-
- This section describes a specific solution to
Problem 1, posing it as a Non-Linear Programming (NLP) problem with a set of hard constraints and a multi-objective cost function. The hard constraints comprise (1) vehicle model constraints, including kinematic ones on the transitions of the model and on the controls it allows (Sec. III-1); and (2) collision avoidance constraints with the purpose of avoiding collisions with the boundaries of the road (which can include, e.g. construction areas or parked vehicles) as well as other traffic participants (Sec. III-2). Soft constraints make up the terms of the cost function (i.e. are embodied in the cost function itself) and are presented in Sec. III-3, whereas hard constraints constrain the optimization of the cost function. Sec. III-4 describes the full problem. - 1) Vehicle Model: A "kinematic bicycle" model is applied, which is a simplified car model focused on the center of the vehicle. Assuming the ego vehicle to be a rectangle with an inter-axel distance of L, this model can be discretized under Δt as:
- While the kinematic bicycle model is not strictly limited by any specific values of maximum steering and acceleration, a limit is imposed on the allowed ranges of controls by enforcing | δk | ≤ δ max and a min ≤ ak ≤ a max, as well as limiting jerk |a k+1 - ak | ≤ ȧ max and angular jerk |δ k+1 - δk | ≤ δ̇max. These maintain the model within the operational domain, and also help to ensure that the resulting trajectories respect the rules of the road and passenger comfort. Finally, speed is constrained as 0 ≤ v min ≤ vk ≤ v max, to guarantee forward motion within a set speed limit v max.
- The use of the same positive and negative jerk magnitude mirrors has been observed to work in practice, however the framework can be extended to use different jerk bounds.
- 2) Collision Avoidance: The collision avoidance problem is divided into two different types of avoidance: the limits of the road (which may include other features such as construction zones or parked vehicles) and other traffic participants. For both, a representation of the area of the ego-vehicle is defined as (x k ) for state x k. A simplified representation is used wherein the ego vehicle boundary is represented solely by its corners. For a rectangular vehicle of width w and length l, the positions of the corners can be defined as a function of the state x k as:
where z ∈ Z = {[11],[-11],[-1-1],[1-1]}, R(φk ) ∈ SO(2) is the rotation matrix corresponding to the ego-vehicle's heading, and ∘ is the Hadamard product. Following this definition, the maintenance of the ego-vehicle within the driveable surface, corresponding to the constraint defined as (xk ) ∩ = Ø, can be reduced to the constraint - The driveable surface is described by limits, bl (λ), br (λ), which are continuous and differentiable functions of λ defining the left and right borders, respectively, with ∀λ: bl (λ) < br (λ). These boundaries impose a constraint on the allowed lateral deviation at position x, such that keeping the ego-vehicle within the driveable surface, (xk ) ∩ Bout = Ø, can be reduced to the set of constraints
with defined as in (6). -
-
- 3) Cost function: The multi-objective cost function is defined for a set of soft constraints over the states and controls of the ego, as well as the positions and uncertainty of the other traffic participants. If a soft constraint ∈ is defined via a function
, and a weight , the cost function can be defined as: where: - The weights associated with the soft constraints determine their relative importance. Soft constraints are defined over different objectives:
- Progress over a longitudinal goal xg : θx = (x - xg )2, a target speed vg : θv = (v - vg )2 , or for lane tracking under the reference path representation, θy = y 2 .
- Comfort as a minimization of the norm of acceleration, θa = a 2, and steering, θδ = δ 2.
- Calculated Risk, defined based on the uncertainty of the other vehicles assuming
-
-
- Due to the non-linearity of J, fΔt, bl, br and gi,z , this problem is a general (equality and inequality constraints) non-linear, non-convex, constrained optimization problem. While it is appealing to solve the problem directly, or using a receding horizon formulation as in [1], there are two major challenges to this approach:
- Slow and uncertain convergence: solvers for these types of problems are generally slow for large instances, such as
Problem 2, particularly when initialization is not carefully considered. While advances in efficient primal-dual interior point solvers have mitigated this issue to a certain degree, the convergence to a solution is uncertain. - Local optima: non-linear constrained optimization solvers tend to be local in nature, finding solutions close to the initial point and, possibly, far from the global optimum.
- In the above, the tuple (x 0,
) is an example of a set of scenario description parameters, describing a dynamic scenario and a starting state of the ego vehicle within it. The dynamics of the scenario is captured in terms of the motion components of x 0, and , i.e. the speed and heading dimensions within the NLP formulation. - To mitigate these NLP issues, a framework depicted schematically in
Figure 3 is used. -
Figure 3 shows a two-tier optimization architecture: afirst stage 302 corresponds to solving a receding horizon formulation of the linearized version of the problem; in asecond stage 304, the problem is solved in a non-linear non-convex constrained optimization fashion using the solution of the first stage as an initialization (e.g. a warm-start initialization). - The two stage optimisation of
Figure 3 may be referred to as 2s-OPT. - A motivation behind the architecture is to avoid the local optima convergence in the non-linear, non-convex constrained optimization by providing an initial solution that is closer to the global optimum, which also should lead to faster and more reliable convergence.
- In the two-tier optimization, the
first stage 302 solves a linearized version ofProblem 2 in a finite, receding horizon manner using a Mixed Integer Linear Programming (MILP) formulation (details presented in Sec. IV-A). This gives guarantees on the optimality for each stage of the receding horizon problem (see Sec. IV-A5), and, thus, acts as a proxy towards reaching the global optimum of the linearized problem. Thesecond stage 304 uses the output of the MILP optimizer as an initial solution and solves thefull Problem 2, as set out above. Given the similar representations of the linearized and non-linear problems, it is expected that this initialization improves convergence, speed and the quality of the final solution. - In this stage,
Problem 2 is re-formulated as a Mixed Integer Linear Programming optimization problem, with mixed integer linear constraints and a multi-objective cost function. - To do so, a linear vehicle model with kinematic feasibility constraints is considered (Sec. IV-A1). An approach to collision avoidance is formulated which maintains the mixed integer linearity of the model (Sec. IV-A2). In Sec. IV-A3 an interpretation of the soft constraints within the MILP cost function is provided. The full MILP problem in Sec. IV-A4 and discuss the optimality of the solution in Sec. IV-A5.
- The following sections make use of the non-linear operators | • |, max(•) and min(•), which can be enforced in MILP by considering auxiliary binary variables under the "big-M" formulation [8], [9]. For example, for a constraint C = max(A,B), assume b 6 {0,1} to be a binary variable and
to be a sufficiently large value. It is then possible to define the corresponding mixed integer constraints: - Similar definitions can be obtained for the operators | • | and min.
- I) Linear Vehicle Model and Kinematic Feasibility: The kinematic bicycle model presented in Sec. Ill-1 is non-linear, but can be linearized around a point using a series expansion. The problem with this approach lies on the fact that this approximation is only valid around the point, yielding higher errors as the distance to the point increases. To avoid this issue, a different vehicle model with nonholonomic constraints is adopted for the
first stage 302. -
- This representation is still based on the reference path , and is the same as the representation used in the
second stage 304 in terms of the spatial dimensions. However, whereas thesecond stage 304 uses an angular representation of motion (i.e. velocity and acceleration are represented in terms of magnitude and angle components), the first stage uses a linear representation of velocity and acceleration. -
- This nonholonomic model is highly simplified when compared to the kinematic bicycle model in (5). To introduce kinematic feasibility, the following constraint is imposed:
for a given constant . Assuming forward motion, that is , this constraint dictates that any movement in the y direction requires motion in the x direction as well. - Similar constraints are imposed as in the non-linear model of the
second stage 304, in particular input bounds constraints, ax min ≤ ax k ≤ ax max and ay min ≤ ay k ≤ ay max ; jerk related constraints, and ; and velocity constraints vx min ≤ vx k ≤ vx max and vy min ≤ vy k ≤ vy max, with vx min ≥ 0 to guarantee forwards motion. - 2) Collision Avoidance: Similarly to the non-linear formulation, collision avoidance is split into avoiding the limits of the road and avoiding other traffic participants. However, a key difference is that the state
x does not explicitly model the orientation of the vehicle, so linearly approximating the calculation of the corners of the ego vehicle would induce very high errors in the model. Thus, the ego vehicle is considered to be a point p k = [xk, yk ] and deal with the area it occupies in the definition of the road limits and other traffic participants. For the driveable space, piecewise-linear functions and are defined for the left and right road boundaries respectively, such that To take the size of the ego vehicle into account, the constraint is formulated as: - where d :
is a function of the size of the ego vehicle with respect to its point estimate. In the most conservative case, assuming a rectangular ego vehicle of width w and length l, it is possible to define , i.e. reduce the driveable surface to where ⊕ is the Minkowski-sum operator. For practical purposes, d = w/2 is considered, which is exact when φ = 0. - For traffic participants, the ellipses
defined in (9) are inscribed with axis-aligned rectangles, which are then augmented to consider the point estimate of the ego vehicle's pose. With dx and dy being functions of the size of the ego vehicle with respect to its center in the x and y direction, respectively, rectangles are defined with the limits: -
- Then, the collision avoidance constraint is the logical implication:
which can be understood as "if the ego position is aligned with the vehicle along x, then it must be outside the vehicle's borders in y". The big-M formulation [8], [9] is used to obtain the following mixed integer constraint: for a sufficiently large . - 3) Mixed Integer Linear Cost Function: For best performance, the cost function of the
MILP stage 302 should be similar to the cost function from thenon-linear stage 304 to minimize the gap between the optimum obtained in both stages. - If the cost functions are similar, and subject to similar constraints, then the optimal trajectory computed in the
first stage 302 should approximate the final optimal trajectory computed in thesecond stage 304, and therefore provide a useful initialization. - The MILP problem is defined in a receding horizon formulation, in which the cost over each state (individual cost) is also defined independently in time, with:
for a set of soft constraints C, where each constraint is defined by a function (x ,u , o1:n , Σ1:n ) and its weight The weights determine the relative importance of each constraint and can be fine-tuned to optimize performance. - Similarly to the
non-linear stage 304, soft constraints are defined over different objectives: - Progress over a longitudinal goal xg, Θ x, = |x - xg |; over a target speed vg, Θ v = |v - vg |;and over lane tracking, Θ y = |y|.
- Comfort as a minimization of the norm of lateral acceleration, Θ a , = |ay |.
- Calculated Risk using the big-M formulation, with w + and l+ being the values to the added to the width and length of each Ri based on different risk levels, defined for i ∈ {1,...,n}:
with the auxiliary constraints: such that minimizing risk corresponds to minimizing the function . -
- In the end, a full trajectory x̅ 0:N of length N+1 is obtained at the
first stage 302, to initialize thesecond stage 304. However, the receding horizon approximation iteratively breaks the problem down into a set of simpler optimization problems, each defined over K < N steps. In each iteration a "cost component" of the full cost function is defined as the sum of the individual costs over only K steps, as set out below. -
- For planning step m = 0, the known initial vehicle state
x 0 is used as a starting point, with subsequent states determined by applying the vehicle dynamics model F Δt in an iterative fashion. For each subsequent planning step, the statex m-1 determined in the previous planning step m - 1 provides a starting point for that iteration, with subsequent states computed from that starting point in the same way. - In both the MILP and the NLP stages, optimization occurs in the continuous space; no spatial/grid discretization occurs at either of the stages. In particular, in the MILP stage, the state vectors of both the ego vehicles and the other actors, and control vectors are not restricted to integer values, or otherwise discretised to any fixed grid. Not discretizing over a grid is one of the advantages of the method, as it allows for smoother trajectories.
- The integer variables in the MILP stage are used to enforce the constraints related to collision avoidance and driveable surface only.
- The trajectory
x 0:N computed in theMILP optimization stage 302 is an example is a "seed trajectory", as that term is used herein, whereas the trajectory x 0:N computed in the NLP optimization stage 304 (derived, in part, from the seed trajectoryx 0:N ), is an example of a "final" trajectory. The trajectory is "final" within the specific context of the two-stage optimization. The term "final" does not necessarily imply finality within the context of the runtime 100 stack as a whole. The final trajectory x 0:N may or may not be subject to further processing, modification etc. in other parts of theruntime stack 100 before it is used as a basis for generating control signals within thecontroller 108. - Consistent with the above terminology, the linear cost function of Equation (22) and the linear dynamics model of Equations (15) and (16) are examples of a "preliminary" cost function and model respectively, whilst the non-linear cost function of Equation (11) and the non-linear dynamics model of Equation (5) are examples a "final" or, equivalently, "full" cost function and model respectively, where the terms "final" and "full" are, again, only used in the specific context of the two-stage optimization, and does not imply absolute finality.
- It is noted that, which the above paragraph considers seed and final trajectories, it could be that the control actions
u 0:N-1 of theMILP stage 302 are alternatively or additionally used to seed the NLP stage 304 (and they may be referred to as seed control actions in that event); similarly, as noted above, data of one or both of the final trajectory x 0:N and the final series of control actionsu 0:N-1 may be provided to thecontroller 108 for the purpose of generating "actual" control signals for controlling the autonomous vehicle. - Whereas in the
NLP stage 304, the scenario description parameters (x 0, ) have a non-linear form (because they are formulated in terms of heading φk ), in theMILP stage 302, the scenario description parameters (x 0, ) have a linear form (because motion is formulated in terms of components instead of speed and heading vk ,φk ), withx 0 denoting the linearized form of the ego vehicle state specifically. - Regarding notation,
x 0:N and x 0:N are used as shorthand to represent complete trajectories, including the (known) current state x 0. The additional states actually determined in the 302, 304 arerelevant optimization stage x 1:N and x 1:N , as depicted inFigure 3 . -
Figure 4 shows, by way of example, one example of a possible trajectory over eight time steps (k=0 to 7). This corresponds to an overtaking maneuver, in which anego vehicle 400 desired to overtake aforward vehicle 402 in the presence of anoncoming vehicle 404. Both the forward vehicle and the on-coming vehicle are traffic participants (obstacles), modelled in the manner above. - It will be appreciated that the duration Δt between time steps in
Figure 4 has been chosen for the purpose of illustration, in order to show the whole manoeuvre, and would not necessarily reflect the duration that would be applied in real-life. - An overtaking goal is defined by way of a suitably distant reference location (not shown) on the reference path , ahead of the
forward vehicle 402. The effect of the progress constraints is to encourage theego vehicle 400 to reach that reference location as soon as it can, subject to the other constraints, whilst the effect of the collision avoidance constraints is to prevent theego vehicle 402 from pulling out until the oncoming vehicle stops being a collision risk. -
Figure 5 shows how such a problem might be formulated in the two-stage approach. Shading is used to show theego vehicle 400 and theforward vehicle 402 at different timesteps k = 0, ... ,7 (For the sake of simplicity, the scenario ofFigure 5 is slightly different than ofFigure 4 , in that it omits the oncoming vehicle 404). - On the left-hand side,
Figure 5 illustrates some of the principles underlying the MILP formulation of thefirst stage 302. - As can be seen on the left-hand side of
Figure 5 , theego vehicle 400 remains axially aligned (i.e. aligned with the reference path ) at all times. This reflects the simpler, linear dynamics model that is used in the MILP stage - as can be seen from Equations (15) and (16) above, the simpler dynamics model does not consider rotation of the ego vehicle 200 (recall that the MILP stage uses a linear representation of velocity and acceleration). Hence, the output of thefirst stage 302 is a more "restricted" form of trajectory defined in terms of linear position and velocity but not orientation/heading of theego vehicle 200. The linearized trajectory may not be fully dynamically realisable, but nevertheless serves as a useful initialization to thesecond stage 304. - By contrast, as shown on the right-hand side of
Figure 5 , the form of the trajectory planned in thesecond stage 304 is not restricted in the same way thefirst stage 302. This is because the full problem takes into account steering angle and heading of theego vehicle 200. That is, the solution toProblem 2 is free to consider motion of theego vehicle 400 that does not meet the discretisation restrictions applied in thefirst stage 302. - 5) On the optimality of the MILP formulation: The goal of the formulation of
Problem 3 is to obtain an optimal solution which can be used as initialization toProblem 2 to minimize the problems of slow convergence and local optima that arise from solving the NLP directly. In this section, the optimality of the solution toProblem 3 is discussed. - A solution to
Problem 3 can be obtained using Branch and Bound, a divide and conquer algorithm first introduced and applied to mixed integer linear programming by Land and Doig in [10]. This solution is proven to be the globally optimal one [7], [11]. In practice, modern solvers (e.g. Gurobi or CPLEX) may fail to find the global solution due to rounding errors and built-in tolerances [11]. On top of this, the receding horizon formulation of the problem, introduced for the sake of computational tractability, generates suboptimality by definition [12], [13]. Due to these factors, no strong theoretical guarantee can be given regarding the solution of the MILP stage. However, despite the lack of theoretical guarantees at this level, a solution that is close to the global optimum at each receding horizon step acts as a proxy towards a final solution to be obtained in thesecond stage 304 that is close to the global optimum and, in turn, initializing the NLP stage using this solution is expected to improve the quality of the solution at thatsecond stage 304. - In some contexts, it may be that the speed increase provided by the two-stage approach will admit direct application on an autonomous vehicle in real-time.
- However, to provide an additional speed increase, which may be needed to ensure real-time performance in a resource-constrained environment, it is also possible to implement an approximation of one or both of the first and
302, 304, using a trained function approximator, such as a neural network.second stages - Either one or both of the first and
302, 304 could be replaced with a function approximator in a practical implementation.second stages - By way of example, the following section describes, with reference to
Figure 6 , an example of how the non-linearsecond stage 304 might be with a trained function approximator. -
Figure 6 shows how the two-stage optimization can be run repeatedly, in an offline context (e.g. in an off-board training computer system), for different scenarios, for the purpose of generating an NLP training set 600. In this case, each input in the NLP training set 600 corresponds to an initialization output generated by thefirst stage optimizer 302 on a given scenario, and a ground truth label for that inputs is given by the output of thesecond stage optimization 304 on that training scenario. Within this framework, theNLP optimization stage 304 is treated as a function that takes the initialization output of thefirst stage 302, and transforms it to a desired control output (corresponding the optimal trajectory solving Problem 2). With a sufficiently large NLP training set 600, afunction approximator 304 can be trained to approximate this function of theNLP optimization stage 304. The trained function approximator 604 (which may be referred to as the NLP approximator 602) can, in turn, be implemented on-board thevehicle 400, rather than attempting to perform the full non-linear optimization of thesecond stage 304 on-board the vehicle in real-time. - Each training example 601 of the NLP training set is made up of an input, which in turn comprises a particular set of scenario description parameters (x0,
) together with a corresponding seed trajectoryx 0:N (i.e. as computed in theMILP stage 302 for the corresponding scenario description parameters), and a corresponding ground truth, which in this case is a corresponding final trajectoryx 0:N as computed in theNLP stage 304 for the corresponding input. - Once trained, given such an input, i.e. a set of scenario description parameters (x 0,
) and a seed trajectoryx 0:N (as computed in an online instantiation of theMILP phase 302 in this example), the NLP approximator 900 computes a corresponding final trajectory x̃ 0:N which will approximate a full NLP solution x 0:N that would be obtained from the full NLP optimization (x 0:N may be referred to as the exact final trajectory in this context). Because theNLP stage 304 is now approximated, it will not necessarily be guaranteed that the approximate final trajectory x̃ 0:N will meet the hard NLP constraints ofProblem 2 above (i.e. the hard constraints on which the NLP approximator 604 has been trained). Therefore, atrajectory verification component 702 is provided, whose function is to verify the approximate final trajectory x̃ 0:N against the hard NLP constraints (i.e. the constraints as set out above in relation to Problem 2), or some practical variant of the hard NLP constraints (e.g. the NLP approximator 712 could be trained "conservatively" on somewhat more stringent hard constraints than the constraints the approximate final trajectory x̃ 0:N is actually required to satisfy, in order to reduce instances in which the approximate final trajectory x̃ 0:N fails the actual hard constraints imposed by the trajectory verification component 712). That is to say, thetrajectory verification component 712 verifies the approximate final trajectory x̃ 0:N against a set of trajectory verification constraints, which may or may not be identical to the NLP constraints ofProblem 2. - If the approximate final trajectory x̃ 0:N satisfies the trajectory verification constraints, it can be passed to the
controller 108 to implement. In the case that the approximate final trajectory fails to satisfy at least one of the trajectory verification constraints, then it can modified, either at the planning level (by the planner 106) or control level (by the controller 108) so that it does. Assuming the NLP approximator 702 has been adequately trained such modifications, if required, should generally be relatively small. - The different scenarios could, for example, be simulated scenarios, generated in a simulator.
-
Figure 7 shows how a similar approach may be adopted to approximate theMILP stage 302 online. The same principles apply, however in this case, a MILP training set 700 is created through repeated applications of the MILP optimizer on different scenarios, and each training example 701 of the MILP training set 700 is made up of an input in the form of particular set of scenario description parameters (x 0, ) (mirroring the input to the MILP stage 302) and a corresponding ground truth, which is now the corresponding seed trajectoryx 0:N (i.e. as obtained for the corresponding input parameters in the MILP stage 302). - That is to say, whereas the seed trajectory x̅0:N forms part of the input of each training example 601 of the NLP training set 600 of
Figure 6 (with the full trajectory x 0 providing the ground truth), in the MILP training set 700 ofFigure 7 , the seed trajectoryx 0:N now provides the ground truth. - The same principles generally apply, but now a
function approximator 702 is trained on the MILP training set 700 to approximate the MILP stage 602. Such a trainedMILP approximator 702 can then be used in online processing to compute an approximate seed trajectory given a set of online scenario description parameters (x0, ), and the approximate seed trajectory can, in turn, be used to initialize an online instantiation of theNLP stage 304. In this example, thefull NLP stage 304 is implemented, and thetrajectory verification component 712 is therefore not required. - Regarding notation, x̃ 0:N and
are used as shorthand to represent complete trajectories, including the current state x0. The states actually determined by the relevant function approximator are x̃ 0:N and . Certain figures show x̃ 0:N and as an output of a function approximator. It will be appreciated that this is a shorthand decision of the computed states x̃ 1:N / (as applicable), together with the known initial state x 0 provided as an input to the system. This also applies toFigures 8 and9 (see below). -
Figure 8 shows an example of an implementation in which theMILP stage 302 and theNLP stage 304 are both approximated. In this case, the MILP approximator 702 ofFigure 7 is used to seed the NLP approximator 604 ofFigure 6 , i.e. an approximate seed trajectory forms an input to theNLP function approximator 604, which in turn is used in deriving an approximate full trajectory x̃ 0:N . Thetrajectory verification component 712 is used, as inFigure 6 , to verify the approximate full trajectory x̃ 0:N . -
Figure 9 shows another example implementation, in which asingle function approximator 900 is trained to approximate both optimization stages 302, 304. In this case, thefunction approximator 900 takes, as input, a set of scenario description parameters, and directly computes an approximate full trajectory x̃ 0:N . Thesingle approximator 900 would be trained on training examples whose inputs are scenario description parameters and whose ground truths are provided by the corresponding full NLP trajectories. A trajectory verification component 732 is similarly used in this case to verify the output of thesingle function approximator 900. - Note that the term "approximate" is used herein in two somewhat distinct senses. When a function approximator is used to approximate the
MILP stage 302 and/or theNLP stage 304, the MILP /NLP function approximator 302 / 304 is said to approximate the seed trajectoryx 0:N / final trajectory x 0:N as applicable (i.e. the output of the trainedMILP approximator 702, , is said to be an approximation of the exact seed trajectoryx 0:N that would be derivable via the full MILP optimization; likewise, the output of the trained NLP approximator 704, ~x 0:N , is said to be an approximation of the exact final trajectory x 0:N that would be obtained via the full NLP optimization). - The seed trajectory
(whether exact or approximate in the sense of the preceding paragraph) is also said to approximate the full trajectory x 0:N / x̃ 0:N (whether exact or approximate in the sense of the preceding paragraph). The exact or approximate seed trajectory may be a relatively coarse approximation of the exact or approximate final trajectory . A relatively coarse approximation in this context is perfectly acceptable, because even a coarse seed trajectory (i.e.x 0:N or ) can have the desired effect of guiding the NLP optimization toward globally a optimal solution (and away from mere local optima). In this same sense, the linear cost function of Equation (22) approximates the non-linear cost function of Equation (11) and the linear dynamics model of Equations (15) and (16) approximates the non-linear dynamics model of Equation (5). - Note that, whilst the description of
Figures 6 to 9 assumes that the various components operates on trajectoriesx 0:N or , they could, alternatively or additionally, operate on the control actions computed at the various optimization/approximation stages, i.e. each approximator 602, 704 could alternatively or additionally be trained to determine an approximate series of seed control actions (approximating those of the MILP stage 302) and/or an approximate series of final control actions ũ 0:N-1 (approximating those of the NLP stage 304 - and which could in turn form (part of) or be used to derive the input to controller 108). The same applies to the single approximator ofFigure 9 , which could alternatively or additionally compute ũ 0:N-1 directly from the scenario description parameters. - The above techniques be implemented in an "onboard" or "offboard" context. One example of an offboard context would be the above training performed in an offboard computer system. The above techniques can also be implemented as part of a simulated runtime stack in order to test the performance of the runtime stack in a simulator. Simulation is an increasingly crucial component of safety and other performance testing for autonomous vehicles in particular.
- Note, where a function approximation approach is used, the function approximator(s) can be tested in the simulator, in order to assess their safety and performance before they are deployed on an actual vehicle.
- Nonlinear non-convex constrained optimization methods are not guaranteed to converge even if feasible solutions exist. They may, therefore, be deployed with the implementation of a fallback mechanism (e.g. emergency braking), which is not necessarily guaranteed to meet the same safety constraints. A goal in this context is to improve the convergence and solution quality of existing nonlinear constrained optimization methods, thus improving the safety of the overall system by reducing the reliance on the potentially unsafe fallback mechanism, while yielding lower-cost plans. This is achieved via an informed initialization step.
- As noted, the present techniques are not limited to real-time applications. The described techniques for improving convergence and achieving higher quality solutions might compromise runtime efficiency, depending on the implementation, but such implementations may nevertheless be suitable for applications that can tolerate longer planning times in return for higher-quality plans - such as high-fidelity simulation or urban driving.
- References herein to components, functions, modules and the like, including the optimizers of the first and
302, 304, thesecond stages 702, 604 and 900, and thefunction approximators verification component 712, denote functional components of a computer system which may be implemented at the hardware level in various ways. A computer system comprises one or more computers that may be programmable or non-programmable. A computer comprises one or more processors which carry out the functionality of the aforementioned functional components. A processor can take the form of a general-purpose processor such as a CPU (Central Processing unit) or accelerator (e.g. GPU) etc. or more specialized form of hardware processor such as an FPGA (Field Programmable Gate Array) or ASIC (Application-Specific Integrated Circuit). That is, a processor may be programmable (e.g. an instruction-based general-purpose processor, FPGA etc.) or non-programmable (e.g. an ASIC). A computer system may be implemented "onboard" a mobile robot (such as an AV) or "offboard" for example in a simulator context. - For example, an offboard implementation of the full multi-stage optimization could be implemented for the purpose of training on-board components, but also for other purposes, such as safety testing or other performance testing, verification etc. Performance testing could involve implementing any of the above trained
702, 604, 900 and (where applicable) thefunction approximators verification component 712 in an offboard computer system as part of the performance testing, applied to simulated scenarios. - In an on-board implementation, the multi-phase optimization may be implemented in full, or one or more (including all) optimization stages may be replaced with one or more function approximators.
- Reference is made in the above to the following:
- [1] W. Schwarting, J. Alonso-Mora, L. Paull, S. Karaman, and D. Rus, "Safe nonlinear trajectory generation for parallel autonomy with a dynamic vehicle model," IEEE Transactions on Intelligent Transportation Systems, vol. 19, no. 99, 2017.
- [7] I.E. Grossmann, V. Voudouris, O. Ghattas, et al., "Mixed-integer linear programming reformuilations [sic] for some nonlinear discrete design optimization problems," Technical Report, Carnegie Mellon University, 1991.
- [8] H.P. Williams, Model building in mathematical programming. John Wiley & Sons, 2018
- [9] J. Omer and J.-L. Farges, "Hybridization of nonlinear and mixed-integer linear programming for aircraft separation with trajectory recovery," IEEE Transactions on Intelligent Transportation Systems, vol 14, no. 3, pp. 1218-1230, 2013.
- [10] A. H. Land and A. G. Doig, "An automatic method for solving discrete programming problems," in 50 Years of Integer Programming 1958-2008, pp. 105-132, Springer, 2010.
- [11] A. Neumaier, "Complete search in continuous global optimization and constraint satisfaction," Acta numerica, vol. 13, pp. 271-369, 2004.
- [12] X. Geng and Y. Xi, "Suboptimality analysis of receding horizon predictive control with terminal constraints," IFAC Proceedings Volumes, vol. 32, no. 2, pp. 2984-2988, 1990.
- [13] L. Grune and A. Rantzer, "On the infinite horizon performance of receding horizon controllers," IEEE Transactions on Automatic Control, vol. 53, no. 9, pp. 2100-2111, 2008.
- The methodology described above has been subject to empirical testing to further demonstrate is efficacy.
- Reference is made to
Figure 10 , which shows, in the top half, an overtaking scenario somewhat analogous to that ofFigures 4 and5 . Specifically, the top scenario ofFigure 10 shows a scenario with overtaking ofstatic vehicles 402 with anoncoming vehicle 404 in residential driving. - The bottom half of
Figure 10 shows an alternative, junction scenario, which uses like reference numerals to denote theego vehicle 400 and anexternal actor 402. This is an unprotected right turn in a junction. - In this section we will show that:
- 1) our method is general and can be applied to a diversity of driving situations
- 2) the MILP stage provides the NLP stage with a better initialization when compared to simpler heuristics, leading to higher convergence rates, faster solving times and more optimal solutions;
- 3) our method leads to solutions that outperform a Nonlinear Model Predictive Control (NMPC) approach similar to the one presented in [E-25] in progress and comfort metrics.
- To that end, we implement the first stage (MILP; Problem 3) using Gurobi 8.1 [E-15], and the second stage (NLP; Problem 2) using IPOPT [29]. Both solvers have a timeout of 25s, after which the optimization is stopped. We use N = 40 and Δt = 0.2s for a trajectory horizon of 8s (Other parameters are listed in Appendix A). Without loss of generality, we assume left-hand traffic where drivers are expected to be on the left hand side of the road, that a route planner is available to generate a reference path the satisfies the planning goal in the local coordinate frame, and consider a constant velocity model for the prediction of dynamic agents. In the simulator, the behavior of other dynamic vehicles is based on the Intelligent Driver Model [E-27].
- VI.1 Generality:
Figures 11 and12 show two qualitatively distinct driving situations. InFigure 11 the ego-vehicle overtakes parked vehicles on a two-lane road while taking into account an oncoming vehicle on the other lands that is also overtaking a static vehicle in its own lane (the reference path) for overtaking, and returning to it as soon as possible, while always maintaining a smooth acceleration profile. The second example inFigure 12 shows an unprotected right turn into a main road.Figure 12(d) shows the ego-vehicle closely reference path (right turn into the main road) while smoothly responding to unforeseen behaviors of the dynamic vehicles. -
Figure 13 shows examples of the four different simulation scenarios we consider for experiments 2) and 3): (SO) static overtaking of parked vehicles on both side of the road; (SO+OV) static overtaking with a dynamic vehicle in the oncoming lane; (DO) dynamic overtaking of a slow moving vehicle; and (DO+OV) dynamic overtaking with an oncoming vehicle on the other lane. For the experiments, we generate datasets of examples per scenario by randomizing over relevant parameters (e.g. initial speeds and numbers of vehicles). More details are presented in Supplementary Material - Appendix B. - VI. 2 Initialization Comparison: We consider four heuristic initializations as alternatives to our MILP stage: ZEROS, in which all states and controls are initialized to zero; C.VEL, where the ego-vehicle is assumed to maintain its speed throughout the solution; C.ACC, where the ego-vehicle maintains a constant acceleration of 1 ms-2 until the maximum speed is achieved; and C.DEC where the ego-vehicle maintains a constant acceleration of -1 ms-2 until it stops.
- A dataset of 1000 examples per scenario was procedurally generated, solved using our method as well as suing only the NLP stage for the same constrains initialized by each of the four heuristics. Table D-I presents the percentage of solved examples within the allocated time (25s) per initialization method and scenario. As it can be observed, our method either performs similarly or outperforms other methods in all the scenarios considered. In
Figure 14 we analyze the problems that were solved using our initialization when compared to the other initialization methods. We observe that, across scenarios ZEROS and C.DEC lead to the highest percentage of solved problems (after ours) and same solutions, but also the highest percentage of different solutions. To analyze this further, for problems where a different solution is reached (darker bars inFigure 14 ) Table II shows the average % difference in cost that the alternatives achieve compared to ours. For problems with similar solutions (lighter bars inFigure 14 ), Table II presents the average % difference in solving time. - We see that, in general, other initialization methods obtain higher costs on average than ours. While C.ACC is able to achieve a lower average cost in some scenarios, it leads to a substantial number of unsolved examples, particularly for DO and DO+OV (see
Figure 14 ). In all scenarios where the same solutions is reached by an alternative method, it takes longer to solve on average. A noteworthy case is DO+OV initialized with C.ACC which takes 306.32% longer on average to reach the same solution as ours. These results validate our claim that we achieve higher convergence rates (as shown by Table I), faster solving times and more optimal solutions (as evidenced byFigure 14 and Table II). - VI.3 NMPC Baseline Comparison: We compare our method to an NMPC baseline which optimizes
Problem 2 with the same soft and hard constraint but in a receding-horizon fashion similar to [25]. This baseline uses the same interior-point optimization method as our NLP stage to solve each of the horizon steps. To evaluate the quality of the methods, we introduce the following metrics that compare our solution, , to the baseline's, : - Progress: our overall progress compared to the baseline's
- Reference path deviation: our average deviation from the reference path compared to the baseline's
- Target speed matching: for a given target speed vg, comparison of our average matching with the baseline's
- Jerk: comparison of our average jerk value with the baseline's
- We generate 1000 examples per scenario and solve them using our method and the NMPC baseline. The top of Table III presents the percentage of the examples that are solved by both methods. In all scenarios, our method solves more examples than the baseline. For the problems that are solved by both we show the metrics per scenario in the bottom of Table III. We achieve significantly higher progress and better velocity matching across scenarios, and similar or slightly smaller jerk values and deviation from the reference path. These results validate our claim that our method is better in progress and comfort metrics than the baseline that has a similar formulation to [E-25].
TABLE I: Initialization Comparison: percentage of solved examples per initialization method (rows) and scenario (columns). SO SO+OV DO DO+OV ZEROS 98.57% 95.55% 96.97% 93.48% C.VEL 96.85% 85.68% 42.42% 32.40% C.ACC 89.23% 64.43% 5.61% 1.97% C.DEC 98.58% 95.66% 88.79% 92.44% Ours 98.47% 95.77% 99.40% 98.14% TABLE III: NMPC Baseline Comparison: Top - percentage of solved problems per scenario. Bottom - average results for difference in progress ΔP, deviation from reference ΔD, velocity matching ΔuM, and jerk Δ|ȧ| (positive and higher is better). SO SO+OV DO DO+OV % Solved NMPC 96.15% 79.45% 89.17% 85.74% Ours 98.38% 96.50% 99.60% 98.65% Metrics ΔP (m) 7.12 8.37 4.63 8.04 ΔD (m) -0.03 0.06 0.10 0.03 0.81 0.79 0.25 0.49 0.01 0.01 0.01 0.0 TABLE II: Initialization Comparison: in cases where the optimum is different (left), the average percent difference in the objective function value (cost) between the alternative method and our solution is show (positive and higher is better for our method), with the respective number of examples in parenthesis; in cases in which the optimum is the same (right), the average difference in solving time as a percent of the one of our method is shown (positive and higher is better for our method), with the number of examples in parenthesis. Init. % Change in Cost (# situations) % Change in Solving Time (# situations) SO SO+OV DO DO+OV SO SO+OV DO DO+OV ZEROS +85.39(56) +68.53(126) +52.36(356) +25.96(403) +42.8(902) +47.62(728) +154.42(607) +104.44(483) C.VEL +60.24(44) +47.69(112) +29.55(64) +1.35(126) +12.53(903) +16.39(652) +70.01(358) +94.71(187) C.ACC +6.16(20) -10.77(100) +0.01(2) -7.67(5) +12.93(856) +19.16(478) +231.79(54) +306.32(14) C.DEC +85.01(55) +67.39(123) +36.79(322) +24.51(396) + 7.93(90:3) +3.48(782) +70.2:3(5G5) + 17.61(483) - The parameters used in the solving of
2 and 3 in the context of Sec. VI are defined in Table A1Problem TABLE A1: Parameters used in the MILP and NLP optimization stages Parameter Stage Value Parameter Stage Value L NLP 4.8 MILP -3 δ max NLP 0.45 MILP 3 α min NLP -3 MILP -0.5 α max NLP 3 MILP 0.5 ȧ max NLP 0.5 MILP 0.5 δ̇ max NLP 0.18 MILP 0.1 ȧ max NLP 0.5 MILP 0 v min NLP 0 MILP 3 v max NLP 10 MILP -1 ωx NLP 0.1 MILP 1 ωv NLP 2.5 Ω x MILP 0.9 ωy NLP 0.05 Ω v MILP 0.5 ωa NLP 1.0 Ω y MILP 0.05 ωδ NLP 2.0 Ω a MILP 0.4 ρ MILP 1.5 M MILP 104 d MILP 0.9 - For the randomly generated scenarios presented in Sec. V, we assume the ego vehicle has length 4.8m and width 1.9m. the example used in this work were preceduirally generated by uniform sampling the parameters of the scenarios, following the ranges defined in Tables B1, B2, B3, B4, and B5.
TABLE B1: Common parameters Parameter Min Max Number of lanes 2 2 Lane width (m) 3.5 4.3 Ego initial x (m) 0 0 initial y (m) bl (x) + 0.55 * 1.9 br (x) - 0.55 * 1.9 initial v (ms -1) 0 9.5 initial φ (rad) -π/12 +π/12 TABLE B2: Parameters of scenario so Parameter Min Max Number of static vehicles 2 6 Static vehicle x (m) 0 80 y (m) bl (x) br (x) width (m) 1.7 2.5 length (m) 4.0 8.0 TABLE B3: Parameters of scenario SO+OV Parameter Min Max Number of static vehicles 2 6 Static vehicle x (m) 0 80 y (m) bl (x) 0 width (m) 1.7 2.5 length (m) 4.0 8.0 Oncoming vehicle initial x (m) 20 80 initial y (m) br (x)/2 br (x)/2 initial v (ms -1) 1.0 8.5 width (m) 1.7 2.5 length (m) 4.0 8.0 TABLE B4: Parameters of scenario DO Parameter Min Max Dynamic vehicle initial x (m) 20 80 initial y (m) bl (x)/2 bl (x)/2 initial v (ms -1) 0.5 3.5 width (m) 1.7 2.5 length (m) 4.0 8.0 TABLE B5: Parameters of scenario DO+OV Parameter Min Max Oncoming vehicle initial x (m) 20 80 initial y (m) br (x)/2 br (x)/2 initial v (ms -1) 1.0 8.5 width (m) 1.7 2.5 length (m) 4.0 8.0 Dynamic vehicle initial x (m) 20 80 initial y (m) bl (x)/2 bl (x)/2 initial v (ms -1) 0.5 3.5 width (m) 1.7 2.5 length (m) 4.0 8.0
Claims (16)
- A computer-implemented method of determining control actions for controlling a mobile robot, the method comprising:receiving a set of scenario description parameters describing a scenario and a desired goal for the mobile robot therein;in a first constrained optimization stage, applying a first optimizer (302) to determine a first series of control actions that substantially globally optimizes a preliminary cost function for the scenario, wherein the desired goal is fixed during the first constrained optimization stage, and wherein the preliminary cost function is based on a first computed trajectory of the mobile robot, as computed by applying a preliminary robot dynamics model to the first series of control actions, andin a second constrained optimization stage, applying a second optimizer (304) to determine a second series of control actions that substantially globally optimizes a full cost function for the scenario, wherein the desired goal is fixed during the second constrained optimisation stage, and wherein the full cost function is based on a second computed trajectory of the mobile robot, as computed by applying a full robot dynamics model to the second series of control actions;wherein initialization data comprising at least one of the first computed trajectory and the first series of control actions is used to initialize the second optimizer (304) for determining the second series of control actions, and wherein the preliminary robot dynamic model approximates the full robot dynamics model, the cost functions embody similar objectives to each encourage achievement of the desired goal, and both are optimized with respect to similar hard constraints, such that the initialization data guides the second optimizer (304) to the substantially globally-optimal second series of control actions.
- The method of claim 1, wherein each computed trajectory is determined, based on an initial mobile robot state, as a series of subsequent mobile robot states;wherein each mobile robot state of the first computed trajectory is determined by applying the preliminary robot dynamics model to at least the previous mobile robot state of the first computed trajectory and a corresponding control action of the first series of control actions;wherein each mobile robot state of the second computed trajectory is determined by applying the full robot dynamics model to at least the previous mobile robot state of the second computed trajectory and a corresponding control action of the second series of control actions.
- The method of claim 2, wherein the preliminary robot dynamics model is linearly dependent on at least the previous mobile robot state of the first computed trajectory and the corresponding control action of the first series of control actions, and the full robot model is non-linearly dependent on at least one of the previous mobile robot state of the second computed trajectory and the corresponding control action of the second series of control actions.
- The method of any preceding claim, wherein the first optimizer (302) is a mixed integer linear programming (MILP) optimizer, and the second optimizer (304) is a non-linear programming (NLP) optimizer;
wherein the first optimizer (302) optionally applies a receding horizon approximation to iteratively optimize component costs of the preliminary cost function, and thereby determine the first series of control actions, and wherein the second optimizer (304) optionally does not use any receding horizon approximation and instead optimizes the full cost function as a whole. - The method of claim 4, wherein the hard constraints of the first stage (302) comprise one or more mixed integer collision avoidance constraints for one or more static or moving obstacles in the scenario and/or one or more mixed integer permitted area constraints for keeping the mobile robot within a permitted area, and wherein the hard constraints of the second stage (304) comprise one or more similar collision avoidance and/or permitted area constraints formulated in terms of non-integer variables.
- The method of any preceding claim, wherein the goal is defined relative to a reference path, and each cost function encourages achievement of the goal by penalizing at least one of lateral deviation from the reference path, and longitudinal deviation from a reference location on the reference path;wherein each of the computed trajectories is optionally represented in a frame of reference defined by the reference path;wherein the preliminary cost function is optionally linearly dependent on said lateral and/or longitudinal deviation, and the full cost function is optionally non-linearly dependent thereon;wherein both cost functions optionally penalize deviation from a target speed.
- The method of any preceding claim, wherein the method is implemented in a planner (106) of a mobile robot and comprises the step of using control data of at least one of: the second computed trajectory, and the second series of control actions to control motion of the mobile robot.
- The method of any of claim 1 to 6, wherein the method is performed repeatedly for different scenarios so as to generate a first training set (700) comprising inputs to the first optimizer (302) and corresponding outputs computed by the first optimizer (302), and the training set is used to train a first function approximator to approximate the first optimizer (302) and;
wherein the method is optionally performed repeatedly for different scenarios so as to generate a second training set (700) comprising inputs to the second optimizer (304) and corresponding outputs computed by the second optimizer (304), and the training set is used to train a second function approximator to approximate the second optimizer (304). - The method of any of claims 1 to 6, wherein the method is performed repeatedly for different scenarios so as to generate a training set comprising inputs to the first optimizer (302) and corresponding outputs computed by the second optimizer (304), and the training set is used to train a single function approximator to approximate both of the first (302) and the second optimizers (304), the method optionally comprising the step of configuring a runtime stack (100) of mobile robot to implement the single function approximator.
- The method of claim 8, comprising the step of configuring a runtime stack (100) of mobile robot to implement one of the following combinations:(i) the first optimiser (302) and the second function approximator, wherein the second function approximator cooperates with the first optimiser (302) within the run-time stack (100) to approximate the second optimization stage (304);(ii) the first function approximator and the second optimizer (304), wherein the first function approximator approximates the first optimization stage (302) for initializing the second optimizer (304); and(iii) the first function approximator and the second function approximator, which cooperate to approximate both optimization stages.
- The method of claim 10, wherein the runtime stack (100) is configured to implement one of combinations (i) and (ii), or the single function approximator, and the method comprises an additional step of configuring the runtime stack (100) with a verification component (702) configured to verify an output of the second function approximator or the single function approximator.
- The method of any of claims 8 to 11, wherein the or each function approximator has a neural network architecture.
- A computer system comprising
one or more optimization components configured to implement the methods of claims 1 to 12. - The computer system of claim 13 , embodied in a mobile robot, wherein the computer system is further configured to control the motion of the mobile robot via one or more actuators (112) of the mobile robot using control data provided by the second optimization component (304).
- The computer system of claim 14, wherein the mobile robot is an autonomous vehicle.
- A computer program for programming one or more computers to implement the method of any of claims 1 to 12.
Applications Claiming Priority (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| GBGB2001202.7A GB202001202D0 (en) | 2020-01-28 | 2020-01-28 | Planning in mobile robots |
| GBGB2001200.1A GB202001200D0 (en) | 2020-01-28 | 2020-01-28 | Planning in mobile robots |
| GBGB2001277.9A GB202001277D0 (en) | 2020-01-30 | 2020-01-30 | Planning in mobile robots |
| PCT/EP2021/052040 WO2021152050A1 (en) | 2020-01-28 | 2021-01-28 | Planning in mobile robots |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| EP4078320A1 EP4078320A1 (en) | 2022-10-26 |
| EP4078320B1 true EP4078320B1 (en) | 2024-08-14 |
Family
ID=74550621
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| EP21703178.0A Active EP4078320B1 (en) | 2020-01-28 | 2021-01-28 | Planning in mobile robots |
Country Status (3)
| Country | Link |
|---|---|
| US (1) | US12351164B2 (en) |
| EP (1) | EP4078320B1 (en) |
| WO (1) | WO2021152050A1 (en) |
Families Citing this family (14)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| FR3074348B1 (en) * | 2017-11-24 | 2020-09-11 | Dassault Aviat | AIRCRAFT MISSION CALCULATION SYSTEM INCLUDING A MISSION PLATE |
| US12187269B2 (en) * | 2020-09-30 | 2025-01-07 | Toyota Motor Engineering & Manufacturing North America, Inc. | Optical sense-compute solution for real-time navigation involving multiple vehicles |
| US12179795B2 (en) * | 2021-05-24 | 2024-12-31 | Nvidia Corporation | Using arrival times and safety procedures in motion planning trajectories for autonomous vehicles |
| US12060060B1 (en) * | 2021-08-31 | 2024-08-13 | Zoox, Inc. | Vehicle control using trajectory clustering and reactive prediction |
| CN116481532B (en) * | 2022-10-26 | 2025-08-15 | 复旦大学 | Monomer unmanned aerial vehicle autonomous motion planning method based on imitation learning |
| GB202302098D0 (en) | 2023-02-14 | 2023-03-29 | Five Ai Ltd | Planning in mobile robots |
| GB202302097D0 (en) | 2023-02-14 | 2023-03-29 | Five Ai Ltd | Planning in mobile robots |
| GB202308283D0 (en) | 2023-06-02 | 2023-07-19 | Five Ai Ltd | Motion planning in mobile robots |
| GB202308270D0 (en) | 2023-06-02 | 2023-07-19 | Five Ai Ltd | Trajectory evaluation for mobile robots |
| CN116985122B (en) * | 2023-07-16 | 2025-10-21 | 西北工业大学 | Method and system for deploying human-machine collaborative space tethered robots based on parameter optimization |
| CN118226875B (en) * | 2024-05-23 | 2024-08-06 | 山东科技大学 | Safety track tracking control method for underwater unmanned ship under comprehensive constraint |
| CN119596695B (en) * | 2024-11-29 | 2025-11-18 | 大连理工大学 | A Constraint-Simplified Trajectory Tracking Method for Omnidirectional All-Drive Robots |
| CN119942001B (en) * | 2025-04-07 | 2025-07-18 | 深圳市富越机电设备有限公司 | Intelligent overturning control method and device based on machine vision |
| CN120245021B (en) * | 2025-06-09 | 2025-08-01 | 天津工业大学 | Distributed anti-collision control method and system for incomplete multi-robot system |
Family Cites Families (28)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US9315178B1 (en) | 2012-04-13 | 2016-04-19 | Google Inc. | Model checking for autonomous vehicles |
| US9645577B1 (en) | 2016-03-23 | 2017-05-09 | nuTonomy Inc. | Facilitating vehicle driving and self-driving |
| US10146224B2 (en) | 2016-11-09 | 2018-12-04 | GM Global Technology Operations LLC | Processor-implemented systems and methods for automated driving |
| US20180292830A1 (en) | 2017-04-06 | 2018-10-11 | Uber Technologies, Inc. | Automatic Tuning of Autonomous Vehicle Cost Functions Based on Human Driving Data |
| US10678247B2 (en) | 2017-08-28 | 2020-06-09 | GM Global Technology Operations LLC | Method and apparatus for monitoring of an autonomous vehicle |
| US10754339B2 (en) | 2017-09-11 | 2020-08-25 | Baidu Usa Llc | Dynamic programming and quadratic programming based decision and planning for autonomous driving vehicles |
| US20190220016A1 (en) | 2018-01-15 | 2019-07-18 | Uber Technologies, Inc. | Discrete Decision Architecture for Motion Planning System of an Autonomous Vehicle |
| US10884422B2 (en) | 2018-04-16 | 2021-01-05 | Baidu Usa Llc | Method for generating trajectories for autonomous driving vehicles (ADVS) |
| US11378961B2 (en) | 2018-04-17 | 2022-07-05 | Baidu Usa Llc | Method for generating prediction trajectories of obstacles for autonomous driving vehicles |
| CN112203918B (en) | 2018-05-31 | 2023-11-21 | 北美日产公司 | Trajectory planning |
| KR20210074366A (en) | 2018-10-16 | 2021-06-21 | 파이브 에이아이 리미티드 | Autonomous vehicle planning and forecasting |
| WO2020079698A1 (en) | 2018-10-19 | 2020-04-23 | A.D Knight Ltd. | Adas systems functionality testing |
| US11106212B2 (en) | 2019-03-26 | 2021-08-31 | Baidu Usa Llc | Path planning for complex scenes with self-adjusting path length for autonomous driving vehicles |
| US11353878B2 (en) | 2019-03-26 | 2022-06-07 | Baidu Usa Llc | Soft-boundary based path optimization for complex scenes for autonomous driving vehicles |
| EP3726497B1 (en) | 2019-04-15 | 2025-04-09 | Zenuity AB | Autonomous decisions in traffic situations with planning control |
| EP4031428B1 (en) | 2019-09-18 | 2024-11-13 | C.R.F. Società Consortile per Azioni | Model-based design of trajectory planning and control for automated motor-vehicles in a dynamic environment |
| US12291233B2 (en) | 2019-09-27 | 2025-05-06 | Honda Motor Co., Ltd. | System and method for providing accurate trajectory following for automated vehicles in dynamic environments |
| US11390300B2 (en) | 2019-10-18 | 2022-07-19 | Uatc, Llc | Method for using lateral motion to optimize trajectories for autonomous vehicles |
| EP3812954B1 (en) | 2019-10-21 | 2025-09-03 | Zenuity AB | Performance monitoring and evaluation of a vehicle adas or autonomous driving feature |
| US11753023B2 (en) | 2020-01-19 | 2023-09-12 | Mitsubishi Electric Research Laboratories, Inc. | Adaptive control of autonomous or semi-autonomous vehicle |
| US20230089978A1 (en) | 2020-01-28 | 2023-03-23 | Five AI Limited | Planning in mobile robots |
| US11714971B2 (en) | 2020-01-31 | 2023-08-01 | Nissan North America, Inc. | Explainability of autonomous vehicle decision making |
| US11650590B2 (en) * | 2020-03-26 | 2023-05-16 | Mitsubishi Electric Research Laboratories, Inc. | Adaptive optimization of decision making for vehicle control |
| EP3929050B1 (en) | 2020-06-22 | 2025-11-12 | Zenuity AB | Assessment of a vehicle control system |
| US11458991B2 (en) | 2020-06-29 | 2022-10-04 | Woven Planet North America, Inc | Systems and methods for optimizing trajectory planner based on human driving behaviors |
| FR3112746B1 (en) | 2020-07-23 | 2022-11-11 | Renault Sas | method for determining a trajectory of a motor vehicle |
| US11958498B2 (en) | 2020-08-24 | 2024-04-16 | Toyota Research Institute, Inc. | Data-driven warm start selection for optimization-based trajectory planning |
| US20220121213A1 (en) * | 2020-10-21 | 2022-04-21 | Automotive Research & Testing Center | Hybrid planning method in autonomous vehicle and system thereof |
-
2021
- 2021-01-28 WO PCT/EP2021/052040 patent/WO2021152050A1/en not_active Ceased
- 2021-01-28 EP EP21703178.0A patent/EP4078320B1/en active Active
- 2021-01-28 US US17/796,209 patent/US12351164B2/en active Active
Also Published As
| Publication number | Publication date |
|---|---|
| EP4078320A1 (en) | 2022-10-26 |
| US20230081921A1 (en) | 2023-03-16 |
| US12351164B2 (en) | 2025-07-08 |
| WO2021152050A1 (en) | 2021-08-05 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| EP4078320B1 (en) | Planning in mobile robots | |
| EP4081876B1 (en) | Planning in mobile robots | |
| Wei et al. | A behavioral planning framework for autonomous driving | |
| Sun et al. | A fast integrated planning and control framework for autonomous driving via imitation learning | |
| Eiras et al. | A two-stage optimization-based motion planner for safe urban driving | |
| Batkovic et al. | Real-time constrained trajectory planning and vehicle control for proactive autonomous driving with road users | |
| Wang et al. | Predictive maneuver planning for an autonomous vehicle in public highway traffic | |
| Lefevre et al. | Autonomous car following: A learning-based approach | |
| CN114846425A (en) | Prediction and planning of mobile robots | |
| Gupta et al. | Interaction-aware trajectory planning for autonomous vehicles with analytic integration of neural networks into model predictive control | |
| Chen et al. | Deep reinforcement learning in autonomous car path planning and control: A survey | |
| Jiang et al. | Obstacle avoidance of autonomous vehicles with CQP-based model predictive control | |
| Claussmann et al. | A path planner for autonomous driving on highways using a human mimicry approach with binary decision diagrams | |
| Nair et al. | Predictive control for autonomous driving with uncertain, multimodal predictions | |
| Lattarulo et al. | A linear model predictive planning approach for overtaking manoeuvres under possible collision circumstances | |
| Wang et al. | Safe reinforcement learning for automated vehicles via online reachability analysis | |
| DeCastro et al. | Counterexample-guided safety contracts for autonomous driving | |
| Gritschneder et al. | Adaptive learning based on guided exploration for decision making at roundabouts | |
| CN112429004A (en) | Automatic lane changing control method for vehicle | |
| WO2024202373A1 (en) | System and method for controlling motion of an ego vehicle | |
| Sarvesh et al. | Reshaping local path planner | |
| US12428025B2 (en) | Multi-profile quadratic programming (MPQP) for optimal gap selection and speed planning of autonomous driving | |
| Liu et al. | Synergizing Decision Making and Trajectory Planning Using Two-Stage Optimization for Autonomous Vehicles | |
| Chen et al. | Drive with quantified risk: A probabilistic approach for occlusion-aware trajectory planning | |
| Gao et al. | Balancing accuracy and efficiency: Fast motion planning based on nonlinear model predictive control |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: UNKNOWN |
|
| STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE |
|
| PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
| STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE |
|
| 17P | Request for examination filed |
Effective date: 20220722 |
|
| AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
| DAV | Request for validation of the european patent (deleted) | ||
| DAX | Request for extension of the european patent (deleted) | ||
| STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: EXAMINATION IS IN PROGRESS |
|
| 17Q | First examination report despatched |
Effective date: 20230628 |
|
| RAP3 | Party data changed (applicant data changed or rights of an application transferred) |
Owner name: FIVE AI LIMITED |
|
| REG | Reference to a national code |
Ref country code: DE Free format text: PREVIOUS MAIN CLASS: G05D0001020000 Ref country code: DE Ref legal event code: R079 Ref document number: 602021017189 Country of ref document: DE Free format text: PREVIOUS MAIN CLASS: G05D0001020000 Ipc: G05D0001644000 |
|
| GRAP | Despatch of communication of intention to grant a patent |
Free format text: ORIGINAL CODE: EPIDOSNIGR1 |
|
| STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: GRANT OF PATENT IS INTENDED |
|
| RIC1 | Information provided on ipc code assigned before grant |
Ipc: G05D 107/13 20240101ALN20240212BHEP Ipc: G05D 105/22 20240101ALN20240212BHEP Ipc: G05D 109/10 20240101ALN20240212BHEP Ipc: G05D 1/644 20240101AFI20240212BHEP |
|
| INTG | Intention to grant announced |
Effective date: 20240306 |
|
| GRAS | Grant fee paid |
Free format text: ORIGINAL CODE: EPIDOSNIGR3 |
|
| GRAA | (expected) grant |
Free format text: ORIGINAL CODE: 0009210 |
|
| STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE PATENT HAS BEEN GRANTED |
|
| AK | Designated contracting states |
Kind code of ref document: B1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
| REG | Reference to a national code |
Ref country code: GB Ref legal event code: FG4D |
|
| REG | Reference to a national code |
Ref country code: CH Ref legal event code: EP |
|
| REG | Reference to a national code |
Ref country code: DE Ref legal event code: R096 Ref document number: 602021017189 Country of ref document: DE |
|
| REG | Reference to a national code |
Ref country code: IE Ref legal event code: FG4D |
|
| REG | Reference to a national code |
Ref country code: LT Ref legal event code: MG9D |
|
| REG | Reference to a national code |
Ref country code: NL Ref legal event code: MP Effective date: 20240814 |
|
| PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: NO Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20241114 |
|
| REG | Reference to a national code |
Ref country code: AT Ref legal event code: MK05 Ref document number: 1713897 Country of ref document: AT Kind code of ref document: T Effective date: 20240814 |
|
| PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: NL Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20240814 Ref country code: PL Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20240814 Ref country code: GR Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20241115 Ref country code: PT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20241216 Ref country code: FI Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20240814 |
|
| PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: BG Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20240814 |
|
| PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: LV Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20240814 |
|
| PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: AT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20240814 Ref country code: IS Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20241214 |
|
| PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: HR Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20240814 |
|
| PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: RS Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20241114 Ref country code: ES Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20240814 |
|
| PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: RS Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20241114 Ref country code: PT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20241216 Ref country code: PL Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20240814 Ref country code: NO Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20241114 Ref country code: NL Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20240814 Ref country code: LV Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20240814 Ref country code: IS Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20241214 Ref country code: HR Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20240814 Ref country code: GR Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20241115 Ref country code: FI Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20240814 Ref country code: ES Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20240814 Ref country code: BG Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20240814 Ref country code: AT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20240814 |
|
| PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: DE Payment date: 20250106 Year of fee payment: 5 |
|
| PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: SM Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20240814 Ref country code: DK Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20240814 Ref country code: RO Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20240814 |
|
| PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: EE Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20240814 |
|
| PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: CZ Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20240814 |
|
| PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: SK Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20240814 Ref country code: IT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20240814 |
|
| PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: GB Payment date: 20250102 Year of fee payment: 5 |
|
| REG | Reference to a national code |
Ref country code: DE Ref legal event code: R097 Ref document number: 602021017189 Country of ref document: DE |
|
| PLBE | No opposition filed within time limit |
Free format text: ORIGINAL CODE: 0009261 |
|
| STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT |
|
| 26N | No opposition filed |
Effective date: 20250515 |
|
| REG | Reference to a national code |
Ref country code: CH Ref legal event code: PL |
|
| PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: SE Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20240814 |
|
| PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: MC Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20240814 Ref country code: LU Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20250128 |
|
| PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: BE Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20250131 |
|
| PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: FR Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20250131 |
|
| PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: CH Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20250131 |
|
| REG | Reference to a national code |
Ref country code: BE Ref legal event code: MM Effective date: 20250131 |
|
| REG | Reference to a national code |
Ref country code: DE Ref legal event code: R082 Ref document number: 602021017189 Country of ref document: DE Representative=s name: WESTPHAL, MUSSGNUG & PARTNER PATENTANWAELTE MI, DE |