WO2023219178A1 - System and method for controlling motion of one or more devices along a calculated optimal trajectory - Google Patents
System and method for controlling motion of one or more devices along a calculated optimal trajectory Download PDFInfo
- Publication number
- WO2023219178A1 WO2023219178A1 PCT/JP2023/018849 JP2023018849W WO2023219178A1 WO 2023219178 A1 WO2023219178 A1 WO 2023219178A1 JP 2023018849 W JP2023018849 W JP 2023018849W WO 2023219178 A1 WO2023219178 A1 WO 2023219178A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- trajectory
- constraints
- initial
- constraint
- motion
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/02—Control of position or course in two dimensions
- G05D1/021—Control of position or course in two dimensions specially adapted to land vehicles
- G05D1/0212—Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
- G05D1/0221—Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory involving a learning process
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01C—MEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
- G01C21/00—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
- G01C21/20—Instruments for performing navigational calculations
- G01C21/206—Instruments for performing navigational calculations specially adapted for indoor navigation
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/0005—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots with arrangements to save energy
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/0088—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots characterized by the autonomous decision making process, e.g. artificial intelligence, predefined behaviours
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/02—Control of position or course in two dimensions
- G05D1/021—Control of position or course in two dimensions specially adapted to land vehicles
- G05D1/0212—Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
- G05D1/0217—Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory in accordance with energy consumption, time reduction or distance reduction criteria
-
- G—PHYSICS
- G08—SIGNALLING
- G08G—TRAFFIC CONTROL SYSTEMS
- G08G1/00—Traffic control systems for road vehicles
- G08G1/16—Anti-collision systems
- G08G1/161—Decentralised systems, e.g. inter-vehicle communication
Definitions
- the present disclosure relates generally to motion planning of devices, and more particularly to a system and a method for controlling motion of one or more devices from an initial state to a target state in an environment having obstacles.
- Optimal motion planning of devices, in a workspace with obstacles, to achieve specific positions while avoiding collisions with the obstacles and between themselves has multiple applications, such as wheeled ground robots, aircraft on ground, and flying drones.
- a task in such motion planning problems comprises of constraints on a motion trajectory of the devices.
- the constraints can encode requirements on a target position of each of the devices, avoidance of collision with each other and other static/moving obstacles in the workspace.
- aerial drones can be tasked to reach certain positions to inspect a structure while avoiding collisions with other structures and between themselves.
- An aircraft can be tasked to move on airport grounds to reach its assigned runaway or gate while avoiding other aircrafts, carts, and structures.
- Mobile robots on a factory floor can be tasked to reach their work positions or positions where they need to unload cargo while avoiding work areas, human workers, and other robots.
- motion planning problem is a non-convex optimization problem.
- the motion planning problem refers to determination of a motion trajectory of the device for performing the task.
- solving the non-convex optimization problem requires a significant amount of computation and memory resources.
- Motion planners are used to solve the non-convex optimization problem.
- An example of a non-convex optimization-based motion planner is using mixed-integer programming (MIP) approach.
- MIP mixed-integer programming
- the MIP approach involves solving a constrained optimization problem, where a cost function is minimized or maximized subject to some constraints, some optimization variables are real-valued (continuous variables), and others are integer or binary-valued (discrete-variables).
- MILP/MIQP mixed-integer linear/quadratic programming
- MILP/MIQP mixed-integer linear/quadratic programming
- the MILP/MIQP is known to terminate in finite time, and hence bounds on time to compute a solution can be derived.
- the computation is still complex (NP-Hard) and may incur significant computation cost.
- NP-Hard complex
- the state of the device may include one or a combination of a position of the device, an orientation of the device, and a velocity of the device.
- the environment includes obstacles.
- the device may be an autonomous vehicle, such as a mobile robot, an aerial vehicle, a water surface vehicle, or an underwater vehicle.
- the performance of the task requires satisfaction of constraints on motion of the device. In other words, the performance of the task is subject to the constraints.
- constraints include reaching a target position, avoiding collision with the obstacles, avoiding collision with other devices, moving the device such that a state of the device (e.g., position) is within a pre-defined set of admissible states of the device, and the like.
- a drone can be tasked to reach certain positions to inspect a structure while avoiding collisions with parts of the structure.
- motion planning problem is a non-convex optimization problem.
- the motion planning problem refers to determination of the motion trajectory of the device for performing the task.
- solving the non-convex optimization problem requires a significant amount of computation and memory resources.
- Some embodiments are based on the recognition that the admissible states and a set of device configurations/states at which all the constraints are satisfied, is typically non-convex.
- motion planning problem is a non-convex optimization problem.
- the motion planning problem refers to determination of the motion trajectory of the device for performing the task.
- solving the non-convex optimization problem requires a significant amount of computation and memory resources.
- Some embodiments are based on understanding that various learned functions, such as neural networks, can be trained offline by a training system having sufficient computational requirements and executed online by a controller having limited computational capabilities.
- Some embodiments are based on a recognition that given an initial trajectory that satisfies the constraints, it is possible to optimize the initial trajectory to also ensure the constraints. Moreover, such an optimization would be convex. In other words, having the initial trajectory can transform the non- convex optimization problem subject to the hard constraints into a convex optimization problem subject to the hard constraints when the initial trajectory is feasible.
- the initial trajectory generated by the learned function trained subject to the soft constraints does not guarantee feasibility and, in general, cannot guarantee such transformation.
- some embodiments are based on the realization that the transformation from the non-convex optimization problem into the convex optimization problem is due to a possibility of convexification around the initial trajectory providing a union of a time-varying collection of convex regions including the target state.
- Each convex region is generated by considering a part of the initial trajectory and facets of the obstacles.
- the initial trajectory does not have to be feasible.
- the initial trajectory may be infeasible as long as the convex regions include the initial state and the target state.
- Some embodiments are based on intuition supported by numerous tests and experiments, which learned function trained to minimize a rate of violation of the constraints generates the initial trajectories suitable for such transformation.
- An example of such training is reinforcement learning (RL), which is an area of machine learning concerned with how intelligent devices ought to take actions in the environment in order to maximize a cumulative reward.
- RL reinforcement learning
- some embodiments are based on recognition that stochastic uncertainty may be present in dynamics of the device due to nonlinearities, actuation mismatch, and unmodelled phenomena, and in the positions of the devices and the obstacles due to sensing limitations.
- the constraints are replaced by probabilistic safety constraints (also referred to as chance constraints).
- the optimization-based safety filter is configured to enforce the probabilistic safety constraints as hard constraints to ensure safety with a high probability.
- the optimization-based safety filter enforces the probabilistic safety constraints based on convex chance approximations of the probabilistic safety constraints.
- one embodiment discloses a controller for controlling a motion of a device from an initial state to a target state in an environment having obstacles forming constraints on the motion of the device.
- the controller comprises a processor; and a memory having instructions stored thereon that, when executed by the processor, cause the controller to: execute a learned function trained with machine learning to generate a feasible or infeasible trajectory connecting the initial state of the device with the target state of the device while penalizing an extent of violation of at least some of the constraints to produce an initial trajectory; solve a convex optimization problem subject to the constraints to produce an optimal trajectory that minimizes deviation from the initial trajectory; and control the motion of the device according to the optimal trajectory.
- another embodiment discloses a method for controlling a motion of a device from an initial state to a target state in an environment having obstacles forming constraints on the motion of the device.
- the method comprises executing a learned function trained with machine learning to generate a feasible or infeasible trajectory connecting the initial state of the device with the target state of the device while penalizing an extent of violation of at least some of the constraints to produce an initial trajectory; solving a convex optimization problem subject to the constraints to produce an optimal trajectory that minimizes deviation from the initial trajectory; and controlling the motion of the device according to the optimal trajectory.
- yet another embodiment discloses a non-transitory computer-readable storage medium embodied thereon a program executable by a processor for performing a method for controlling a motion of a device from an initial state to a target state in an environment having obstacles forming constraints on the motion of the device.
- the method comprises executing a learned function trained with machine learning to generate a feasible or infeasible trajectory connecting the initial state of the device with the target state of the device while penalizing an extent of violation of at least some of the constraints to produce an initial trajectory; solving a convex optimization problem subject to the constraints to produce an optimal trajectory that minimizes deviation from the initial trajectory; and controlling the motion of the device according to the optimal trajectory.
- FIG. 1A illustrates an example environment where a device is controlled, according to an embodiment of the present disclosure.
- FIG. IB illustrates a block diagram of a controller for controlling a motion of the device from an initial state to a target state in an environment, according to an embodiment of the present disclosure.
- FIG. 1C illustrates an example initial trajectory, according to an embodiment of the present disclosure.
- FIG. ID illustrates a non-convex collision-free region determined by executing an optimization-based safety filter, according to an embodiment of the present disclosure.
- FIG. IE illustrates an approach for generating convex regions defining the non- convex collision-free region, according to an embodiment of the present disclosure.
- FIG. IF illustrates multiple halfspaces generated by the optimization-based safety filter as the device navigates in the environment, according to an embodiment of the present disclosure.
- FIG. 2A illustrates an environment including multiple devices, wherein each device has a respective target state, according to an embodiment of the present disclosure.
- FIG. 2B illustrates a schematic for motion planning of the multiple devices, according to an embodiment of the present disclosure.
- FIG. 2C illustrates a schematic for the motion planning of the multiple devices, according to an alternate embodiment of the present disclosure.
- FIG. 3 illustrates a schematic of the Markov Decision Process (MDP) used in reinforcement learning, according to some embodiments of the present disclosure.
- MDP Markov Decision Process
- FIG. 4 illustrates a schematic of a deep neural network used as the prediction model (i.e., the learned function), according to some embodiments of the present disclosure.
- FIG. 5A illustrates a convex approximation of a constraint that requires the device to stay within a pre-determined environment polytope, according to some embodiments of the present disclosure.
- FIG. 5B illustrates a convex approximation of an obstacle avoidance constraint, according, to some embodiments of the present disclosure.
- FIG. 5C illustrates a convex approximation of a device-device collision avoidance constraint, according to some embodiments of the present disclosure.
- FIG. 5D shows a block diagram for formulation of a constraint on state x t of the device when the state x t is stochastic and non-stochastic, according to some embodiments of the present disclosure.
- FIG. 6 shows an example of a two-dimensional projection of a control-invariant set corresponding to a constraint set, according to an embodiment of the present disclosure.
- FIG. 7 shows a block diagram of a method for controlling the motion of the device from the initial state to the target state in the environment having obstacles, according to an embodiment of the present disclosure.
- FIG. 8 is a schematic illustrating a computing device for implementing the methods and systems/controllers of the present disclosure.
- FIG. 1 A illustrates an example of environment 100 where a device 101 is controlled, according to an embodiment of the present disclosure. It is an objective of some embodiments to control a motion of the device 101 from an initial state 103 to a target state 105 in the environment 100. It is also an objective of some embodiments to determine a motion trajectory of the device 101 to perform a task, for example, reaching the target state 105.
- the state of the device 101 may include one or a combination of a position of the device 101, an orientation of the device 101, and a velocity of the device 101.
- the environment 100 includes obstacles 107 and 109.
- the device 101 may be an autonomous vehicle, such as a mobile robot, an aerial vehicle, a water surface vehicle, or an underwater vehicle.
- the performance of the task requires satisfaction of constraints on the motion of the device 101. In other words, the performance of the task is subject to the constraints.
- constraints include reaching the target state 105, avoiding collision with the obstacles 107 and 109 (i.e., a device-obstacle collision avoidance constraint), avoiding collision with other devices (i.e., an inter-device collision avoidance constraint), moving the device 101 such that the state of the device 101 (e.g., position) is within a pre-defined set of admissible states of the device 101 (i.e., a keep-in constraint), and the like.
- a drone can be tasked to reach certain positions to inspect a structure while avoiding collisions with parts of the structure.
- some embodiments provide a controller 111 for controlling the motion of the device 101 from the initial state 103 to the target state 105 in the environment 100.
- FIG. IB shows a block diagram of the controller 111, according to an embodiment of the present disclosure.
- the controller 111 includes a processor 113 and a memory 115.
- the processor 113 may be a single core processor, a multi-core processor, a computing cluster, or any number of other configurations.
- the processor 113 is an embedded processing unit (EPU).
- the memory 115 may include random access memory (RAM), read only memory (ROM), flash memory, or any other suitable memory systems. Additionally, in some embodiments, the memory 115 may be implemented using a hard drive, an optical drive, a thumb drive, an array of drives, or any combinations thereof.
- the controller 111 is communicatively coupled to the device 101. For instance, the controller 111 may be remotely connected to the device 101, via a network. In an alternate embodiment, the controller 111 is embedded in the device 101.
- motion planning problem is a non-convex optimization problem.
- the motion planning problem refers to determination of the motion trajectory of the device 101 for performing the task.
- solving the non-convex optimization problem requires a significant amount of computation and memory resources.
- Some embodiments are based on an understanding that various learned functions associated with machine learning, such as neural networks, can be trained offline by a training system having sufficient computational requirements and executed online by a controller having limited computational capabilities.
- the machine learning is related to computational statistics, which is probabilistic in nature.
- the learned functions such as neural networks are not equipped to manage hard constraints but designed to satisfy the constraints with some probabilities.
- a hard constraint refers to a constraint that “must” be satisfied at all times.
- the probabilistic notion of constraints is analogous or a direct replication of notion of soft constraints, when violation of the constraints is penalized but not prohibited.
- Some embodiments are based on a recognition that given an initial trajectory that satisfies the constraints, it is possible to optimize the initial trajectory to also ensure the satisfaction of constraints. Moreover, such an optimization would be convex. In other words, having the initial trajectory can transform the non-convex optimization problem subject to the hard constraints into a convex optimization problem subject to the hard constraints when the initial trajectory is feasible.
- the initial trajectory generated by the learned function trained subject to the soft constraints does not guarantee feasibility and, in general, cannot guarantee such transformation.
- some embodiments are based on the realization that the transformation from the non-corlvex optimization problem into the convex optimization problem is due to a possibility of convexification around the initial trajectory providing a union of a time-varying collection of convex regions including the target state 105.
- the initial trajectory does not have to be feasible.
- the initial trajectory may be infeasible as long as the convex regions include the initial state 103 and the target state 105.
- FIG. 1C illustrates an example of initial trajectory 117, according to an embodiment of the present disclosure.
- the processor 113 (shown in FIG. IB) is configured to execute the learned function to produce the initial trajectory 117 connecting the initial state 103 with the target state 105 while penalizing an extent of violation of at least some of the constraints.
- the learned function may be stored in the memory 115 (shown in FIG. IB) of the controller 111.
- the learned function is a neural network trained using a reinforcement learning based on a reward function. The reward function penalizes the extent of violation of the constraints.
- the reward function returns higher values when the initial trajectory 117 satisfies the constraints as compared to when the constraints are violated (i.e., the reward function penalizes upon violation of the constraints).
- an algorithm such as a proximal policy optimization is used to determine an initial trajectory that maximizes the reward function.
- the initial trajectory 117 is an infeasible trajectory as it overlaps with the obstacle 107. Further, given the initial trajectory 117, the processor 1 13 executes an optimization-based safety filter to determine a non-convex collision-free region.
- FIG. 1 D shows a non-convex collision-free region 119 determined by executing an optimization-based safety filter, according to an embodiment of the present disclosure.
- the optimization-based safety filter may be stored in the memory 115 (shown in FIG. IB) of the controller 111.
- the processor 113 (shown in FIG. IB) solves the convex optimization problem subject to the hard constraints to produce an optimal trajectory 121 that minimizes deviation from the initial trajectory 117 and lies within the non- convex collision-free region 119.
- the processor 113 controls the device 101 according to the optimal trajectory 121. For example, based on the optimal trajectory 121, the controller 111 determines control inputs to actuators of the device 101 to track the optimal trajectory 121.
- the device 101 since the optimal trajectory lies within the non-convex collision-free region 119, the device 101, when controlled according to the optimal trajectory 121, does not overlap with the obstacles 107 and 109. For example, even when the device 101 makes tight comers 123 and 125 when executing the optimal trajectory 121, the device 101 does not overlap with the obstacle 107.
- the non-convex collision-free region 119 is the union of time-varying collection of convex regions. Each convex region is generated by considering a part of the initial trajectory 117 and facets of the obstacles 107 and 109.
- FIG. IE illustrates an approach for generating convex regions defining the non-convex collision-free region 119, according to an embodiment of the present disclosure.
- the convex regions are generated by generating halfspaces. Specifically, as the device 101 starts at the initial state 103, the optimization-based safety filter projects a current location of the device 101 (i.e., the initial state 103) onto the obstacles 107 and 109, and results supporting halfspaces 127 and 129. Intersection of the halfspaces 127 and 129 forms a convex region 131 that includes the device 101 at the initial state 103.
- the processor 113 determines a trajectory that deviates minimum from the initial trajectory 117 and maintains the device 101 in the convex region 131.
- the supporting halfspaces are continuously updated based on the current state of the device 101.
- the device 101 reaches a position 133
- the supporting halfspace for the obstacle 107 changes from 129 to 135, and the convex region changes from 131 to 137.
- the device 101 reaches a position 139
- the supporting halfspace for the obstacle 107 changes from 135 to 141, and the convex region changes from 137 to 143.
- FIG. IF illustrates all the halfspaces 127, 129, 135, 141, 145, 147, and 149, generated by the optimization-based safety filter as the device 101 navigates in the environment 100, according to an embodiment of the present disclosure.
- the intersection ofthe halfspaces 127, 129, 135, 141, 145, 147, and 149 forms the convex regions 131, 137, and 143, which collectively define the non-convex collision-free region 119.
- some embodiments are based on realization that such a framework of motion planning (which is based on the learned function and the optimization-based safety filter) can be extended for motion planning of multiple devices, wherein each device has a respective target state.
- FIG. 2A shows an environment 200 including multiple devices, wherein each device has a respective target state, according to an embodiment of the present disclosure.
- the devices 201a, 201b, 201c, and 201d are required to reach their assigned target states 203a, 203b, 203c, and ' 203d.
- the environment 200 includes obstacles 205a, 205b, 205c, 205d, 205e, and 201f. It is an object of some embodiments to determine an optimal trajectory for each device that connects each device with their respective target states while avoiding collision with each other and with the obstacles 205a, 205b, 205c, 205d, 205e, and 20 If.
- the controller 111 is executed on a remote centralized server and is communicatively coupled to each device.
- the learned function is formulated for each device to produce an initial trajectory for each device, separately.
- the convex optimization problem is not solved separately for each device, the convex optimization problem is transformed into a joint-optimization problem that simultaneously determines an optimal trajectory for each device based on the respective initial trajectory.
- the controller 111 includes a number of the learned functions equal to a number of devices, and an optimization-based safety filter that simultaneously determines the optimal motion trajectory for each device by solving the joint-optimization problem.
- Such a motion planning of the multiple devices is advantageous because a single-device learned function is easier to train, requires less computational resources, and demonstrates superior performance as compared to training the neural network (or a model) that provides the initial trajectories for all devices simultaneously.
- the motion planning of the multiple devices is explained below in FIG. 2B. t
- FIG. 2B shows a schematic for the motion planning of the multiple devices, according to an embodiment of the present disclosure.
- Learned functions 207a-207d produces initial trajectories 209a-209d, respectively.
- the / initial trajectories 209a-209d are associated with the devices 20 la-20 Id, respectively.
- the initial trajectories 209a-209d are applied to an optimizationbased safety filter 211.
- the optimization-based safety filter 211 solves the jointoptimization problem to simultaneously determine optimal trajectories 231a- 213d.
- the optimal trajectories 231 a-213d are associated with the devices 201 a- 20 Id, respectively.
- the controller 111 controls each device based on their respective optimal trajectory. In particular, the controller 111 controls each device based on a pre-determined number of states of their respective optimal trajectory.
- the devices 20 la-20 Id are controlled based on their respective optimal trajectory, the devices 201a-201d attain new states 215a-215d.
- the new states 215a-215d are fed back 217 to the learned functions 207a-207d, respectively. Further, the learned functions 207a-207d produces new initial trajectories based on the new states 215a-215d and the above-mentioned process is repeated.
- FIG. 2C shows a schematic for the motion planning of the multiple devices, according to an alternate embodiment of the present disclosure.
- Each of the devices 20 la-20 Id includes a controller (similar to the controller 111).
- the device 201a includes a controller 219 including a learned function 219a and an optimization-based safety filter 219b.
- the learned function 219a is configured to produce an initial trajectory for the device 201a.
- the optimization-based safety filter 219b is configured to solve a convex optimization problem subject to the constraints to produce an optimal trajectory that minimizes deviation from the initial trajectory of the device 201a.
- the controller 219 controls a motion of the device 201a based on the optimal trajectory produced for the device 201a.
- the device 201b includes a controller 221 including a learned function 221a and an optimization-based safety filter 221b
- the device 201c includes a controller 223 including a learned function 223a and an optimization-based safety filter 223b
- the device 20 Id includes a controller 225 including a learned function 225a and an optimization-based safety filter 225b.
- the learned function of each device produces an initial trajectory for their respective device and subsequently the optimization-based safety filter of each device produces an optimal trajectory for the corresponding device.
- K is a stabilizing gain matrix that renders closed-loop system is stable, and a matrix F satisfies C(I — (A + BKyy ⁇ BF — Identity(d), and r(fc) is a reference position command.
- cj, l ⁇ j ⁇ N o denotes nominal mean positions of the obstacles for all N o obstacles.
- the obstacles are assumed to be circles with radii y ; -, 1 ⁇ j ⁇ N o respectively.
- stochastic dynamics is also considered for the device 101, where a process noise w and a measurement noise q ⁇ k) are stochastic disturbances affecting the dynamics (la) of the device 101.
- MDP deterministic Markov Decision Process
- observation vector o(k) is defined as a concatenated vector containing the state x , a displacement of the device's current position to the target and to the N o static obstacles , where Cj denotes nominal mean positions of the obstacles for all .
- the position of the device p(k) is a linear transformation of the observation vector o(k).
- Action space An action a(k) determines a reference position as a bounded perturbation a to the target position q, such that 3. Step function".
- a next state x(k + 1) is obtained using the closed-loop stabilized dynamics (lb).
- the stochastic dynamics (1c) can also be used to setup a similar step function.
- FIG. 3 shows a schematic of the MDP, according to some embodiments of the present disclosure.
- a policy 301 converts an observation 303 (denoted above as o(k) ) into an action 305 (denoted above as a(k)).
- a reward function 307 is a function of the observation 303 and the action 305, as defined in (3).
- a world 309 consists of a system 311, a measurement model 313, and a clock 315.
- the system 311 uses the step function defined based on the closed-loop dynamics (lb), a current internal state x(k), a current action a(k), and a current reference control r (k) to arrive at a next internal state x(k 4- 1).
- the measurement model 313 converts a state into an appropriate observation vector.
- the clock 315 increments time k to k + 1.
- Some embodiments of the present disclosure use function approximators to characterize the policy 301. Consequently, a suitable prediction model that provides an initial trajectory (e.g., the initial trajectory 11'7) can be constructed.
- the prediction model corresponds to the learned function.
- the prediction model may be deterministic or stochastic and may rely on neural architectures or kernel representations. Examples of deterministic prediction models include multi-layer perceptron, convolutional neural networks, kernel regression and support vector machines, and the like. Examples of stochastic prediction models include Bayesian neural networks, neural processes, Gaussian processes, Kriging interpolation, and the like.
- FIG. 4 shows a schematic of a deep neural network 400 used as the prediction model (i.e., the learned function), according to some embodiments of the present disclosure.
- an input signal 401 is provided to the deep neural network 400 through an input layer 403.
- the input layer 403 is a feature extraction layer and may be linear or comprise convolutional filters in order to extract useful characteristics from the input signal 401.
- transformation is linear, and a feature
- VF 0 and w 0 form weights and biases of the input layer 403 and X is the input to the deep neural network 400. Consequently, the features are passed through hidden layers 405, and they are transformed via activation functions a(-) such as rectified linear units (ReLUs), sigmoid functions, or modifications and combinations thereof. Operations occurring in the hidden layers 405 can be represented using compositions of functions, where an output from a final hidden layer is given by
- An output layer 407 is used here for regression and therefore, can be linear, so a prediction output 409 is and training the deep neural network 400 involves obtaining good values of the weights and biases in the input layer 403, in the output layer 407 and in each of the hidden layers 405.
- the reinforcement learning algorithms are used to train the prediction models (or learned function) for the initial trajectory, where another prediction model for the policy is obtained such that a cumulative sum of reward functions is maximized.
- the reinforcement learning algorithms may include Q-leaming, a state-action-reward-state-action, a deep Q network, a deep deterministic policy gradient, an asynchronous actor-critic algorithm, a trust region policy optimization, and/or a proximal policy optimization.
- Some embodiments are based on a recognition that the constraints can include static constraints, such as a wall of a building, and dynamic constraints, such as a moving obstacle changing its position in a function of time. Some embodiments are based on a recognition that if the static constraints are considered during the training of the learned function, presence of unknown dynamic constraints still can be managed by the convex optimization problem. This is because the dynamic constraints are less stringent with the added flexibility of the time domain. To that end, the constraints managed by the convex optimization problem can differ from the constraint used during the training of the learned function. Thus, the learned function is trained to penalize the violation of only the static constraint, and the convex optimization problem is solved subject to the static constraints and the dynamic constraints.
- some embodiments are based on recognition that stochastic uncertainty may be present in dynamics of the device due to nonlinearities, actuation mismatch, and unmodelled phenomena, and in the positions of the devices and the obstacles due to sensing limitations.
- the constraints are replaced by probabilistic safety constraints (also referred to as chance constraints).
- the optimization-based safety filter is configured to enforce the probabilistic safety constraints as hard constraints to ensure safety with a high probability.
- the optimization-based safety filter enforces the probabilistic safety constraints based on convex chance approximations of the probabilistic safety constraints, as described in FIG. 5 A, FIG. 5B, and FIG. 5C.
- the convex optimization problem can be formulated as a convex quadratic program where a safe control action planning horizon T and N A number of devices.
- a safe control action planning horizon T and N A number of devices.
- the convex quadratic program Given the initial trajectory by the learned function, denoted by for the convex quadratic program minimizes quadratic objective subject to constraints arising from the dynamics (la), control constraints on )» constraints on a resulting motion trajectory that perform the task or come closer to performing the task with a high user-specified likelihood within a time horizon T, and recursive feasibility constraints that ensure that the convex quadratic program is feasible at future time steps as well.
- Some embodiments of the present disclosure consider a task of ensuring that an overall system having multiple devices is probabilistically collectively safe, a collection of constraints that ensures safe motion planning for the multiple devices.
- FIG. 5A illustrates a convex approximation of the first constraint
- the device 101 must stay within a pre-determined environment polytope 501, according to some embodiments of the present disclosure.
- the constraints are naturally convex for non-stochastic dynamics (la) of the device 101.
- the stochastic dynamics (1c) of the device 101 the device 101 is required to stay within a smaller set 503. Tightening of the environment polytope 501 by a tightened region 505 leads to formation of the smaller set 503, which arises due to stochasticity of the dynamics.
- similar convex approximations can be constructed for additional constraints on the state of the device 101 including its velocity .
- FIG. 5B illustrates a convex approximation of the second constraint - do not collide with N o static obstacles, according to some embodiments of the present disclosure .
- a region 507 denotes a geometry of an obstacle j with a center 509 (denoted by cy).
- the region 507 is avoided by the device 101 if the device position is in a linear halfspace 511 that is contained in a complement of a linear halfspace 513 including the region 507.
- the halfspace 511 is constructed from an obstacle border 515 and a normal vector 517 and tightened by a tightened region 519 to account for the stochastic uncertainty. The tightening required is explained in FIG. 5D.
- FIG. 5C illustrates a convex approximation of the third constraint
- devices 52 land 523 do not collide with other devices, according to some embodiments of the present disclosure.
- devices 52 land 523 do not collide.
- a coordinate frame defined by axes 525 and 527 is attached to the device 521.
- Dynamics of the device 523 in the coordinate frame is given by where w 523-521 (fc) and 7)523-521 (70 are process and measurement noise in the coordinate frame defined by the axes 525 and 527.
- w 523-521 (fc) and 7)523-521 (70 are process and measurement noise in the coordinate frame defined by the axes 525 and 527.
- w and 77 these noises are Gaussian.
- the device 523 should stay below a boundary 529.
- the boundary 529 is obtained after growing a region 531 with a tightening region 533.
- the region 531 contains a device geometry 535, where a Minkowski sum of the device geometry 535 is considered with its negative image.
- a resulting motion 537 stays below the boundary 529
- FIG. 5D shows a block diagram for formulation of a constraint 539 on state x t when x t is stochastic and non-stochastic, according to some embodiments of the present disclosure.
- it is checked if the state x t is stochastic. If the state x t is non-stochastic, then constraint tightening is not required 543. On the other hand, if the state x t is stochastic, then the constraint 539 becomes a chance constraint 545 such that the constraint 539 holds with a likelihood no smaller than 6. Consequently, the constraint 539 is tightened using a deterministic reformulation 547.
- some embodiments of the present disclosure are based on the realization that the constraints do not require tightening when the dynamics are non-stochastic, and they simplify to their original convex approximations in such an event.
- Some embodiments are based on the realization that recursive feasibility constraints are required for practical implementation of the controller 111 in a model predictive paradigm. Specifically, mean positions of the devices are required to stay inside appropriately defined control invariant sets so that the devices can be controlled safely despite the stochastic uncertainty.
- An example of the control invariant set is described below in FIG.
- FIG. 6 shows an example of a two-dimensional projection of a control-invariant set 603 corresponding to a constraint set 601 , according to an embodiment of the present disclosure.
- the constraint set 601 may be a multi-dimensional polytope determined by hyperplanes, which are represented by linear inequalities, along multiple dimensions corresponding to the constraints on the optimal trajectory of the device 101.
- the constraint set 601 encodes the states of the device 101 that are safe. Any state of the device 101 within the control invariant set 603, there exists a control command maintaining the state of the device 101 within the control invariant set 603 for known or admissible future states.
- any state of the device 101 such as a state 615 within the control invariant set 603 and within all possible control commands 617-623 that the controller 11 1 can execute, there is at least one control command 623 that maintains the state of the device 101 within the control invariant set 603.
- a state 605 can be feasible for one iteration, but all control commands 607-613 that the controller 111 is allowed to take during the next iteration can bring the state of the device 101 outside of the constraint set 601.
- Examples of the constraint set 601 include the convex approximations of the admissible workspace as discussed in FIG. 5A, 5B, and 5C.
- FIG. 7 shows a block diagram of a method 700 for controlling motion of a device from an initial state to a target state in an environment having obstacles, according to an embodiment of the present disclosure.
- the method 700 obtaining parameters of the task from the device 101 and/or a remote server.
- the parameters of the task include the state of the device 101.
- the parameters may include constraints that characterize the task, for example, one or a combination of the initial state of the device 101, the target state of the device 101, a geometrical configuration of one or multiple stationary obstacles defining at least a part of the constraint, and motion of moving obstacles defining at least a part of the constraint.
- the parameters are submitted to a learned function.
- the method 700 includes executing the learned function trained with machine Teaming to produce an initial trajectory connecting the initial state of the device with the target state of the device.
- the learned function is a neural network trained using a reinforcement learning based on a reward function that penalizes an extent of violation of the constraints.
- the method 700 includes solving a convex optimization problem subject to the constraints to produce an optimal trajectory that minimizes deviation from the initial trajectory.
- an optimization based safety filter solves the convex optimization problem subject to the constraints as hard constraints, to produce the optimal trajectory.
- the method 700 includes controlling the motion of the device 101 according to the optimal trajectory.
- FIG. 8 is a schematic illustrating a computing device 800 for implementing the methods and systems/controllers of the present disclosure.
- the computing device 800 includes a power source 801, a processor 803, a memory 805, a storage device 807, all connected to a bus 809. Further, a highspeed interface 811, a low-speed interface 813, high-speed expansion ports 815 and low speed connection ports 819, can be connected to the bus 809. In addition, a low-speed expansion port 817 is in connection with the bus 809. Further, an input interface 821 can be connected via the bus 809 to an external receiver 823 and an output interface 825. A receiver 827 can be connected to an external transmitter 829 and a transmitter 831 via the bus 809.
- NIC network interface controller
- NIC network interface controller
- the memory 805 can store instructions that are executable by the computing device 800 and any data that can be utilized by the methods and systems of the present disclosure.
- the memory 805 can include random access memory (RAM), read only memory (ROM), flash memory, or any other suitable memory systems.
- RAM random access memory
- ROM read only memory
- flash memory or any other suitable memory systems.
- the memory 805 can be a volatile memory unit or units, and/or a non-volatile memory unit or units.
- the memory 805 may also be another form of computer-readable medium, such as a magnetic or optical disk.
- the storage device 807 can be adapted to store supplementary data and/or software modules used by the computer device 800.
- the storage device 807 can include a hard drive, an optical drive, a thumb-drive, an array of drives, or any combinations thereof.
- the storage device 807 can contain a computer-readable medium, such as a floppy disk device, a hard disk device, an optical disk device, or a tape device, a flash memory or other similar solid- state memory device, or an array of devices, including devices in a storage area network or other configurations.
- Instructions can be stored in an information carrier. The instructions, when executed by one or more processing devices (for example, the processor 803), perform one or more methods, such as those described above.
- the computing device 800 can be linked through the bus 809, optionally, to a display interface or user Interface (HMI) 847 adapted to connect the computing device 800 to a display device 849 and a keyboard 851, wherein the display device 849 can include a computer monitor, camera, television, projector, or mobile device, among others.
- the computer device 800 may include a printer interface to connect to a printing device, wherein the printing device can include a liquid inkjet printer, solid ink printer, large-scale commercial printer, thermal printer, UV printer, or dyesublimation printer, among others.
- the high-speed interface 811 manages bandwidth-intensive operations for the computing device 800, while the low-speed interface 813 manages lower bandwidth-intensive operations. Such allocation of functions is an example only.
- the high-speed interface 811 can be coupled to the memory 805, the user interface (HMI) 847, and to the keyboard 851 and the display 849 (e.g., through a graphics processor or accelerator), and to the high-speed expansion ports 815, which may accept various expansion cards via the bus 809.
- the low-speed interface 813 is coupled to the storage device 807 and the low-speed expansion ports 817, via the bus 809.
- the low-speed expansion ports 817 which may include various communication ports (e.g., USB, Bluetooth, Ethernet, wireless Ethernet) may be coupled to the one or more input/output devices 841.
- the computing device 800 may be connected to a server 853 and a rack server 855.
- the computing device 800 may be implemented in several different forms. For example, the computing device 800 may be implemented as part of the rack server 855.
- individual embodiments may be described as a process which is depicted as a flowchart, a flow diagram, a data flow diagram, a structure diagram, or a block diagram. Although a flowchart may describe the operations as a sequential process, many of the operations can be performed in parallel or concurrently. In addition, the order of the operations may be rearranged. A process may be terminated when its operations are completed but may have additional steps not discussed or included in a figure. Furthermore, not all operations in any particularly described process may occur in all embodiments.
- a process may correspond to a method, a function, a procedure, a subroutine, a subprogram, etc. When a process corresponds to a function, the function’s termination can correspond to a return of the function to the calling function or the main function.
- embodiments of the subject matter disclosed may be implemented, at least in part, either manually or automatically.
- Manual or automatic implementations may be executed, or at least assisted, using machines, hardware, software, firmware, middleware, microcode, hardware description languages, or any combination thereof.
- the program code or code segments to perform the necessary tasks may be stored in a machine-readable medium.
- a processor(s) may perform the necessary tasks.
- embodiments of the present disclosure and the functional operations described in this specification can be implemented in digital electronic circuitry, in tangibly embodied computer software or firmware, in computer hardware, including the structures disclosed in this specification and their structural equivalents, or in combinations of one or more of them.
- some embodiments of the present disclosure can be implemented as one or more computer programs, i.e., one or more modules of computer program instructions encoded on a tangible non-transitory program carrier for execution by, or to control the operation of, data processing apparatus.
- program instructions can be encoded on an artificially generated propagated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal, which is generated to encode information for transmission to suitable receiver apparatus for execution by a data processing apparatus.
- the computer storage medium can be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of them.
- a computer program (which may also be referred to or described as a program, software, a software application, a module, a software module, a script, or code) can be written in any form of programming language, including compiled or (interpreted languages, or declarative or procedural languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.
- a computer program may, but need not, correspond to a file in a file system.
- a program can be stored in a portion of a file that holds other programs or data, e.g., one or more scripts stored in a markup language document, in a single file dedicated to the program in question, or in multiple coordinated files, e.g., files that store one or more modules, sub programs, or portions of code.
- a computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network.
- Computers suitable for the execution of a computer program include, by way of example, can be based on general or special purpose microprocessors or both, and any other kind of central processing unit.
- a central processing unit will receive instructions and data from a read only memory or a random access memory or both.
- the essential elements of a computer are a central processing unit for performing or executing instructions and one or more memory devices for storing instructions and data.
- a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto optical disks, or optical disks.
- mass storage devices for storing data, e.g., magnetic, magneto optical disks, or optical disks.
- a computer need not have such devices.
- a computer can be embedded in another device, e.g., a mobile telephone, a personal digital assistant (PDA), a mobile audio or video player, a game console, a Global Positioning System (GPS) receiver, or a portable storage device, e.g., a universal serial bus (USB) flash drive, to name just a few.
- PDA personal digital assistant
- GPS Global Positioning System
- USB universal serial bus
- a computer having a display device, e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor, for displaying information to the user and a keyboard and a pointing device, e.g., a mouse or a trackball, by which the user can provide input to the computer.
- a display device e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor
- a keyboard and a pointing device e.g., a mouse or a trackball
- Other kinds of devices can be used to provide for interaction with a user as well; for example, feedback provided to the user can be any form of sensory feedback, e.g., visual feedback, auditory feedback, or tactile feedback; and input from the user can be received in any form, including acoustic, speech, or tactile input.
- a computer can interact with a user by sending documents to and receiving documents from a device that is used by the user; for example, by sending web pages to
- Embodiments of the subject matter described in this specification can be implemented in a computing system that includes a back end component, e.g., as a data server, or that includes a middleware component, e.g., an application server, or that includes a front end component, e.g., a client computer having a graphical user interface or a Web browser through which a user can interact with an implementation of the subject matter described in this specification, or any combination of one or more such back end, middleware, or front end components.
- the components of the system can be interconnected by any form or medium of digital data communication, e.g., a communication network. Examples of communication networks include a local area network (“LAN”) and a wide area network (“WAN”), e.g., the Internet.
- LAN local area network
- WAN wide area network
- the computing system can include clients and servers.
- a client and server are generally remote from each other and typically interact through a communication network.
- the relationship of client and server arises by virtue of computer programs running on the respective computers and having a clientserver relationship to each other.
Landscapes
- Engineering & Computer Science (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Automation & Control Theory (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Aviation & Aerospace Engineering (AREA)
- Business, Economics & Management (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Game Theory and Decision Science (AREA)
- Medical Informatics (AREA)
- Feedback Control In General (AREA)
Abstract
Description
Claims
Priority Applications (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202380038874.3A CN119173827A (en) | 2022-05-13 | 2023-05-15 | System and method for controlling the movement of one or more devices along a calculated optimal trajectory |
| EP23730594.1A EP4500288A1 (en) | 2022-05-13 | 2023-05-15 | System and method for controlling motion of one or more devices along a calculated optimal trajectory |
| JP2024554222A JP7738778B2 (en) | 2022-05-13 | 2023-05-15 | SYSTEM AND METHOD FOR CONTROLLING THE MOTION OF ONE OR MORE DEVICES ALONG A CALCULATED OPTIMUM TRAJECTORY - Patent application |
Applications Claiming Priority (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202263364650P | 2022-05-13 | 2022-05-13 | |
| US63/364,650 | 2022-05-13 | ||
| US18/049,293 US20230367336A1 (en) | 2022-05-13 | 2022-10-25 | System and Method for Controlling Motion of One or More Devices |
| US18/049,293 | 2022-10-25 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2023219178A1 true WO2023219178A1 (en) | 2023-11-16 |
Family
ID=86764982
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/JP2023/018849 Ceased WO2023219178A1 (en) | 2022-05-13 | 2023-05-15 | System and method for controlling motion of one or more devices along a calculated optimal trajectory |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2023219178A1 (en) |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN118192263A (en) * | 2024-04-22 | 2024-06-14 | 北京航空航天大学 | A spacecraft rendezvous and docking control method and system based on safety reinforcement learning |
| CN119910665A (en) * | 2025-04-02 | 2025-05-02 | 德清县浙工大莫干山研究院 | Robot trajectory tracking method, device and medium based on finite time neural network |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2021152047A1 (en) * | 2020-01-28 | 2021-08-05 | Five AI Limited | Planning in mobile robots |
| US20220055651A1 (en) * | 2020-08-24 | 2022-02-24 | Toyota Research Institute, Inc. | Data-driven warm start selection for optimization-based trajectory planning |
-
2023
- 2023-05-15 WO PCT/JP2023/018849 patent/WO2023219178A1/en not_active Ceased
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2021152047A1 (en) * | 2020-01-28 | 2021-08-05 | Five AI Limited | Planning in mobile robots |
| US20220055651A1 (en) * | 2020-08-24 | 2022-02-24 | Toyota Research Institute, Inc. | Data-driven warm start selection for optimization-based trajectory planning |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN118192263A (en) * | 2024-04-22 | 2024-06-14 | 北京航空航天大学 | A spacecraft rendezvous and docking control method and system based on safety reinforcement learning |
| CN119910665A (en) * | 2025-04-02 | 2025-05-02 | 德清县浙工大莫干山研究院 | Robot trajectory tracking method, device and medium based on finite time neural network |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| EP4500288A1 (en) | System and method for controlling motion of one or more devices along a calculated optimal trajectory | |
| US11868894B2 (en) | Distributed training using actor-critic reinforcement learning with off-policy correction factors | |
| EP3485432B1 (en) | Training machine learning models on multiple machine learning tasks | |
| EP3568810B1 (en) | Action selection for reinforcement learning using neural networks | |
| US12367373B2 (en) | Controlling robots using entropy constraints | |
| EP3788549B1 (en) | Stacked convolutional long short-term memory for model-free reinforcement learning | |
| US20200151515A1 (en) | Environment navigation using reinforcement learning | |
| WO2023219178A1 (en) | System and method for controlling motion of one or more devices along a calculated optimal trajectory | |
| EP3698284B1 (en) | Training an unsupervised memory-based prediction system to learn compressed representations of an environment | |
| US11281208B2 (en) | Efficient teleoperation of mobile robots via online adaptation | |
| Wang et al. | A continuous actor-critic reinforcement learning approach to flocking with fixed-wing UAVs | |
| WO2020092437A1 (en) | Determining control policies by minimizing the impact of delusion | |
| JP2023086671A (en) | Systems and methods for controlling motion of spacecraft in multi-object celestial systems | |
| Desaraju et al. | Leveraging experience for robust, adaptive nonlinear MPC on computationally constrained systems with time-varying state uncertainty | |
| JP2023174507A (en) | System and method for indirect data-driven control under constraints | |
| Pereira et al. | Deep L 1 Stochastic Optimal Control Policies for Planetary Soft Landing | |
| Papaioannou et al. | Rolling horizon coverage control with collaborative autonomous agents | |
| US20240311641A1 (en) | Adaptive q learning in dynamically changing environments | |
| Henna et al. | Satellite fault tolerant attitude control based on expert guided exploration of reinforcement learning agent | |
| Zhang et al. | Model-based event-triggered adaptive formation control for underactuated surface vehicles via the minimal learning parameter technique | |
| Jardine | A reinforcement learning approach to predictive control design: autonomous vehicle applications | |
| US12366450B2 (en) | System and method for controlling a vehicle in an environment | |
| CN115185291B (en) | Path following control method and device, aircraft, intelligent terminal and storage medium | |
| Lin et al. | Adaptive attitude and guidance control of dual-motor driven unmanned vessels | |
| Xu et al. | Rotation matrix-based finite-time trajectory tracking control of benthic AUV with communication bandwidth limitation |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23730594 Country of ref document: EP Kind code of ref document: A1 |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2024554222 Country of ref document: JP |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2023730594 Country of ref document: EP |
|
| ENP | Entry into the national phase |
Ref document number: 2023730594 Country of ref document: EP Effective date: 20241024 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |