CN111273668A

CN111273668A - Unmanned vehicle motion track planning system and method for structured road

Info

Publication number: CN111273668A
Application number: CN202010099122.6A
Authority: CN
Inventors: 彭育辉; 张垚; 范贤波; 钟聪
Original assignee: Fuzhou University
Current assignee: Fuzhou University
Priority date: 2020-02-18
Filing date: 2020-02-18
Publication date: 2020-06-12
Anticipated expiration: 2040-02-18
Also published as: CN111273668B

Abstract

The invention relates to a motion trail planning system of an unmanned vehicle for a structured road, which comprises a sensing module, a positioning module, a lane change decision module, a motion planning module and a trail tracking module, wherein the sensing module is used for sensing the motion trail of the unmanned vehicle; the lane change decision module outputs decision actions according to data collected by the sensing module and the positioning module; and the motion planning module outputs an optimal track to the track tracking module according to the decision action. The invention uses deep reinforcement learning to make decisions, and modules such as trajectory planning, perception and control based on decision action dynamics are independently processed, thereby greatly improving interpretability and operability of a decision planning process compared with an end-to-end method, and being well adapted to the system architecture of the existing unmanned automobile.

Description

Motion trajectory planning system and method for unmanned vehicles for structured roads

技术领域technical field

本发明属于无人驾驶技术领域，具体涉及一种针对结构化道路的无人驾驶汽车运动轨迹规划系统及方法。The invention belongs to the technical field of unmanned driving, and in particular relates to a system and method for planning the motion trajectory of unmanned vehicles for structured roads.

背景技术Background technique

无人驾驶汽车的决策与运动规划系统的目标是使无人车像熟练的驾驶员一样产生安全、合理的驾驶行为，规划出一条安全的行驶轨迹。传统的决策规划系统大多是基于规则的决策规划方法，其逻辑明确、规划推理能力强，但需要预先想到无人车会遇到的各种场景，系统比较复杂。近年来使用端到端卷积神经网络进行决策规划的方法被应用到无人车上，使决策规划系统大幅简化，系统直接输入相机获得的各帧图像，经由神经网络决策后直接输出车辆目标转向盘转角，但是其解释性较差。The goal of the decision-making and motion planning system of unmanned vehicles is to make unmanned vehicles produce safe and reasonable driving behaviors like skilled drivers, and plan a safe driving trajectory. Most of the traditional decision-making planning systems are rule-based decision-making and planning methods with clear logic and strong planning reasoning ability, but they need to anticipate various scenarios that unmanned vehicles will encounter, and the system is relatively complex. In recent years, the method of decision-making planning using end-to-end convolutional neural networks has been applied to unmanned vehicles, which greatly simplifies the decision-making planning system. The system directly inputs each frame of images obtained by the camera, and directly outputs the vehicle target steering after the neural network makes decisions. Disc corners, but they are less interpretable.

发明内容SUMMARY OF THE INVENTION

有鉴于此，本发明的目的在于提供一种针对结构化道路的无人驾驶汽车运动轨迹规划系统及方法。In view of this, the purpose of the present invention is to provide a system and method for planning the motion trajectory of an unmanned vehicle for structured roads.

为实现上述目的，本发明采用如下技术方案：To achieve the above object, the present invention adopts the following technical solutions:

一种针对结构化道路的无人驾驶汽车运动轨迹规划系统，包括感知模块、定位模块、变道决策模块、运动规划模块和轨迹追踪模块；所述变道决策模块根据感知模块和定位模块采集的数据输出决策动作；所述运动规划模块根据决策动作，输出最优轨迹至轨迹追踪模块。An unmanned vehicle motion trajectory planning system for structured roads, comprising a perception module, a positioning module, a lane change decision module, a motion planning module and a trajectory tracking module; the lane change decision module is based on the data collected by the perception module and the positioning module. Data output decision action; the motion planning module outputs the optimal trajectory to the trajectory tracking module according to the decision action.

进一步的，所述感知模块包括激光雷达、毫米波雷达和运动摄像机。Further, the perception module includes lidar, millimeter-wave radar, and motion cameras.

进一步的，所述定位模块包括GPS卫星定位系统、惯性导航仪和网络差分模块。Further, the positioning module includes a GPS satellite positioning system, an inertial navigator and a network differential module.

一种针对结构化道路的无人驾驶汽车运动轨迹规划方法，包括以下步骤：A method for planning the motion trajectory of an unmanned vehicle for structured roads, comprising the following steps:

步骤S1：采集实时行驶数据；Step S1: collect real-time driving data;

步骤S2：将采集的行驶数据进行坐标转化，将车辆在Frenet坐标系下的数据转换为车辆在全局坐标系下的数据；Step S2: transform the collected driving data into coordinates, and convert the data of the vehicle in the Frenet coordinate system into the data of the vehicle in the global coordinate system;

步骤S3：确定状态空间、动作空间和动作价值回报函数；Step S3: Determine the state space, action space and action value return function;

步骤S4：构建深度强化学习模型；Step S4: constructing a deep reinforcement learning model;

步骤S5：将周围环境信息、决策动作、车辆自身状态作为输入，通过深度强化学习模型计算，输出无人车执行不同动作的Q值，并根据输出的最大Q值选取相应的决策动作；Step S5: take the surrounding environment information, decision-making action, and the state of the vehicle as input, calculate through the deep reinforcement learning model, output the Q value of the unmanned vehicle performing different actions, and select the corresponding decision-making action according to the output maximum Q value;

步骤S6：根据决策模块给出的决策动作，计算出Frenet坐标下的目标配置；Step S6: according to the decision-making action given by the decision-making module, calculate the target configuration under the Frenet coordinates;

步骤S7：根据无人车当前位置的初始配置与目标配置，在预设的时间间隔内规划出一个含有时间戳的轨迹集；Step S7: According to the initial configuration and target configuration of the current position of the unmanned vehicle, plan a trajectory set containing a timestamp within a preset time interval;

步骤S8：建立损失函数，从生成的轨迹集中选择一条最优轨迹作为追踪轨迹。Step S8: establish a loss function, and select an optimal trajectory from the generated trajectory set as a tracking trajectory.

进一步的，所述全局坐标系下数据包括：位置(x，y)、速度v_x、加速度a_x转换成Frenet坐标系下的纵向位移s、纵向速度

纵向加速度

和横向位移1、横向速度

和横向加速度

Further, the data under the global coordinate system includes: position (x, y), velocity v _x , acceleration a _x converted into longitudinal displacement s under the Frenet coordinate system, longitudinal velocity

longitudinal acceleration

and lateral displacement 1, lateral velocity

and lateral acceleration

进一步的，所述步骤S3具体为：将状态空间定义为基于鸟瞰状态的Frenet坐标下的无人车与周围其他车辆的运动状态：Further, the step S3 is specifically: defining the state space as the motion state of the unmanned vehicle and other surrounding vehicles under the Frenet coordinates based on the bird's-eye view state:

式中：s_ego-为无人车自身状态，N₀-为周围车辆数目。In the formula: s _ego - is the state of the unmanned vehicle, N ₀ - is the number of surrounding vehicles.

无人车自身状态与其他车辆状态采用一个六元组描述，主要特征为车辆的纵向距离、纵向速度、纵向加速度、横向距离、横向速度、横向加速度：The state of the unmanned vehicle and the state of other vehicles are described by a six-tuple, and the main features are the longitudinal distance, longitudinal speed, longitudinal acceleration, lateral distance, lateral speed, and lateral acceleration of the vehicle:

将输入的决策动作分为速度保持、加速、减速、向左变道、向右变道、跟车行驶六种，用一个六元组表示为：The input decision-making actions are divided into six types: speed maintaining, acceleration, deceleration, lane change to the left, lane change to the right, and following the car, which are represented by a six-tuple as:

A＝[KS，ACC，DEC，CL，CR，FOL]^T A = [KS, ACC, DEC, CL, CR, FOL] ^T

根据行车安全、舒适度、车头时距、以及行车速度建立价值回报函数，即：The value return function is established according to driving safety, comfort, headway, and driving speed, namely:

R(s，a)＝R_s(s，a)+R_t(s，a)+R_c(s，a)+R_v(s，a)R(s,a)=Rs( _s ,a)+ _Rt (s,a)+ _Rc (s,a)+ _Rv (s,a)

车头时距回报值公式如下：The formula for the headway return value is as follows:

式中，RE为车头时距指标，本发明中RE＝30。In the formula, RE is the headway index, and RE=30 in the present invention.

舒适度的回报值公式为：The formula for the return value of comfort is:

行车速度的回报值定义为：The reward value of driving speed is defined as:

式中，v_max为当前车道限制的最大速度。In the formula, _vmax is the maximum speed limited by the current lane.

进一步的，所述步骤S4具体为：Further, the step S4 is specifically:

DQN模型的输入层的信息为决策动作、车辆自身状态以及其他车辆状态，输出层为在该状态下采取的各个决策动作对应的Q值，使用深度神经网络来替代价值函数，当已知价值回报函数Q^*(s，a)，用深度神经网络对Q函数进行拟合，则有：The information of the input layer of the DQN model is the decision-making action, the state of the vehicle itself and other vehicle states, and the output layer is the Q value corresponding to each decision-making action taken in this state. A deep neural network is used to replace the value function. When the value return is known The function Q ^* (s, a), using a deep neural network to fit the Q function, there are:

Q(s，α)≈Q(s，α；θ)Q(s, α) ≈ Q(s, α; θ)

通过求解贝尔曼方程得到的Q函数估计与深度神经网络预测的Q值之间存在差异，这个差异作为损失函数来更新神经网络；There is a difference between the Q function estimate obtained by solving the Bellman equation and the Q value predicted by the deep neural network, and this difference is used as a loss function to update the neural network;

引入损失函数最小化贝尔曼方程对Q值估计和神经网络对Q值估计的差，即：The loss function is introduced to minimize the difference between the Q value estimate by the Bellman equation and the Q value estimate by the neural network, namely:

L_i(θ_i)＝E[(y_i-Q(s，a，θ))²] _Li (θ _i )=E[(y _i -Q(s, a, θ)) ² ]

式中，y_i为贝尔曼方程所求出的Q值，y_i＝r+γmaxQ(s′，a′，θ)；i为迭代次数；In the formula, y _i is the Q value obtained by the Bellman equation, y _i =r+γmaxQ(s', a', θ); i is the number of iterations;

对上式采用梯度下降法进行训练，公式为：The above formula is trained by gradient descent, and the formula is:

进一步的，所述深度神经网络由估值神经网络和目标神经网络两个具有相同结构的神经网络构成，估值神经网络参与训练，不断更新参数提高其预测能力，而目标神经网络在估值神经网络训练一定次数之后保存估值神经网络的参数。Further, the deep neural network is composed of two neural networks with the same structure, the evaluation neural network and the target neural network. The evaluation neural network participates in training, and the parameters are continuously updated to improve its prediction ability, while the target neural network is in the evaluation neural network. After the network is trained for a certain number of times, the parameters of the estimated neural network are saved.

进一步的，所述步骤S6具体为：Further, the step S6 is specifically:

根据变道决策模块的决策动作，确定无人驾驶汽车的目标位置信息，当决策动作为速度保持、加速、减速时，横向的目标配置为

其中l_t为当前车道中心线的横向距离l，T_j为到达该目标位置的时间；纵向的目标配置为

其中

为想要保持或达到的目标车速，

是允许的车速浮动；According to the decision action of the lane change decision module, the target position information of the driverless car is determined. When the decision action is speed maintenance, acceleration and deceleration, the lateral target configuration is

where l _t is the lateral distance l of the center line of the current lane, and T _j is the time to reach the target position; the vertical target configuration is

in

For the target speed you want to maintain or achieve,

is the allowable speed fluctuation;

当决策动作为左变道、向右变道时，横向的目标配置为

其中l_t为左侧或右侧车道中心线的横向距离l；如果无人驾驶汽车前方没有障碍车辆，则纵向的目标配置为

如果无人驾驶汽车前方有障碍车辆，则纵向的目标配置为

其中s_ob、

分别是前面障碍车辆的纵向距离、纵向速度、纵向加速度；When the decision-making action is to change lanes to the left or change to the right, the lateral target configuration is

where l _t is the lateral distance l of the left or right lane centerline; if there is no obstacle vehicle in front of the driverless car, the longitudinal target configuration is

If there is an obstacle vehicle in front of the driverless car, the longitudinal target configuration is

where s _ob ,

are the longitudinal distance, longitudinal velocity, and longitudinal acceleration of the preceding obstacle vehicle, respectively;

当决策动作为跟车行驶时，横向的目标配置为

其中l_t为当前车道中心线的横向距离l；纵向的目标配置为

其中

s_lv、

分别是跟随车辆的纵向距离、纵向速度、纵向加速度。When the decision action is to follow the vehicle, the lateral target configuration is

where l _t is the lateral distance l of the current lane centerline; the longitudinal target configuration is

in

_slv ,

They are the longitudinal distance, longitudinal velocity, and longitudinal acceleration of the following vehicle, respectively.

进一步的，所述步骤S8具体为：Further, the step S8 is specifically:

步骤S81：损失函数包括如下：Step S81: the loss function includes the following:

横向损失函数定义为：The lateral loss function is defined as:

C_l＝k_jJ_t(l(t))+k_tT+k_ll₁ ² C _l =k _j J _t (l(t))+k _t T+k _l l ₁ ²

其中k_j、k_t、k_l为权重值，k_jJ_t(l(t))是惩罚Jerk较大的轨迹；k_tT是惩罚制动时间，T越大制动时间越长；k_ll₁ ²是惩罚偏离道路中心线；Where k _j , k _t , k _l are the weight values, k _j J _t (l(t)) is the trajectory with a larger penalty Jerk; k _t T is the penalty braking time, the larger the T, the longer the braking time; k _l l ₁ ² is the penalty for deviating from the road centerline;

当纵向目标配置为

纵向损失函数定义为：When the portrait target is configured as

The longitudinal loss function is defined as:

C_s＝k_jJ_t(s(t))+k_tT+k_s[s₁-s_d]² C _s =k _j J _t (s(t))+k _t T+k _s [s ₁ -s _d ] ²

其中k_j、k_t、k_s为权重值，k_s[s₁-s_d]²是惩罚目标配置中的纵向距离s₁与目标位置s_d的偏差；where k _j , k _t , and k _s are weight values, and k _s [s ₁ -s _d ] ² is the deviation between the vertical distance s ₁ and the target position s _d in the penalty target configuration;

当纵向目标配置为

纵向损失函数定义为：When the portrait target is configured as

The longitudinal loss function is defined as:

其中k_j、k_t、k_s为权重值，

是惩罚目标配置中的纵向速度

与目标速度

的偏差；where k _j , _k _t , and ks are weight values,

is the longitudinal velocity in the penalty target configuration

with the target speed

deviation;

轨迹总的损失函数为：The total loss function of the trajectory is:

C_al＝k_laC_l+k_loC_s C _al =k _la C _l +k _lo C _s

其中k_la、k_lo为权重值；where k _la and k _lo are weight values;

步骤S82：对所有的轨迹计算完损失函数的值之后，去除s方向上超过最大速度或最大加速度的轨迹，去除超过最大曲率的轨迹，去除会发生碰撞的轨迹，从剩下的轨迹中选择损失函数值最小的轨迹作为最优轨迹，发送给轨迹追踪模块去追踪。Step S82: After calculating the value of the loss function for all trajectories, remove the trajectories that exceed the maximum speed or the maximum acceleration in the s direction, remove the trajectories that exceed the maximum curvature, remove the trajectories that may collide, and select the loss from the remaining trajectories. The trajectory with the smallest function value is taken as the optimal trajectory and sent to the trajectory tracking module for tracking.

本发明与现有技术相比具有以下有益效果：Compared with the prior art, the present invention has the following beneficial effects:

本发明使用深度强化学习做决策，基于决策动作动态的规划轨迹，感知及控制等模块被独立出来处理，相对于端到端的方法大大提高了决策规划过程的可解释性和可操作性，能很好地适配于以往的无人驾驶汽车的系统架构。The invention uses deep reinforcement learning to make decisions, based on the dynamic planning trajectory of the decision-making action, and the perception and control modules are processed independently. Compared with the end-to-end method, the interpretability and operability of the decision-making planning process are greatly improved, and the It is well adapted to the system architecture of previous driverless cars.

附图说明Description of drawings

图1是本发明系统总体框架结构；Fig. 1 is the overall frame structure of the system of the present invention;

图2是本发明一实施例中笛卡尔坐标系与Frenet坐标系的转换示意图；2 is a schematic diagram of the conversion between a Cartesian coordinate system and a Frenet coordinate system in an embodiment of the present invention;

图3是本发明一实施例中深度强化学习(DQN)结构图；3 is a structural diagram of deep reinforcement learning (DQN) in an embodiment of the present invention;

图4是本发明一实施例中向左变道过程中车辆场景图。FIG. 4 is a scene diagram of a vehicle in the process of changing lanes to the left according to an embodiment of the present invention.

具体实施方式Detailed ways

下面结合附图及实施例对本发明做进一步说明。The present invention will be further described below with reference to the accompanying drawings and embodiments.

请参照图1，本发明提供一种针对结构化道路的无人驾驶汽车运动轨迹规划系统，包括感知模块、定位模块、变道决策模块、运动规划模块和轨迹追踪模块；所述变道决策模块根据感知模块和定位模块采集的数据输出决策动作；所述运动规划模块根据决策动作，输出最优轨迹至轨迹追踪模块。所述感知模块包括激光雷达、毫米波雷达和运动摄像机。所述定位模块包括GPS卫星定位系统、惯性导航仪和网络差分模块。Please refer to FIG. 1 , the present invention provides an unmanned vehicle motion trajectory planning system for structured roads, including a perception module, a positioning module, a lane change decision module, a motion planning module and a trajectory tracking module; the lane change decision module The decision action is output according to the data collected by the sensing module and the positioning module; the motion planning module outputs the optimal trajectory to the trajectory tracking module according to the decision action. The perception module includes lidar, millimeter-wave radar, and motion cameras. The positioning module includes a GPS satellite positioning system, an inertial navigator and a network differential module.

在本实施例中，变道决策模块主要包括如下内容：In this embodiment, the lane change decision module mainly includes the following contents:

(1)对接收到感知层和定位导航系统的信息进行坐标转化，使用车辆在Frenet坐标系下的信息，需要把车辆在全局坐标系(笛卡尔坐标系)下的位置(x，y)、速度v_x、加速度a_x转换成Frenet坐标系下的纵向位移s、纵向速度

纵向加速度

和横向位移l、横向速度

和横向加速度

(1) Perform coordinate transformation on the information received from the perception layer and the positioning and navigation system. Using the information of the vehicle in the Frenet coordinate system, it is necessary to convert the position (x, y) of the vehicle in the global coordinate system (Cartesian coordinate system), Velocity v _x and acceleration a _x are converted into longitudinal displacement s and longitudinal velocity in the Frenet coordinate system

longitudinal acceleration

and lateral displacement l, lateral velocity

and lateral acceleration

(2)确定状态空间、动作空间和动作价值回报函数。深度神经网络的输入是环境和状态，输出是动作，所以需要确定状态空间、动作空间和动作价值回报函数。将状态空间定义为基于鸟瞰状态的Frenet坐标下的无人车与周围其他车辆的运动状态：(2) Determine the state space, action space and action value reward function. The input of the deep neural network is the environment and state, and the output is the action, so it is necessary to determine the state space, action space and action value reward function. The state space is defined as the motion state of the unmanned vehicle and other surrounding vehicles in the Frenet coordinates based on the bird's-eye view state:

本实施例将输入的决策动作分为速度保持、加速、减速、向左变道、向右变道、跟车行驶六种，用一个六元组表示为：In this embodiment, the input decision-making actions are divided into six types: speed maintenance, acceleration, deceleration, lane change to the left, lane change to the right, and following the car, which are represented by a six-tuple as:

A＝[KS，ACC，DEC，CL，CR，FOL]^T A = [KS, ACC, DEC, CL, CR, FOL] ^T

动作价值回报函数定量的反应了无人驾驶汽车完成任务的程度，本发明主要从行车安全、舒适度、车头时距、以及行车速度这几个方面建立价值回报函数，即：The action value return function quantitatively reflects the degree to which the unmanned vehicle completes the task. The present invention mainly establishes the value return function from the aspects of driving safety, comfort, headway, and driving speed, namely:

行车安全是无人驾驶汽车的基本要求和首要任务，当无人驾驶汽车与其他车辆发生碰撞时，给无人驾驶汽车一个大的负回报值R_s＝-1000；如果无人驾驶汽车不发生碰撞到达目标位置，则给一个大的正回报值R_s＝800。Driving safety is the basic requirement and primary task of driverless cars. When the driverless car collides with other vehicles, a large negative reward value R _s = -1000 is given to the driverless car; if the driverless car does not happen When the collision reaches the target position, a large positive reward value R _s =800 is given.

车头时距是衡量车辆跟随状态下安全性和道路通车效率的重要指标，定义为前后两车的车头通过同一地点的时间间隔。有实验数据指出，在车头时距大于2s时，车辆发生追尾事故的概率较低；当车头时距大于2s时，车辆发生追尾的概率与车头时距成反比。定义车头时距回报值公式如下：The headway is an important indicator to measure the safety and road traffic efficiency when the vehicle is following. According to experimental data, when the headway is greater than 2s, the probability of a rear-end collision is low; when the headway is greater than 2s, the probability of a rear-end collision is inversely proportional to the headway. The formula for defining the headway return value is as follows:

无人驾驶汽车的决策也要考虑乘坐的舒适度，应尽量避免决策动作的频发切换，同时减少剧烈的加速动作。本发明对舒适度的回报值定义为：The decision-making of the driverless car should also consider the comfort of the ride, and the frequent switching of decision-making actions should be avoided as much as possible, while reducing violent acceleration actions. The reward value of the present invention for comfort is defined as:

行车速度决定了无人驾驶汽车的行车效率，在交通规则允许的情况下，应提高行车效率。本发明对行车速度的回报值定义为：The driving speed determines the driving efficiency of the driverless car, and the driving efficiency should be improved when the traffic rules permit. The reward value of the present invention to the driving speed is defined as:

(3)搭建深度强化学习(DQN)模型。DQN模型的输入层的信息为决策动作、车辆自身状态以及其他车辆状态，输出层为在该状态下采取的各个决策动作对应的Q值。在DQN模型中，使用深度神经网络来近似替代价值函数，当已知价值回报函数Q^*(s，a)，用深度神经网络对Q函数进行拟合，则有：(3) Build a deep reinforcement learning (DQN) model. The information of the input layer of the DQN model is the decision-making action, the state of the vehicle itself and other vehicle states, and the output layer is the Q value corresponding to each decision-making action taken in this state. In the DQN model, a deep neural network is used to approximate the replacement value function. When the value return function Q ^* (s, a) is known, and the Q function is fitted by a deep neural network, there are:

Q(s，α)≈Q(s，α；θ)Q(s, α) ≈ Q(s, α; θ)

通过求解贝尔曼方程得到的Q函数估计与深度神经网络预测的Q值之间存在差异，这个差异可以作为损失函数来更新神经网络。因此，引入损失函数最小化贝尔曼方程对Q值估计和神经网络对Q值估计的差，即：There is a difference between the Q-function estimate obtained by solving the Bellman equation and the Q-value predicted by the deep neural network, and this difference can be used as a loss function to update the neural network. Therefore, a loss function is introduced to minimize the difference between the Q-value estimate by the Bellman equation and the Q-value estimate by the neural network, namely:

L_i(θ_i)＝E[(y_i-Q(s，a，θ))²] _Li (θ _i )=E[(y _i -Q(s, a, θ)) ² ]

式中，y_i为贝尔曼方程所求出的Q值，y_i＝r+γmaxQ(s＇，a＇，θ)；i为迭代次数：In the formula, y _i is the Q value calculated by the Bellman equation, y _i =r+γmaxQ(s', a', θ); i is the number of iterations:

深度强化学习模型由估值神经网络和目标神经网络两个具有相同结构的神经网络构成。只有估值神经网络参与训练，不断更新参数提高其预测能力，而目标神经网络在估值神经网络训练一定次数之后保存估值神经网络的参数。两个神经网络在预测Q值时会有差异，通过这个差异来更新Q学习中的策略。The deep reinforcement learning model is composed of two neural networks with the same structure, the evaluation neural network and the target neural network. Only the evaluation neural network participates in the training, and the parameters are continuously updated to improve its prediction ability, while the target neural network saves the parameters of the evaluation neural network after the evaluation neural network is trained for a certain number of times. There is a difference between the two neural networks in predicting the Q value, and this difference is used to update the policy in Q-learning.

运动规划模块主要包括如下内容：The motion planning module mainly includes the following contents:

(1)根据变道决策模块的决策动作，确定无人驾驶汽车的目标位置信息。当决策动作为速度保持、加速、减速时，横向的目标配置为

其中

为想要保持或达到的目标车速，

是允许的车速浮动。(1) Determine the target location information of the driverless vehicle according to the decision-making action of the lane-change decision-making module. When the decision action is speed hold, acceleration, deceleration, the lateral target configuration is

in

For the target speed you want to maintain or achieve,

is the allowable speed fluctuation.

当决策动作为左变道、向右变道时，横向的目标配置为

如果无人驾驶汽车前方有障碍车辆，则纵向的目标配置为

其中s_ob、

分别是前面障碍车辆的纵向距离、纵向速度、纵向加速度。When the decision-making action is to change lanes to the left or change to the right, the lateral target configuration is

where s _ob ,

are the longitudinal distance, longitudinal velocity, and longitudinal acceleration of the preceding obstacle vehicle, respectively.

当决策动作为跟车行驶时，横向的目标配置为

其中l_t为当前车道中心线的横向距离l；纵向的目标配置为

其中

s_lv、

in

_slv ,

(2)确定当前无人驾驶汽车在Frenet坐标系下的信息。实时的接受定位导航系统的信息，把定位导航系统计算的无人驾驶汽车在笛卡尔坐标系下的位置(x，y)、速度v_x、加速度a_x转换成Frenet坐标系下的纵向位移s、纵向速度

纵向加速度

和横向位移l、横向速度

和横向加速度

(2) Determine the information of the current driverless car in the Frenet coordinate system. Accept the information of the positioning and navigation system in real time, and convert the position (x, y), velocity v _x and acceleration a _x of the driverless car in the Cartesian coordinate system calculated by the positioning and navigation system into the longitudinal displacement s in the Frenet coordinate system , longitudinal speed

longitudinal acceleration

and lateral displacement l, lateral velocity

and lateral acceleration

(3)在一定时间间隔内，规划出一个从车辆当前位置到目标位置关于时间的轨迹集。由于引入Frenet坐标系，将轨迹规划问题解耦合为纵向s和横向l两个方向的优化问题。车辆的加加速度Jerk与乘客的舒适度有关，加加速度J_t关于配置p在时间端t₀～t₁内累计的Jerk为：

引入一个控制动作的执行时间：T＝t_end-t_start，通过改变执行时间形成一个有限轨迹集。Takahashi等人已经证明：可以使用一个五次多项式来表示Jerk最优化问题在的解，形如：p(t)＝α₀+α₁t+α₂t²+α₃t³+α₄t⁴+α₅t⁵。设初始配置为

目标配置为

可求解得到对应的l方向关于时间t的五次多项式(轨迹)，使用多个不同的T_j可以得到一个l方向上关于时间t的轨迹集。同理可求得s方向上关于时间t的轨迹集。(3) In a certain time interval, plan a trajectory set from the current position of the vehicle to the target position with respect to time. Due to the introduction of the Frenet coordinate system, the trajectory planning problem is decoupled into two optimization problems in the longitudinal s and lateral l directions. The jerk Jerk of the vehicle is related to the comfort of the passengers. The Jerk accumulated by the jerk J _t in the time end t ₀ to t ₁ with respect to the configuration p is:

The execution time of a control action is introduced: T=t _end -t _start , and a finite trajectory set is formed by changing the execution time. Takahashi et al. have shown that the solution to the Jerk optimization problem can be represented by a quintic polynomial, in the form: p(t)=α ₀ +α ₁ t+α ₂ t ² +α ₃ t ³ +α ₄ t ⁴ +α ₅ t ⁵ . Set the initial configuration to

The target is configured as

The quintic polynomial (trajectory) of the corresponding l direction with respect to time t can be obtained by solving, and a set of trajectories in the l direction with respect to time t can be obtained by using multiple different T _j . Similarly, the trajectory set in the s direction with respect to time t can be obtained.

(4)建立损失函数，从生成的轨迹集中选择一条最优轨迹，作为控制模块的追踪轨迹。横向损失函数定义为：(4) Establish a loss function, and select an optimal trajectory from the generated trajectory set as the tracking trajectory of the control module. The lateral loss function is defined as:

C_l＝k_jJ_t(l(t))+k_tT+k_ll₁ ² C _l =k _j J _t (l(t))+k _t T+k _l l ₁ ²

其中k_j、k_t、k_l为权重值，k_jJ_t(l(t))是惩罚Jerk较大的轨迹；k_tT是惩罚制动时间，T越大制动时间越长；k_ll₁ ²是惩罚偏离道路中心线。Where k _j , k _t , k _l are the weight values, k _j J _t (l(t)) is the trajectory with a larger penalty Jerk; k _t T is the penalty braking time, the larger the T, the longer the braking time; k _l l ₁ ² is the penalty for deviating from the road centerline.

当纵向目标配置为

纵向损失函数定义为：When the portrait target is configured as

The longitudinal loss function is defined as:

其中k_j、k_t、k_s为权重值，k_s[s₁-s_d]²是惩罚目标配置中的纵向距离s₁与目标位置s_d的偏差。where k _j , k _t , and k _s are weight values, and k _s [s ₁ -s _d ] ² is the deviation between the longitudinal distance s ₁ and the target position s _d in the penalized target configuration.

当纵向目标配置为

纵向损失函数定义为：When the portrait target is configured as

The longitudinal loss function is defined as:

其中k_j、k_t、k_s为权重值，

是惩罚目标配置中的纵向速度

与目标速度

的偏差。where k _j , _k _t , and ks are weight values,

is the longitudinal velocity in the penalty target configuration

with the target speed

deviation.

轨迹总的损失函数为：The total loss function of the trajectory is:

C_al＝k_laC_l+k_loC_s C _al =k _la C _l +k _lo C _s

其中k_la、k_lo为权重值。where k _la and k _lo are weight values.

在本实施例中，如图2所示，无人驾驶汽车当前时刻的坐标为：In this embodiment, as shown in Figure 2, the coordinates of the current moment of the driverless car are:

式中，

为无人驾驶车辆在全局坐标系下的位置向量；s(t)为从参考路径的起点到点r的曲线距离；l(t)为车辆位置离参考路径上最近点r的法向距离。In the formula,

is the position vector of the unmanned vehicle in the global coordinate system; s(t) is the curve distance from the starting point of the reference path to point r; l(t) is the normal distance between the vehicle position and the closest point r on the reference path.

由图中的几何关系得：From the geometric relationship in the figure:

式中，

为参考路径上点R处的法向单位向量；

为点R在全局坐标系下的位置向量。In the formula,

is the normal unit vector at point R on the reference path;

is the position vector of point R in the global coordinate system.

用θ_x表示车辆当前位置在全局坐标系下的朝向角；用θ_r表示参考路径上点r在全局坐标系下的朝向角；k_r为参考点处的曲率；v_x和a_x分别为车辆当前时刻的速度和加速度。在参考路径上离车辆最近点r处的切向量和法向量为：Use θ _x to represent the heading angle of the current position of the vehicle in the global coordinate system; use θ _r to represent the heading angle of point r on the reference path in the global coordinate system; k _r is the curvature at the reference point; v _x and a _x are respectively The speed and acceleration of the vehicle at the current moment. The tangent and normal vectors at the closest point r to the vehicle on the reference path are:

由(1)、(2)变换得：Transform from (1) and (2) to get:

(5)式两边同时乘以

可得：(5) Multiply both sides by the

Available:

参考路径上的法向量

关于时间的一阶微分为切向量

乘以-k_r，normal vector on the reference path

The first derivative with respect to time is a tangent vector

Multiply by -k _r ,

由于

且

的方向与

的方向垂直，因此相乘也为0，故由(7)式可得：because

and

direction with

The direction of is vertical, so the multiplication is also 0, so it can be obtained from formula (7):

v_x为

的模，即：v _x is

, that is:

无人驾驶汽车正常行驶时，满足下述两个条件：When a driverless car drives normally, the following two conditions are met:

l＜1/k_r (11)l＜1/k _r (11)

可推得：It can be inferred:

计算

和

可使用相邻两帧之间纵向和横向的速度变化率来表征，即：calculate

and

It can be characterized by the vertical and horizontal velocity change rates between two adjacent frames, namely:

式中，f表示计算的频率，一般大于10Hz。In the formula, f represents the calculated frequency, generally greater than 10Hz.

通过以上有关公式，可以完成笛卡尔坐标系与Frenet坐标系的相互转换。Through the above formulas, the mutual conversion between the Cartesian coordinate system and the Frenet coordinate system can be completed.

在本实施例中，DQN模型的输入层的信息为决策动作、车辆自身状态以及其他车辆状态，输出层为在该状态下采取的各个决策动作对应的Q值。In this embodiment, the information of the input layer of the DQN model is the decision action, the state of the vehicle itself and other vehicle states, and the output layer is the Q value corresponding to each decision action taken in this state.

在本实施例中，如图4所示，向左变道过程中车辆场景图。当前运动规划模块接受到的决策动作为向左变道，横向的目标配置为

其中l_t为左侧或右侧车道中心线的横向距离l；纵向的目标配置为

其中s_ob、

分别是前面障碍车辆的纵向距离、纵向速度、纵向加速度。由初始配置和目标配置可得方程组：In this embodiment, as shown in FIG. 4 , a scene graph of the vehicle in the process of changing lanes to the left. The decision action received by the current motion planning module is to change lanes to the left, and the lateral target configuration is

where l _t is the lateral distance l of the left or right lane centerline; the longitudinal target configuration is

where s _ob ,

are the longitudinal distance, longitudinal velocity, and longitudinal acceleration of the preceding obstacle vehicle, respectively. The system of equations can be obtained from the initial configuration and the target configuration:

令t₀＝0可直接求得α_l0、α_l1、α_l2为：Let t ₀ =0, α _l0 , α _l1 , α _l2 can be directly obtained as:

令T＝t₁-t₂，系数α_l3、α_l4、α_l5可通过求解如下矩阵方程得到：Let T=t ₁ -t ₂ , the coefficients α _l3 , α _l4 , α _l5 can be obtained by solving the following matrix equations:

通过选用不同的T值，得到一个l方向上的轨迹集；同理可求得s方向上的轨迹集。By choosing different T values, a trajectory set in the l direction can be obtained; in the same way, the trajectory set in the s direction can be obtained.

使用如下损失函数计算l方向上轨迹的损失函数值：Calculate the loss function value of the trajectory in the l direction using the following loss function:

C_l＝k_jJ_t(l(t))+k_tT+k_ll₁ ² (18)C _l =k _j J _t (l(t))+k _t T+k _l l ₁ ² (18)

使用如下损失函数计算s方向上轨迹的损失函数值：Calculate the loss function value of the trajectory in the s direction using the following loss function:

C_s＝k_jJ_t(s(t))+k_tT+k_s[s₁-s_d]² (19)C _s =k _j J _t (s(t))+k _t T+k _s [s ₁ -s _d ] ² (19)

轨迹总的损失函数为：The total loss function of the trajectory is:

C_al＝k_laC_l+k_loC_s (20)C _al = k _la C _l +k _lo C _s (20)

对所有的轨迹计算总的损失函数值之后，去除s方向上超过最大速度或最大加速度的轨迹，去除超过最大曲率的轨迹，去除会发生碰撞的轨迹，从剩下的轨迹中选择损失函数值最小的轨迹作为最优轨迹，发送给轨迹追踪模块去追踪。After calculating the total loss function value for all trajectories, remove the trajectories that exceed the maximum velocity or the maximum acceleration in the s direction, remove the trajectories that exceed the maximum curvature, remove the trajectories that will collide, and select the smallest loss function value from the remaining trajectories. The trajectory is taken as the optimal trajectory and sent to the trajectory tracking module for tracking.

以上所述仅为本发明的较佳实施例，凡依本发明申请专利范围所做的均等变化与修饰，皆应属本发明的涵盖范围。The above descriptions are only preferred embodiments of the present invention, and all equivalent changes and modifications made according to the scope of the patent application of the present invention shall fall within the scope of the present invention.

Claims

1. A system for planning the motion trail of an unmanned vehicle for a structured road is characterized by comprising a sensing module, a positioning module, a lane change decision module, a motion planning module and a trail tracking module; the lane change decision module outputs decision actions according to data collected by the sensing module and the positioning module; and the motion planning module outputs an optimal track to the track tracking module according to the decision action.

2. The unmanned automotive motion trajectory planning system for structured roads of claim 1, wherein: the sensing module comprises a laser radar, a millimeter wave radar and a motion camera.

3. The unmanned automotive motion trajectory planning system for structured roads of claim 1, wherein: the positioning module comprises a GPS satellite positioning system, an inertial navigator and a network difference module.

4. A method for planning the motion trail of an unmanned vehicle for a structured road is characterized by comprising the following steps:

step S1, collecting real-time driving data;

step S2, coordinate transformation is carried out on the collected driving data, and data of the vehicle in a Frenet coordinate system are transformed into data of the vehicle in a global coordinate system;

step S3, determining a state space, an action space and an action value return function;

step S4, constructing a deep reinforcement learning model;

step S5, taking the surrounding environment information, decision-making action and the vehicle state as input, calculating through a deep reinforcement learning model, outputting Q values of different actions executed by the unmanned vehicle, and selecting corresponding decision-making action according to the output maximum Q value;

step S6: calculating target configuration under Frenet coordinates according to decision action given by the decision module;

step S7: according to the initial configuration and the target configuration of the current position of the unmanned vehicle, a track set containing a timestamp is planned within a preset time interval;

step S8: and establishing a loss function, and selecting an optimal track from the generated track set as a tracking track.

5. The unmanned aerial vehicle motion trajectory planning method for the structured road according to claim 4, wherein the data under the global coordinate system comprises: position (x, y), velocity v_xAcceleration a_xConverting into longitudinal displacement s and longitudinal speed in Frenet coordinate system

Longitudinal acceleration

And transverse displacement l, transverse velocity

And lateral acceleration

6. The method for planning the unmanned vehicle motion trajectory for the structured road according to claim 4, wherein the step S3 specifically comprises: the state space is defined as the motion state of the unmanned vehicle and other vehicles around the unmanned vehicle in Frenet coordinates based on the bird's eye view state:

in the formula: s_egoUnmanned vehicle self-state, N₀-the number of surrounding vehicles.

The state of the unmanned vehicle and the states of other vehicles are described by a six-element group, and the unmanned vehicle is mainly characterized by comprising the following components in parts by weight:

dividing the input decision-making actions into six types of speed keeping, acceleration, deceleration, lane changing to the left, lane changing to the right and following vehicle driving, and expressing the six types of speed keeping, acceleration, deceleration, lane changing to the left, lane changing to the right and following vehicle driving by using a six-tuple:

A＝[KS，ACC，DEC，CL，CR，FOL]^T

establishing a value return function according to driving safety, comfort level, headway and driving speed, namely:

R(s，a)＝R_s(s，a)+R_t(s，a)+R_c(s，a)+R_v(s，a)

the formula of the head time interval return value is as follows:

in the formula, RE is a headway indicator, and in the present invention, RE is 30.

The return value of the comfort degree is expressed as:

the return value of the driving speed is defined as:

in the formula, v_maxThe maximum speed of the current lane constraint.

7. The method for planning the unmanned vehicle motion trajectory for the structured road according to claim 4, wherein the step S4 specifically comprises:

the information of an input layer of the DQN model is decision-making action, the state of the vehicle and other vehicle states, an output layer is a Q value corresponding to each decision-making action taken under the state, a deep neural network is used for replacing a cost function, and when the value return function Q is known^*(s, a), fitting the Q function with a deep neural network, then:

Q(s，α)≈Q(s，α；θ)

the Q function estimation obtained by solving the Bellman equation has difference with the Q value predicted by the deep neural network, and the difference is used as a loss function to update the neural network;

introducing a loss function to minimize the difference between the bellman equation to the Q value estimate and the neural network to the Q value estimate, namely:

L_i(θ_i)＝E[(y_i-Q(s，a，θ))²]

in the formula, y_iQ value, y, determined for the Bellman equation_iR + γ maxQ (s ', a', θ); i is the number of iterations;

training the formula by adopting a gradient descent method, wherein the formula is as follows:

8. the method of claim 7, wherein the deep neural network is composed of two neural networks having the same structure, the estimated neural network participates in training and continuously updates parameters to improve the prediction capability, and the target neural network stores the parameters of the estimated neural network after the estimated neural network is trained for a certain number of times.

9. The method for planning the unmanned vehicle motion trajectory for the structured road according to claim 4, wherein the step S6 specifically comprises:

determining target position information of the unmanned automobile according to the decision-making action of the lane-changing decision-making module, and when the decision-making action is speed keeping, acceleration and deceleration, configuring the transverse target into a mode of being capable of keeping, accelerating and decelerating

Wherein l_tIs the transverse distance l, T of the center line of the current lane_jIs the time to reach the target location; longitudinal target configuration

Wherein

To maintain or achieve the target vehicle speed,

is allowed vehicle speed float;

when deciding the driving as left lane change and right lane change, the target configuration of the lateral direction is

Wherein l_tThe lateral distance l is the center line of the left or right lane; if there is no obstacle vehicle in front of the unmanned vehicle, the longitudinal target is configured to

If there is obstacle in front of the unmanned automobileVehicle, longitudinal target configuration

Wherein s is_ob、

Longitudinal distance, longitudinal speed, longitudinal acceleration of the preceding obstacle vehicle, respectively;

when the decision-making action is following the car, the transverse target configuration is

Wherein l_tThe transverse distance l is the central line of the current lane; longitudinal target configuration

Wherein

s_lv、

Respectively the longitudinal distance, the longitudinal speed, the longitudinal acceleration of the following vehicle.

10. The method for planning the unmanned vehicle motion trajectory for the structured road according to claim 4, wherein the step S8 specifically comprises:

step S81: the loss function includes the following:

the lateral loss function is defined as:

C_l＝k_jJ_t(l(t))+k_tT+k_ll₁ ²

wherein k is_j、k_t、k_lIs a weight value, k_jJ_t(l (t)) penalizes trajectories with larger Jerk; k is a radical of_tT is punished braking time, and the larger T is, the longer the braking time is; k is a radical of_ll₁ ²Punishment of deviation from the center line of the road;

when the longitudinal target is configured

The longitudinal loss function is defined as:

C_s＝k_jJ_t(s(t))+k_tT+k_s[s₁-s_d]²

wherein k is_j、k_t、k_sIs a weight value, k_s[s₁-s_d]²Is the longitudinal distance s in the penalty target configuration₁And target position s_dA deviation of (a);

when the longitudinal target is configured

The longitudinal loss function is defined as:

wherein k is_j、k_t、k_sIs a weight value of the weight value,

is penalizing longitudinal velocity in target configuration

With target speed

A deviation of (a);

the total loss function of the trace is:

C_al＝k_laC_l+k_loC_s

wherein k is_la、k_loIs a weighted value;

step S82: after the values of the loss functions are calculated for all the tracks, the tracks exceeding the maximum speed or the maximum acceleration in the s direction are removed, the tracks exceeding the maximum curvature are removed, the tracks which can collide are removed, the track with the minimum loss function value is selected from the rest tracks to serve as the optimal track, and the optimal track is sent to a track tracking module to be tracked.