[go: up one dir, main page]

CN111273668A - Unmanned vehicle motion track planning system and method for structured road - Google Patents

Unmanned vehicle motion track planning system and method for structured road Download PDF

Info

Publication number
CN111273668A
CN111273668A CN202010099122.6A CN202010099122A CN111273668A CN 111273668 A CN111273668 A CN 111273668A CN 202010099122 A CN202010099122 A CN 202010099122A CN 111273668 A CN111273668 A CN 111273668A
Authority
CN
China
Prior art keywords
decision
module
longitudinal
vehicle
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010099122.6A
Other languages
Chinese (zh)
Other versions
CN111273668B (en
Inventor
彭育辉
张垚
范贤波
钟聪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fuzhou University
Original Assignee
Fuzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fuzhou University filed Critical Fuzhou University
Priority to CN202010099122.6A priority Critical patent/CN111273668B/en
Publication of CN111273668A publication Critical patent/CN111273668A/en
Application granted granted Critical
Publication of CN111273668B publication Critical patent/CN111273668B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0231Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means
    • G05D1/0238Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means using obstacle or wall sensors
    • G05D1/024Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means using obstacle or wall sensors in combination with a laser
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0212Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
    • G05D1/0214Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory in accordance with safety or protection criteria, e.g. avoiding hazardous areas
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0212Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
    • G05D1/0221Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory involving a learning process
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0212Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
    • G05D1/0223Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory involving speed control of the vehicle
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0212Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
    • G05D1/0225Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory involving docking at a fixed facility, e.g. base station or loading bay
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0231Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means
    • G05D1/0246Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means using a video camera in combination with image processing means
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0257Control of position or course in two dimensions specially adapted to land vehicles using a radar
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0276Control of position or course in two dimensions specially adapted to land vehicles using signals provided by a source external to the vehicle
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0276Control of position or course in two dimensions specially adapted to land vehicles using signals provided by a source external to the vehicle
    • G05D1/0278Control of position or course in two dimensions specially adapted to land vehicles using signals provided by a source external to the vehicle using satellite positioning signals, e.g. GPS

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Electromagnetism (AREA)
  • Optics & Photonics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)

Abstract

The invention relates to a motion trail planning system of an unmanned vehicle for a structured road, which comprises a sensing module, a positioning module, a lane change decision module, a motion planning module and a trail tracking module, wherein the sensing module is used for sensing the motion trail of the unmanned vehicle; the lane change decision module outputs decision actions according to data collected by the sensing module and the positioning module; and the motion planning module outputs an optimal track to the track tracking module according to the decision action. The invention uses deep reinforcement learning to make decisions, and modules such as trajectory planning, perception and control based on decision action dynamics are independently processed, thereby greatly improving interpretability and operability of a decision planning process compared with an end-to-end method, and being well adapted to the system architecture of the existing unmanned automobile.

Description

针对结构化道路的无人驾驶汽车运动轨迹规划系统及方法Motion trajectory planning system and method for unmanned vehicles for structured roads

技术领域technical field

本发明属于无人驾驶技术领域,具体涉及一种针对结构化道路的无人驾驶汽车运动轨迹规划系统及方法。The invention belongs to the technical field of unmanned driving, and in particular relates to a system and method for planning the motion trajectory of unmanned vehicles for structured roads.

背景技术Background technique

无人驾驶汽车的决策与运动规划系统的目标是使无人车像熟练的驾驶员一样产生安全、合理的驾驶行为,规划出一条安全的行驶轨迹。传统的决策规划系统大多是基于规则的决策规划方法,其逻辑明确、规划推理能力强,但需要预先想到无人车会遇到的各种场景,系统比较复杂。近年来使用端到端卷积神经网络进行决策规划的方法被应用到无人车上,使决策规划系统大幅简化,系统直接输入相机获得的各帧图像,经由神经网络决策后直接输出车辆目标转向盘转角,但是其解释性较差。The goal of the decision-making and motion planning system of unmanned vehicles is to make unmanned vehicles produce safe and reasonable driving behaviors like skilled drivers, and plan a safe driving trajectory. Most of the traditional decision-making planning systems are rule-based decision-making and planning methods with clear logic and strong planning reasoning ability, but they need to anticipate various scenarios that unmanned vehicles will encounter, and the system is relatively complex. In recent years, the method of decision-making planning using end-to-end convolutional neural networks has been applied to unmanned vehicles, which greatly simplifies the decision-making planning system. The system directly inputs each frame of images obtained by the camera, and directly outputs the vehicle target steering after the neural network makes decisions. Disc corners, but they are less interpretable.

发明内容SUMMARY OF THE INVENTION

有鉴于此,本发明的目的在于提供一种针对结构化道路的无人驾驶汽车运动轨迹规划系统及方法。In view of this, the purpose of the present invention is to provide a system and method for planning the motion trajectory of an unmanned vehicle for structured roads.

为实现上述目的,本发明采用如下技术方案:To achieve the above object, the present invention adopts the following technical solutions:

一种针对结构化道路的无人驾驶汽车运动轨迹规划系统,包括感知模块、定位模块、变道决策模块、运动规划模块和轨迹追踪模块;所述变道决策模块根据感知模块和定位模块采集的数据输出决策动作;所述运动规划模块根据决策动作,输出最优轨迹至轨迹追踪模块。An unmanned vehicle motion trajectory planning system for structured roads, comprising a perception module, a positioning module, a lane change decision module, a motion planning module and a trajectory tracking module; the lane change decision module is based on the data collected by the perception module and the positioning module. Data output decision action; the motion planning module outputs the optimal trajectory to the trajectory tracking module according to the decision action.

进一步的,所述感知模块包括激光雷达、毫米波雷达和运动摄像机。Further, the perception module includes lidar, millimeter-wave radar, and motion cameras.

进一步的,所述定位模块包括GPS卫星定位系统、惯性导航仪和网络差分模块。Further, the positioning module includes a GPS satellite positioning system, an inertial navigator and a network differential module.

一种针对结构化道路的无人驾驶汽车运动轨迹规划方法,包括以下步骤:A method for planning the motion trajectory of an unmanned vehicle for structured roads, comprising the following steps:

步骤S1:采集实时行驶数据;Step S1: collect real-time driving data;

步骤S2:将采集的行驶数据进行坐标转化,将车辆在Frenet坐标系下的数据转换为车辆在全局坐标系下的数据;Step S2: transform the collected driving data into coordinates, and convert the data of the vehicle in the Frenet coordinate system into the data of the vehicle in the global coordinate system;

步骤S3:确定状态空间、动作空间和动作价值回报函数;Step S3: Determine the state space, action space and action value return function;

步骤S4:构建深度强化学习模型;Step S4: constructing a deep reinforcement learning model;

步骤S5:将周围环境信息、决策动作、车辆自身状态作为输入,通过深度强化学习模型计算,输出无人车执行不同动作的Q值,并根据输出的最大Q值选取相应的决策动作;Step S5: take the surrounding environment information, decision-making action, and the state of the vehicle as input, calculate through the deep reinforcement learning model, output the Q value of the unmanned vehicle performing different actions, and select the corresponding decision-making action according to the output maximum Q value;

步骤S6:根据决策模块给出的决策动作,计算出Frenet坐标下的目标配置;Step S6: according to the decision-making action given by the decision-making module, calculate the target configuration under the Frenet coordinates;

步骤S7:根据无人车当前位置的初始配置与目标配置,在预设的时间间隔内规划出一个含有时间戳的轨迹集;Step S7: According to the initial configuration and target configuration of the current position of the unmanned vehicle, plan a trajectory set containing a timestamp within a preset time interval;

步骤S8:建立损失函数,从生成的轨迹集中选择一条最优轨迹作为追踪轨迹。Step S8: establish a loss function, and select an optimal trajectory from the generated trajectory set as a tracking trajectory.

进一步的,所述全局坐标系下数据包括:位置(x,y)、速度vx、加速度ax转换成Frenet坐标系下的纵向位移s、纵向速度

Figure BDA0002386326330000021
纵向加速度
Figure BDA0002386326330000022
和横向位移1、横向速度
Figure BDA0002386326330000023
和横向加速度
Figure BDA0002386326330000024
Further, the data under the global coordinate system includes: position (x, y), velocity v x , acceleration a x converted into longitudinal displacement s under the Frenet coordinate system, longitudinal velocity
Figure BDA0002386326330000021
longitudinal acceleration
Figure BDA0002386326330000022
and lateral displacement 1, lateral velocity
Figure BDA0002386326330000023
and lateral acceleration
Figure BDA0002386326330000024

进一步的,所述步骤S3具体为:将状态空间定义为基于鸟瞰状态的Frenet坐标下的无人车与周围其他车辆的运动状态:Further, the step S3 is specifically: defining the state space as the motion state of the unmanned vehicle and other surrounding vehicles under the Frenet coordinates based on the bird's-eye view state:

Figure BDA0002386326330000034
Figure BDA0002386326330000034

式中:sego-为无人车自身状态,N0-为周围车辆数目。In the formula: s ego - is the state of the unmanned vehicle, N 0 - is the number of surrounding vehicles.

无人车自身状态与其他车辆状态采用一个六元组描述,主要特征为车辆的纵向距离、纵向速度、纵向加速度、横向距离、横向速度、横向加速度:The state of the unmanned vehicle and the state of other vehicles are described by a six-tuple, and the main features are the longitudinal distance, longitudinal speed, longitudinal acceleration, lateral distance, lateral speed, and lateral acceleration of the vehicle:

Figure BDA0002386326330000031
Figure BDA0002386326330000031

将输入的决策动作分为速度保持、加速、减速、向左变道、向右变道、跟车行驶六种,用一个六元组表示为:The input decision-making actions are divided into six types: speed maintaining, acceleration, deceleration, lane change to the left, lane change to the right, and following the car, which are represented by a six-tuple as:

A=[KS,ACC,DEC,CL,CR,FOL]T A = [KS, ACC, DEC, CL, CR, FOL] T

根据行车安全、舒适度、车头时距、以及行车速度建立价值回报函数,即:The value return function is established according to driving safety, comfort, headway, and driving speed, namely:

R(s,a)=Rs(s,a)+Rt(s,a)+Rc(s,a)+Rv(s,a)R(s,a)=Rs( s ,a)+ Rt (s,a)+ Rc (s,a)+ Rv (s,a)

车头时距回报值公式如下:The formula for the headway return value is as follows:

Figure BDA0002386326330000032
Figure BDA0002386326330000032

式中,RE为车头时距指标,本发明中RE=30。In the formula, RE is the headway index, and RE=30 in the present invention.

舒适度的回报值公式为:The formula for the return value of comfort is:

Figure BDA0002386326330000033
Figure BDA0002386326330000033

行车速度的回报值定义为:The reward value of driving speed is defined as:

Figure BDA0002386326330000041
Figure BDA0002386326330000041

式中,vmax为当前车道限制的最大速度。In the formula, vmax is the maximum speed limited by the current lane.

进一步的,所述步骤S4具体为:Further, the step S4 is specifically:

DQN模型的输入层的信息为决策动作、车辆自身状态以及其他车辆状态,输出层为在该状态下采取的各个决策动作对应的Q值,使用深度神经网络来替代价值函数,当已知价值回报函数Q*(s,a),用深度神经网络对Q函数进行拟合,则有:The information of the input layer of the DQN model is the decision-making action, the state of the vehicle itself and other vehicle states, and the output layer is the Q value corresponding to each decision-making action taken in this state. A deep neural network is used to replace the value function. When the value return is known The function Q * (s, a), using a deep neural network to fit the Q function, there are:

Q(s,α)≈Q(s,α;θ)Q(s, α) ≈ Q(s, α; θ)

通过求解贝尔曼方程得到的Q函数估计与深度神经网络预测的Q值之间存在差异,这个差异作为损失函数来更新神经网络;There is a difference between the Q function estimate obtained by solving the Bellman equation and the Q value predicted by the deep neural network, and this difference is used as a loss function to update the neural network;

引入损失函数最小化贝尔曼方程对Q值估计和神经网络对Q值估计的差,即:The loss function is introduced to minimize the difference between the Q value estimate by the Bellman equation and the Q value estimate by the neural network, namely:

Lii)=E[(yi-Q(s,a,θ))2] Lii )=E[(y i -Q(s, a, θ)) 2 ]

式中,yi为贝尔曼方程所求出的Q值,yi=r+γmaxQ(s′,a′,θ);i为迭代次数;In the formula, y i is the Q value obtained by the Bellman equation, y i =r+γmaxQ(s', a', θ); i is the number of iterations;

对上式采用梯度下降法进行训练,公式为:The above formula is trained by gradient descent, and the formula is:

Figure BDA0002386326330000042
Figure BDA0002386326330000042

进一步的,所述深度神经网络由估值神经网络和目标神经网络两个具有相同结构的神经网络构成,估值神经网络参与训练,不断更新参数提高其预测能力,而目标神经网络在估值神经网络训练一定次数之后保存估值神经网络的参数。Further, the deep neural network is composed of two neural networks with the same structure, the evaluation neural network and the target neural network. The evaluation neural network participates in training, and the parameters are continuously updated to improve its prediction ability, while the target neural network is in the evaluation neural network. After the network is trained for a certain number of times, the parameters of the estimated neural network are saved.

进一步的,所述步骤S6具体为:Further, the step S6 is specifically:

根据变道决策模块的决策动作,确定无人驾驶汽车的目标位置信息,当决策动作为速度保持、加速、减速时,横向的目标配置为

Figure BDA0002386326330000051
Figure BDA0002386326330000052
其中lt为当前车道中心线的横向距离l,Tj为到达该目标位置的时间;纵向的目标配置为
Figure BDA0002386326330000053
其中
Figure BDA0002386326330000054
为想要保持或达到的目标车速,
Figure BDA0002386326330000055
是允许的车速浮动;According to the decision action of the lane change decision module, the target position information of the driverless car is determined. When the decision action is speed maintenance, acceleration and deceleration, the lateral target configuration is
Figure BDA0002386326330000051
Figure BDA0002386326330000052
where l t is the lateral distance l of the center line of the current lane, and T j is the time to reach the target position; the vertical target configuration is
Figure BDA0002386326330000053
in
Figure BDA0002386326330000054
For the target speed you want to maintain or achieve,
Figure BDA0002386326330000055
is the allowable speed fluctuation;

当决策动作为左变道、向右变道时,横向的目标配置为

Figure BDA0002386326330000056
Figure BDA0002386326330000057
其中lt为左侧或右侧车道中心线的横向距离l;如果无人驾驶汽车前方没有障碍车辆,则纵向的目标配置为
Figure BDA0002386326330000058
Figure BDA0002386326330000059
如果无人驾驶汽车前方有障碍车辆,则纵向的目标配置为
Figure BDA00023863263300000510
Figure BDA00023863263300000511
其中sob
Figure BDA00023863263300000512
分别是前面障碍车辆的纵向距离、纵向速度、纵向加速度;When the decision-making action is to change lanes to the left or change to the right, the lateral target configuration is
Figure BDA0002386326330000056
Figure BDA0002386326330000057
where l t is the lateral distance l of the left or right lane centerline; if there is no obstacle vehicle in front of the driverless car, the longitudinal target configuration is
Figure BDA0002386326330000058
Figure BDA0002386326330000059
If there is an obstacle vehicle in front of the driverless car, the longitudinal target configuration is
Figure BDA00023863263300000510
Figure BDA00023863263300000511
where s ob ,
Figure BDA00023863263300000512
are the longitudinal distance, longitudinal velocity, and longitudinal acceleration of the preceding obstacle vehicle, respectively;

当决策动作为跟车行驶时,横向的目标配置为

Figure BDA00023863263300000513
其中lt为当前车道中心线的横向距离l;纵向的目标配置为
Figure BDA00023863263300000514
Figure BDA00023863263300000515
其中
Figure BDA00023863263300000516
slv
Figure BDA00023863263300000517
分别是跟随车辆的纵向距离、纵向速度、纵向加速度。When the decision action is to follow the vehicle, the lateral target configuration is
Figure BDA00023863263300000513
where l t is the lateral distance l of the current lane centerline; the longitudinal target configuration is
Figure BDA00023863263300000514
Figure BDA00023863263300000515
in
Figure BDA00023863263300000516
slv ,
Figure BDA00023863263300000517
They are the longitudinal distance, longitudinal velocity, and longitudinal acceleration of the following vehicle, respectively.

进一步的,所述步骤S8具体为:Further, the step S8 is specifically:

步骤S81:损失函数包括如下:Step S81: the loss function includes the following:

横向损失函数定义为:The lateral loss function is defined as:

Cl=kjJt(l(t))+ktT+kll1 2 C l =k j J t (l(t))+k t T+k l l 1 2

其中kj、kt、kl为权重值,kjJt(l(t))是惩罚Jerk较大的轨迹;ktT是惩罚制动时间,T越大制动时间越长;kll1 2是惩罚偏离道路中心线;Where k j , k t , k l are the weight values, k j J t (l(t)) is the trajectory with a larger penalty Jerk; k t T is the penalty braking time, the larger the T, the longer the braking time; k l l 1 2 is the penalty for deviating from the road centerline;

当纵向目标配置为

Figure BDA00023863263300000518
纵向损失函数定义为:When the portrait target is configured as
Figure BDA00023863263300000518
The longitudinal loss function is defined as:

Cs=kjJt(s(t))+ktT+ks[s1-sd]2 C s =k j J t (s(t))+k t T+k s [s 1 -s d ] 2

其中kj、kt、ks为权重值,ks[s1-sd]2是惩罚目标配置中的纵向距离s1与目标位置sd的偏差;where k j , k t , and k s are weight values, and k s [s 1 -s d ] 2 is the deviation between the vertical distance s 1 and the target position s d in the penalty target configuration;

当纵向目标配置为

Figure BDA0002386326330000061
纵向损失函数定义为:When the portrait target is configured as
Figure BDA0002386326330000061
The longitudinal loss function is defined as:

Figure BDA0002386326330000062
Figure BDA0002386326330000062

其中kj、kt、ks为权重值,

Figure BDA0002386326330000063
是惩罚目标配置中的纵向速度
Figure BDA0002386326330000064
与目标速度
Figure BDA0002386326330000065
的偏差;where k j , k t , and ks are weight values,
Figure BDA0002386326330000063
is the longitudinal velocity in the penalty target configuration
Figure BDA0002386326330000064
with the target speed
Figure BDA0002386326330000065
deviation;

轨迹总的损失函数为:The total loss function of the trajectory is:

Cal=klaCl+kloCs C al =k la C l +k lo C s

其中kla、klo为权重值;where k la and k lo are weight values;

步骤S82:对所有的轨迹计算完损失函数的值之后,去除s方向上超过最大速度或最大加速度的轨迹,去除超过最大曲率的轨迹,去除会发生碰撞的轨迹,从剩下的轨迹中选择损失函数值最小的轨迹作为最优轨迹,发送给轨迹追踪模块去追踪。Step S82: After calculating the value of the loss function for all trajectories, remove the trajectories that exceed the maximum speed or the maximum acceleration in the s direction, remove the trajectories that exceed the maximum curvature, remove the trajectories that may collide, and select the loss from the remaining trajectories. The trajectory with the smallest function value is taken as the optimal trajectory and sent to the trajectory tracking module for tracking.

本发明与现有技术相比具有以下有益效果:Compared with the prior art, the present invention has the following beneficial effects:

本发明使用深度强化学习做决策,基于决策动作动态的规划轨迹,感知及控制等模块被独立出来处理,相对于端到端的方法大大提高了决策规划过程的可解释性和可操作性,能很好地适配于以往的无人驾驶汽车的系统架构。The invention uses deep reinforcement learning to make decisions, based on the dynamic planning trajectory of the decision-making action, and the perception and control modules are processed independently. Compared with the end-to-end method, the interpretability and operability of the decision-making planning process are greatly improved, and the It is well adapted to the system architecture of previous driverless cars.

附图说明Description of drawings

图1是本发明系统总体框架结构;Fig. 1 is the overall frame structure of the system of the present invention;

图2是本发明一实施例中笛卡尔坐标系与Frenet坐标系的转换示意图;2 is a schematic diagram of the conversion between a Cartesian coordinate system and a Frenet coordinate system in an embodiment of the present invention;

图3是本发明一实施例中深度强化学习(DQN)结构图;3 is a structural diagram of deep reinforcement learning (DQN) in an embodiment of the present invention;

图4是本发明一实施例中向左变道过程中车辆场景图。FIG. 4 is a scene diagram of a vehicle in the process of changing lanes to the left according to an embodiment of the present invention.

具体实施方式Detailed ways

下面结合附图及实施例对本发明做进一步说明。The present invention will be further described below with reference to the accompanying drawings and embodiments.

请参照图1,本发明提供一种针对结构化道路的无人驾驶汽车运动轨迹规划系统,包括感知模块、定位模块、变道决策模块、运动规划模块和轨迹追踪模块;所述变道决策模块根据感知模块和定位模块采集的数据输出决策动作;所述运动规划模块根据决策动作,输出最优轨迹至轨迹追踪模块。所述感知模块包括激光雷达、毫米波雷达和运动摄像机。所述定位模块包括GPS卫星定位系统、惯性导航仪和网络差分模块。Please refer to FIG. 1 , the present invention provides an unmanned vehicle motion trajectory planning system for structured roads, including a perception module, a positioning module, a lane change decision module, a motion planning module and a trajectory tracking module; the lane change decision module The decision action is output according to the data collected by the sensing module and the positioning module; the motion planning module outputs the optimal trajectory to the trajectory tracking module according to the decision action. The perception module includes lidar, millimeter-wave radar, and motion cameras. The positioning module includes a GPS satellite positioning system, an inertial navigator and a network differential module.

在本实施例中,变道决策模块主要包括如下内容:In this embodiment, the lane change decision module mainly includes the following contents:

(1)对接收到感知层和定位导航系统的信息进行坐标转化,使用车辆在Frenet坐标系下的信息,需要把车辆在全局坐标系(笛卡尔坐标系)下的位置(x,y)、速度vx、加速度ax转换成Frenet坐标系下的纵向位移s、纵向速度

Figure BDA0002386326330000071
纵向加速度
Figure BDA0002386326330000072
和横向位移l、横向速度
Figure BDA0002386326330000073
和横向加速度
Figure BDA0002386326330000074
(1) Perform coordinate transformation on the information received from the perception layer and the positioning and navigation system. Using the information of the vehicle in the Frenet coordinate system, it is necessary to convert the position (x, y) of the vehicle in the global coordinate system (Cartesian coordinate system), Velocity v x and acceleration a x are converted into longitudinal displacement s and longitudinal velocity in the Frenet coordinate system
Figure BDA0002386326330000071
longitudinal acceleration
Figure BDA0002386326330000072
and lateral displacement l, lateral velocity
Figure BDA0002386326330000073
and lateral acceleration
Figure BDA0002386326330000074

(2)确定状态空间、动作空间和动作价值回报函数。深度神经网络的输入是环境和状态,输出是动作,所以需要确定状态空间、动作空间和动作价值回报函数。将状态空间定义为基于鸟瞰状态的Frenet坐标下的无人车与周围其他车辆的运动状态:(2) Determine the state space, action space and action value reward function. The input of the deep neural network is the environment and state, and the output is the action, so it is necessary to determine the state space, action space and action value reward function. The state space is defined as the motion state of the unmanned vehicle and other surrounding vehicles in the Frenet coordinates based on the bird's-eye view state:

Figure BDA0002386326330000075
Figure BDA0002386326330000075

式中:sego-为无人车自身状态,N0-为周围车辆数目。In the formula: s ego - is the state of the unmanned vehicle, N 0 - is the number of surrounding vehicles.

无人车自身状态与其他车辆状态采用一个六元组描述,主要特征为车辆的纵向距离、纵向速度、纵向加速度、横向距离、横向速度、横向加速度:The state of the unmanned vehicle and the state of other vehicles are described by a six-tuple, and the main features are the longitudinal distance, longitudinal speed, longitudinal acceleration, lateral distance, lateral speed, and lateral acceleration of the vehicle:

Figure BDA0002386326330000081
Figure BDA0002386326330000081

本实施例将输入的决策动作分为速度保持、加速、减速、向左变道、向右变道、跟车行驶六种,用一个六元组表示为:In this embodiment, the input decision-making actions are divided into six types: speed maintenance, acceleration, deceleration, lane change to the left, lane change to the right, and following the car, which are represented by a six-tuple as:

A=[KS,ACC,DEC,CL,CR,FOL]T A = [KS, ACC, DEC, CL, CR, FOL] T

动作价值回报函数定量的反应了无人驾驶汽车完成任务的程度,本发明主要从行车安全、舒适度、车头时距、以及行车速度这几个方面建立价值回报函数,即:The action value return function quantitatively reflects the degree to which the unmanned vehicle completes the task. The present invention mainly establishes the value return function from the aspects of driving safety, comfort, headway, and driving speed, namely:

R(s,a)=Rs(s,a)+Rt(s,a)+Rc(s,a)+Rv(s,a)R(s,a)=Rs( s ,a)+ Rt (s,a)+ Rc (s,a)+ Rv (s,a)

行车安全是无人驾驶汽车的基本要求和首要任务,当无人驾驶汽车与其他车辆发生碰撞时,给无人驾驶汽车一个大的负回报值Rs=-1000;如果无人驾驶汽车不发生碰撞到达目标位置,则给一个大的正回报值Rs=800。Driving safety is the basic requirement and primary task of driverless cars. When the driverless car collides with other vehicles, a large negative reward value R s = -1000 is given to the driverless car; if the driverless car does not happen When the collision reaches the target position, a large positive reward value R s =800 is given.

车头时距是衡量车辆跟随状态下安全性和道路通车效率的重要指标,定义为前后两车的车头通过同一地点的时间间隔。有实验数据指出,在车头时距大于2s时,车辆发生追尾事故的概率较低;当车头时距大于2s时,车辆发生追尾的概率与车头时距成反比。定义车头时距回报值公式如下:The headway is an important indicator to measure the safety and road traffic efficiency when the vehicle is following. According to experimental data, when the headway is greater than 2s, the probability of a rear-end collision is low; when the headway is greater than 2s, the probability of a rear-end collision is inversely proportional to the headway. The formula for defining the headway return value is as follows:

Figure BDA0002386326330000082
Figure BDA0002386326330000082

式中,RE为车头时距指标,本发明中RE=30。In the formula, RE is the headway index, and RE=30 in the present invention.

无人驾驶汽车的决策也要考虑乘坐的舒适度,应尽量避免决策动作的频发切换,同时减少剧烈的加速动作。本发明对舒适度的回报值定义为:The decision-making of the driverless car should also consider the comfort of the ride, and the frequent switching of decision-making actions should be avoided as much as possible, while reducing violent acceleration actions. The reward value of the present invention for comfort is defined as:

Figure BDA0002386326330000083
Figure BDA0002386326330000083

行车速度决定了无人驾驶汽车的行车效率,在交通规则允许的情况下,应提高行车效率。本发明对行车速度的回报值定义为:The driving speed determines the driving efficiency of the driverless car, and the driving efficiency should be improved when the traffic rules permit. The reward value of the present invention to the driving speed is defined as:

Figure BDA0002386326330000084
Figure BDA0002386326330000084

式中,vmax为当前车道限制的最大速度。In the formula, vmax is the maximum speed limited by the current lane.

(3)搭建深度强化学习(DQN)模型。DQN模型的输入层的信息为决策动作、车辆自身状态以及其他车辆状态,输出层为在该状态下采取的各个决策动作对应的Q值。在DQN模型中,使用深度神经网络来近似替代价值函数,当已知价值回报函数Q*(s,a),用深度神经网络对Q函数进行拟合,则有:(3) Build a deep reinforcement learning (DQN) model. The information of the input layer of the DQN model is the decision-making action, the state of the vehicle itself and other vehicle states, and the output layer is the Q value corresponding to each decision-making action taken in this state. In the DQN model, a deep neural network is used to approximate the replacement value function. When the value return function Q * (s, a) is known, and the Q function is fitted by a deep neural network, there are:

Q(s,α)≈Q(s,α;θ)Q(s, α) ≈ Q(s, α; θ)

通过求解贝尔曼方程得到的Q函数估计与深度神经网络预测的Q值之间存在差异,这个差异可以作为损失函数来更新神经网络。因此,引入损失函数最小化贝尔曼方程对Q值估计和神经网络对Q值估计的差,即:There is a difference between the Q-function estimate obtained by solving the Bellman equation and the Q-value predicted by the deep neural network, and this difference can be used as a loss function to update the neural network. Therefore, a loss function is introduced to minimize the difference between the Q-value estimate by the Bellman equation and the Q-value estimate by the neural network, namely:

Lii)=E[(yi-Q(s,a,θ))2] Lii )=E[(y i -Q(s, a, θ)) 2 ]

式中,yi为贝尔曼方程所求出的Q值,yi=r+γmaxQ(s',a',θ);i为迭代次数:In the formula, y i is the Q value calculated by the Bellman equation, y i =r+γmaxQ(s', a', θ); i is the number of iterations:

对上式采用梯度下降法进行训练,公式为:The above formula is trained by gradient descent, and the formula is:

Figure BDA0002386326330000091
Figure BDA0002386326330000091

深度强化学习模型由估值神经网络和目标神经网络两个具有相同结构的神经网络构成。只有估值神经网络参与训练,不断更新参数提高其预测能力,而目标神经网络在估值神经网络训练一定次数之后保存估值神经网络的参数。两个神经网络在预测Q值时会有差异,通过这个差异来更新Q学习中的策略。The deep reinforcement learning model is composed of two neural networks with the same structure, the evaluation neural network and the target neural network. Only the evaluation neural network participates in the training, and the parameters are continuously updated to improve its prediction ability, while the target neural network saves the parameters of the evaluation neural network after the evaluation neural network is trained for a certain number of times. There is a difference between the two neural networks in predicting the Q value, and this difference is used to update the policy in Q-learning.

运动规划模块主要包括如下内容:The motion planning module mainly includes the following contents:

(1)根据变道决策模块的决策动作,确定无人驾驶汽车的目标位置信息。当决策动作为速度保持、加速、减速时,横向的目标配置为

Figure BDA0002386326330000092
其中lt为当前车道中心线的横向距离l,Tj为到达该目标位置的时间;纵向的目标配置为
Figure BDA0002386326330000093
其中
Figure BDA0002386326330000094
为想要保持或达到的目标车速,
Figure BDA0002386326330000095
是允许的车速浮动。(1) Determine the target location information of the driverless vehicle according to the decision-making action of the lane-change decision-making module. When the decision action is speed hold, acceleration, deceleration, the lateral target configuration is
Figure BDA0002386326330000092
where l t is the lateral distance l of the center line of the current lane, and T j is the time to reach the target position; the vertical target configuration is
Figure BDA0002386326330000093
in
Figure BDA0002386326330000094
For the target speed you want to maintain or achieve,
Figure BDA0002386326330000095
is the allowable speed fluctuation.

当决策动作为左变道、向右变道时,横向的目标配置为

Figure BDA0002386326330000096
Figure BDA0002386326330000097
其中lt为左侧或右侧车道中心线的横向距离l;如果无人驾驶汽车前方没有障碍车辆,则纵向的目标配置为
Figure BDA0002386326330000098
Figure BDA0002386326330000099
如果无人驾驶汽车前方有障碍车辆,则纵向的目标配置为
Figure BDA00023863263300000910
Figure BDA00023863263300000911
其中sob
Figure BDA00023863263300000912
分别是前面障碍车辆的纵向距离、纵向速度、纵向加速度。When the decision-making action is to change lanes to the left or change to the right, the lateral target configuration is
Figure BDA0002386326330000096
Figure BDA0002386326330000097
where l t is the lateral distance l of the left or right lane centerline; if there is no obstacle vehicle in front of the driverless car, the longitudinal target configuration is
Figure BDA0002386326330000098
Figure BDA0002386326330000099
If there is an obstacle vehicle in front of the driverless car, the longitudinal target configuration is
Figure BDA00023863263300000910
Figure BDA00023863263300000911
where s ob ,
Figure BDA00023863263300000912
are the longitudinal distance, longitudinal velocity, and longitudinal acceleration of the preceding obstacle vehicle, respectively.

当决策动作为跟车行驶时,横向的目标配置为

Figure BDA00023863263300000913
其中lt为当前车道中心线的横向距离l;纵向的目标配置为
Figure BDA00023863263300000914
Figure BDA0002386326330000101
其中
Figure BDA0002386326330000102
slv
Figure BDA0002386326330000103
分别是跟随车辆的纵向距离、纵向速度、纵向加速度。When the decision action is to follow the vehicle, the lateral target configuration is
Figure BDA00023863263300000913
where l t is the lateral distance l of the current lane centerline; the longitudinal target configuration is
Figure BDA00023863263300000914
Figure BDA0002386326330000101
in
Figure BDA0002386326330000102
slv ,
Figure BDA0002386326330000103
They are the longitudinal distance, longitudinal velocity, and longitudinal acceleration of the following vehicle, respectively.

(2)确定当前无人驾驶汽车在Frenet坐标系下的信息。实时的接受定位导航系统的信息,把定位导航系统计算的无人驾驶汽车在笛卡尔坐标系下的位置(x,y)、速度vx、加速度ax转换成Frenet坐标系下的纵向位移s、纵向速度

Figure BDA0002386326330000104
纵向加速度
Figure BDA0002386326330000105
和横向位移l、横向速度
Figure BDA0002386326330000106
和横向加速度
Figure BDA0002386326330000107
(2) Determine the information of the current driverless car in the Frenet coordinate system. Accept the information of the positioning and navigation system in real time, and convert the position (x, y), velocity v x and acceleration a x of the driverless car in the Cartesian coordinate system calculated by the positioning and navigation system into the longitudinal displacement s in the Frenet coordinate system , longitudinal speed
Figure BDA0002386326330000104
longitudinal acceleration
Figure BDA0002386326330000105
and lateral displacement l, lateral velocity
Figure BDA0002386326330000106
and lateral acceleration
Figure BDA0002386326330000107

(3)在一定时间间隔内,规划出一个从车辆当前位置到目标位置关于时间的轨迹集。由于引入Frenet坐标系,将轨迹规划问题解耦合为纵向s和横向l两个方向的优化问题。车辆的加加速度Jerk与乘客的舒适度有关,加加速度Jt关于配置p在时间端t0~t1内累计的Jerk为:

Figure BDA0002386326330000108
引入一个控制动作的执行时间:T=tend-tstart,通过改变执行时间形成一个有限轨迹集。Takahashi等人已经证明:可以使用一个五次多项式来表示Jerk最优化问题在的解,形如:p(t)=α01t+α2t23t34t45t5。设初始配置为
Figure BDA0002386326330000109
Figure BDA00023863263300001010
目标配置为
Figure BDA00023863263300001011
可求解得到对应的l方向关于时间t的五次多项式(轨迹),使用多个不同的Tj可以得到一个l方向上关于时间t的轨迹集。同理可求得s方向上关于时间t的轨迹集。(3) In a certain time interval, plan a trajectory set from the current position of the vehicle to the target position with respect to time. Due to the introduction of the Frenet coordinate system, the trajectory planning problem is decoupled into two optimization problems in the longitudinal s and lateral l directions. The jerk Jerk of the vehicle is related to the comfort of the passengers. The Jerk accumulated by the jerk J t in the time end t 0 to t 1 with respect to the configuration p is:
Figure BDA0002386326330000108
The execution time of a control action is introduced: T=t end -t start , and a finite trajectory set is formed by changing the execution time. Takahashi et al. have shown that the solution to the Jerk optimization problem can be represented by a quintic polynomial, in the form: p(t)=α 01 t+α 2 t 23 t 34 t 45 t 5 . Set the initial configuration to
Figure BDA0002386326330000109
Figure BDA00023863263300001010
The target is configured as
Figure BDA00023863263300001011
The quintic polynomial (trajectory) of the corresponding l direction with respect to time t can be obtained by solving, and a set of trajectories in the l direction with respect to time t can be obtained by using multiple different T j . Similarly, the trajectory set in the s direction with respect to time t can be obtained.

(4)建立损失函数,从生成的轨迹集中选择一条最优轨迹,作为控制模块的追踪轨迹。横向损失函数定义为:(4) Establish a loss function, and select an optimal trajectory from the generated trajectory set as the tracking trajectory of the control module. The lateral loss function is defined as:

Cl=kjJt(l(t))+ktT+kll1 2 C l =k j J t (l(t))+k t T+k l l 1 2

其中kj、kt、kl为权重值,kjJt(l(t))是惩罚Jerk较大的轨迹;ktT是惩罚制动时间,T越大制动时间越长;kll1 2是惩罚偏离道路中心线。Where k j , k t , k l are the weight values, k j J t (l(t)) is the trajectory with a larger penalty Jerk; k t T is the penalty braking time, the larger the T, the longer the braking time; k l l 1 2 is the penalty for deviating from the road centerline.

当纵向目标配置为

Figure BDA00023863263300001012
纵向损失函数定义为:When the portrait target is configured as
Figure BDA00023863263300001012
The longitudinal loss function is defined as:

Cs=kjJt(s(t))+ktT+ks[s1-sd]2 C s =k j J t (s(t))+k t T+k s [s 1 -s d ] 2

其中kj、kt、ks为权重值,ks[s1-sd]2是惩罚目标配置中的纵向距离s1与目标位置sd的偏差。where k j , k t , and k s are weight values, and k s [s 1 -s d ] 2 is the deviation between the longitudinal distance s 1 and the target position s d in the penalized target configuration.

当纵向目标配置为

Figure BDA00023863263300001013
纵向损失函数定义为:When the portrait target is configured as
Figure BDA00023863263300001013
The longitudinal loss function is defined as:

Figure BDA00023863263300001014
Figure BDA00023863263300001014

其中kj、kt、ks为权重值,

Figure BDA00023863263300001015
是惩罚目标配置中的纵向速度
Figure BDA00023863263300001016
与目标速度
Figure BDA00023863263300001017
的偏差。where k j , k t , and ks are weight values,
Figure BDA00023863263300001015
is the longitudinal velocity in the penalty target configuration
Figure BDA00023863263300001016
with the target speed
Figure BDA00023863263300001017
deviation.

轨迹总的损失函数为:The total loss function of the trajectory is:

Cal=klaCl+kloCs C al =k la C l +k lo C s

其中kla、klo为权重值。where k la and k lo are weight values.

在本实施例中,如图2所示,无人驾驶汽车当前时刻的坐标为:In this embodiment, as shown in Figure 2, the coordinates of the current moment of the driverless car are:

Figure BDA0002386326330000111
Figure BDA0002386326330000111

式中,

Figure BDA0002386326330000112
为无人驾驶车辆在全局坐标系下的位置向量;s(t)为从参考路径的起点到点r的曲线距离;l(t)为车辆位置离参考路径上最近点r的法向距离。In the formula,
Figure BDA0002386326330000112
is the position vector of the unmanned vehicle in the global coordinate system; s(t) is the curve distance from the starting point of the reference path to point r; l(t) is the normal distance between the vehicle position and the closest point r on the reference path.

由图中的几何关系得:From the geometric relationship in the figure:

Figure BDA0002386326330000113
Figure BDA0002386326330000113

式中,

Figure BDA0002386326330000114
为参考路径上点R处的法向单位向量;
Figure BDA0002386326330000115
为点R在全局坐标系下的位置向量。In the formula,
Figure BDA0002386326330000114
is the normal unit vector at point R on the reference path;
Figure BDA0002386326330000115
is the position vector of point R in the global coordinate system.

用θx表示车辆当前位置在全局坐标系下的朝向角;用θr表示参考路径上点r在全局坐标系下的朝向角;kr为参考点处的曲率;vx和ax分别为车辆当前时刻的速度和加速度。在参考路径上离车辆最近点r处的切向量和法向量为:Use θ x to represent the heading angle of the current position of the vehicle in the global coordinate system; use θ r to represent the heading angle of point r on the reference path in the global coordinate system; k r is the curvature at the reference point; v x and a x are respectively The speed and acceleration of the vehicle at the current moment. The tangent and normal vectors at the closest point r to the vehicle on the reference path are:

Figure BDA0002386326330000116
Figure BDA0002386326330000116

Figure BDA0002386326330000117
Figure BDA0002386326330000117

由(1)、(2)变换得:Transform from (1) and (2) to get:

Figure BDA0002386326330000118
Figure BDA0002386326330000118

(5)式两边同时乘以

Figure BDA0002386326330000119
可得:(5) Multiply both sides by the
Figure BDA0002386326330000119
Available:

Figure BDA00023863263300001110
Figure BDA00023863263300001110

参考路径上的法向量

Figure BDA00023863263300001111
关于时间的一阶微分为切向量
Figure BDA00023863263300001112
乘以-kr,normal vector on the reference path
Figure BDA00023863263300001111
The first derivative with respect to time is a tangent vector
Figure BDA00023863263300001112
Multiply by -k r ,

Figure BDA00023863263300001113
Figure BDA00023863263300001113

由于

Figure BDA00023863263300001114
Figure BDA00023863263300001115
的方向与
Figure BDA00023863263300001116
的方向垂直,因此相乘也为0,故由(7)式可得:because
Figure BDA00023863263300001114
and
Figure BDA00023863263300001115
direction with
Figure BDA00023863263300001116
The direction of is vertical, so the multiplication is also 0, so it can be obtained from formula (7):

Figure BDA00023863263300001117
Figure BDA00023863263300001117

vx

Figure BDA00023863263300001118
的模,即:v x is
Figure BDA00023863263300001118
, that is:

Figure BDA00023863263300001119
Figure BDA00023863263300001119

无人驾驶汽车正常行驶时,满足下述两个条件:When a driverless car drives normally, the following two conditions are met:

Figure BDA0002386326330000121
Figure BDA0002386326330000121

l<1/kr (11)l<1/k r (11)

可推得:It can be inferred:

Figure BDA0002386326330000122
Figure BDA0002386326330000122

计算

Figure BDA0002386326330000123
Figure BDA0002386326330000124
可使用相邻两帧之间纵向和横向的速度变化率来表征,即:calculate
Figure BDA0002386326330000123
and
Figure BDA0002386326330000124
It can be characterized by the vertical and horizontal velocity change rates between two adjacent frames, namely:

Figure BDA0002386326330000125
Figure BDA0002386326330000125

Figure BDA0002386326330000126
Figure BDA0002386326330000126

式中,f表示计算的频率,一般大于10Hz。In the formula, f represents the calculated frequency, generally greater than 10Hz.

通过以上有关公式,可以完成笛卡尔坐标系与Frenet坐标系的相互转换。Through the above formulas, the mutual conversion between the Cartesian coordinate system and the Frenet coordinate system can be completed.

在本实施例中,DQN模型的输入层的信息为决策动作、车辆自身状态以及其他车辆状态,输出层为在该状态下采取的各个决策动作对应的Q值。In this embodiment, the information of the input layer of the DQN model is the decision action, the state of the vehicle itself and other vehicle states, and the output layer is the Q value corresponding to each decision action taken in this state.

在本实施例中,如图4所示,向左变道过程中车辆场景图。当前运动规划模块接受到的决策动作为向左变道,横向的目标配置为

Figure BDA0002386326330000127
其中lt为左侧或右侧车道中心线的横向距离l;纵向的目标配置为
Figure BDA0002386326330000128
其中sob
Figure BDA0002386326330000129
分别是前面障碍车辆的纵向距离、纵向速度、纵向加速度。由初始配置和目标配置可得方程组:In this embodiment, as shown in FIG. 4 , a scene graph of the vehicle in the process of changing lanes to the left. The decision action received by the current motion planning module is to change lanes to the left, and the lateral target configuration is
Figure BDA0002386326330000127
where l t is the lateral distance l of the left or right lane centerline; the longitudinal target configuration is
Figure BDA0002386326330000128
where s ob ,
Figure BDA0002386326330000129
are the longitudinal distance, longitudinal velocity, and longitudinal acceleration of the preceding obstacle vehicle, respectively. The system of equations can be obtained from the initial configuration and the target configuration:

Figure BDA00023863263300001210
Figure BDA00023863263300001210

令t0=0可直接求得αl0、αl1、αl2为:Let t 0 =0, α l0 , α l1 , α l2 can be directly obtained as:

Figure BDA0002386326330000131
Figure BDA0002386326330000131

令T=t1-t2,系数αl3、αl4、αl5可通过求解如下矩阵方程得到:Let T=t 1 -t 2 , the coefficients α l3 , α l4 , α l5 can be obtained by solving the following matrix equations:

Figure BDA0002386326330000132
Figure BDA0002386326330000132

通过选用不同的T值,得到一个l方向上的轨迹集;同理可求得s方向上的轨迹集。By choosing different T values, a trajectory set in the l direction can be obtained; in the same way, the trajectory set in the s direction can be obtained.

使用如下损失函数计算l方向上轨迹的损失函数值:Calculate the loss function value of the trajectory in the l direction using the following loss function:

Cl=kjJt(l(t))+ktT+kll1 2 (18)C l =k j J t (l(t))+k t T+k l l 1 2 (18)

使用如下损失函数计算s方向上轨迹的损失函数值:Calculate the loss function value of the trajectory in the s direction using the following loss function:

Cs=kjJt(s(t))+ktT+ks[s1-sd]2 (19)C s =k j J t (s(t))+k t T+k s [s 1 -s d ] 2 (19)

轨迹总的损失函数为:The total loss function of the trajectory is:

Cal=klaCl+kloCs (20)C al = k la C l +k lo C s (20)

对所有的轨迹计算总的损失函数值之后,去除s方向上超过最大速度或最大加速度的轨迹,去除超过最大曲率的轨迹,去除会发生碰撞的轨迹,从剩下的轨迹中选择损失函数值最小的轨迹作为最优轨迹,发送给轨迹追踪模块去追踪。After calculating the total loss function value for all trajectories, remove the trajectories that exceed the maximum velocity or the maximum acceleration in the s direction, remove the trajectories that exceed the maximum curvature, remove the trajectories that will collide, and select the smallest loss function value from the remaining trajectories. The trajectory is taken as the optimal trajectory and sent to the trajectory tracking module for tracking.

以上所述仅为本发明的较佳实施例,凡依本发明申请专利范围所做的均等变化与修饰,皆应属本发明的涵盖范围。The above descriptions are only preferred embodiments of the present invention, and all equivalent changes and modifications made according to the scope of the patent application of the present invention shall fall within the scope of the present invention.

Claims (10)

1. A system for planning the motion trail of an unmanned vehicle for a structured road is characterized by comprising a sensing module, a positioning module, a lane change decision module, a motion planning module and a trail tracking module; the lane change decision module outputs decision actions according to data collected by the sensing module and the positioning module; and the motion planning module outputs an optimal track to the track tracking module according to the decision action.
2. The unmanned automotive motion trajectory planning system for structured roads of claim 1, wherein: the sensing module comprises a laser radar, a millimeter wave radar and a motion camera.
3. The unmanned automotive motion trajectory planning system for structured roads of claim 1, wherein: the positioning module comprises a GPS satellite positioning system, an inertial navigator and a network difference module.
4. A method for planning the motion trail of an unmanned vehicle for a structured road is characterized by comprising the following steps:
step S1, collecting real-time driving data;
step S2, coordinate transformation is carried out on the collected driving data, and data of the vehicle in a Frenet coordinate system are transformed into data of the vehicle in a global coordinate system;
step S3, determining a state space, an action space and an action value return function;
step S4, constructing a deep reinforcement learning model;
step S5, taking the surrounding environment information, decision-making action and the vehicle state as input, calculating through a deep reinforcement learning model, outputting Q values of different actions executed by the unmanned vehicle, and selecting corresponding decision-making action according to the output maximum Q value;
step S6: calculating target configuration under Frenet coordinates according to decision action given by the decision module;
step S7: according to the initial configuration and the target configuration of the current position of the unmanned vehicle, a track set containing a timestamp is planned within a preset time interval;
step S8: and establishing a loss function, and selecting an optimal track from the generated track set as a tracking track.
5. The unmanned aerial vehicle motion trajectory planning method for the structured road according to claim 4, wherein the data under the global coordinate system comprises: position (x, y), velocity vxAcceleration axConverting into longitudinal displacement s and longitudinal speed in Frenet coordinate system
Figure FDA0002386326320000023
Longitudinal acceleration
Figure FDA0002386326320000024
And transverse displacement l, transverse velocity
Figure FDA0002386326320000026
And lateral acceleration
Figure FDA0002386326320000025
6. The method for planning the unmanned vehicle motion trajectory for the structured road according to claim 4, wherein the step S3 specifically comprises: the state space is defined as the motion state of the unmanned vehicle and other vehicles around the unmanned vehicle in Frenet coordinates based on the bird's eye view state:
Figure FDA0002386326320000021
in the formula: segoUnmanned vehicle self-state, N0-the number of surrounding vehicles.
The state of the unmanned vehicle and the states of other vehicles are described by a six-element group, and the unmanned vehicle is mainly characterized by comprising the following components in parts by weight:
Figure FDA0002386326320000022
dividing the input decision-making actions into six types of speed keeping, acceleration, deceleration, lane changing to the left, lane changing to the right and following vehicle driving, and expressing the six types of speed keeping, acceleration, deceleration, lane changing to the left, lane changing to the right and following vehicle driving by using a six-tuple:
A=[KS,ACC,DEC,CL,CR,FOL]T
establishing a value return function according to driving safety, comfort level, headway and driving speed, namely:
R(s,a)=Rs(s,a)+Rt(s,a)+Rc(s,a)+Rv(s,a)
the formula of the head time interval return value is as follows:
Figure FDA0002386326320000031
in the formula, RE is a headway indicator, and in the present invention, RE is 30.
The return value of the comfort degree is expressed as:
Figure FDA0002386326320000032
the return value of the driving speed is defined as:
Figure FDA0002386326320000033
in the formula, vmaxThe maximum speed of the current lane constraint.
7. The method for planning the unmanned vehicle motion trajectory for the structured road according to claim 4, wherein the step S4 specifically comprises:
the information of an input layer of the DQN model is decision-making action, the state of the vehicle and other vehicle states, an output layer is a Q value corresponding to each decision-making action taken under the state, a deep neural network is used for replacing a cost function, and when the value return function Q is known*(s, a), fitting the Q function with a deep neural network, then:
Q(s,α)≈Q(s,α;θ)
the Q function estimation obtained by solving the Bellman equation has difference with the Q value predicted by the deep neural network, and the difference is used as a loss function to update the neural network;
introducing a loss function to minimize the difference between the bellman equation to the Q value estimate and the neural network to the Q value estimate, namely:
Lii)=E[(yi-Q(s,a,θ))2]
in the formula, yiQ value, y, determined for the Bellman equationiR + γ maxQ (s ', a', θ); i is the number of iterations;
training the formula by adopting a gradient descent method, wherein the formula is as follows:
Figure FDA0002386326320000041
8. the method of claim 7, wherein the deep neural network is composed of two neural networks having the same structure, the estimated neural network participates in training and continuously updates parameters to improve the prediction capability, and the target neural network stores the parameters of the estimated neural network after the estimated neural network is trained for a certain number of times.
9. The method for planning the unmanned vehicle motion trajectory for the structured road according to claim 4, wherein the step S6 specifically comprises:
determining target position information of the unmanned automobile according to the decision-making action of the lane-changing decision-making module, and when the decision-making action is speed keeping, acceleration and deceleration, configuring the transverse target into a mode of being capable of keeping, accelerating and decelerating
Figure FDA0002386326320000042
Figure FDA0002386326320000043
Wherein ltIs the transverse distance l, T of the center line of the current lanejIs the time to reach the target location; longitudinal target configuration
Figure FDA0002386326320000044
Wherein
Figure FDA0002386326320000045
To maintain or achieve the target vehicle speed,
Figure FDA0002386326320000046
is allowed vehicle speed float;
when deciding the driving as left lane change and right lane change, the target configuration of the lateral direction is
Figure FDA0002386326320000047
Figure FDA0002386326320000048
Wherein ltThe lateral distance l is the center line of the left or right lane; if there is no obstacle vehicle in front of the unmanned vehicle, the longitudinal target is configured to
Figure FDA0002386326320000049
Figure FDA0002386326320000051
If there is obstacle in front of the unmanned automobileVehicle, longitudinal target configuration
Figure FDA0002386326320000052
Figure FDA0002386326320000053
Wherein s isob
Figure FDA0002386326320000054
Longitudinal distance, longitudinal speed, longitudinal acceleration of the preceding obstacle vehicle, respectively;
when the decision-making action is following the car, the transverse target configuration is
Figure FDA0002386326320000055
Wherein ltThe transverse distance l is the central line of the current lane; longitudinal target configuration
Figure FDA0002386326320000056
Figure FDA0002386326320000057
Wherein
Figure FDA0002386326320000058
slv
Figure FDA0002386326320000059
Respectively the longitudinal distance, the longitudinal speed, the longitudinal acceleration of the following vehicle.
10. The method for planning the unmanned vehicle motion trajectory for the structured road according to claim 4, wherein the step S8 specifically comprises:
step S81: the loss function includes the following:
the lateral loss function is defined as:
Cl=kjJt(l(t))+ktT+kll1 2
wherein k isj、kt、klIs a weight value, kjJt(l (t)) penalizes trajectories with larger Jerk; k is a radical oftT is punished braking time, and the larger T is, the longer the braking time is; k is a radical ofll1 2Punishment of deviation from the center line of the road;
when the longitudinal target is configured
Figure FDA00023863263200000510
The longitudinal loss function is defined as:
Cs=kjJt(s(t))+ktT+ks[s1-sd]2
wherein k isj、kt、ksIs a weight value, ks[s1-sd]2Is the longitudinal distance s in the penalty target configuration1And target position sdA deviation of (a);
when the longitudinal target is configured
Figure FDA00023863263200000511
The longitudinal loss function is defined as:
Figure FDA00023863263200000512
wherein k isj、kt、ksIs a weight value of the weight value,
Figure FDA00023863263200000513
is penalizing longitudinal velocity in target configuration
Figure FDA00023863263200000514
With target speed
Figure FDA00023863263200000515
A deviation of (a);
the total loss function of the trace is:
Cal=klaCl+kloCs
wherein k isla、kloIs a weighted value;
step S82: after the values of the loss functions are calculated for all the tracks, the tracks exceeding the maximum speed or the maximum acceleration in the s direction are removed, the tracks exceeding the maximum curvature are removed, the tracks which can collide are removed, the track with the minimum loss function value is selected from the rest tracks to serve as the optimal track, and the optimal track is sent to a track tracking module to be tracked.
CN202010099122.6A 2020-02-18 2020-02-18 Unmanned vehicle motion track planning system and method for structured road Expired - Fee Related CN111273668B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010099122.6A CN111273668B (en) 2020-02-18 2020-02-18 Unmanned vehicle motion track planning system and method for structured road

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010099122.6A CN111273668B (en) 2020-02-18 2020-02-18 Unmanned vehicle motion track planning system and method for structured road

Publications (2)

Publication Number Publication Date
CN111273668A true CN111273668A (en) 2020-06-12
CN111273668B CN111273668B (en) 2021-09-03

Family

ID=70999348

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010099122.6A Expired - Fee Related CN111273668B (en) 2020-02-18 2020-02-18 Unmanned vehicle motion track planning system and method for structured road

Country Status (1)

Country Link
CN (1) CN111273668B (en)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111845774A (en) * 2020-07-20 2020-10-30 上海大学 A dynamic trajectory planning and tracking method for autonomous vehicles based on horizontal and vertical coordination
CN112132260A (en) * 2020-09-03 2020-12-25 深圳索信达数据技术有限公司 Training method, calling method, device and storage medium of neural network model
CN112363505A (en) * 2020-11-10 2021-02-12 合肥工业大学 Articulated sweeper speed planning method and system based on target distance
CN113264043A (en) * 2021-05-17 2021-08-17 北京工业大学 Unmanned driving layered motion decision control method based on deep reinforcement learning
CN113386795A (en) * 2021-07-05 2021-09-14 西安电子科技大学芜湖研究院 Intelligent decision-making and local track planning method for automatic driving vehicle and decision-making system thereof
CN113537782A (en) * 2021-07-19 2021-10-22 福州大学 Contract network-based multi-satellite situation awareness system distributed task planning method
CN113619604A (en) * 2021-08-26 2021-11-09 清华大学 Integrated decision and control method and device for automatic driving automobile and storage medium
CN113721637A (en) * 2021-11-02 2021-11-30 武汉理工大学 Intelligent vehicle dynamic obstacle avoidance path continuous planning method and system and storage medium
CN113844441A (en) * 2021-10-14 2021-12-28 安徽江淮汽车集团股份有限公司 Machine learning method of front collision early warning braking system
CN113879323A (en) * 2021-10-26 2022-01-04 清华大学 Reliable learning autonomous driving decision-making method, system, storage medium and device
CN113942524A (en) * 2020-07-15 2022-01-18 广州汽车集团股份有限公司 Vehicle running control method and system and computer readable storage medium
CN114013443A (en) * 2021-11-12 2022-02-08 哈尔滨工业大学 Automatic driving vehicle lane change decision control method based on hierarchical reinforcement learning
CN115169951A (en) * 2022-07-26 2022-10-11 中国科学院合肥物质科学研究院 Multi-feature-fused automatic driving course reinforcement learning training method
CN115373388A (en) * 2022-08-09 2022-11-22 上海东古智能科技有限公司 A path planning system and method for a shipboard robot
CN115731261A (en) * 2021-08-27 2023-03-03 河北省交通规划设计研究院有限公司 Vehicle lane-changing behavior recognition method and system based on expressway radar data
CN117141520A (en) * 2023-10-31 2023-12-01 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) Real-time track planning method, device and equipment
CN118795874A (en) * 2023-04-07 2024-10-18 中车株洲电力机车研究所有限公司 A local trajectory planning method based on relative positioning information

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104267721A (en) * 2014-08-29 2015-01-07 陈业军 Unmanned driving system of intelligent automobile
CN108068817A (en) * 2017-12-06 2018-05-25 张家港天筑基业仪器设备有限公司 A kind of automatic lane change device and method of pilotless automobile
CN108777872A (en) * 2018-05-22 2018-11-09 中国人民解放军陆军工程大学 Deep Q neural network anti-interference model and intelligent anti-interference algorithm
US20190204841A1 (en) * 2017-12-29 2019-07-04 Beijing Didi Infinity Technology And Development Co,, Ltd. Systems and methods for path determination
CN110362096A (en) * 2019-08-13 2019-10-22 东北大学 A kind of automatic driving vehicle dynamic trajectory planing method based on local optimality

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104267721A (en) * 2014-08-29 2015-01-07 陈业军 Unmanned driving system of intelligent automobile
CN108068817A (en) * 2017-12-06 2018-05-25 张家港天筑基业仪器设备有限公司 A kind of automatic lane change device and method of pilotless automobile
US20190204841A1 (en) * 2017-12-29 2019-07-04 Beijing Didi Infinity Technology And Development Co,, Ltd. Systems and methods for path determination
CN108777872A (en) * 2018-05-22 2018-11-09 中国人民解放军陆军工程大学 Deep Q neural network anti-interference model and intelligent anti-interference algorithm
CN110362096A (en) * 2019-08-13 2019-10-22 东北大学 A kind of automatic driving vehicle dynamic trajectory planing method based on local optimality

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
孙天骏: ""基于学习控制的汽车全速自适应巡航决策与控制算法研究"", 《中国博士学位论文全文数据库 工程科技Ⅱ辑》 *
魏民祥 等: ""基于Frenet 坐标系的自动驾驶轨迹规划与优化算法"", 《控制与决策》 *

Cited By (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113942524A (en) * 2020-07-15 2022-01-18 广州汽车集团股份有限公司 Vehicle running control method and system and computer readable storage medium
CN111845774B (en) * 2020-07-20 2021-12-03 上海大学 Automatic driving automobile dynamic trajectory planning and tracking method based on transverse and longitudinal coordination
CN111845774A (en) * 2020-07-20 2020-10-30 上海大学 A dynamic trajectory planning and tracking method for autonomous vehicles based on horizontal and vertical coordination
CN112132260A (en) * 2020-09-03 2020-12-25 深圳索信达数据技术有限公司 Training method, calling method, device and storage medium of neural network model
CN112363505A (en) * 2020-11-10 2021-02-12 合肥工业大学 Articulated sweeper speed planning method and system based on target distance
CN113264043A (en) * 2021-05-17 2021-08-17 北京工业大学 Unmanned driving layered motion decision control method based on deep reinforcement learning
CN113386795B (en) * 2021-07-05 2022-07-01 西安电子科技大学芜湖研究院 Intelligent decision-making and local track planning method for automatic driving vehicle and decision-making system thereof
CN113386795A (en) * 2021-07-05 2021-09-14 西安电子科技大学芜湖研究院 Intelligent decision-making and local track planning method for automatic driving vehicle and decision-making system thereof
CN113537782A (en) * 2021-07-19 2021-10-22 福州大学 Contract network-based multi-satellite situation awareness system distributed task planning method
CN113537782B (en) * 2021-07-19 2023-08-18 福州大学 Distributed mission planning method for multi-satellite situational awareness system based on contract network
CN113619604A (en) * 2021-08-26 2021-11-09 清华大学 Integrated decision and control method and device for automatic driving automobile and storage medium
CN113619604B (en) * 2021-08-26 2023-08-15 清华大学 Integrated control method, device and storage medium for automatic driving automobile
CN115731261B (en) * 2021-08-27 2023-06-16 河北省交通规划设计研究院有限公司 Vehicle lane change behavior recognition method and system based on expressway radar data
CN115731261A (en) * 2021-08-27 2023-03-03 河北省交通规划设计研究院有限公司 Vehicle lane-changing behavior recognition method and system based on expressway radar data
CN113844441A (en) * 2021-10-14 2021-12-28 安徽江淮汽车集团股份有限公司 Machine learning method of front collision early warning braking system
CN113879323B (en) * 2021-10-26 2023-03-14 清华大学 Reliable learning type automatic driving decision-making method, system, storage medium and equipment
CN113879323A (en) * 2021-10-26 2022-01-04 清华大学 Reliable learning autonomous driving decision-making method, system, storage medium and device
CN113721637A (en) * 2021-11-02 2021-11-30 武汉理工大学 Intelligent vehicle dynamic obstacle avoidance path continuous planning method and system and storage medium
CN114013443A (en) * 2021-11-12 2022-02-08 哈尔滨工业大学 Automatic driving vehicle lane change decision control method based on hierarchical reinforcement learning
CN115169951A (en) * 2022-07-26 2022-10-11 中国科学院合肥物质科学研究院 Multi-feature-fused automatic driving course reinforcement learning training method
CN115373388A (en) * 2022-08-09 2022-11-22 上海东古智能科技有限公司 A path planning system and method for a shipboard robot
CN118795874A (en) * 2023-04-07 2024-10-18 中车株洲电力机车研究所有限公司 A local trajectory planning method based on relative positioning information
CN117141520A (en) * 2023-10-31 2023-12-01 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) Real-time track planning method, device and equipment
CN117141520B (en) * 2023-10-31 2024-01-12 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) Real-time track planning method, device and equipment

Also Published As

Publication number Publication date
CN111273668B (en) 2021-09-03

Similar Documents

Publication Publication Date Title
CN111273668B (en) Unmanned vehicle motion track planning system and method for structured road
Li et al. Planning and decision-making for connected autonomous vehicles at road intersections: A review
CN115257746B (en) A lane-changing decision control method for autonomous driving vehicles considering uncertainty
Lin et al. Anti-jerk on-ramp merging using deep reinforcement learning
CN114013443B (en) Automatic driving vehicle lane change decision control method based on hierarchical reinforcement learning
Huang et al. Path planning and cooperative control for automated vehicle platoon using hybrid automata
CN114312830B (en) Intelligent vehicle coupling decision model and method considering dangerous driving conditions
CN113291308A (en) Vehicle self-learning lane-changing decision-making system and method considering driving behavior characteristics
CN112965476A (en) High-speed unmanned vehicle trajectory planning system and method based on multi-window sampling
CN118212808B (en) Method, system and equipment for planning traffic decision of signalless intersection
CN108919795A (en) A kind of autonomous driving vehicle lane-change decision-making technique and device
Goulet et al. Distributed maneuver planning with connected and automated vehicles for boosting traffic efficiency
Ma et al. Overtaking path planning for CAV based on improved artificial potential field
CN115042770A (en) A Vehicle Queue Lateral Control Method Based on Distributed Robust Model Prediction
CN114889589A (en) A kind of intelligent vehicle steering and braking cooperative collision avoidance control system and method
CN114895676A (en) Method for realizing high-speed running of ground automatic driving vehicle based on space intelligent system
Gratzer et al. Two-layer mpc architecture for efficient mixed-integer-informed obstacle avoidance in real-time
Liu et al. Impact of sharing driving attitude information: A quantitative study on lane changing
Yang et al. Vehicle local path planning and time consistency of unmanned driving system based on convolutional neural network
WO2025222554A1 (en) Multi-agent vehicle-road-cloud integrated collaborative decision-making and control architecture system and method based on federated reinforcement learning
CN118861963A (en) An end-to-end autonomous driving lane change decision method based on multimodal input
CN115257820A (en) A forward collision avoidance driving decision-making method for operating vehicles for open interference scenarios
CN117452936A (en) Unmanned vehicle tracking control method based on fuzzy Petri and DDPG network
CN113460083A (en) Vehicle control device, vehicle control method, and storage medium
Yu et al. Game-Theoretic Model Predictive Control for Safety-Assured Autonomous Vehicle Overtaking in Mixed-Autonomy Environment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20210903

Termination date: 20220218