[go: up one dir, main page]

CN110673637A - A method for UAV false path planning based on deep reinforcement learning - Google Patents

A method for UAV false path planning based on deep reinforcement learning Download PDF

Info

Publication number
CN110673637A
CN110673637A CN201910948346.7A CN201910948346A CN110673637A CN 110673637 A CN110673637 A CN 110673637A CN 201910948346 A CN201910948346 A CN 201910948346A CN 110673637 A CN110673637 A CN 110673637A
Authority
CN
China
Prior art keywords
flight
reinforcement learning
uav
action
deep reinforcement
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910948346.7A
Other languages
Chinese (zh)
Other versions
CN110673637B (en
Inventor
陈鲤文
周瑶
郑日晶
张文吉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujian University Of Science And Technology
Original Assignee
Fujian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujian University of Technology filed Critical Fujian University of Technology
Priority to CN201910948346.7A priority Critical patent/CN110673637B/en
Publication of CN110673637A publication Critical patent/CN110673637A/en
Application granted granted Critical
Publication of CN110673637B publication Critical patent/CN110673637B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/10Simultaneous control of position or course in three dimensions
    • G05D1/101Simultaneous control of position or course in three dimensions specially adapted for aircraft

Landscapes

  • Engineering & Computer Science (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Traffic Control Systems (AREA)

Abstract

本发明公开了一种基于深度强化学习的无人机伪路径规划的方法,首先在飞行地图上划分禁飞区域的边界坐标和标记出无人机飞行任务的起点坐标和终点坐标位置;执行飞行任务前感知无人机当前环境状态,利用深度强化学习算法,根据得到的Q函数值选择当前环境下的偏转角度和飞行动作;无人机根据在飞行过程中不断地接收来自地面基站发射设备给出飞行的位置数据并与环境进行交互得到的奖励回报更新Q函数;飞行过程中将禁飞区域作为虚拟障碍物,判断无人机是否按照预设航线飞行;若接近禁飞区域边缘,则通过奖励函数引导无人机规划伪航行路径,避开禁飞区域;本发明实现了对未知环境下的无人机的伪路径规划,提高无人机飞行的智能化,安全化。

The invention discloses a method for unmanned aerial vehicle pseudo-path planning based on deep reinforcement learning. First, the boundary coordinates of the no-fly area are divided on the flight map, and the starting and ending coordinates of the flying task of the unmanned aerial vehicle are marked; Before the mission, the current environmental state of the UAV is perceived, and the deep reinforcement learning algorithm is used to select the deflection angle and flight action in the current environment according to the obtained Q function value; the UAV continuously receives information from the ground base station transmitting equipment during the flight process. The Q function is updated with the reward reward obtained from the flight position data and interaction with the environment; during the flight, the no-fly area is used as a virtual obstacle to determine whether the drone is flying according to the preset route; if it is close to the edge of the no-fly area, pass The reward function guides the unmanned aerial vehicle to plan a pseudo-navigation path and avoids the no-fly area; the invention realizes the pseudo-path planning for the unmanned aerial vehicle in an unknown environment, and improves the intelligence and safety of the flying of the unmanned aerial vehicle.

Description

Unmanned aerial vehicle pseudo path planning method based on deep reinforcement learning
Technical Field
The invention belongs to the technical field of machine learning, and particularly relates to an unmanned aerial vehicle pseudo-path planning method based on deep reinforcement learning.
Background
With the great progress of the calculation level and the artificial intelligence field, the application field of the unmanned aerial vehicle is more and more, especially, the unmanned aerial vehicle is more and more widely applied in the military aviation field, the types of tasks executed by the unmanned aerial vehicle are more and more complex, and the unmanned aerial vehicle plays an important role in the military reconnaissance field and the aviation transportation field. The intelligent requirement of unmanned aerial vehicle flight path planning is also higher and higher, when unmanned aerial vehicle carries out special task, requires the flight in-process from the starting point to the terminal point according to the regulation, and unmanned aerial vehicle still need avoid normal civil aviation flight area territory and radar monitoring area to avoid causing the interference to the flight of civil aviation aircraft and radar monitoring. In order to better serve the applications in various fields, the research on the unmanned aerial vehicle pseudo-path planning becomes a research hotspot and difficulty of the current unmanned aerial vehicle flight path planning.
With the progress of artificial intelligence technology, in recent years, intelligent agent control methods based on deep neural networks and deep reinforcement learning enter the public field of vision. Reinforcement learning is one of the important branches of machine learning, and assists an agent in taking more intelligent behaviors and actions in each state by feeding back each action of the agent through environment modeling and maximizing the expected future harvest which can be obtained by the agent in the current state through setting an objective function for accumulating rewards. The deep reinforcement learning is an algorithm for optimizing the strategy of an agent by using a neural network, and eliminates the traditional learning methods such as: time sequence difference and dimension disaster problem in the real strategy difference algorithm provide ideas for real-time calculation.
In the process of solving the problem of actually solving the flight path planning of the unmanned aerial vehicle, according to different tasks and different complexity of terrain environments, an intelligent algorithm which accords with the flight path planning is selected, the existing algorithm plans and navigates according to the real-time flight path and obstacle avoidance of the unmanned aerial vehicle when the flight path planning is carried out, however, in the actual situation, some no-fly areas in the airspace are undetectable invisible obstacles, the unmanned aerial vehicle can easily enter into the no-fly area to fly by mistake in the flight process, and the flight danger of other airspaces is caused.
Disclosure of Invention
The invention aims to overcome the common thinking of the existing unmanned aerial vehicle track planning and provides a method for planning a pseudo path of an unmanned aerial vehicle based on deep reinforcement learning. The invention plans a pseudo-flight path aiming at the condition that the unmanned aerial vehicle avoids the restricted area for flying, and when the actually planned flight path of the unmanned aerial vehicle conflicts with the restricted area for flying, the pseudo-flight path is used for guiding the unmanned aerial vehicle to avoid the restricted area for flying, thereby ensuring the flight safety of the unmanned aerial vehicle in the airspace and the normal operation of other areas.
The technical scheme adopted by the invention is as follows: a method for planning pseudo paths of an unmanned aerial vehicle based on deep reinforcement learning is characterized by comprising the following steps:
step 1: dividing boundary coordinates of a no-fly area on a flight map, and marking coordinates of a starting point and an end point of the unmanned aerial vehicle flight;
step 2: sensing the current environment state of the unmanned aerial vehicle before executing a flight task, wherein the current environment state comprises low and high altitude climate data, the flight height of the unmanned aerial vehicle and the flight position coordinate of the unmanned aerial vehicle; based on the current environment state information, selecting a flight deflection angle and an action in the current environment according to the obtained Q function value by using a deep reinforcement learning algorithm; the unmanned aerial vehicle continuously receives position data given by the ground base station transmitting equipment in the flying process and interacts with the environment to obtain reward return updating Q functions;
and step 3: in the flight process, the no-fly area is used as a virtual barrier, and whether the unmanned aerial vehicle flies according to a normal air route or not is judged;
if the unmanned aerial vehicle is far away from the no-fly zone, the unmanned aerial vehicle continues to interactively plan a path with the environment, and the step 2 is executed;
if the unmanned aerial vehicle approaches the edge of the no-fly zone, guiding the unmanned aerial vehicle to plan a pseudo navigation route through a reward function of deep reinforcement learning, and avoiding the no-fly zone;
and 4, step 4: if the unmanned aerial vehicle reaches the terminal, ending the flight; otherwise, the step 2 is continuously executed.
The invention has the advantages that:
1. the invention can realize the path planning of the unmanned aerial vehicle in a complex environment, so that the unmanned aerial vehicle can efficiently fly to a target position to complete subsequent tasks.
2. The invention can plan a flight pseudo path for the unmanned aerial vehicle to avoid the flight-forbidden airspace by using a deep reinforcement learning method, ensures that the unmanned aerial vehicle does not mistakenly fly into the aviation forbidden area and the radar monitoring area under the condition of no solid barrier, avoids interfering the normal work of other airspaces, and has high efficiency, safety and intelligence.
Drawings
FIG. 1 is a flow chart of an embodiment of the present invention;
FIG. 2 is a schematic block diagram of a deep reinforcement learning Double DQN algorithm according to an embodiment of the present invention;
fig. 3 is a schematic diagram of unmanned aerial vehicle pseudo path planning using a deep reinforcement learning DoubleDQN algorithm in the embodiment of the present invention.
Detailed Description
In order to facilitate the understanding and implementation of the present invention for those of ordinary skill in the art, the present invention is further described in detail with reference to the accompanying drawings and examples, it is to be understood that the embodiments described herein are merely illustrative and explanatory of the present invention and are not restrictive thereof.
The invention adopts a method for planning the pseudo path of the unmanned aerial vehicle based on deep reinforcement learning to avoid the danger that the unmanned aerial vehicle mistakenly enters an aviation flight forbidden zone during aviation flight, utilizes a deep reinforcement learning algorithm in combination with grid map positioning, takes a forbidden airspace as a virtual barrier, and replans a pseudo path for the unmanned aerial vehicle through the reinforcement learning algorithm when the flight path planned by the unmanned aerial vehicle mistakenly enters the forbidden zone, so that the unmanned aerial vehicle avoids the aviation forbidden zone, ensures the flight safety of the unmanned aerial vehicle and the normal operation of other aviation zones, and simultaneously improves the efficiency and the safety performance of the route planning of the unmanned aerial vehicle.
Referring to fig. 1, the method for planning the pseudo path of the unmanned aerial vehicle based on deep reinforcement learning provided by the invention comprises the following steps:
step 1: dividing boundary coordinates of a no-fly area on a flight map, and marking coordinates of a starting point and an end point of the unmanned aerial vehicle flight;
the no-fly area in the embodiment comprises a normal civil aviation area and a radar area;
in the embodiment, a flight map is simulated into a grid environment model, the grid environment model divides the flight environment of the unmanned aerial vehicle into a series of cells with binary information and the same or different sizes, and some of the cells are divided into no-fly areas; the boundary coordinates of the no-fly area are definitely marked as { (x) on the grid environment modeli,yi),(xi+1,yi+1),(xi+2,yi+2)……(xi+m,yi+n) | m, n is more than 0, and i is more than or equal to 1 }; marking the starting point (X) of the unmanned plane flight on the flight mapstart,Ystart) And end point (X)end,Yend) The position coordinates of (a).
Step 2: sensing the current environment state of the unmanned aerial vehicle before executing a flight task, wherein the current environment state comprises low and high altitude climate data, the flight height of the unmanned aerial vehicle and the flight position coordinate of the unmanned aerial vehicle; based on the current environment state information, selecting a flight deflection angle and an action in the current environment according to the obtained Q function value by using a deep reinforcement learning algorithm; the unmanned aerial vehicle updates a 0 function according to reward return obtained by continuously receiving position data given by the ground base station transmitting equipment in the flying process and interacting with the environment;
in this embodiment, the deep reinforcement learning network is Double DQN, which is a deep reinforcement learning network of a Double DQN neural network;
the Double DQN is an improved deep convolutional neural network combining a convolutional neural network in deep learning and a Q-learning algorithm of reinforcement learning;
the deep reinforcement learning network comprises a state set S of the unmanned aerial vehicle in flight1,S2,S3……StT is equal to or greater than 1, and action set { a ≧ 1}1,a2,a3……atT is more than or equal to 1}, a reward function R(s) and a deep reinforcement learning network weight value theta;
deep reinforcement learning is carried out according to the state set, the action set and the reward function which are substituted into a state action value function Qt(st,at) Performing the following steps; qt(st,at) The function of (d) is:
Figure BDA0002224929740000041
wherein Qt+1(st,at) Is the Q value corresponding to the time t +1, Qt(st,at) Is the Q value at time t, alpha ∈ (0.5, 1)]For learning rate, γ ∈ (0, 1) is the discount factor, RtThe return value is the return value when the action at the moment t is executed; max is Qt+1(st+1,at) Or Qt(st,at) The Q value corresponding to the maximum value; if the state s reaches the target point grid after the action a, then R (s, a) is 1; if the state s reaches the barrier grid after the action a, then R (s, a) is-1; otherwise, R (s, a) ═ 0.
After the target network weight theta is added, the action behavior value function is updated as follows:
Figure BDA0002224929740000042
wherein, Vt+1According to the current state behavior value function Q at the moment of t +1t(st,at(ii) a Theta) the obtained behavior value function is used for updating the state behavior value at the moment t + 1; in the deep reinforcement learning Double DQN, the selection of the action and the evaluation of the action are respectively realized by different value functions;
the value function formula when the action is selected is as follows:
Yt Q=Rt+1+ymaxaQ(St+1,a;θ);
value function in action selection when making a selection, an action a is first selected*The action a*Should be satisfied in state St+1Process Q (S)t+1A) maximum; wherein R ist+1Represents the prize value at time t + 1;
the value function in the evaluation of the movement is that the largest movement a is selected*Then selecting different network weight theta' action evaluation formulas;
Figure BDA0002224929740000043
wherein,
Figure BDA0002224929740000044
the value of the state action value function after the calculation of the Double DQN by the deep reinforcement learning network is used.
In this embodiment, the selection of the deep reinforcement learning network weight θ is a priority playback; referring to fig. 2, the specific implementation includes the following sub-steps:
step 2.1: the unmanned aerial vehicle trains in an air flight environment, and a state action data set is collected in the interaction between the unmanned aerial vehicle and the environment and is put into a playback memory unit;
step 2.2: the neural network of the deep reinforcement learning is divided into a real network and an estimation network, and when the empirical data stored in the playback memory unit exceeds the set data set quantity, the intelligent agent (learning brain in the reinforcement learning) starts training;
step 2.3: the unmanned aerial vehicle interacts with the environment to select action according to the current state, wherein the real network has the same structure as the estimation network, and only parameters of a neural network used for training are different; the real network trains in the neural network according to the current state of the unmanned aerial vehicle to obtain the maximum state behavior value Q (s, a; theta), and simultaneously the estimation network trains in the neural network to obtain the state behavior value max in the next statea′Q (s ', a '; theta ') to obtain error functions of the real network and the estimated network, and obtaining a maximum state behavior value function argmax under a greedy strategy by using a random gradient descent methodaQ (s, a; θ); and the unmanned aerial vehicle selects the next action according to the state behavior value function and continues to interact with the environment.
In this embodiment, the unmanned aerial vehicle continuously interacts with the environment during the flight process, and the state behavior value function Q is continuously updated according to the Double DQN algorithmt(st,at(ii) a θ), updating the route trajectory.
And step 3: in the flight process, the no-fly area is used as a virtual barrier, and whether the unmanned aerial vehicle flies according to a normal air route or not is judged;
if the unmanned aerial vehicle is far away from the no-fly zone, the unmanned aerial vehicle continues to interactively plan a path with the environment, and the step 2 is executed;
if the unmanned aerial vehicle approaches the edge of the no-fly zone, guiding the unmanned aerial vehicle to plan a pseudo navigation route through a reward function of deep reinforcement learning, and avoiding the no-fly zone;
and 4, step 4: if the unmanned aerial vehicle reaches the terminal, ending the flight; otherwise, the step 2 is continuously executed.
In this embodiment, a pseudo path diagram of the flight path planning is shown in fig. 3.
The invention carries out the planning of the unmanned aerial vehicle pseudo path by using the deep reinforcement learning method combining the neural networks of reinforcement learning and deep learning, obtains the strategy function value by using the interaction of an intelligent agent and the environment, guides the selection of the flight action of the unmanned aerial vehicle, has stronger convergence and generalization capability and improves the intelligent degree of the unmanned aerial vehicle flight.
It should be understood that parts of the specification not set forth in detail are prior art; the above description of the preferred embodiments is intended to be illustrative, and not to be construed as limiting the scope of the invention, which is defined by the appended claims, and all changes and modifications that fall within the metes and bounds of the claims, or equivalences of such metes and bounds are therefore intended to be embraced by the appended claims.

Claims (5)

1.一种基于深度强化学习的无人机伪路径规划的方法,其特征在于,包括以下步骤:1. a method for UAV pseudo-path planning based on deep reinforcement learning, is characterized in that, comprises the following steps: 步骤1:在飞行地图上划分禁飞区域的边界坐标,并标记出无人机飞行的起点和终点位置坐标;Step 1: Divide the boundary coordinates of the no-fly area on the flight map, and mark the start and end coordinates of the drone flight; 步骤2:执行飞行任务前感知无人机当前环境状态,包括低、高空气候数据,无人机飞行高度,无人机飞行位置坐标;基于当前环境状态信息,利用深度强化学习算法,根据得到的Q函数值选择当前环境下的飞行偏转角度和动作;无人机根据在飞行过程中不断地接收来自地面基站发射设备给出飞行的位置数据并与环境进行交互得到的奖励回报更新Q函数;Step 2: Perceive the current environmental state of the UAV before executing the flight mission, including low and high-altitude climate data, UAV flight height, and UAV flight position coordinates; The Q function value selects the flight deflection angle and action in the current environment; the UAV updates the Q function according to the reward reward obtained by continuously receiving the flight position data from the ground base station transmitting equipment and interacting with the environment during the flight; 步骤3:飞行过程中将禁飞区域作为虚拟障碍物,判断无人机是否按照正常的航线飞行;Step 3: Use the no-fly area as a virtual obstacle during the flight to determine whether the drone is flying in a normal route; 若远离禁飞区,无人机继续与环境交互规划路径,执行步骤2;If it is far away from the no-fly zone, the UAV continues to interact with the environment to plan the path, and go to step 2; 若接近禁飞区域边缘,则通过深度强化学习算法的奖励函数引导无人机规划伪航行路线,避开禁飞区域;If it is close to the edge of the no-fly area, the UAV will be guided by the reward function of the deep reinforcement learning algorithm to plan a pseudo-navigation route and avoid the no-fly area; 步骤4:若无人机到达终点,则结束飞行;否则继续执行步骤2。Step 4: If the drone reaches the end point, end the flight; otherwise, continue to step 2. 2.根据权利要求1所述的基于深度强化学习的无人机伪路径规划的方法,其特征在于:步骤1中,首先将飞行地图模拟为栅格环境模型,栅格环境模型将无人机的飞行环境划分为一系列具有二值信息的大小相同或不同的单元格,其中一些单元格划分为禁飞区域;禁飞区域的边界坐标在栅格环境模型上明确标出为{(xi,yi),(xi+1,yi+1),(xi+2,yi+2)......(xi+m,yi+n)|m,n>0,i≥1};在飞行地图上同时标出无人机飞行的起点(Xstart,Ystart)和终点(Xend,Yend)的位置坐标。2. the method for UAV false path planning based on deep reinforcement learning according to claim 1, is characterized in that: in step 1, at first, the flight map is simulated as a grid environment model, and the grid environment model uses the UAV to be simulated. The flight environment is divided into a series of cells of the same or different size with binary information, some of which are divided into no-fly areas; the boundary coordinates of the no-fly areas are clearly marked on the grid environment model as {(xi i , y i ), (x i+1 , y i+1 ), (x i+2 , y i+2 )...(x i+m , y i+n )|m, n> 0, i≥1}; simultaneously mark the position coordinates of the starting point (X start , Y start ) and the ending point (X end , Y end ) of the drone flight on the flight map. 3.根据权利要求1所述的基于深度强化学习的无人机伪路径规划的方法,其特征在于:步骤2中,所述Double DQN算法是利用深度学习中的卷积神经网络和强化学习的Q-learning算法相结合的改进型深度卷积神经网络算法;3. the method for the UAV pseudo-path planning based on deep reinforcement learning according to claim 1, is characterized in that: in step 2, described Double DQN algorithm utilizes the convolutional neural network in deep learning and reinforcement learning. Improved deep convolutional neural network algorithm combined with Q-learning algorithm; 所述深度强化学习算法包括无人机飞行时的状态集{S1,S2,S3......St,t≥1},动作集{a1,a2,a3......at,t≥1},奖励函数R(s),以及深度强化学习目标网络权重θ;The deep reinforcement learning algorithm includes a state set {S 1 , S 2 , S 3 ...... S t , t≥1}, an action set {a 1 , a 2 , a 3 . ......a t , t≥1}, the reward function R(s), and the deep reinforcement learning target network weight θ; 所述深度强化学习根据状态集、动作集、奖励函数代入到状态行为值函数Qt(st,at)中;The deep reinforcement learning is substituted into the state behavior value function Q t (s t , at t ) according to the state set, the action set and the reward function; 所述Qt(st,at)的函数为:The function of Q t (s t , at ) is:
Figure FDA0002224929730000021
Figure FDA0002224929730000021
其中Qt+1(st,at)为t+1时刻对应的Q值,Qt(st,at)为t时刻的Q值,α为学习速率,γ为折扣因子,Rt为执行t时刻动作时的回报值;where Q t+1 (s t , at t ) is the Q value corresponding to time t+1, Q t (s t , at t ) is the Q value at time t, α is the learning rate, γ is the discount factor, R t is the reward value when performing the action at time t; 所述目标网络权值θ加入后动作行为值函数更新为:After the target network weight θ is added, the action behavior value function is updated as:
Figure FDA0002224929730000022
Figure FDA0002224929730000022
其中,Vt+1为t+1时刻根据当前的状态行为值函数Qt(st,at;θ)所得到的行为值函数用来更新t+1时刻的状态行为值;深度强化学习Double DQN中将动作的选择和动作的评估分别用不同的值函数实现;Among them, V t+1 is the behavior value function obtained according to the current state behavior value function Q t (s t , a t ; θ) at time t+1, which is used to update the state behavior value at time t+1; deep reinforcement learning In Double DQN, the selection of actions and the evaluation of actions are implemented with different value functions; 动作选择时的值函数公式为公式:The value function formula for action selection is the formula: Yt Q=Rt+1+γmaxaQ(St+1,a;θ);Y t Q =R t+1 +γmax a Q(S t+1 , a; θ); 动作选择时的值函数做出选择时首先选择一个动作a*,该动作a*应该满足在状态St+1处Q(St+1,a)最大;其中Rt+1表示t+1时刻的奖励值;The value function of action selection When making a choice, first select an action a * , the action a * should satisfy the maximum Q(S t+1 , a) at the state S t+1 ; where R t+1 means t+1 the reward value of the moment; 动作评估时的值函数为在选出最大的动作a*之后选择不同的网络权重θ′动作评估的公式;The value function during action evaluation is the formula for selecting different network weights θ′ after selecting the largest action a * ;
Figure FDA0002224929730000023
Figure FDA0002224929730000023
其中,为利用深度强化学习网络Double DQN计算之后的状态动作值函数的值。in, It is the value of the state-action value function after calculation using the deep reinforcement learning network Double DQN.
4.根据权利要求3所述的基于深度强化学习的无人机伪路径规划的方法,其特征在于:步骤2中,所述深度强化学习算法权重值θ的选择为优先回放;具体实现包括以下子步骤:4. the method for UAV pseudo-path planning based on deep reinforcement learning according to claim 3, is characterized in that: in step 2, the selection of described deep reinforcement learning algorithm weight value θ is priority playback; Concrete realization comprises the following Substeps: 步骤2.1:无人机首先在空中飞行环境中进行训练,由无人机与环境的交互中收集状态动作数据集放入回放记忆单元中;Step 2.1: The UAV is first trained in the air flight environment, and the state action data set is collected from the interaction between the UAV and the environment and put into the playback memory unit; 步骤2.2:深度强化学习的神经网络分为现实网络和估计网络两部分,当回放记忆单元所存储的经验数据超过设定的数据集数量时,智能体开始训练;Step 2.2: The neural network of deep reinforcement learning is divided into two parts: the real network and the estimation network. When the experience data stored in the playback memory unit exceeds the set number of data sets, the agent starts training; 步骤2.3:无人机在与环境的交互根据当前的状态选择动作,其中现实网络和估计网络的结构一样,只是用于训练的神经网络的参数不同;现实网络根据无人机当前状态在神经网络中进行训练得到最大的状态行为值Q(s,a;θ),同时估计网络经过训练神经网络的训练得到下一状态下的状态行为值maxa'Q(s',a';θ'),得到现实网络和估计网络的误差函数,利用随机梯度下降法得到贪婪策略下的最大的状态行为值函数argmaxaQ(s,a;θ);无人机根据状态行为值函数选择下一步的动作,并继续与环境进行交互。Step 2.3: The drone interacts with the environment and selects actions according to the current state. The structure of the real network is the same as the estimated network, but the parameters of the neural network used for training are different; the real network is based on the current state of the drone in the neural network. The maximum state behavior value Q(s, a; θ) is obtained by training in the network, and the estimated state behavior value max a' Q(s', a';θ') in the next state is obtained by training the neural network. , get the error function of the real network and the estimated network, and use the stochastic gradient descent method to get the maximum state behavior value function argmax a Q(s, a; θ) under the greedy strategy; the UAV selects the next step according to the state behavior value function action, and continue to interact with the environment. 5.根据权利要求4所述的基于深度强化学习的无人机伪路径规划的方法,其特征在于:步骤2中,无人机在飞行过程中与环境不断的进行交互,根据Double DQN算法不断的更新状态行为值函数Q(s,a;θ),更新航路轨迹。5. the method for unmanned aerial vehicle pseudo-path planning based on deep reinforcement learning according to claim 4, it is characterized in that: in step 2, unmanned aerial vehicle constantly interacts with environment in flight process, according to Double DQN algorithm continuously The updated state behavior value function Q(s, a; θ) of , updates the route trajectory.
CN201910948346.7A 2019-10-08 2019-10-08 Unmanned aerial vehicle pseudo path planning method based on deep reinforcement learning Active CN110673637B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910948346.7A CN110673637B (en) 2019-10-08 2019-10-08 Unmanned aerial vehicle pseudo path planning method based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910948346.7A CN110673637B (en) 2019-10-08 2019-10-08 Unmanned aerial vehicle pseudo path planning method based on deep reinforcement learning

Publications (2)

Publication Number Publication Date
CN110673637A true CN110673637A (en) 2020-01-10
CN110673637B CN110673637B (en) 2022-05-13

Family

ID=69080721

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910948346.7A Active CN110673637B (en) 2019-10-08 2019-10-08 Unmanned aerial vehicle pseudo path planning method based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN110673637B (en)

Cited By (42)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111381499A (en) * 2020-03-10 2020-07-07 东南大学 Adaptive control method of networked aircraft based on 3D space radio frequency map learning
CN111504319A (en) * 2020-04-08 2020-08-07 安徽舒州农业科技有限责任公司 Automatic driving control method and system based on agricultural unmanned aerial vehicle
CN111580533A (en) * 2020-05-07 2020-08-25 北京邮电大学 Aerodynamics-based UAV information collection method and device
CN111880563A (en) * 2020-07-17 2020-11-03 西北工业大学 Multi-unmanned aerial vehicle task decision method based on MADDPG
CN111950873A (en) * 2020-07-30 2020-11-17 上海卫星工程研究所 Satellite real-time guiding task planning method and system based on deep reinforcement learning
CN111998847A (en) * 2020-07-16 2020-11-27 西北工业大学 Underwater vehicle bionic geomagnetic navigation method based on deep reinforcement learning
CN112580537A (en) * 2020-12-23 2021-03-30 中国人民解放军国防科技大学 Deep reinforcement learning method for multi-unmanned aerial vehicle system to continuously cover specific area
CN112636811A (en) * 2020-12-08 2021-04-09 北京邮电大学 Relay unmanned aerial vehicle deployment method and device
CN112650058A (en) * 2020-12-23 2021-04-13 西北工业大学 Four-rotor unmanned aerial vehicle trajectory control method based on reinforcement learning
CN112711271A (en) * 2020-12-16 2021-04-27 中山大学 Autonomous navigation unmanned aerial vehicle power optimization method based on deep reinforcement learning
CN112783192A (en) * 2019-11-11 2021-05-11 中国移动通信集团上海有限公司 Unmanned aerial vehicle path planning method, device, equipment and storage medium
CN112902969A (en) * 2021-02-03 2021-06-04 重庆大学 Path planning method for unmanned aerial vehicle in data collection process
CN112906542A (en) * 2021-02-08 2021-06-04 北京理工大学 Unmanned vehicle obstacle avoidance method and device based on reinforcement learning
CN113110516A (en) * 2021-05-20 2021-07-13 广东工业大学 Restricted space robot operation planning method for deep reinforcement learning
CN113283827A (en) * 2021-04-16 2021-08-20 北京航空航天大学合肥创新研究院(北京航空航天大学合肥研究生院) Two-stage unmanned aerial vehicle logistics path planning method based on deep reinforcement learning
CN113283424A (en) * 2020-02-19 2021-08-20 通用汽车有限责任公司 System for collecting aerial images by unmanned aerial vehicle following target and based on reinforcement learning
CN113342029A (en) * 2021-04-16 2021-09-03 山东师范大学 Maximum sensor data acquisition path planning method and system based on unmanned aerial vehicle cluster
CN113359820A (en) * 2021-05-28 2021-09-07 中国地质大学(武汉) DQN-based unmanned aerial vehicle path planning method
CN113436213A (en) * 2021-06-23 2021-09-24 上海极维信息科技有限公司 Method for processing region edge problem of positioning algorithm by using reinforcement learning
CN113641192A (en) * 2021-07-06 2021-11-12 暨南大学 A Path Planning Method for UAV Swarm Perception Task Based on Reinforcement Learning
CN113867396A (en) * 2021-10-22 2021-12-31 吉林大学 Method and device for route planning and route smoothing of network-connected unmanned aerial vehicles
CN113892070A (en) * 2020-04-30 2022-01-04 乐天集团股份有限公司 Learning device, information processing device, and control model for completing learning
CN113942616A (en) * 2021-09-30 2022-01-18 华能盐城大丰新能源发电有限责任公司 Inspection mechanism and method for offshore wind farm
CN114021773A (en) * 2021-09-26 2022-02-08 北京百度网讯科技有限公司 Path planning method and device, electronic equipment and storage medium
CN114115340A (en) * 2021-11-15 2022-03-01 南京航空航天大学 Airspace cooperative control method based on reinforcement learning
CN114115304A (en) * 2021-10-26 2022-03-01 南京航空航天大学 Aircraft four-dimensional climbing track planning method and system
CN114167880A (en) * 2021-12-02 2022-03-11 大连海事大学 A time-optimized multi-underwater glider path planning system
CN114518758A (en) * 2022-02-08 2022-05-20 中建八局第三建设有限公司 Q learning-based indoor measuring robot multi-target-point moving path planning method
CN114550540A (en) * 2022-02-10 2022-05-27 北方天途航空技术发展(北京)有限公司 Intelligent monitoring method, device, equipment and medium for training machine
CN114578843A (en) * 2020-12-01 2022-06-03 中移(成都)信息通信科技有限公司 Flight path planning method and device, aircraft and storage medium
CN114594793A (en) * 2022-03-07 2022-06-07 四川大学 A path planning method for base station UAV
CN115032996A (en) * 2022-06-21 2022-09-09 中国电信股份有限公司 Path planning method and device, electronic equipment and storage medium
CN115167506A (en) * 2022-06-27 2022-10-11 华南师范大学 Method, device, device and storage medium for UAV flight route update planning
CN115290096A (en) * 2022-09-29 2022-11-04 广东技术师范大学 A dynamic trajectory planning method for UAV based on reinforcement learning differential algorithm
CN115562357A (en) * 2022-11-23 2023-01-03 南京邮电大学 An intelligent path planning method for UAV swarms
CN116185079A (en) * 2023-04-28 2023-05-30 西安迈远科技有限公司 Unmanned aerial vehicle construction inspection route planning method based on self-adaptive cruising
CN116402273A (en) * 2023-03-01 2023-07-07 中国电子科技集团公司第二十八研究所 An intelligent scheduling method for airport taxiing based on multi-agent reinforcement learning
CN117806340A (en) * 2023-11-24 2024-04-02 中国电子科技集团公司第十五研究所 Airspace training flight path automatic planning method and device based on reinforcement learning
CN118674227A (en) * 2024-07-04 2024-09-20 佛山市南海区大数据投资建设有限公司 Unmanned aerial vehicle inspection scheduling method based on ant colony algorithm and related equipment
CN118794303A (en) * 2024-04-22 2024-10-18 西北工业大学 Methods for close search and precise strike of mobile targets by cruise missiles under information opacity
CN118915795A (en) * 2024-10-10 2024-11-08 长江三峡集团实业发展(北京)有限公司 Multi-unmanned aerial vehicle cooperative control method and device
CN119759066A (en) * 2025-03-06 2025-04-04 北方天途航空技术发展(北京)有限公司 A UAV control method and system based on cloud box remote communication

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106595671A (en) * 2017-02-22 2017-04-26 南方科技大学 Unmanned aerial vehicle path planning method and device based on reinforcement learning
WO2018156891A1 (en) * 2017-02-24 2018-08-30 Google Llc Training policy neural networks using path consistency learning
CN109032168A (en) * 2018-05-07 2018-12-18 西安电子科技大学 A kind of Route planner of the multiple no-manned plane Cooperative Area monitoring based on DQN
CN109655066A (en) * 2019-01-25 2019-04-19 南京邮电大学 One kind being based on the unmanned plane paths planning method of Q (λ) algorithm
CN109974737A (en) * 2019-04-11 2019-07-05 山东师范大学 Route planning method and system based on combination of safety evacuation signs and reinforcement learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106595671A (en) * 2017-02-22 2017-04-26 南方科技大学 Unmanned aerial vehicle path planning method and device based on reinforcement learning
WO2018156891A1 (en) * 2017-02-24 2018-08-30 Google Llc Training policy neural networks using path consistency learning
CN109032168A (en) * 2018-05-07 2018-12-18 西安电子科技大学 A kind of Route planner of the multiple no-manned plane Cooperative Area monitoring based on DQN
CN109655066A (en) * 2019-01-25 2019-04-19 南京邮电大学 One kind being based on the unmanned plane paths planning method of Q (λ) algorithm
CN109974737A (en) * 2019-04-11 2019-07-05 山东师范大学 Route planning method and system based on combination of safety evacuation signs and reinforcement learning

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
MUNOZ, GUILLEM等: "Deep Reinforcement Learning for Drone Delivery", 《DRONES》 *
虞晓霞等: "一种基于深度学习的禁飞区无人机目标识别方法", 《长春理工大学学报(自然科学版)》 *
韩晓雷: "基于安全区域模型的飞行机器人电塔巡检路径规划", 《中国优秀博硕士学位论文全文数据库(硕士)工程科技Ⅱ辑》 *

Cited By (56)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112783192A (en) * 2019-11-11 2021-05-11 中国移动通信集团上海有限公司 Unmanned aerial vehicle path planning method, device, equipment and storage medium
CN112783192B (en) * 2019-11-11 2022-11-22 中国移动通信集团上海有限公司 Unmanned aerial vehicle path planning method, device, equipment and storage medium
CN113283424A (en) * 2020-02-19 2021-08-20 通用汽车有限责任公司 System for collecting aerial images by unmanned aerial vehicle following target and based on reinforcement learning
CN111381499A (en) * 2020-03-10 2020-07-07 东南大学 Adaptive control method of networked aircraft based on 3D space radio frequency map learning
CN111504319A (en) * 2020-04-08 2020-08-07 安徽舒州农业科技有限责任公司 Automatic driving control method and system based on agricultural unmanned aerial vehicle
US12416925B2 (en) 2020-04-30 2025-09-16 Rakuten Group, Inc. Learning device, information processing device, and learned control model
CN113892070B (en) * 2020-04-30 2024-04-26 乐天集团股份有限公司 Learning device, information processing device, and control model for completing learning
CN113892070A (en) * 2020-04-30 2022-01-04 乐天集团股份有限公司 Learning device, information processing device, and control model for completing learning
CN111580533A (en) * 2020-05-07 2020-08-25 北京邮电大学 Aerodynamics-based UAV information collection method and device
CN111998847A (en) * 2020-07-16 2020-11-27 西北工业大学 Underwater vehicle bionic geomagnetic navigation method based on deep reinforcement learning
CN111880563A (en) * 2020-07-17 2020-11-03 西北工业大学 Multi-unmanned aerial vehicle task decision method based on MADDPG
CN111880563B (en) * 2020-07-17 2022-07-15 西北工业大学 A MADDPG-Based Multi-UAV Mission Decision-Making Method
CN111950873B (en) * 2020-07-30 2022-11-15 上海卫星工程研究所 Satellite real-time guiding task planning method and system based on deep reinforcement learning
CN111950873A (en) * 2020-07-30 2020-11-17 上海卫星工程研究所 Satellite real-time guiding task planning method and system based on deep reinforcement learning
CN114578843A (en) * 2020-12-01 2022-06-03 中移(成都)信息通信科技有限公司 Flight path planning method and device, aircraft and storage medium
CN112636811A (en) * 2020-12-08 2021-04-09 北京邮电大学 Relay unmanned aerial vehicle deployment method and device
CN112711271A (en) * 2020-12-16 2021-04-27 中山大学 Autonomous navigation unmanned aerial vehicle power optimization method based on deep reinforcement learning
CN112580537A (en) * 2020-12-23 2021-03-30 中国人民解放军国防科技大学 Deep reinforcement learning method for multi-unmanned aerial vehicle system to continuously cover specific area
CN112650058A (en) * 2020-12-23 2021-04-13 西北工业大学 Four-rotor unmanned aerial vehicle trajectory control method based on reinforcement learning
CN112902969A (en) * 2021-02-03 2021-06-04 重庆大学 Path planning method for unmanned aerial vehicle in data collection process
CN112902969B (en) * 2021-02-03 2023-08-01 重庆大学 A path planning method for unmanned aerial vehicles in the process of data collection
CN112906542B (en) * 2021-02-08 2023-11-24 北京理工大学 An obstacle avoidance method and device for unmanned vehicles based on reinforcement learning
CN112906542A (en) * 2021-02-08 2021-06-04 北京理工大学 Unmanned vehicle obstacle avoidance method and device based on reinforcement learning
CN113283827A (en) * 2021-04-16 2021-08-20 北京航空航天大学合肥创新研究院(北京航空航天大学合肥研究生院) Two-stage unmanned aerial vehicle logistics path planning method based on deep reinforcement learning
CN113283827B (en) * 2021-04-16 2024-03-12 北京航空航天大学合肥创新研究院(北京航空航天大学合肥研究生院) Two-stage unmanned aerial vehicle logistics path planning method based on deep reinforcement learning
CN113342029A (en) * 2021-04-16 2021-09-03 山东师范大学 Maximum sensor data acquisition path planning method and system based on unmanned aerial vehicle cluster
CN113110516B (en) * 2021-05-20 2023-12-22 广东工业大学 Operation planning method for limited space robot with deep reinforcement learning
CN113110516A (en) * 2021-05-20 2021-07-13 广东工业大学 Restricted space robot operation planning method for deep reinforcement learning
CN113359820A (en) * 2021-05-28 2021-09-07 中国地质大学(武汉) DQN-based unmanned aerial vehicle path planning method
CN113436213A (en) * 2021-06-23 2021-09-24 上海极维信息科技有限公司 Method for processing region edge problem of positioning algorithm by using reinforcement learning
CN113641192B (en) * 2021-07-06 2023-07-18 暨南大学 A Path Planning Method for Unmanned Aerial Vehicle Crowd Sensing Task Based on Reinforcement Learning
CN113641192A (en) * 2021-07-06 2021-11-12 暨南大学 A Path Planning Method for UAV Swarm Perception Task Based on Reinforcement Learning
CN114021773A (en) * 2021-09-26 2022-02-08 北京百度网讯科技有限公司 Path planning method and device, electronic equipment and storage medium
CN113942616A (en) * 2021-09-30 2022-01-18 华能盐城大丰新能源发电有限责任公司 Inspection mechanism and method for offshore wind farm
CN113867396B (en) * 2021-10-22 2024-04-26 吉林大学 A method and device for route planning and route smoothing of networked unmanned aerial vehicles
CN113867396A (en) * 2021-10-22 2021-12-31 吉林大学 Method and device for route planning and route smoothing of network-connected unmanned aerial vehicles
CN114115304A (en) * 2021-10-26 2022-03-01 南京航空航天大学 Aircraft four-dimensional climbing track planning method and system
CN114115340A (en) * 2021-11-15 2022-03-01 南京航空航天大学 Airspace cooperative control method based on reinforcement learning
CN114167880A (en) * 2021-12-02 2022-03-11 大连海事大学 A time-optimized multi-underwater glider path planning system
CN114518758B (en) * 2022-02-08 2023-12-12 中建八局第三建设有限公司 Indoor measurement robot multi-target point moving path planning method based on Q learning
CN114518758A (en) * 2022-02-08 2022-05-20 中建八局第三建设有限公司 Q learning-based indoor measuring robot multi-target-point moving path planning method
CN114550540A (en) * 2022-02-10 2022-05-27 北方天途航空技术发展(北京)有限公司 Intelligent monitoring method, device, equipment and medium for training machine
CN114594793A (en) * 2022-03-07 2022-06-07 四川大学 A path planning method for base station UAV
CN115032996A (en) * 2022-06-21 2022-09-09 中国电信股份有限公司 Path planning method and device, electronic equipment and storage medium
CN115167506A (en) * 2022-06-27 2022-10-11 华南师范大学 Method, device, device and storage medium for UAV flight route update planning
CN115290096A (en) * 2022-09-29 2022-11-04 广东技术师范大学 A dynamic trajectory planning method for UAV based on reinforcement learning differential algorithm
CN115562357A (en) * 2022-11-23 2023-01-03 南京邮电大学 An intelligent path planning method for UAV swarms
CN115562357B (en) * 2022-11-23 2023-03-14 南京邮电大学 Intelligent path planning method for unmanned aerial vehicle cluster
CN116402273A (en) * 2023-03-01 2023-07-07 中国电子科技集团公司第二十八研究所 An intelligent scheduling method for airport taxiing based on multi-agent reinforcement learning
CN116185079B (en) * 2023-04-28 2023-08-04 西安迈远科技有限公司 Unmanned aerial vehicle construction inspection route planning method based on self-adaptive cruising
CN116185079A (en) * 2023-04-28 2023-05-30 西安迈远科技有限公司 Unmanned aerial vehicle construction inspection route planning method based on self-adaptive cruising
CN117806340A (en) * 2023-11-24 2024-04-02 中国电子科技集团公司第十五研究所 Airspace training flight path automatic planning method and device based on reinforcement learning
CN118794303A (en) * 2024-04-22 2024-10-18 西北工业大学 Methods for close search and precise strike of mobile targets by cruise missiles under information opacity
CN118674227A (en) * 2024-07-04 2024-09-20 佛山市南海区大数据投资建设有限公司 Unmanned aerial vehicle inspection scheduling method based on ant colony algorithm and related equipment
CN118915795A (en) * 2024-10-10 2024-11-08 长江三峡集团实业发展(北京)有限公司 Multi-unmanned aerial vehicle cooperative control method and device
CN119759066A (en) * 2025-03-06 2025-04-04 北方天途航空技术发展(北京)有限公司 A UAV control method and system based on cloud box remote communication

Also Published As

Publication number Publication date
CN110673637B (en) 2022-05-13

Similar Documents

Publication Publication Date Title
CN110673637A (en) A method for UAV false path planning based on deep reinforcement learning
Hu et al. Multi-UAV coverage path planning: A distributed online cooperation method
CN109254588B (en) Unmanned aerial vehicle cluster cooperative reconnaissance method based on cross variation pigeon swarm optimization
CN109933086B (en) Unmanned aerial vehicle environment perception and autonomous obstacle avoidance method based on deep Q learning
CN107314772B (en) Unmanned aerial vehicle self-learning waypoint track flight method and system thereof
CN109521794A (en) A kind of multiple no-manned plane routeing and dynamic obstacle avoidance method
CN110825113A (en) Formation keeping method suitable for quad-rotor unmanned aerial vehicle cluster flight
CN114740846A (en) Hierarchical path planning method for topology-raster-metric hybrid map
CN106595671A (en) Unmanned aerial vehicle path planning method and device based on reinforcement learning
CN108459616B (en) A route planning method for UAV swarm cooperative coverage based on artificial bee colony algorithm
CN115951598A (en) Virtual-real combined simulation method, device and system for multiple unmanned aerial vehicles
CN107065929A (en) A kind of unmanned plane is around flying method and system
CN116518974B (en) Conflict-free route planning method based on airspace grids
CN113625733B (en) DDPG-based multi-target three-dimensional unmanned aerial vehicle path planning method
Meng et al. Advances in UAV Path Planning: A Comprehensive Review of Methods, Challenges, and Future Directions.
CN114721427A (en) Multi-unmanned aerial vehicle collaborative search and rescue reconnaissance planning method in dynamic environment
CN114339842B (en) Method and device for dynamic trajectory design of UAV swarms in time-varying scenarios based on deep reinforcement learning
CN117991804A (en) A trajectory planning method for high dynamic fixed-wing UAV based on improved RRT
CN116880543A (en) Aircraft motion state decision-making method, device, electronic equipment and storage medium
CN110084414B (en) Empty pipe anti-collision method based on K-time control deep reinforcement learning
CN107037826A (en) Unmanned plane detection mission distribution method and device
CN112867023B (en) Method for minimizing perception data acquisition delay through dynamic scheduling of unmanned terminal
Beck Collaborative search and rescue by autonomous robots
CN119065400B (en) A method for coordinated regional coverage mission of fixed-wing UAV swarm
Ma et al. Adaptive deployment of UAV-aided networks based on hybrid deep reinforcement learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address

Address after: 350000, Fujian, Fuzhou province Minhou County town street, Fuzhou District, the new campus of the School Road

Patentee after: Fujian University of Science and Technology

Country or region after: China

Address before: 350000, Fujian, Fuzhou province Minhou County town street, Fuzhou District, the new campus of the School Road

Patentee before: FUJIAN University OF TECHNOLOGY

Country or region before: China

CP03 Change of name, title or address
OL01 Intention to license declared
OL01 Intention to license declared