CN112560571A - Intelligent autonomous visual navigation method based on convolutional neural network - Google Patents
Intelligent autonomous visual navigation method based on convolutional neural network Download PDFInfo
- Publication number
- CN112560571A CN112560571A CN202011072214.1A CN202011072214A CN112560571A CN 112560571 A CN112560571 A CN 112560571A CN 202011072214 A CN202011072214 A CN 202011072214A CN 112560571 A CN112560571 A CN 112560571A
- Authority
- CN
- China
- Prior art keywords
- neural network
- convolutional neural
- navigation
- robot
- visual navigation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/10—Terrestrial scenes
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01C—MEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
- G01C11/00—Photogrammetry or videogrammetry, e.g. stereogrammetry; Photographic surveying
- G01C11/04—Interpretation of pictures
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01C—MEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
- G01C21/00—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/56—Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
- G06V20/58—Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/56—Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
- G06V20/588—Recognition of the road, e.g. of lane markings; Recognition of the vehicle driving pattern in relation to the road
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Remote Sensing (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Radar, Positioning & Navigation (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Automation & Control Theory (AREA)
- Bioinformatics & Computational Biology (AREA)
- Probability & Statistics with Applications (AREA)
- Image Analysis (AREA)
Abstract
The invention relates to the fields of computer vision, image processing, robot autonomous navigation and the like, and provides a visual navigation algorithm with better performance for intelligent autonomous visual navigation of a robot. Therefore, the invention adopts the technical scheme that an intelligent autonomous visual navigation method based on a convolutional neural network is adopted, the first part is to process a picture by utilizing a canny edge detection algorithm to obtain the edge information of an input picture, and the edge information is used as prior information and an original RGB image to form four-channel input to be used as an input part of the convolutional neural network; and the second part is that the convolutional neural network carries out forward calculation on an input picture, and two fully-connected classifiers are used for classifying the features to obtain the navigation direction and the collision probability for the visual navigation of the robot. The invention is mainly applied to the autonomous navigation occasion of the robot.
Description
Technical Field
The invention relates to the fields of computer vision, image processing, autonomous robot navigation and the like, and solves the problem of how to realize intelligent autonomous vision navigation by obtaining a navigation direction only according to a monocular image by a robot in an unknown environment.
Background
With the development of the robot technology, people have higher and higher requirements on the intelligent autonomy of the robot, and the requirements on the safe and reliable robot autonomous navigation technology are stronger and stronger. As a prerequisite for the robot to perform other tasks, autonomous navigation techniques in strange environments are crucial for the robot. The problem is further compounded especially in corridor or urban environments where the robot needs to avoid static obstacles in the environment, as well as other dynamic objects around, such as pedestrians and vehicles. The problem is therefore becoming an increasingly important research focus in the robotics community.
The traditional autonomous navigation method is often the implementation of two basic steps: firstly, real-time Positioning and environment map building are performed in the current environment, and a common method includes using a GPS (Global Positioning System) or using a SLAM algorithm (Simultaneous Mapping and Positioning) based on a vision or radar sensor, and the like; and then analyzing the established environment map, avoiding in the environment map, and realizing autonomous navigation of the robot, such as [1] [2] and the like. In the traditional autonomous navigation problem, the GPS has the problems of overlarge positioning error, indoor positioning incapability and the like, and the SLAM also has the problems of error accumulation, difficulty in coping with scenes with too fast change, large occupation of computing resources and the like. Besides, the perception task and the navigation task are separated and processed, and the problem of logic design of obtaining the navigation direction from an environment map is also introduced. Therefore, the autonomous navigation technique of the robot is a very difficult subject.
With deep learning skillWith the continuous development of the technology, the computer vision technology based on the convolutional neural network has great influence on various industries. In recent years, many foreign scholars use convolutional neural networks to directly obtain the navigation direction of the robot from original perception data, and provide a new idea for realizing autonomous navigation of the robot, for example [3]-[10]And the like. The visual navigation technology based on the convolutional neural network refers to designing a model structure and a target function, such as the convolutional neural network, a loss function and the like, and training the model by using a large amount of labeled data (original perception data-control strategy), so that the trained model can directly obtain a reasonable control strategy according to the original perception data, autonomous navigation of the robot is realized, and steps such as drawing construction and positioning are not needed. The method mainly comprises two main categories, namely a Reinforcement Learning (RL) -based visual navigation method[9][10]And visual navigation method based on Supervised Learning (Supervised Learning)[4]-[8]. Compared with the traditional thought, the new thought does not need to design the logic of obtaining the navigation direction from the environment map, and the problem is simplified. And complicated steps such as real-time graph building are omitted, and a large amount of computing resources are saved. However, the reinforcement learning-based method requires a large number of repeated experiments, is high in cost and complexity and is difficult to apply in a large scale, while the supervised learning-based method provides a stable and reliable method, but requires a large number of actual data samples for training to obtain a good result.
The visual navigation technology based on the convolutional neural network is a method for realizing autonomous navigation of a robot, and as a new technology based on the convolutional neural network, the accuracy is still to be improved. In the actual operation of the robot, outputting any erroneous navigation output may have serious consequences, affecting production and personnel safety. Therefore, how to improve the performance of the visual navigation algorithm and promote the intelligent autonomous navigation effect is a problem to be solved urgently. Due to the limitation of computing resources on the embedded system, the related algorithm is too simple to meet the requirement of algorithm accuracy, and the accuracy of the algorithm needs to be improved by improving a network or introducing a priori characteristics.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention aims to provide an intelligent autonomous visual navigation method based on a convolutional neural network for a ground robot. The method aims at solving the problems that the traditional autonomous navigation method is difficult to deal with the scenes with too fast change, the occupation of computing resources is large and the like, and solving the problems that the related visual navigation method based on the convolutional neural network has simple algorithm and insufficient accuracy. The invention researches and improves the current visual navigation algorithm, improves the visual navigation algorithm from two methods of prior information inspection and network structure, and provides the visual navigation algorithm with better performance for the intelligent autonomous visual navigation of the robot. Therefore, the invention adopts the technical scheme that an intelligent autonomous visual navigation method based on a convolutional neural network is adopted, the first part is to process a picture by utilizing a canny edge detection algorithm to obtain the edge information of an input picture, and the edge information is used as prior information and an original RGB image to form four-channel input to be used as an input part of the convolutional neural network; the second part is that the convolutional neural network carries out forward calculation on an input picture, and two fully-connected classifiers are used for classifying the features to obtain a navigation direction and collision probability for the visual navigation of the robot; in the training stage of the convolutional neural network, parameters of the convolutional neural network are adjusted by using a training set part of a public data set to obtain an algorithm with good performance on a verification set; and in the testing stage, the trained convolutional neural network is used, the parameters of the fixed network are unchanged, the input image is reasoned, and the result is used for the visual navigation of the robot.
The network is visually displayed. The convolution visualization display method is that different channels of the last layer of convolution layer are multiplied according to different weights and then are superposed to obtain an output thermodynamic diagram (Saliency Map), and an expression of the thermodynamic diagram of the category c is obtained as shown in a formula (1), wherein ReLU is a common activation function, k is the number of channels,is the weight for the kth channel of class c, AkIs a characteristic diagram of the k-th channel, ycIs the output value of the class, Z is the feature point of the single-channel feature mapThe number of the cells.
In the aspect of model loss function setting, for the navigation direction branch, a mean square error mse (mean Squared error) loss function is used, as shown in formula 2, for the collision probability prediction branch, since the collision probability is represented in the form of 0 and 1 classification in the data set, a Cross Entropy Cross entry loss function is used, as shown in formula 3. In the following formula, N is the total number of samples,is a predicted value of the direction,is a predicted value of the direction,is a predictive value of the probability of a collision,is the true value of the collision probability:
the invention has the characteristics and beneficial effects that:
(1) the invention applies the visual navigation algorithm based on the convolutional neural network to the intelligent autonomous visual navigation of the robot. The navigation direction is directly obtained from the original perception information, complex steps such as real-time map building and the like are omitted, a large amount of computing resources are saved, the logic of obtaining the navigation direction from the environment map does not need to be designed, and the problem is simplified.
(2) The invention improves the existing visual navigation algorithm based on the convolutional neural network, and the model obtains better effect on the public data set by adding prior information and optimizing the network structure.
(3) The invention designs a ground robot experiment verification platform, carries the proposed algorithm on the robot, verifies the effectiveness of the algorithm and realizes the intelligent autonomous visual navigation effect of the robot.
Description of the drawings:
FIG. 1 is a visual navigation algorithm visualization effect graph.
FIG. 2 is a general structure diagram of a visual navigation method model combined with edge features.
FIG. 3 is a process diagram for implementing visual navigation algorithm inference in conjunction with edge features.
FIG. 4 shows a partial sample navigation direction data set.
FIG. 5 is a partial sample of a collision probability data set.
FIG. 6 is a block diagram of the visual navigation network.
FIG. 7 is a parameter diagram of a visual navigation network architecture.
Fig. 8 is a front view and a back view of the intelligent autonomous navigation platform of the ground robot.
FIG. 9 ROS topic node design diagram.
FIG. 10 Experimental scene graph.
FIG. 11 is a graphical representation of the experimental procedure and corresponding navigation directions.
Detailed Description
The ground robot carries kinect as sensing equipment and Jetson TX2 as a processor, and the algorithm provided by the invention is used for carrying out an intelligent autonomous visual navigation experiment of the real robot so as to verify the effect of the algorithm provided by the invention.
The invention trains the proposed algorithm by using the public data set to obtain the algorithm with good performance.
The invention provides a visual intelligent autonomous navigation method based on a convolutional neural network. The hardware environment of the network training stage and the public data set verification stage is as follows: the system comprises a video card single-path NVIDIA GeForce GTX 1080, a CPU eight-core Intel i7-7700 and a memory size of 16 GB. In the actual experiment stage, a software environment depends on a distributed node frame system of ROS, and a hardware system comprises a ground robot, a kinect binocular camera, a Jetson TX2 processor and the like. The above are based on ubuntu16.04 and keras deep learning frameworks.
The ground experimental robot carries a kinect camera and a Jetson TX2 processor. Jetson TX2 is a more advanced embedded vision computing system at present, and is an upgraded and improved version of Jetson TX1, the first modular supercomputer in the world. The Jetson TX2 is an excellent development platform in the fields of computer vision, deep learning, GPU calculation, image processing and the like based on NVIDIA Pascal architecture design containing 256 CUDA cores. The processor is used for reasoning the convolutional neural network, and the kinect camera is used for collecting the environment image for inputting the perception information of the neural network.
The technical scheme of the invention is as follows:
use for reference Selvaraju and the like[11]The idea of the method is to perform visual analysis on a visual navigation algorithm of a convolutional neural network, as shown in fig. 1, find that the visual navigation network focuses more on edge features, linear features, and the like in an image, such as lane lines, textures of remote obstacles, and the like. The idea is then presented here: the edge information in the image is detected by using a simple and quick edge detection algorithm, such as a canny operator, and is used as prior information, and the prior information and the RGB image three channels are combined to form four channels to be used as neural network input, so that the effect of the visual navigation algorithm can be improved. The method provided by the invention is based on Dronet[8]The improvement is made, and the whole framework is shown in figure 2. The method mainly comprises two parts, wherein the first part is to process the picture by using a canny edge detection algorithm to obtain the edge information of the input picture, and the edge information is used as prior information and an original RGB image to form four-channel input to be used as an input part of the network. And the second part is that the convolutional neural network carries out forward calculation on an input picture, and two fully-connected classifiers are used for classifying the features to obtain the navigation direction and the collision probability for the visual navigation of the robot. In the training stage of the convolutional neural network, the parameters of the convolutional neural network are adjusted by using the part of the training set of the public data set to obtain the parameters on the verification setAnd (4) an algorithm with good performance. In the testing stage, the trained convolutional neural network is used, the parameters of the fixed network are unchanged, the input image is reasoned, the result is used for the visual navigation of the robot, and the reasoning implementation process is shown in fig. 3.
In order to improve the accuracy of the algorithm on the premise of keeping the real-time performance of the algorithm, edge features are used as prior information, images of four channels in total are combined with RGB images, the larger scale of 320 x 320 is used as the input of a convolutional neural network, and more feature information can be provided for a model. On the other hand, a 15-layer convolutional neural network is designed, and the model structure is shown in fig. 3. In order to extract features as quickly as possible while reducing the number of model parameters, the model is cycled five times with one convolutional layer and one max pooling layer as basic modules. And adding cross-layer connection to fuse information of different scales. And the last convolution layer is connected with two full connection layers, and the two branches respectively predict the navigation angle and the collision probability of the robot and are used for intelligent autonomous visual navigation of the robot.
Training the proposed algorithm on the server, using two public data sets, An Open Source Self-drying Car, Youngchu[12]Data sets, data sets of Federal workers in Zurich[13]. Partial data samples are shown in fig. 4 and 5, respectively. And (4) a visual navigation algorithm with better performance is obtained by utilizing the data set to train in advance and is used for the subsequent intelligent autonomous visual navigation experiment. The trained algorithm is carried on a ground robot experiment platform, robot control logic and topic nodes are designed, a corridor environment with a plurality of obstacles is set, and the practicability of the algorithm is verified through experiments.
Because the visual equipment can acquire rich environmental information, and the convolutional neural network can extract abstract characteristics, the visual navigation method can obtain a better effect through the study of a large amount of data, and thus intelligent autonomous navigation is realized. The invention provides an intelligent autonomous navigation method based on vision, which has very important significance for the research of autonomous navigation of a robot. The invention has the advantages of stability, reliability, good expandability and strong stability, and because the whole algorithm is completed in the onboard processor, the stability and the reliability of the whole system are improved. In summary, as the social demands are increased, the robots have more and more functional demands and application scenes, and the robots are required to have stronger sensing, navigation and execution capabilities, and the research of the visual navigation algorithm based on the convolutional neural network becomes a key for solving the problem.
The invention mainly has the following characteristics and advantages:
(1) an intelligent autonomous visual navigation algorithm with better performance based on a convolutional neural network is provided: the invention provides a visual navigation algorithm combined with edge characteristics, and experiments on a public data set show that the performance of the method is superior to that of a related algorithm, and better real-time performance can be kept.
(2) The intelligent autonomous visual navigation algorithm based on the convolutional neural network is used for realizing autonomous navigation of the robot: the invention provides an intelligent autonomous navigation method based on a visual navigation algorithm, which directly obtains the navigation direction of a robot from original perception information, omits complex steps such as real-time map building and the like, and saves a large amount of computing resources.
(3) Setting up an experimental platform for actual verification: in order to verify the practicability of the method, the intelligent autonomous visual navigation experiment platform is built, the experiment scene and the control logic are designed, and the feasibility of the algorithm is verified through experiments.
The invention is further described below with reference to the accompanying drawings.
In order to explore which image characteristics are concerned by the visual navigation algorithm based on the convolutional neural network, the invention uses the Selvaraju and the like as reference[11]The idea of (1) is to perform visual display on the network. The convolution visualization display method is that different channels of the last layer of convolution layer are multiplied according to different weights and then are superposed to obtain an output thermodynamic diagram (Saliency Map), and an expression of the thermodynamic diagram of the category c is obtained as shown in a formula (1), wherein ReLU is a common activation function, k is the number of channels,is the weight for the kth channel of class c, AkIs a characteristic diagram of the k-th channel, ycIs the categoryZ is the number of feature points of the single-channel feature map.
The visualization effect is shown in fig. 1. By neuro-convolution visualization, one can conclude that: visual navigation networks are more focused on edge features, line features, etc. in the image, such as lane lines, texture of distant obstacles, etc. The idea is then presented here: the edge information in the image is detected by using a simple and quick edge detection algorithm, such as a canny operator, and the edge information is used as prior information and is input together with the RGB image as a neural network, so that the visual navigation algorithm effect can be improved.
The frame diagram of the visual navigation algorithm combined with the edge feature proposed by the present invention is shown in fig. 2. The method mainly comprises two parts, wherein the first part is the utilization of canny[12]The edge detection algorithm processes the picture to obtain the edge information of the input picture, and then the edge information and the original RGB image form a four-channel image as input. The second part is that the convolutional neural network provided by the invention carries out forward calculation on the input picture to obtain the navigation direction and the collision probability.
The convolutional neural network for implementing visual navigation proposed by the present invention is shown in fig. 6. The network structure is a 15-layer convolutional neural network, in order to extract features as soon as possible and reduce the model parameter number, the model takes a convolutional layer and a maximum pooling layer as basic modules, the model is circulated five times, the feature map size is down-sampled to 1/32 of the input size, and the specific parameter structure of the network is shown in fig. 7.
In terms of model loss function setting, for navigation direction branching, since the direction output is continuous, it is treated herein as a regression problem, using Mean Squared Error (MSE) loss function, as shown in equation 2. For the collision probability prediction branch, since the collision probability is represented in the data set in the form of 0 and 1 classification, which is treated herein as a classification problem, a Cross Entropy (Cross Entropy) loss function is used, as shown in equation 3. In the formula, N isThe total number of the first and second groups,is a predicted value of the direction,is a predicted value of the direction,is a predictive value of the probability of a collision,is the true value of the collision probability.
The purpose of the experiment is that the navigation direction branch and the collision probability branch achieve better effect simultaneously. Since the gradient value of the loss value of the mean square error of the navigation direction branch is proportional to the absolute value of the steering error, it is much larger than the gradient value of the loss value of the collision probability prediction branch at the beginning of the training. In order to ensure that the final two tasks are converged simultaneously, the loss value of the collision probability prediction branch is not considered in the first 5 epochs, and then the loss value weight of the collision probability prediction branch is gradually increased. Therefore, the navigation direction branch loss value can be converged to a reasonable value firstly, the loss values of the two tasks are ensured not to be too different, a model with the minimum loss values of the two subtasks is obtained, the calculation formula of the loss values is shown as a formula 4, and L is shown as the formulasteerAnd LcouThe calculation method is shown in formula 2 and formula 3.
Lcontrol=Lsteer+max(0,1-e0.1(epoch-5))×Lcoll (4)
Because there are no samples with both the navigation direction and the collision probability in the data set, a sample with a batch size (batch size) is randomly selected from the two data sets during each training, and the loss function is calculated and divided by the total number of the random selections of the data.
During experiment, difficult sample excavation is adopted[15]The method comprises the steps of selecting N samples with the largest loss value from a training batch each time to carry out gradient feedback, neglecting other samples which are easy to separate, and gradually reducing the size of N along with network training. N is calculated as shown in equations 5 and 6 below, where BS is the batch size (batch size) and epoch is the current training number.
Nsteer=BS-(BS-10)×max(0,1-e-1/30×(epoch-30)) (5)
Ncoll=BS-(BS-5)×max(0,1-e-1/30×(epoch-30)) (6)
The following data enhancement method is used during network training:
(1) rotating: the input RGB image is rotated by a random angle value in an interval of [ -20, 20] degrees, the input RGB image is rotated by the angle, then center cutting is carried out, and the rotated image is ensured to keep the original length-width ratio.
(2) Random lateral translation: the input RGB image is transversely translated by taking a random value in an interval of [ -0.2, 0.2] as a translation proportion.
(3) Random longitudinal translation: the input RGB image is longitudinally translated by taking a random value in an interval of [ -0.2, 0.2] as a translation proportion.
The RGB image size is adjusted to 320 x 320 to be used as the input of the neural network, the network is not changed during testing, a data enhancement method is not used, and only the evaluation index is calculated according to the result. The optimizer selects Adam optimizer, the initial learning rate is set to be 0.001, and the weight attenuation is 10-5The batch size is set to 32, an early termination training function (earlystop) is added, and training is stopped when the loss value on the validation set does not decrease for 5 consecutive epochs.
Compared experiment results on a source data set prove that the method provided by the invention obtains the best results on most evaluation indexes, and the effectiveness of the method provided by the invention on a visual navigation task is proved. The model runs at a speed slightly lower than that based on Dronet, but the model is more accurate and the current running speed is already satisfactory.
In order to explore the functions of each module of the method provided by the invention, the improved effect is visualized, and the two improvements of the edge extraction module and the modification network structure are respectively subjected to independent experimental analysis according to objective evaluation indexes. The analysis method is that only one improvement is carried out relative to the model before improvement, other parts of the model are kept unchanged, a fixed number of batches are retrained by using a data set, and then the model is evaluated by using a test set and compared with the performance of the model before improvement.
The comparison test results of different models added into the edge extraction module can obtain that the performance of the network is basically improved after the edge extraction part is added. The edge extraction module provides prior characteristics for the network, and the accuracy of the algorithm is effectively improved. However, when Dronet adds the edge extraction module, part of the performance is slightly reduced, probably because the model is too simple and the feature extraction degree is not enough.
When different network structures are used, experiments are carried out and output evaluation is compared, and the network structure used by the method can be obtained according to the result, so that the performance improvement is larger compared with other methods, and the effectiveness of the improvement of the network structure is proved.
And then, the method provided by the invention is carried on a robot for experiment. Fig. 8 is a front view and a back view of the ground robot intelligent autonomous navigation platform built by the invention, the ground robot turtle 2 comprises a motor, an odometer and the like besides a machine body, and further comprises a kinect binocular camera and a JetsonTX2 processor.
The ROS topic node set by the experiment comprises:
(1) and the kinect driving node issues an RGB image information topic/camera/image _ raw and a depth information topic/camera/depth/image _ raw as perception information acquired by the system and used as input of a subsequent algorithm.
(2) And monitoring RGB image information by using a node visual _ determination of visual navigation, inputting the information into a convolutional neural network to obtain the results of the navigation direction and the collision probability, and issuing a navigation direction topic/visual _ determination/control _ prediction for subsequent calculation of a robot control instruction.
(3) The robot control module node object _ follower monitors the depth information and the target detection result topic, obtains a control instruction through logic calculation, and issues a robot control instruction topic/cmd _ vel _ mux/input/level for robot control.
(4) the virtual robot driving node monitors the topic of a robot control instruction, is used for controlling a robot motor, and realizes actions such as robot rotation and translation.
(5) And the PC terminal monitor module node PC _ browser monitors RGB image information and a visual navigation result, and displays the visual navigation result on the PC terminal in real time for monitoring the running condition of the robot.
The individual node flow diagram framework is shown in fig. 9. Wherein, the calculation logic of the control command obtained by the visual navigation result is shown in formula 7 and formula 8, wherein, wkAnd vkAngle and speed control desired outputs, α, at times k, respectivelywAnd alphavWeight coefficients of angle and velocity, woutputAnd poutputNavigation directions and collision probabilities, v, respectively, of neural network outputsmaxIs the set maximum speed of the robot. The weight coefficient is added to prevent the robot speed from suddenly changing and ensure the stable change of the moving speed of the robot. When the collision probability is high, the robot is considered to be still because the front part is close to the obstacle and cannot avoid the obstacle.
wk=(1-αw)×wk-1+αw×woutput (7)
Although the data set used herein is mainly in outdoor scenes, the visual navigation method has been proven to have a certain generalization in [8], and in an indoor environment where the environment is relatively simple, the network trained by using the outdoor data set also has a good effect. And the experiment is convenient in the indoor environment, so the actual experiment environment is set as a corridor containing more obstacles in fig. 10, and the robot needs to autonomously navigate and advance in the path until a target object is explored to detect and track the target object.
Part of the experimental process diagram, with corresponding navigation directions, is shown in fig. 11. The yellow arrow in the figure is the motion direction obtained by the network, the red character at the lower right corner in the figure is the collision probability of the robot, and the current running speed setting of the robot is obtained according to the collision probability.
Experiments prove that the method provided by the invention can realize the intelligent autonomous navigation function of the robot.
Reference to the literature
[1]Qin T,Li P,Shen S,et al.VINS-Mono:A Robust and Versatile Monocular Visual-Inertial State Estimator[J].IEEE Transactions on Robotics,2018,34(4):1004-1020.
[2]Murartal R,Tardos J D.ORB-SLAM2:An Open-Source SLAM System for Monocular,Stereo, and RGB-D Cameras[J].IEEE Transactions on Robotics,2017,33(5):1255-1262.
[3]Pomerleau D A.ALVINN:An Autonomous Land Vehicle in a Neural Network[J].neural information processing systems,1988,89(-77):305-313.
[4]Ross S,Melikbarkhudarov N,Shankar K S,et al.Learning monocular reactive UAV control in cluttered natural environments[J].international conference on robotics and automation,2013: 1765-1772.
[5]Gandhi D,Pinto L,Gupta A,et al.Learning to fly by crashing[J].intelligent robots and systems, 2017:3948-3955.
[6]Giusti A,Guzzi J,Ciresan D C,et al.A Machine Learning Approach to Visual Perception of Forest Trails for Mobile Robots[C].international conference on robotics and automation,2016, 1(2):661-667.
[7]Smolyanskiy N,Kamenev A,Smith J M,et al.Toward low-flying autonomous MAV trail navigation using deep neural networks for environmental awareness[J].intelligent robots and systems,2017:4241-4247.
[8]Loquercio A,Maqueda A I,Delblanco C R,et al.DroNet:Learning to Fly by Driving[C]. international conference on robotics and automation,2018,3(2):1088-1095.
[9]Lillicrap T P,Hunt J J,Pritzel A,et al.Continuous control with deep reinforcement learning[J]. international conference on learning representations,2016.
[10]Zhu Y,Mottaghi R,Kolve E,et al.Target-driven visual navigation in indoor scenes using deep reinforcement learning[J].international conference on robotics and automation,2017: 3357-3364.
[11]Selvaraju R R,Cogswell M,Das A,et al.Grad-CAM:Visual Explanations from Deep Networks via Gradient-Based Localization[J].international conference on computer vision, 2017:618-626.
[12]Udacity,“An Open Source Self-Driving Car,”https://www.udacity.com/self-driving-car,2016.
[13]http://rpg.ifi.uzh.ch/data/collision.zip
[14]Canny J F.A Computational Approach to Edge Detection[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,1986,8(6):679-698.
[15]Shrivastava A,Gupta A,Girshick R B,et al.Training Region-Based Object Detectors with Online Hard Example Mining[J].computer vision and pattern recognition,2016:761-769.
[16]Howard A G,Zhu M,Chen B,et al.Mobilenets:Efficient convolutional neural networks for mobile vision applications[J].arXiv preprint arXiv:1704.04861,2017。
Claims (2)
1. An intelligent autonomous visual navigation method based on a convolutional neural network is characterized in that a first part is to process a picture by using a canny edge detection algorithm to obtain edge information of an input picture, and the edge information is used as prior information and an original RGB image to form four-channel input to be used as an input part of the convolutional neural network; the second part is that the convolutional neural network carries out forward calculation on an input picture, and two fully-connected classifiers are used for classifying the features to obtain a navigation direction and collision probability for the visual navigation of the robot; in the training stage of the convolutional neural network, parameters of the convolutional neural network are adjusted by using a training set part of a public data set to obtain an algorithm with good performance on a verification set; and in the testing stage, the trained convolutional neural network is used, the parameters of the fixed network are unchanged, the input image is reasoned, and the result is used for the visual navigation of the robot.
2. The intelligent autonomous visual navigation method based on convolutional neural network as claimed in claim 1, wherein the step of visually displaying the network comprises: the convolution visualization display method is that different channels of the last layer of convolution layer are multiplied according to different weights and then are superposed to obtain an output thermodynamic diagram, and the expression of the thermodynamic diagram of the category c is obtained as shown in the formula (1), wherein ReLU is a common activation function, k is the number of channels,is the weight for the kth channel of class c, AkIs a characteristic diagram of the k-th channel, ycIs the output value of this category, Z is the number of feature points of the single-channel feature map:
in the aspect of model loss function setting, aiming at navigation direction branches, a Mean Square Error (MSE) (mean Squared error) loss function is used, and the formula is shown in a formula 2; for the collision probability prediction branch, since the collision probability is expressed in the form of 0 and 1 classification in the data set, a Cross Entropy Cross entry loss function is used, as shown in equation 3; in the following formula, N is the total number of samples,is a predicted value of the direction,is a predicted value of the direction,is a predictive value of the probability of a collision,is the true value of the collision probability:
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202011072214.1A CN112560571A (en) | 2020-10-09 | 2020-10-09 | Intelligent autonomous visual navigation method based on convolutional neural network |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202011072214.1A CN112560571A (en) | 2020-10-09 | 2020-10-09 | Intelligent autonomous visual navigation method based on convolutional neural network |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN112560571A true CN112560571A (en) | 2021-03-26 |
Family
ID=75042825
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202011072214.1A Pending CN112560571A (en) | 2020-10-09 | 2020-10-09 | Intelligent autonomous visual navigation method based on convolutional neural network |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN112560571A (en) |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN114859971A (en) * | 2022-05-07 | 2022-08-05 | 北京卓翼智能科技有限公司 | Intelligent unmanned aerial vehicle for monitoring wind turbine |
| TWI795306B (en) * | 2022-06-14 | 2023-03-01 | 艾歐圖科技股份有限公司 | Localization failure detection system for autonomous mobile robots using deep learning based 6d pose regression |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN107179077A (en) * | 2017-05-15 | 2017-09-19 | 北京航空航天大学 | A kind of self-adaptive visual air navigation aid based on ELM LRF |
| WO2019136946A1 (en) * | 2018-01-15 | 2019-07-18 | 中山大学 | Deep learning-based weakly supervised salient object detection method and system |
| CN110110783A (en) * | 2019-04-30 | 2019-08-09 | 天津大学 | A kind of deep learning object detection method based on the connection of multilayer feature figure |
| CN111311666A (en) * | 2020-05-13 | 2020-06-19 | 南京晓庄学院 | Monocular vision odometer method integrating edge features and deep learning |
-
2020
- 2020-10-09 CN CN202011072214.1A patent/CN112560571A/en active Pending
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN107179077A (en) * | 2017-05-15 | 2017-09-19 | 北京航空航天大学 | A kind of self-adaptive visual air navigation aid based on ELM LRF |
| WO2019136946A1 (en) * | 2018-01-15 | 2019-07-18 | 中山大学 | Deep learning-based weakly supervised salient object detection method and system |
| CN110110783A (en) * | 2019-04-30 | 2019-08-09 | 天津大学 | A kind of deep learning object detection method based on the connection of multilayer feature figure |
| CN111311666A (en) * | 2020-05-13 | 2020-06-19 | 南京晓庄学院 | Monocular vision odometer method integrating edge features and deep learning |
Non-Patent Citations (3)
| Title |
|---|
| ANTONIO LOQUERCIO等: "《DroNet: Learning to Fly by Driving》", 《IEEE ROBOTICS AND AUTOMATION LETTERS》 * |
| RAMPRASAATH R. SELVARAJU等: "《Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization》", 《INTERNATIONAL JOURNAL OF COMPUTER VISION (2020)》 * |
| 王世峰: "《基于信息融合的图像显著区域检测算法研究》", 《万方数据知识服务平台》 * |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN114859971A (en) * | 2022-05-07 | 2022-08-05 | 北京卓翼智能科技有限公司 | Intelligent unmanned aerial vehicle for monitoring wind turbine |
| TWI795306B (en) * | 2022-06-14 | 2023-03-01 | 艾歐圖科技股份有限公司 | Localization failure detection system for autonomous mobile robots using deep learning based 6d pose regression |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN114970321B (en) | A scene flow digital twin method and system based on dynamic trajectory flow | |
| Sleaman et al. | Monocular vision with deep neural networks for autonomous mobile robots navigation | |
| CN116182875A (en) | A temporary road path planning method and system based on graph neural network | |
| Li et al. | A method of dense point cloud SLAM based on improved YOLOV8 and fused with ORB-SLAM3 to cope with dynamic environments | |
| Tsai et al. | Mapless lidar navigation control of wheeled mobile robots based on deep imitation learning | |
| CN112258565A (en) | Image processing method and device | |
| CN114882457B (en) | A model training method, lane line detection method and device | |
| CN108320051B (en) | Mobile robot dynamic collision avoidance planning method based on GRU network model | |
| Yu et al. | A deep-learning-based strategy for kidnapped robot problem in similar indoor environment | |
| CN112560571A (en) | Intelligent autonomous visual navigation method based on convolutional neural network | |
| Ye et al. | Neural network‐based semantic segmentation model for robot perception of driverless vision | |
| Dang et al. | Real-time semantic plane reconstruction on a monocular drone using sparse fusion | |
| Zhuang et al. | 3D-SeqMOS: A Novel Sequential 3D Moving Object Segmentation in Autonomous Driving | |
| Zhao et al. | A multi-sensor fusion system for improving indoor mobility of the visually impaired | |
| Yubo et al. | Survey of UAV autonomous landing based on vision processing | |
| CN113836975A (en) | Obstacle avoidance method for UAV with binocular vision based on YOLOV3 | |
| Mateus et al. | Active perception applied to unmanned aerial vehicles through deep reinforcement learning | |
| Lyu et al. | Sensor Fusion and Motion Planning with Unified Bird’s-Eye View Representation for End-to-end Autonomous Driving | |
| Kayalvizhi et al. | A comprehensive study on supermarket indoor navigation for visually impaired using computer vision techniques | |
| CN113483769A (en) | Particle filter based vehicle self-positioning method, system, device and medium | |
| Jiao et al. | LuSeg: Efficient Negative and Positive Obstacles Segmentation via Contrast-Driven Multi-Modal Feature Fusion on the Lunar | |
| Sineglazov et al. | Visual Navigation Systems Implementation Using Hybrid Convolutional Neural Networks | |
| Li et al. | A continual imitation learning benchmark for mobile robot navigation in sequential environments | |
| Chen et al. | GRRL: Goal-Guided Risk-Inspired Reinforcement Learning for Efficient Autonomous Driving in Off-Road Environment | |
| Ming et al. | Research on guideline navigation system based on deep learning |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| RJ01 | Rejection of invention patent application after publication | ||
| RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210326 |