Disclosure of Invention
The invention aims to provide a few-sample unmanned aerial vehicle image identification method based on virtual sample generation.
The purpose of the invention is realized as follows:
a few-sample unmanned aerial vehicle image identification method based on virtual sample generation comprises the following steps:
step 1: the method comprises the steps that a short video of unmanned aerial vehicle flight with N frames is shot in a ground-to-air long distance mode through a camera device, N unmanned aerial vehicle regions are obtained and serve as positive samples, and interference small regions such as trees, buildings, clouds, birds, kites and balloons are combined with other relevant videos and serve as negative samples and serve as training sample sets;
step 2: generating virtual samples by adopting a convolution network-based W-GAN method, training a GAN model by utilizing positive sample images with the total number of N, and randomly generating the virtual samples with the number of 3N through a generator in the model;
and step 3: dividing the positive sample images with the total number of N into 0.5N groups in a mode of two positive sample images in each group, generating 4 images for each group of samples by using a QR reconstruction weighted fusion method, and obtaining 2N virtual samples in total;
and 4, step 4: acquiring virtual samples with the number of 4N by using other virtual sample generation methods including mirror image turning and resampling on the positive sample images with the total number of N;
and 5: the virtual samples with the number of 9N are generated and used for improving the generalization capability of the recognition model, the virtual samples are used as positive samples to be added into a training set, a rapid DPM model is trained, and the recognizer capable of accurately recognizing the images of the unmanned aerial vehicle under the condition of few samples is obtained.
The step 2 of the convolution network-based W-GAN model generation method specifically includes:
(2-a) the generator network structure in the model is composed of 2 full-connection layers and 2 deconvolution layers, and the network result specifically comprises:
the 1 st layer is a fully connected layer activated by using a ReLU function, and the output is a 1024-sized vector;
the 2 nd layer is a full connection layer activated by using a ReLU function, outputs a vector with the size of 8192, and then is recombined into a multidimensional vector group with the size of 8 multiplied by 128;
the 3 rd layer is an deconvolution layer with convolution kernel of 4 x 4 and step length of 2, and is activated by using a ReLU function, and a graph with size of 16 x 64 is output;
the 4 th layer is an deconvolution layer with convolution kernel of 4 x 4 and step length of 2, and is activated by using Tanh function, and a graph with size of 32 x 3 is output;
(2-b) the network structure of the discriminator in the model is composed of 2 convolution layers and 2 full connection layers, and the network structure specifically comprises the following steps:
the layer 1 is a convolution layer with convolution kernel of 4 × 4 and step length of 2, and is activated by using LeakyReLU function, and a graph with size of 16 × 16 × 128 is output;
the 2 nd layer is a convolution layer with convolution kernel of 4 x 4 and step length of 2, and is activated by using LeakyReLU function, and a graph with size of 8 x 256 is output and then recombined into a vector with size of 16384;
the 3 rd layer is a fully connected layer activated by using a LeakyReLU function, and outputs a 1024-sized vector;
the 4 th layer is a full connection layer which does not use an activation function, and outputs a vector with the size of 1;
and (2-c) training the models by using the N positive sample sets to obtain trained generators, and randomly generating 3N virtual samples by using the generators.
The specific steps of the QR reconstruction weighted fusion method for generating the virtual sample in step 3 include: (3-a) according to a unitary matrix Q and an upper triangular matrix R obtained by QR decomposition, directly reconstructing to obtain a complete information graph
I(Q,R)=Q·R
Due to the characteristics of the upper triangular array R, the ith column vector of the corresponding matrix Q contributes to the element values from the ith column to the last column of the original image matrix without influencing the element values before the ith column, so the integrity of the element values of the information graph is gradually increased leftwards; for one of the images IlQR decomposition is performed to calculate a unitary matrix Q containing image informationlAnd upper triangular array Rl(ii) a And reconstructing by using the matrix Q, R and a reconstruction coefficient w to obtain a left information map I:
I(Ql,Rl,w)=Ql32×(32·w)·Rl(32·w)×32
4 reconstruction coefficients 0.25, 0.5, 0.75 and 1.0 respectively obtain 4 corresponding left information graphs; (3-b) Another image I within the grouprFirstly, mirror image inversion is carried out to obtain symmetrical images, and the unitary matrix Q is obtained by adopting the same processing mode as (3-a)rAnd upper triangular array RrFurther, the left information map of the symmetrical image is obtained, and the 4 right information maps I are obtained by mirror image turningG:
IG(Qr,Rr,w)=G(Qr32×(32·w)·Rr(32·w)×32)
Wherein G (-) is to flip the matrix mirror image;
(3-c) mapping the left information chart I and the right information chart I under the same reconstruction coefficientGAnd (3) performing weighted fusion:
obtaining 4 fused virtual images Iw(ii) a And finally, carrying out QR reconstruction weighted fusion on the images with the number of 0.5N groups to obtain 2N virtual samples.
The fast DPM model in step 5 specifically includes:
(5-a) selecting 5 component filters of the DPM model according to the obvious characteristics of four wings and a fuselage of the unmanned aerial vehicle;
(5-b) calculating a two-layer HOG feature pyramid of the image, wherein the bottom-layer feature map is used as the input of a root filter, and the top-layer feature map is used as the input of a component filter;
(5-c) Anchor fixing of root Filter to image center (x)0,y0) The total response score of the image is formulated as:
wherein R is
0Scoring a root filter, R
iIn order to score the component filter,
is the distance between the root filter anchor point and the detection point of the component.
The invention has the beneficial effects that: in the virtual sample generation method, a W-GAN based convolution network is adopted to enable a model to generate a high-quality unmanned aerial vehicle virtual image, the virtual image generated by the QR reconstruction weighting fusion method has the distribution of the characteristics of two images, and under the condition of no excessive distortion, the effective information and diversity of the sample are increased, so that the generalization capability of the model is improved; in the identification algorithm part, the rapid DPM model adopts a component model which fixes the position of a feature graph and an anchor point input by a root filter and accords with the modularized features of the unmanned aerial vehicle, and the running speed and the accuracy are improved.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
Aiming at the defects in the prior art, the invention provides a few-sample unmanned aerial vehicle image recognition method based on virtual sample generation, which improves the condition of insufficient model generalization capability of few samples, and solves the unmanned aerial vehicle recognition problem under the condition of few samples by combining the modularization characteristics of unmanned aerial vehicle images. In order to achieve the above object, the method scheme of the invention is as follows:
a few-sample unmanned aerial vehicle image identification method based on virtual sample generation is characterized by comprising the following steps:
(1) the method comprises the steps that a short video of unmanned aerial vehicle flight with N frames is shot in a ground-to-air long distance mode through a camera device, N unmanned aerial vehicle regions are obtained and serve as positive samples, and interference small regions such as trees, buildings, clouds, birds, kites and balloons are combined with other relevant videos and serve as negative samples and serve as training sample sets;
(2) generating virtual samples by adopting a convolution network-based W-GAN method, training a GAN model by utilizing positive sample images with the total number of N, and randomly generating the virtual samples with the number of 3N through a generator in the model;
(3) dividing the positive sample images with the total number of N into 0.5N groups in a mode of two positive sample images in each group, generating 4 images for each group of samples by using a QR reconstruction weighted fusion method, and obtaining 2N virtual samples in total;
(4) acquiring virtual samples with the number of 4N by using other virtual sample generation methods including mirror image turning and resampling on the positive sample images with the total number of N;
(5) the virtual samples with the number of 9N are generated and used for improving the generalization capability of the recognition model, the virtual samples are used as positive samples to be added into a training set, a rapid DPM model is trained, and the recognizer capable of accurately recognizing the images of the unmanned aerial vehicle under the condition of few samples is obtained.
Further, the method for generating the W-GAN model based on the convolutional network specifically includes:
(2-a) the generator network structure in the model is composed of 2 full-connection layers and 2 deconvolution layers, and the network result specifically comprises:
the 1 st layer is a fully connected layer activated by using a ReLU function, and the output is a 1024-sized vector;
the 2 nd layer is a full connection layer activated by using a ReLU function, outputs a vector with the size of 8192, and then is recombined into a multidimensional vector group with the size of 8 multiplied by 128;
the 3 rd layer is an deconvolution layer with convolution kernel of 4 x 4 and step length of 2, and is activated by using a ReLU function, and a graph with size of 16 x 64 is output;
layer 4 is an deconvolution layer with a convolution kernel of 4 × 4 and a step size of 2, and is activated using the Tanh function, outputting a graph of size 32 × 32 × 3.
(2-b) the network structure of the discriminator in the model is composed of 2 convolution layers and 2 full connection layers, and the network structure specifically comprises the following steps:
the layer 1 is a convolution layer with convolution kernel of 4 × 4 and step length of 2, and is activated by using LeakyReLU function, and a graph with size of 16 × 16 × 128 is output;
the 2 nd layer is a convolution layer with convolution kernel of 4 x 4 and step length of 2, and is activated by using LeakyReLU function, and a graph with size of 8 x 256 is output and then recombined into a vector with size of 16384;
the 3 rd layer is a fully connected layer activated by using a LeakyReLU function, and outputs a 1024-sized vector;
layer 4 is a fully connected layer that does not use an activation function, and outputs a vector of size 1.
And (2-c) training the models by using the N positive sample sets to obtain trained generators, and randomly generating 3N virtual samples by using the generators.
Further, the specific steps of the QR reconstruction weighted fusion method for generating the virtual sample include:
(3-a) according to a unitary matrix Q and an upper triangular matrix R obtained by QR decomposition, directly reconstructing to obtain a complete information graph:
I(Q,R)=Q·R
due to the characteristics of the upper triangular array R, the ith column vector of the corresponding matrix Q contributes to the element values of the ith to the last column of the original image matrix without affecting the element values before the ith column, and therefore, the integrity of the element values of the information map gradually increases to the left.
For one of the images IlQR decomposition is performed to calculate a unitary matrix Q containing image informationlAnd upper triangular array Rl. And reconstructing by using the matrix Q, R and a reconstruction coefficient w to obtain a left information map I:
I(Ql,Rl,w)=Ql32×(32·w)·Rl(32·w)×32
the 4 reconstruction coefficients 0.25, 0.5, 0.75 and 1.0 respectively obtain 4 corresponding left information maps.
(3-b) Another image I within the grouprFirstly, mirror image inversion is carried out to obtain symmetrical images, and the unitary matrix Q is obtained by adopting the same processing mode as (3-a)rAnd upper triangular array RrFurther, the left information map of the symmetrical image is obtained, and the mirror image inversion is carried out again to obtain 4 right information maps IG:
IG(Qr,Rr,w)=G(Qr32×(32·w)·Rr(32·w)×32)
Wherein G (-) is the inversion of the matrix mirror image.
(3-c) mapping the left information chart I and the right information chart I under the same reconstruction coefficientGAnd (3) performing weighted fusion:
obtaining 4 fused virtual images Iw。
And finally, carrying out QR reconstruction weighted fusion on the images with the number of 0.5N groups to obtain 2N virtual samples.
Further, the fast DPM model specifically includes:
and (5-a) according to the obvious characteristics of four wings and a fuselage of the unmanned aerial vehicle, the number of component filters of the DPM model is selected to be 5.
And (5-b) calculating a two-layer HOG feature pyramid of the image, wherein the bottom-layer feature map is used as the input of a root filter, and the top-layer feature map is used as the input of a component filter.
(5-c) Anchor fixing of root Filter to image center (x)0,y0) The total response score of the image is formulated as:
wherein R is
0Scoring a root filter, R
iIn order to score the component filter,
is the distance between the root filter anchor point and the detection point of the component.
The invention provides a few-sample unmanned aerial vehicle image identification method based on virtual sample generation. The method comprises the steps of firstly expanding the number of samples by using a virtual sample generation method such as W-GAN and QR reconstruction weighted fusion based on a convolutional network, improving the condition of insufficient generalization capability of the model under the condition of few samples, then identifying the unmanned aerial vehicle image to be identified by using a rapid DPM model, and fully utilizing the modularization characteristic of the unmanned aerial vehicle image to ensure the accuracy and robustness of identification. It can be seen that in the virtual sample generation method, a W-GAN based on a convolutional network is adopted to enable a model to generate a high-quality unmanned aerial vehicle virtual image, the virtual image generated by the QR reconstruction weighting fusion method has the distribution of the characteristics of two images, and under the condition of no excessive distortion, the effective information and diversity of the sample are increased, so that the generalization capability of the model is improved; in the identification algorithm part, the rapid DPM model adopts a component model which fixes the position of a feature graph and an anchor point input by a root filter and accords with the modularized features of the unmanned aerial vehicle, and the running speed and the accuracy are improved.
In order that the objects, method aspects and advantages of the present invention will become more apparent, the method aspects of the present invention will be described in detail, with reference to all of the accompanying drawings in the following detailed description. It should be understood that the described embodiments are merely illustrative of the invention, rather than all embodiments and are not limiting of the invention.
Due to the reasons of multiple types, large environmental influence on imaging and the like, the identification task of the unmanned aerial vehicle is under the condition of few samples; a section of unmanned aerial vehicle flight video which cannot correctly identify the target is acquired, and after processing, the unmanned aerial vehicle can be accurately identified by the identifier. Fig. 1 is a schematic flow chart of a method for identifying an image of a small-sample unmanned aerial vehicle based on virtual sample generation according to an embodiment of the present invention, and the specific implementation steps are as follows:
s1, remotely shooting a section of short video flying by the unmanned aerial vehicle with the number of N from ground to air through the camera device, acquiring N unmanned aerial vehicle regions as positive samples, collecting small interference regions such as trees, buildings, clouds, birds, kites and balloons as negative samples by combining other related videos, and uniformly zooming to the size of 32 multiplied by 32 to be used as a training sample set.
The unmanned aerial vehicle positive sample is a rectangular area containing the unmanned aerial vehicle, which is intercepted by manual or other means, and the image shows a larger difference due to factors such as flight attitude, environment and the like. Various interference small areas are used as negative samples, and besides the interference of small objects in the air, partial areas of trees, buildings and clouds are easy to become the interference of the algorithm. All sample images are scaled to 32 x 32, so that the image quality is not influenced, and the subsequent algorithm can conveniently process.
And S2, generating virtual samples by using a convolution network-based W-GAN method, training a GAN model by using the positive sample images with the total number of N, and randomly generating the virtual samples with the number of 3N by using a generator in the model.
The currently popular generation-assisted Networks (GAN) is a reliable method for generating dummy samples. GAN has the problems of difficult training and easy pattern collapse, and the quality of the generated images is greatly different. In order to generate a positive sample picture of the unmanned aerial vehicle with better quality, the method provides a W-GAN method based on a convolutional network. The W-GAN is composed of a generator G and a discriminator D, and the objective function is as follows:
fig. 2 is a schematic diagram of a convolutional network-based W-GAN framework provided in an embodiment of the present invention, where a generator and a discriminator in a model are of mutually symmetric convolutional network structures, so as to improve model performance and adapt to an unmanned aerial vehicle image with low resolution. Wherein, the network structure of generator G is that 2 full connection layers and 2 deconvolution layers constitute, specifically is:
the 1 st layer is a fully connected layer activated by using a ReLU function, and the output is a 1024-sized vector;
the 2 nd layer is a full connection layer activated by using a ReLU function, outputs a vector with the size of 8192, and then is recombined into a multidimensional vector group with the size of 8 multiplied by 128;
the 3 rd layer is an deconvolution layer with convolution kernel of 4 x 4 and step length of 2, and is activated by using a ReLU function, and a graph with size of 16 x 64 is output;
layer 4 is an deconvolution layer with a convolution kernel of 4 × 4 and a step size of 2, and is activated using the Tanh function, outputting a graph of size 32 × 32 × 3.
The network structure of the discriminator D is composed of 2 convolution layers and 2 full-connection layers, and the network structure specifically comprises:
the layer 1 is a convolution layer with convolution kernel of 4 × 4 and step length of 2, and is activated by using LeakyReLU function, and a graph with size of 16 × 16 × 128 is output;
the 2 nd layer is a convolution layer with convolution kernel of 4 x 4 and step length of 2, and is activated by using LeakyReLU function, and a graph with size of 8 x 256 is output and then recombined into a vector with size of 16384;
the 3 rd layer is a fully connected layer activated by using a LeakyReLU function, and outputs a 1024-sized vector;
layer 4 is a fully connected layer that does not use an activation function, and outputs a vector of size 1.
And (3) training the W-GAN model based on the convolutional network by using the N positive sample sets, obtaining a trained generator, and then randomly generating 3N virtual samples. FIG. 3(a) is a diagram illustrating an actual image sample of a positive sample set according to an embodiment of the present invention; fig. 3(b) shows a sample of a virtual sample generated by the original W-GAN method, which easily shows that the generated virtual sample image has the problems of large noise and poor image quality; fig. 3(c) shows a sample of a virtual sample generated by the convolutional network-based W-GAN method, and it can be seen that the image generated by the method has a higher image similarity with the original positive sample, so that the noise problem is solved, the quality of the virtual image is improved, and the model has completely learned the feature distribution of the original positive sample.
And S3, dividing the positive sample images with the total number of N into 0.5N groups in a mode of two positive sample images per group, generating 4 images for each group of samples by using a QR reconstruction weighted fusion method, and obtaining 2N virtual samples in total.
Fig. 4 is a schematic diagram of a QR reconstruction weighted fusion method provided in an embodiment of the present invention, where a total number of N positive sample images are divided into 0.5N groups, each group has two images, and each group is subjected to QR reconstruction weighted fusion, specifically:
firstly, for one of the images IlQR decomposition is performed to calculate a unitary matrix Q containing image informationlAnd upper triangular array Rl. The matrices Q and R may be recombined into a complete information graph that is completely the same as the original image by multiplication, as follows:
I(Q,R)=Q·R
due to the characteristics of the upper triangular array R, the ith column vector of the corresponding matrix Q contributes to the element values of the ith to the last column of the original image matrix without affecting the element values before the ith column, and therefore, the integrity of the element values of the information map gradually increases to the left. Therefore, the first 32 × w columns can be retained by the reconstruction coefficient w to determine Q, R the amount of information contained in the matrix. Due to the complete property of the information graph obtained by QR decomposition in the left direction, the matrix Q is utilizedl、RlAnd calculating the reconstruction coefficient w to obtain a left information graph I, which is shown as the following formula:
I(Ql,Rl,w)=Ql32×(32·w)·Rl(32·w)×32
the 4 reconstruction coefficients 0.25, 0.5, 0.75 and 1 respectively obtain 4 corresponding left information maps.
Then, another image I in the grouprFirstly, mirror image inversion is carried out to obtain symmetrical images, and the unitary matrix Q is obtained by adopting the same processing mode as the previous steprAnd upper triangular array RrFurther, the left information map of the symmetrical image is obtained, and the 4 right information maps I are obtained by mirror image turningGAs shown in the following formula:
IG(Qr,Rr,w)=G(Qr32×(32·w)·Rr(32·w)×32)
wherein G (-) is the inversion of the matrix mirror image.
And finally, performing weighted fusion on the left information graph and the right information graph under the same reconstruction coefficient, as shown in the following formula:
obtaining a fused virtual image Iw。
And obtaining 2N virtual samples after QR reconstruction weighted fusion of the N positive sample images. Fig. 5 shows an effect diagram of a virtual sample generated by the QR reconstruction weighted fusion method according to the embodiment of the present invention.
And S4, for the positive sample images with the total number of N, generating 4 images for each sample by using other virtual sample generation methods, and obtaining 4N virtual samples in total.
S41, obtaining N virtual image samples for N positive sample images by using a mirror image overturning method.
An image I is subjected to mirror image inversion, and a symmetrical virtual image I' (x) is obtained as shown in the following formulai,yj):
I'(xi,yj)=I(x27-i,yj)
The virtual image obtained by the mirror image turning method is a picture of the original image which is symmetrical about the y axis, and due to the symmetry characteristic of the unmanned aerial vehicle, the virtual image is very close to the real image.
And S42, obtaining 3N virtual image samples from 3 resampling coefficients for N positive sample images by using a resampling method.
An image is resampled, the image is downsampled according to 3 sampling coefficients of 0.4, 0.6 and 0.8 respectively to obtain a reduced image, then the up-sampling is carried out by utilizing a bilinear interpolation method, and the obtained 3 resampled images are virtual samples. The images obtained by the resampling method can be regarded as images captured when the distance of the unmanned aerial vehicle becomes long, and are also very close to real images, so that the multi-scale characteristic of the sample set is enhanced.
And S5, generating 9N virtual samples for improving the generalization capability of the recognition model, adding the virtual samples into a training set as positive samples, training a rapid DPM model, and obtaining a recognizer capable of accurately recognizing the unmanned aerial vehicle image under the condition of few samples.
The number of virtual samples obtained by the virtual sample generation method is 9N, the number of positive samples is 10N in total by combining the original positive samples, and a new sample set is obtained by combining the original negative samples.
DPM (Deformable Parts Model) is a very successful target detection algorithm, which calculates a matching score of a target by a root template and a part template, and performs classification. The DPM model firstly calculates a multi-layer HOG characteristic pyramid of an input image, and then finds the maximum score position of a target through sliding window searching. However, the scheme has a large number of similar matching calculations, and the algorithm is time-consuming. Due to the modular characteristic of the unmanned aerial vehicle, the idea of combining DPM component matching and overall matching is very suitable for the identification task of the target of the unmanned aerial vehicle. In order to further optimize the operation speed of the algorithm, the invention provides a fast DPM model, and fig. 6 is a schematic diagram of a fast DPM model framework provided by an embodiment of the invention, specifically:
s51, selecting 5 component filters of the DPM model according to obvious characteristics of four wings and a fuselage of the unmanned aerial vehicle.
And S52, calculating a two-layer HOG feature pyramid of the image, wherein the bottom-layer feature map is used as the input of the root filter, and the top-layer feature map is used as the input of the component filter.
S53, fixing anchor points of root filters to be image centers (x)0,y0) The response value score formula of the image is shown as follows:
wherein R is
0Scoring a root filter, R
iIn order to score the component filter,
is the distance between the root filter anchor point and the detection point of the component.
And after the DPM model is trained, obtaining the trained unmanned aerial vehicle recognizer. The identification process comprises the following steps: inputting an image, calculating two layers of HOG feature pyramids, calculating scores of bottom layer feature graphs by a root filter, calculating response values obtained by convolution by performing sliding search on top layer feature graphs by 5 component filters, and searching for an optimal position; if the total response score of the image is higher than the threshold value, judging that the target is the unmanned aerial vehicle target; otherwise, the target is judged to be a non-unmanned aerial vehicle target.
In conclusion, the virtual sample generation method is combined with the identification method of the DPM model, is suitable for the unmanned aerial vehicle small image identification task under the condition of few samples, and has strong practicability; the W-GAN model based on the convolutional network can generate a high-quality virtual image of the unmanned aerial vehicle, the virtual image generated by the QR reconstruction weighted fusion method has the distribution of the characteristics of two images, and under the condition of no excessive distortion, the effective information and diversity of a sample are increased, so that the generalization capability of the model is improved; the rapid DPM model adopts a feature diagram input by a fixed root filter, the position of an anchor point and a component model according with the modularized feature of the unmanned aerial vehicle, and improves the running speed and accuracy.
The basic principles, main features and practical features of the image recognition method of the unmanned aerial vehicle with few samples are described above, and those skilled in the art should understand that the above description of the embodiments is only for helping understanding the method technology and core idea of the present invention, and not for limiting the present invention, and meanwhile, according to the idea of the present application, there are changes in the specific implementation and application scope, and these changes all fall into the protection scope of the present invention.