Disclosure of Invention
The invention aims to provide an unmanned aerial vehicle aerial image deblurring method based on improvement DeblurGAN, which improves the precision of a deblurring algorithm and solves the problem of unmanned aerial vehicle image acquisition information loss caused by complex weather environment. In order to achieve the above purpose, the invention adopts the following specific technical scheme:
step 1, preparing a data set, and carrying out synthetic fuzzy processing on the data set;
Step 2, before carrying out algorithm processing, carrying out image preprocessing on an input data set;
the method comprises the steps of constructing a generating network and a judging network framework of an improved algorithm, wherein the generating network comprises an encoder and a decoder, the encoder adopts a double-branch network structure to extract a characteristic diagram, and the decoder restores a high-resolution image through layer-by-layer upsampling and convolution operation;
step 4, setting an improved target loss function for generating an countermeasure network model;
Step 5, performing performance evaluation on the trained deblurring method based on the improved DeblurGAN unmanned aerial vehicle aerial image;
In the step 1, preparing an experimental dataset, performing manual adding blurring processing based on an unmanned aerial vehicle aerial image dataset Uavid2020, and simulating a random wind power situation possibly encountered by the unmanned aerial vehicle by utilizing a Albumentations image data enhancement library; the method mainly comprises the steps of dividing the intensity of wind power into three grades, namely (15, 30) weaker wind power, (30, 50) medium wind power and (50, 80) stronger wind power, wherein the direction of the wind power is set to be random and the range is 0-360 degrees;
In the step 2, image preprocessing is carried out on an input data set, characteristic information of an image is extracted on a space domain and a frequency domain respectively, gradient information of the image in the horizontal direction and the vertical direction is calculated by utilizing an edge operator (Sobel operator) in the space domain to obtain space domain characteristics reflecting image edges and texture information, fast Fourier Transform (FFT) and frequency domain centering processing are carried out on an blurred image in the frequency domain to obtain distribution information of image frequencies, normalization processing is carried out on the extracted space domain characteristics and the frequency domain characteristics, scaling is carried out on the extracted space domain characteristics and the frequency domain characteristics to a preset interval so as to eliminate amplitude value difference, and the frequency domain characteristics are further multiplied by scaling factors to balance weight relation between the space domain characteristics and the frequency domain characteristics, and the processed space domain characteristics and the frequency domain characteristics are overlapped on channel dimensions according to set weights to form a fusion characteristic diagram comprising multiple channels;
In the step 3, a generating network and a judging network of an improved algorithm are constructed, and for the encoder part of the generating network, a double-branch network structure is adopted, and double-branch network processing is realized through a FPN MobileNet network and a Swin converter network successively.
The first branch is a MobileNet network based on an FPN characteristic pyramid structure, which comprises an initial convolution layer, performs 3X 3 convolution operation on an input image, transforms the number of input channels from 3 to 128 of a preset channel number, designs a plurality of FPN layers, each layer consists of 3X 3 convolution, batch normalization and ReLU activation functions, realizes layer-by-layer downsampling through maximum pooling, and generates a multi-scale characteristic image with the shape of (H/2, W/2), (H/4, W/4), (H/8) and the like in sequence;
The method comprises the steps of obtaining a first branch, obtaining a multi-level self-attention calculation module, obtaining a query matrix Q, a key matrix K and a value matrix V through 1X 1 convolution, obtaining attention distribution by using a scaling factor and a softmax function, and realizing feature enhancement in a local window, obtaining the same channel number after normalizing the feature map output by each scale of the second branch by using 1X 1 convolution, and carrying out weighted fusion on the feature map corresponding to the first branch according to a leachable fusion weight;
For a decoder part of the generating network, adopting layer-by-layer up-sampling and convolution operation to recover the high-resolution characteristics of the image, and arranging different scale characteristic extraction and fusion modules between the two;
In said step 4, an improved objective loss function is set for generating an countermeasure network model, the generator network loss function being composed of a weighting of the countermeasure loss (l_gan), the perceived loss (l_ perceive) and the pixel loss (l_pixel);
The generator loss function is set as follows:
Ltotal=λgan·Lgan+λperceive·Lperceive+λpixel·Lpixel
Wherein λ gan、λperceive、λpixel represents the weight coefficients of the contrast loss L gan, the perceived loss L perceive, and the pixel loss L pixel, respectively, and the initial weight allocation is set to 2:3:5;
The set arbiter loss function is as follows:
The purpose of the generator is to generate a realistic sample, as much as possible to make the generated sample and the real sample indistinguishable by the arbiter, the first term of the arbiter loss function is the difference between the output of the arbiter that calculates the real sample x and the generated sample G (z), the second term focuses on the output of the arbiter that generates the sample and tries to minimize the D (G (z)) and the D (z) Ideally, D (x) should be close to 1, indicating that the true sample is deemed true by the arbiter, and D (G (z)) should be close to 0, indicating that the generated sample is determined false.
In the step 5, the improved generation countermeasure network algorithm is subjected to model training and experimental evaluation, the evaluation index of the model effect adopts peak signal-to-noise ratio (PSNR) and Structural Similarity Index (SSIM), the PSNR can rapidly evaluate the global pixel error of the image by calculating the mean square error of two images, the SSIM can effectively distinguish the structural change of the image content by calculating the local similarity block by block through a sliding window and taking the global average value, and the two images are combined to evaluate the image quality comprehensively.
The invention provides a deblurring method of an unmanned aerial vehicle aerial image based on an improved DeblurGAN algorithm, which has the following beneficial effects:
Aiming at the problem of blurring of aerial images of unmanned aerial vehicles, the method extracts edge texture information and frequency domain information of blurred images in a space domain and a frequency domain respectively through improvement of a generated countermeasure network algorithm, performs dynamic weighted fusion through an extrusion excitation module, inputs the preprocessed images into a designed double-branch network structure, realizes double-branch extraction feature information through a FPN MobileNet network and a Swin transform network in sequence, strengthens fusion of multi-scale features, and is favorable for recovering high-resolution clear images. The method has a good effect on recovering the aerial blurred image of the unmanned aerial vehicle.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
As shown in fig. 1, the embodiment of the invention discloses an unmanned aerial vehicle aerial image fuzzy image processing method based on improvement DeblurGAN, which comprises the following specific steps:
Step 1, preparing an experimental data set, and synthesizing a blurred image based on an unmanned aerial vehicle aerial image data set Uavid 2020;
step 2, preprocessing an input image, and extracting characteristic information of the image on a space domain and a frequency domain respectively;
Step 3, constructing an improved DeblurGAN deblurring algorithm, and adopting a double-branch network structure combining a FPN MobileNet network and a Swin transform network in a generation network part;
Step 4, performing model training on the improved DeblurGAN deblurring algorithm;
and 5, storing the model with the best effect after training is finished, and further evaluating the effectiveness and accuracy of the improved algorithm by inputting the test set image.
In the step 1, the disclosed unmanned aerial vehicle aerial photographing data set has few data about fuzzy images, so that the conditions of high altitude resistance, wind power and the like suffered by the unmanned aerial vehicle in aerial operation can be simulated through the synthetic fuzzy effect, the data set Uavid2020 captures 4k high-resolution images, the data set mainly has two environments of urban streetscapes and suburban roads, and the challenges of rich scene change and moving object recognition are provided. The data set is selected to synthesize a blurred image, and random situations possibly encountered by the unmanned aerial vehicle are simulated by using a Albumentations image data enhancement library, and the random situations are mainly divided into wind_strength wind intensity, blur_prob blur probability, disguision_prob distortion probability and random horizontal displacement d x =int (self.wind_ strengtyh w (2 np.random.rand () -1)), and are used for simulating the caused blurred artifacts.
In the combined fuzzy data set, the intensity of wind power is mainly divided into three grades, namely (15, 30) weaker wind power, (30, 50) medium wind power and (50, 80) stronger wind power, the direction of the wind power is set to be random, the range of the wind power is 0-360 degrees, and enough data sets can be obtained through different parameter settings, so that the experimental requirements are met. And sorting the data set, and obtaining a pair of blurred images and clear images as algorithm input.
In said step 2, the input image is preprocessed, i.e. the characteristic information of the image is extracted in the spatial domain and in the frequency domain, respectively. Image IMREAD _ GRAYSCALE is first read in the form of a gray scale, then the gradient of the image in the x and y directions is calculated using the Sobel edge detection method, grad_x=cv2.sobel (img, cv2.cv_64f,1,0, ksize=3), grad_y=cv2.sobel (img, cv2.cv_64f,0,1, ksize=3), and then the gradient is normalized to the [0,1] range.
In the frequency domain, the image is first converted to a floating-point type np. Float32, then the amplitude spectrum of the image is calculated by fourier transform (FFT, fast Fourier Transform), magnitude_ spectum =20×np. Log (np. Abs ((fshift) +1e-10), then the amplitude spectrum is normalized to the [0,1] range, and the frequency domain features also need to be multiplied by scaling factor scale_factor=0.5 for balancing the weights.
Combining two gradients of the extracted spatial domain into two channels, expanding the frequency domain features into a single channel, carrying out feature fusion on the channels, namely, fused_features=torch.cat ([ grad_x, grad_y, freq_feature ], and dim=0), wherein the shape of the feature map is [3, H, W ], introducing an extrusion excitation module (SE, squeeze-and-Excitation Module), adjusting the channel features by the module through a simple gating mechanism and a sigmoid activation function, learning the nonlinear relation between the channels, compressing the spatial dimension to 1 multiplied by 1 through two fully connected layers, and finally, carrying out re-weighting on the frequency domain channels and the spatial domain channels according to the excitation result by the module to realize dynamic adjustment of the features.
In the step 3, a modified DeblurGAN defuzzification algorithm is constructed, and the specific network structure is shown in fig. 2. The first branch is combined with a MobileNet network structure through an FPN (Feature Pyramid Network feature pyramid) structure to extract feature graphs of multiple scales, the feature graphs of 128 channels are generated through processing of a 3X3 convolution (Convolution), batch normalization (BatchNorm) and a ReLU activation function, the FPN layer simulates feature extraction of 5 scales (from large to small), each scale is composed of Convolution block by using the 3X3 convolution, the batch normalization and the ReLU activation function, the output of each layer of the FPN is subjected to 2X2 max pooling (Max pooling) operation to gradually reduce the space size of the feature graphs, and finally the feature graphs of 5 scales are output in the shapes of (B, 128, H/2, W/2), (B, 128, H/4, W/4), (B, 128, H/8,W/8), (B, 128, H/16, W/16), (B, 128, H/32, W/32);
the second branch captures long-range dependencies and global context with a windowed self-attention mechanism through the Swin transducer encoder structure. Each layer consists of a Patch Partition and Swin Transfomer Block, and the Patch Partition module partitions the input image into non-overlapping patches, and each Patch is treated as a token;
Swin Transfomer Block the structure is specifically shown in fig. 3, receiving input feature tensor Z e R B×C×H×W, wherein B is batch size, C is channel number, H, W is space dimension of image, dividing input feature map into non-overlapping windows of fixed size, each Window size is m×m, generating windowed feature W l, left of the structure is Multi-head Attention module W-MSA Block (Window-based Multi-head Self-attribute) of conventional Window configuration, generating query Q, key K, value V matrix by linear projection, calculating in-Window Attention score and weighting aggregation feature, outputting feature as Z W-MSA, adding feature with original input element by element, completing first residual connection Z l=Z+ZW-MSA, realizing cross-Window information interaction, LN (Layer Normalization) performing layer normalization Z normal=LayerNorm(Zl of channel dimension for Z l, stabilizing numerical distribution, right of the structure is Multi-head Attention module SW-MSA Block (shiftwindow MSA) of moving Window configuration, performing Window shift operation for normalized feature as Window shift operation Repartitioning and calculating attention, outputting the characteristics Z SW-MSA, adding element by element to complete the second residual connection Z l+1=Znormal+ZSW-MSA, applying MLP (Multi-Layer Percention) channel-by-channel feedforward sub-network to perform two-stage nonlinear transformation on Z l+1, namely linear projection + GULU activation and linear projection +Dropout, respectively, to ensure that the output characteristic dimension is consistent with the input, and finally performing final channel normalization on MLP output to output complete characteristics X out=LayerNorm(XMLP) for subsequent network Layer processing.
Swin Transfomer Block adopts a division mode of a moving window, and the calculation formula is as follows:
Wherein Z l-1 denotes entering the first The token of the Block is used to determine the number of tokens,And Z l respectively represents the firstThe output characteristics of the W-MSA and MLP blocks of the Block,And Z l+1 respectively represents the firstOutput characteristics of SW-MSA module and MLP module of each Block. In the implementation process of codes, the size of a moving window is set to be window_size=7, the number of multi-head attentions is set to be num_heads=3, the first-stage attentive residual error is obtained by adding the original input X and the characteristics subjected to W-MSA processing to X l=X+XW-MSA, the second-stage MLP residual error is subjected to LN processing twice, the characteristic information strengthened before and after fusion is obtained, the local attentiveness of a hierarchical level is enhanced, and the attentive capability of the follow-up output characteristics to global information is improved.
Two branches of the double-branch network structure respectively obtain feature images with different scales, the feature images are fused at feature layers with corresponding sizes, then the features with different scales are normalized to 128 channels through a 1X 1Convolution convolution module, fusion of the multi-scale feature images is realized, and finally a restored high-resolution image is output.
In the step 4, the preprocessed image is divided into a training set, a verification set and a test set according to a ratio of 7:2:1, and is input into a model for training. When designing the training program, the size of the input image is cut to be 256 multiplied by 256, data enhancement operations are carried out, including operations such as random cutting, geometric transformation and the like, so that the data are added, the generalization capability of a model is improved, the training round number epochs is set to be 200, the batch number train_ batches _per_epoch of each round of training is set to be 880, the batch number of verification per round is set to be val_ batches _per_epoch to be 440, the batch processing size is set to be 16, an optimizer uses Adam, the initial learning rate is set to be 0.001, and the linear learning rate attenuation is set to be from the 50 th round. When the loss function in the training process tends to be converged and the image similarity degree value tends to be stable, the current training model is judged to be better, and the current training model can be stored for subsequent testing;
the loss function of the modified DeblurGAN deblurring algorithm is divided into a generator loss function and a arbiter loss function, the generator loss includes a counterloss L gan, a perceived loss L perceive, and a pixel loss L pixel, and the calculation formula is as follows:
Wherein B represents the total number of samples of mini-batch in one forward or backward propagation, namely batch size batch_size, B represents the subscript of samples in a processing batch, D (x) represents the output of a discriminator for a real sample x, H, W, C respectively represents the length, width and height of a picture and the number of channels, phi (x) represents a characteristic tensor obtained in a network layer, I truth represents a real sample image or a target image, I false represents an image generated by a generator, and the pixel difference and the perception content difference between the real image and the generated image can be obtained in the calculation mode, so that the training effect of the model is continuously optimized.
The loss functions are weighted and combined according to the corresponding set proportion, and the formula is as follows:
Ltotal=0.2·Lgan+0.3·Lperceive+0.5·Lpixel
In a specific experimental process, different training effects may be caused by different weighting coefficients, and the weight proportion may be continuously adjusted according to the training effect, so that the training model effect is best.
The objective loss function for GAN-based generation of an antagonism network can be expressed by the following equation:
Wherein the input samples, a batch of samples X 1,X2,...,XN of the real data sample distribution P data, a batch of noise samples z 1,z2,...,zM from the noise distribution P z, D (x) represents the output of the discriminator for the real sample x, judges whether the sample is from real data, and when D (x) is 1, log D (x) is 0, and the loss approaches to the minimum value; The first term of the equation is the difference between the calculated true sample x and the output of the discriminator generating the sample, and the second term focuses on the output of the discriminator generating the sample and attempts to minimize D (G (z)) and D (z)) when D (G (z)). Fwdarw.0, log (1-D (G (z)). Fwdarw.0, the loss is close to the minimum value Is a difference in (a) between the two.
In the step 5, the evaluation index of the model adopts a peak signal to noise ratio (PSNR) and a Structural Similarity Index (SSIM), and the specific formulas are as follows:
Where MAX I represents the maximum pixel value of the image (typically 255), MSE represents the mean square error, H, W represents the height and width of the image, I truth and I false are the real sample image and the generated image of the generation network, respectively, (I, j) represents one pixel point in the image, mu x,μy represents the mean of image X and image Y, sigma x,σy represents the standard deviation, sigma xy represents the covariance, and C 1,C2 represents a constant for avoiding denominator to be zero. The PSNR and the SSIM are combined to evaluate the image, so that the training effect of the model can be evaluated more comprehensively.
The invention is not limited to the manner described above. The foregoing description of the specific embodiments is intended to describe and illustrate the technical solutions of the present invention, and many variations can be made by those skilled in the art without departing from the spirit of the invention and the scope of the claims, which are within the scope of the present invention.