CN120430962A

CN120430962A - Deblurring method for UAV aerial images based on improved DeblurGAN

Info

Publication number: CN120430962A
Application number: CN202510553569.9A
Authority: CN
Inventors: 陈宏滨; 李夏璇
Original assignee: Guilin University of Electronic Technology
Current assignee: Guilin University of Electronic Technology
Priority date: 2025-04-29
Filing date: 2025-04-29
Publication date: 2025-08-05

Abstract

This paper discloses a deblurring method for unmanned aerial vehicle (UAV) aerial images based on an improved DeblurGAN. First, a dataset of blurred images taken by UAV aerial photography is simulated and synthesized using the Albumentations image data augmentation library. Second, the dataset is preprocessed to extract image feature information in the spatial domain and frequency domain and perform feature fusion. Then, a dual-branch generative network structure is constructed, and the FPN MobileNet network and the Swin Transformer network are combined to enhance the image feature extraction capability and the global image attention capability. Finally, an improved loss function is used to further optimize the training effect, and the PSNR and SSIM are used to evaluate the restoration effect of the improved algorithm on blurred UAV aerial images.

Description

Unmanned aerial vehicle aerial image deblurring method based on improvement DeblurGAN

Technical Field

The invention relates to the field of unmanned aerial vehicle aerial image processing, in particular to an unmanned aerial vehicle aerial image deblurring method based on an improved DeblurGAN algorithm.

Background

The unmanned aerial vehicle has become an important carrier for modern aerial communication base stations and mobile data acquisition by virtue of the remarkable advantages of small size, strong maneuverability, wide coverage range, low deployment cost and the like. As novel remote sensing technical equipment, the application effect of the novel remote sensing technical equipment in the fields of urban planning, precise agriculture, environmental monitoring and the like is remarkable, and particularly in scenes such as disaster emergency response, traffic network detection and the like, the rapid information acquisition capability is realized through the aerial image segmentation technology, so that an efficient data support system is constructed for a decision maker.

In the technical application level, the high-resolution imaging system carried by the unmanned aerial vehicle can capture the earth surface space information in real time. However, as the flying height increases, the stability of the equipment caused by aerodynamic effects gradually appears, and these meteorological factors can significantly affect the flying attitude of the unmanned aerial vehicle, thereby causing the imaging quality to be reduced. More particularly, the image blurring formation mechanism involves multiple physical coupling effects, and specifically comprises (1) equipment factors such as lens optical calibration deviation and sensor resolution limitation, (2) environmental factors such as photon noise and atmospheric turbulence disturbance under a low-illumination condition, and (3) motion factors such as dynamic blurring caused by platform vibration and motion artifacts caused by high-speed movement of a target. The problem of image degradation not only restricts effective extraction of geospatial information, but also produces linkage influence in deep face of social and economic operation, wherein misjudgment of disaster is possibly caused in the disaster emergency field, gold rescue time is delayed, structural damage and omission are easily caused in infrastructure monitoring, potential safety hazard is increased, crop growth analysis precision is influenced in the aspect of agricultural production, and resource allocation misalignment is caused. The breakthrough of the technical bottlenecks has become a key research direction for improving the remote sensing application efficiency of the unmanned aerial vehicle.

The traditional deblurring algorithm starts from a fuzzy degradation model, adopts mathematical modeling or convolution network to predict a fuzzy kernel, so as to recover an image by deconvolution, however, the fuzzy process is often complex, changeable and difficult to accurately estimate, so that the traditional deblurring method cannot well meet the requirement of recovering a clear image. In recent years, deep learning has been rapidly developed, and has strong feature learning and mapping capabilities, so that deblurring modes of complex blurred images can be learned from a large number of data sets. Such as convolutional neural Network (Convolution Neural Network, CNN), recurrent neural Network (Recurrent Neural Network, RNN), generate countermeasure Network (GENERATIVE ADVERSARIAL Network, GAN), and Transformer Network. The deep learning can continuously adjust and learn the characteristics according to the training test, and the network is adaptively adjusted, so that the end-to-end image deblurring target is achieved, and compared with the traditional deblurring algorithm, the deep learning method has higher robustness and wider applicability in complex application scenes.

Disclosure of Invention

The invention aims to provide an unmanned aerial vehicle aerial image deblurring method based on improvement DeblurGAN, which improves the precision of a deblurring algorithm and solves the problem of unmanned aerial vehicle image acquisition information loss caused by complex weather environment. In order to achieve the above purpose, the invention adopts the following specific technical scheme:

step 1, preparing a data set, and carrying out synthetic fuzzy processing on the data set;

Step 2, before carrying out algorithm processing, carrying out image preprocessing on an input data set;

the method comprises the steps of constructing a generating network and a judging network framework of an improved algorithm, wherein the generating network comprises an encoder and a decoder, the encoder adopts a double-branch network structure to extract a characteristic diagram, and the decoder restores a high-resolution image through layer-by-layer upsampling and convolution operation;

step 4, setting an improved target loss function for generating an countermeasure network model;

Step 5, performing performance evaluation on the trained deblurring method based on the improved DeblurGAN unmanned aerial vehicle aerial image;

In the step 1, preparing an experimental dataset, performing manual adding blurring processing based on an unmanned aerial vehicle aerial image dataset Uavid2020, and simulating a random wind power situation possibly encountered by the unmanned aerial vehicle by utilizing a Albumentations image data enhancement library; the method mainly comprises the steps of dividing the intensity of wind power into three grades, namely (15, 30) weaker wind power, (30, 50) medium wind power and (50, 80) stronger wind power, wherein the direction of the wind power is set to be random and the range is 0-360 degrees;

In the step 2, image preprocessing is carried out on an input data set, characteristic information of an image is extracted on a space domain and a frequency domain respectively, gradient information of the image in the horizontal direction and the vertical direction is calculated by utilizing an edge operator (Sobel operator) in the space domain to obtain space domain characteristics reflecting image edges and texture information, fast Fourier Transform (FFT) and frequency domain centering processing are carried out on an blurred image in the frequency domain to obtain distribution information of image frequencies, normalization processing is carried out on the extracted space domain characteristics and the frequency domain characteristics, scaling is carried out on the extracted space domain characteristics and the frequency domain characteristics to a preset interval so as to eliminate amplitude value difference, and the frequency domain characteristics are further multiplied by scaling factors to balance weight relation between the space domain characteristics and the frequency domain characteristics, and the processed space domain characteristics and the frequency domain characteristics are overlapped on channel dimensions according to set weights to form a fusion characteristic diagram comprising multiple channels;

In the step 3, a generating network and a judging network of an improved algorithm are constructed, and for the encoder part of the generating network, a double-branch network structure is adopted, and double-branch network processing is realized through a FPN MobileNet network and a Swin converter network successively.

The first branch is a MobileNet network based on an FPN characteristic pyramid structure, which comprises an initial convolution layer, performs 3X 3 convolution operation on an input image, transforms the number of input channels from 3 to 128 of a preset channel number, designs a plurality of FPN layers, each layer consists of 3X 3 convolution, batch normalization and ReLU activation functions, realizes layer-by-layer downsampling through maximum pooling, and generates a multi-scale characteristic image with the shape of (H/2, W/2), (H/4, W/4), (H/8) and the like in sequence;

The method comprises the steps of obtaining a first branch, obtaining a multi-level self-attention calculation module, obtaining a query matrix Q, a key matrix K and a value matrix V through 1X 1 convolution, obtaining attention distribution by using a scaling factor and a softmax function, and realizing feature enhancement in a local window, obtaining the same channel number after normalizing the feature map output by each scale of the second branch by using 1X 1 convolution, and carrying out weighted fusion on the feature map corresponding to the first branch according to a leachable fusion weight;

For a decoder part of the generating network, adopting layer-by-layer up-sampling and convolution operation to recover the high-resolution characteristics of the image, and arranging different scale characteristic extraction and fusion modules between the two;

In said step 4, an improved objective loss function is set for generating an countermeasure network model, the generator network loss function being composed of a weighting of the countermeasure loss (l_gan), the perceived loss (l_ perceive) and the pixel loss (l_pixel);

The generator loss function is set as follows:

L_total＝λ_gan·L_gan+λ_perceive·L_perceive+λ_pixel·L_pixel

Wherein λ _gan、λ_perceive、λ_pixel represents the weight coefficients of the contrast loss L _gan, the perceived loss L _perceive, and the pixel loss L _pixel, respectively, and the initial weight allocation is set to 2:3:5;

The set arbiter loss function is as follows:

The purpose of the generator is to generate a realistic sample, as much as possible to make the generated sample and the real sample indistinguishable by the arbiter, the first term of the arbiter loss function is the difference between the output of the arbiter that calculates the real sample x and the generated sample G (z), the second term focuses on the output of the arbiter that generates the sample and tries to minimize the D (G (z)) and the D (z) Ideally, D (x) should be close to 1, indicating that the true sample is deemed true by the arbiter, and D (G (z)) should be close to 0, indicating that the generated sample is determined false.

In the step 5, the improved generation countermeasure network algorithm is subjected to model training and experimental evaluation, the evaluation index of the model effect adopts peak signal-to-noise ratio (PSNR) and Structural Similarity Index (SSIM), the PSNR can rapidly evaluate the global pixel error of the image by calculating the mean square error of two images, the SSIM can effectively distinguish the structural change of the image content by calculating the local similarity block by block through a sliding window and taking the global average value, and the two images are combined to evaluate the image quality comprehensively.

The invention provides a deblurring method of an unmanned aerial vehicle aerial image based on an improved DeblurGAN algorithm, which has the following beneficial effects:

Aiming at the problem of blurring of aerial images of unmanned aerial vehicles, the method extracts edge texture information and frequency domain information of blurred images in a space domain and a frequency domain respectively through improvement of a generated countermeasure network algorithm, performs dynamic weighted fusion through an extrusion excitation module, inputs the preprocessed images into a designed double-branch network structure, realizes double-branch extraction feature information through a FPN MobileNet network and a Swin transform network in sequence, strengthens fusion of multi-scale features, and is favorable for recovering high-resolution clear images. The method has a good effect on recovering the aerial blurred image of the unmanned aerial vehicle.

Drawings

FIG. 1 is a schematic flow diagram of the method of the present invention;

FIG. 2 is a block diagram of an algorithm for generating a network;

Fig. 3 is a block diagram of the lead-in Swin Transformer Block.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

As shown in fig. 1, the embodiment of the invention discloses an unmanned aerial vehicle aerial image fuzzy image processing method based on improvement DeblurGAN, which comprises the following specific steps:

Step 1, preparing an experimental data set, and synthesizing a blurred image based on an unmanned aerial vehicle aerial image data set Uavid 2020;

step 2, preprocessing an input image, and extracting characteristic information of the image on a space domain and a frequency domain respectively;

Step 3, constructing an improved DeblurGAN deblurring algorithm, and adopting a double-branch network structure combining a FPN MobileNet network and a Swin transform network in a generation network part;

Step 4, performing model training on the improved DeblurGAN deblurring algorithm;

and 5, storing the model with the best effect after training is finished, and further evaluating the effectiveness and accuracy of the improved algorithm by inputting the test set image.

In the step 1, the disclosed unmanned aerial vehicle aerial photographing data set has few data about fuzzy images, so that the conditions of high altitude resistance, wind power and the like suffered by the unmanned aerial vehicle in aerial operation can be simulated through the synthetic fuzzy effect, the data set Uavid2020 captures 4k high-resolution images, the data set mainly has two environments of urban streetscapes and suburban roads, and the challenges of rich scene change and moving object recognition are provided. The data set is selected to synthesize a blurred image, and random situations possibly encountered by the unmanned aerial vehicle are simulated by using a Albumentations image data enhancement library, and the random situations are mainly divided into wind_strength wind intensity, blur_prob blur probability, disguision_prob distortion probability and random horizontal displacement d _x =int (self.wind_ strengtyh w (2 np.random.rand () -1)), and are used for simulating the caused blurred artifacts.

In the combined fuzzy data set, the intensity of wind power is mainly divided into three grades, namely (15, 30) weaker wind power, (30, 50) medium wind power and (50, 80) stronger wind power, the direction of the wind power is set to be random, the range of the wind power is 0-360 degrees, and enough data sets can be obtained through different parameter settings, so that the experimental requirements are met. And sorting the data set, and obtaining a pair of blurred images and clear images as algorithm input.

In said step 2, the input image is preprocessed, i.e. the characteristic information of the image is extracted in the spatial domain and in the frequency domain, respectively. Image IMREAD _ GRAYSCALE is first read in the form of a gray scale, then the gradient of the image in the x and y directions is calculated using the Sobel edge detection method, grad_x=cv2.sobel (img, cv2.cv_64f,1,0, ksize=3), grad_y=cv2.sobel (img, cv2.cv_64f,0,1, ksize=3), and then the gradient is normalized to the [0,1] range.

In the frequency domain, the image is first converted to a floating-point type np. Float32, then the amplitude spectrum of the image is calculated by fourier transform (FFT, fast Fourier Transform), magnitude_ spectum =20×np. Log (np. Abs ((fshift) +1e-10), then the amplitude spectrum is normalized to the [0,1] range, and the frequency domain features also need to be multiplied by scaling factor scale_factor=0.5 for balancing the weights.

Combining two gradients of the extracted spatial domain into two channels, expanding the frequency domain features into a single channel, carrying out feature fusion on the channels, namely, fused_features=torch.cat ([ grad_x, grad_y, freq_feature ], and dim=0), wherein the shape of the feature map is [3, H, W ], introducing an extrusion excitation module (SE, squeeze-and-Excitation Module), adjusting the channel features by the module through a simple gating mechanism and a sigmoid activation function, learning the nonlinear relation between the channels, compressing the spatial dimension to 1 multiplied by 1 through two fully connected layers, and finally, carrying out re-weighting on the frequency domain channels and the spatial domain channels according to the excitation result by the module to realize dynamic adjustment of the features.

In the step 3, a modified DeblurGAN defuzzification algorithm is constructed, and the specific network structure is shown in fig. 2. The first branch is combined with a MobileNet network structure through an FPN (Feature Pyramid Network feature pyramid) structure to extract feature graphs of multiple scales, the feature graphs of 128 channels are generated through processing of a 3X3 convolution (Convolution), batch normalization (BatchNorm) and a ReLU activation function, the FPN layer simulates feature extraction of 5 scales (from large to small), each scale is composed of Convolution block by using the 3X3 convolution, the batch normalization and the ReLU activation function, the output of each layer of the FPN is subjected to 2X2 max pooling (Max pooling) operation to gradually reduce the space size of the feature graphs, and finally the feature graphs of 5 scales are output in the shapes of (B, 128, H/2, W/2), (B, 128, H/4, W/4), (B, 128, H/8,W/8), (B, 128, H/16, W/16), (B, 128, H/32, W/32);

the second branch captures long-range dependencies and global context with a windowed self-attention mechanism through the Swin transducer encoder structure. Each layer consists of a Patch Partition and Swin Transfomer Block, and the Patch Partition module partitions the input image into non-overlapping patches, and each Patch is treated as a token;

Swin Transfomer Block the structure is specifically shown in fig. 3, receiving input feature tensor Z e R ^B×C×H×W, wherein B is batch size, C is channel number, H, W is space dimension of image, dividing input feature map into non-overlapping windows of fixed size, each Window size is m×m, generating windowed feature W ^l, left of the structure is Multi-head Attention module W-MSA Block (Window-based Multi-head Self-attribute) of conventional Window configuration, generating query Q, key K, value V matrix by linear projection, calculating in-Window Attention score and weighting aggregation feature, outputting feature as Z _W-MSA, adding feature with original input element by element, completing first residual connection Z ^l＝Z+Z_W-MSA, realizing cross-Window information interaction, LN (Layer Normalization) performing layer normalization Z _normal＝LayerNorm(Z^l of channel dimension for Z _l, stabilizing numerical distribution, right of the structure is Multi-head Attention module SW-MSA Block (shiftwindow MSA) of moving Window configuration, performing Window shift operation for normalized feature as Window shift operation Repartitioning and calculating attention, outputting the characteristics Z _SW-MSA, adding element by element to complete the second residual connection Z ^l+1＝Z_normal+Z_SW-MSA, applying MLP (Multi-Layer Percention) channel-by-channel feedforward sub-network to perform two-stage nonlinear transformation on Z ^l+1, namely linear projection + GULU activation and linear projection +Dropout, respectively, to ensure that the output characteristic dimension is consistent with the input, and finally performing final channel normalization on MLP output to output complete characteristics X _out＝LayerNorm(X_MLP) for subsequent network Layer processing.

Swin Transfomer Block adopts a division mode of a moving window, and the calculation formula is as follows:

Wherein Z ^l-1 denotes entering the first The token of the Block is used to determine the number of tokens,And Z ^l respectively represents the firstThe output characteristics of the W-MSA and MLP blocks of the Block,And Z ^l+1 respectively represents the firstOutput characteristics of SW-MSA module and MLP module of each Block. In the implementation process of codes, the size of a moving window is set to be window_size=7, the number of multi-head attentions is set to be num_heads=3, the first-stage attentive residual error is obtained by adding the original input X and the characteristics subjected to W-MSA processing to X ^l＝X+X_W-MSA, the second-stage MLP residual error is subjected to LN processing twice, the characteristic information strengthened before and after fusion is obtained, the local attentiveness of a hierarchical level is enhanced, and the attentive capability of the follow-up output characteristics to global information is improved.

Two branches of the double-branch network structure respectively obtain feature images with different scales, the feature images are fused at feature layers with corresponding sizes, then the features with different scales are normalized to 128 channels through a 1X 1Convolution convolution module, fusion of the multi-scale feature images is realized, and finally a restored high-resolution image is output.

In the step 4, the preprocessed image is divided into a training set, a verification set and a test set according to a ratio of 7:2:1, and is input into a model for training. When designing the training program, the size of the input image is cut to be 256 multiplied by 256, data enhancement operations are carried out, including operations such as random cutting, geometric transformation and the like, so that the data are added, the generalization capability of a model is improved, the training round number epochs is set to be 200, the batch number train_ batches _per_epoch of each round of training is set to be 880, the batch number of verification per round is set to be val_ batches _per_epoch to be 440, the batch processing size is set to be 16, an optimizer uses Adam, the initial learning rate is set to be 0.001, and the linear learning rate attenuation is set to be from the 50 th round. When the loss function in the training process tends to be converged and the image similarity degree value tends to be stable, the current training model is judged to be better, and the current training model can be stored for subsequent testing;

the loss function of the modified DeblurGAN deblurring algorithm is divided into a generator loss function and a arbiter loss function, the generator loss includes a counterloss L _gan, a perceived loss L _perceive, and a pixel loss L _pixel, and the calculation formula is as follows:

Wherein B represents the total number of samples of mini-batch in one forward or backward propagation, namely batch size batch_size, B represents the subscript of samples in a processing batch, D (x) represents the output of a discriminator for a real sample x, H, W, C respectively represents the length, width and height of a picture and the number of channels, phi (x) represents a characteristic tensor obtained in a network layer, I _truth represents a real sample image or a target image, I _false represents an image generated by a generator, and the pixel difference and the perception content difference between the real image and the generated image can be obtained in the calculation mode, so that the training effect of the model is continuously optimized.

The loss functions are weighted and combined according to the corresponding set proportion, and the formula is as follows:

L_total＝0.2·L_gan+0.3·L_perceive+0.5·L_pixel

In a specific experimental process, different training effects may be caused by different weighting coefficients, and the weight proportion may be continuously adjusted according to the training effect, so that the training model effect is best.

The objective loss function for GAN-based generation of an antagonism network can be expressed by the following equation:

Wherein the input samples, a batch of samples X ₁,X₂,...,X_N of the real data sample distribution P _data, a batch of noise samples z ₁,z₂,...,z_M from the noise distribution P _z, D (x) represents the output of the discriminator for the real sample x, judges whether the sample is from real data, and when D (x) is 1, log D (x) is 0, and the loss approaches to the minimum value; The first term of the equation is the difference between the calculated true sample x and the output of the discriminator generating the sample, and the second term focuses on the output of the discriminator generating the sample and attempts to minimize D (G (z)) and D (z)) when D (G (z)). Fwdarw.0, log (1-D (G (z)). Fwdarw.0, the loss is close to the minimum value Is a difference in (a) between the two.

In the step 5, the evaluation index of the model adopts a peak signal to noise ratio (PSNR) and a Structural Similarity Index (SSIM), and the specific formulas are as follows:

Where MAX _I represents the maximum pixel value of the image (typically 255), MSE represents the mean square error, H, W represents the height and width of the image, I _truth and I _false are the real sample image and the generated image of the generation network, respectively, (I, j) represents one pixel point in the image, mu _x,μ_y represents the mean of image X and image Y, sigma _x,σ_y represents the standard deviation, sigma _xy represents the covariance, and C ₁,C₂ represents a constant for avoiding denominator to be zero. The PSNR and the SSIM are combined to evaluate the image, so that the training effect of the model can be evaluated more comprehensively.

The invention is not limited to the manner described above. The foregoing description of the specific embodiments is intended to describe and illustrate the technical solutions of the present invention, and many variations can be made by those skilled in the art without departing from the spirit of the invention and the scope of the claims, which are within the scope of the present invention.

Claims

1. An unmanned aerial vehicle aerial image deblurring method based on improvement DeblurGAN is characterized by comprising the following steps:

S1, preparing a dataset, based on an unmanned aerial vehicle aerial image dataset Uavid2020, adding a blurring effect to an original unmanned aerial vehicle aerial image, and storing the blurring image and a corresponding clear image as experimental data;

S2, before algorithm processing, preprocessing a data set, extracting frequency domain information and space domain information of a blurred image in a frequency domain by utilizing Fourier transform and a space domain by utilizing convolution, carrying out dynamic weighting fusion on characteristic information obtained by the frequency domain information and the space domain information through an extrusion excitation module, and outputting the fused image as input of algorithm processing;

S3, carrying out DeblurGAN improvement on a deblurring algorithm, wherein the unmanned aerial vehicle aerial image deblurring method is obtained based on the generation of an countermeasure network improvement, the whole network structure is composed of a generator network and a discriminator network, a coding part of the generator network adopts a FPN MobileNet network and a Swin transform network to form a double-branch network structure, and a decoding part restores the high-resolution characteristic of the image through layer-by-layer up-sampling and convolution operation;

S4, setting a target loss function for generating an countermeasure network model, wherein the network loss function of the generator consists of countermeasure loss, perception loss and pixel loss weighting, the network loss function of the discriminator adopts an improved hinge loss function, combines a gradient penalty term and a stable countermeasure training process, and iterates training on a training set by the generator and the discriminator in an alternate optimization mode until the loss function tends to have a better convergence effect;

s5, in model training, a blurred image is preprocessed in a frequency domain and a spatial domain, then is processed through a double-branch network, the image is subjected to effective feature fusion on multiple scales, finally, a deblurred image is generated, the image output by a generator network is used as input of a discriminator network, the discriminator network adopts a patch structure, the generated image is cut into different patch patches, the discrimination speed of the network is improved, the generator and the discriminator are continuously game, better deblurred processing results are output by the generator network through counter propagation, and quality evaluation is carried out on the restored image by using PSNR and SSIM indexes.

2. The method for deblurring aerial images of a drone based on improvement DeblurGAN according to claim 1, wherein in step S1, based on the disclosed aerial data set Uavid2020, the image data enhancement library Albumentations is used to simulate the image blurring effect caused by the random wind conditions possibly encountered by the drone, including wind_strungth wind intensity, blur_prob blur probability, display_prob distortion probability, and random horizontal displacement d _x =int (self. Wind_strungth w (-1)), and a sufficient number of data sets are obtained by setting different parameters to satisfy the experimental requirements.

3. The method for deblurring an aerial image based on an improved DeblurGAN unmanned aerial vehicle according to claim 1, wherein in step S2, the specific frequency domain and spatial domain feature extraction process is as follows:

Calculating gradient information of an image in a horizontal direction grad_x and a vertical direction grad_y by utilizing an edge operator in a space domain to obtain space domain features reflecting image edge and texture information;

Then, the extracted spatial domain features and frequency domain features are normalized, scaled to a preset interval to eliminate amplitude value difference, and the frequency domain features are further multiplied by a scaling factor to balance the weight relation between the spatial domain features and the frequency domain features, wherein the two gradients of the spatial domain are combined into two channels, the frequency domain features are expanded into a single channel, feature fusion fured_features=torch.cat (grad_x, grad_y, freq_features, dim=0) is carried out on the channels, an extrusion excitation module is introduced, the processed spatial domain features and the frequency domain features are fused in a channel dimension according to dynamic learning weight weighting to form a fusion feature map comprising multiple channels, and the fusion feature map is dynamically adjusted continuously to realize better feature fusion;

4. the method for deblurring an aerial image based on an improved DeblurGAN unmanned aerial vehicle according to claim 1, wherein in the step S3, the process of constructing the dual-branch network is specifically as follows:

The first branch is a MobileNet network based on an FPN characteristic pyramid structure, which comprises an initial convolution layer, performs 3X 3 convolution operation on an input image, transforms the number of input channels from 3 to 128 of a preset channel number, designs a plurality of FPN layers, each layer consists of 3X 3 convolution, batch normalization and ReLU activation functions, and then realizes layer-by-layer downsampling through maximum pooling to generate a multi-scale characteristic map with the shapes of (H/2, W/2), (H/4, W/4) and (H/8,W/8) in sequence;

The multi-layer Swin transform module, each module includes relative position embedding and window-based multi-head self-attention calculation, generates a query matrix Q, a key matrix K and a value matrix V through 1X 1 convolution respectively, and obtains attention distribution by using a scaling factor and softmax function to realize feature enhancement in a local window;

and inputting the fused feature images into a Decoder module, and recovering high-resolution features by the Decoder through layer-by-layer upsampling and convolution operation, so as to finally reconstruct a clear image.

5. The method for deblurring an aerial image based on an improved DeblurGAN unmanned aerial vehicle according to claim 1, wherein in step S4, a generator network loss function is set as follows:

L_total＝λ_gan·L_gan+λ_perceive·L_perceive+λ_pixel·L_pixel

Wherein, lambda _gan、λ_perceive、λ_pixel represents the weight coefficient of the countering loss L _gan, the perception loss L _perceive and the pixel loss L _pixel respectively, and the initial weight parameters are set to be 2:3:5;

The objective loss function for the antagonism network is generated as follows:

First, input samples, sample a batch of samples X ₁,X₂,...,X_N of the real data sample distribution P _data, sample a batch of noise samples z ₁,z₂,...,z_M from the noise distribution P _z, Representing the expected value of the real sample calculation,Representing the expectations of the noise distribution calculation, then calculating the output of the arbiter, D (x) representing the output of the arbiter for the real sample x, G (z) representing the output of the generator, the first term of the equation being the difference between the calculated real sample x and the output of the arbiter generating the sample G (z), the second term focusing on the output of the arbiter generating the sample and attempting to minimize D (G (z)) andIdeally, D (x) should be close to 1, indicating that the true sample is deemed true by the arbiter, and D (G (z)) should be close to 0, indicating that the generated sample is determined false.