CN116563144A

CN116563144A - Dynamic attention-based intensive LSTM residual network denoising method

Info

Publication number: CN116563144A
Application number: CN202310434790.3A
Authority: CN
Inventors: 尹海涛; 田诚浩
Original assignee: Nanjing University of Posts and Telecommunications
Current assignee: Nanjing University of Posts and Telecommunications
Priority date: 2023-04-21
Filing date: 2023-04-21
Publication date: 2023-08-08

Abstract

A dense LSTM residual network denoising method based on dynamic attention comprises constructing a training data set, and preprocessing the training data set; constructing a denoising model by using a dynamic attention mechanism and a dense LSTM residual error network; setting super parameters and loss functions of a network denoising model, adding different levels of noise to a training data set, and training to obtain a trained network model; and (3) carrying out image denoising test by adopting the trained network model, and evaluating denoising effect by using the structural similarity and the peak signal-to-noise ratio. The method has the beneficial effects of improving the noise reduction performance and the imaging quality, effectively improving the image quality, and being beneficial to understanding and analyzing tasks for high-level images such as subsequent image classification, segmentation, identification and tracking.

Description

A Dense LSTM Residual Network Denoising Method Based on Dynamic Attention

技术领域technical field

本发明属于计算机视觉和图像处理领域，具体涉及到一种基于动态注意力的密集型LSTM残差网络去噪方法。The invention belongs to the field of computer vision and image processing, and in particular relates to a dynamic attention-based dense LSTM residual network denoising method.

背景技术Background technique

近年来，随着信息化社会的不断发展，图像信息已经成为人们接收信息的一种主要方式。数字图像在采集、传输过程中受环境、设备、人为因素等影响，不可避免的引入噪声，从而降低图像质量，影响图像的可读性。图像去噪是提高图像质量非常重要的方法之一。多年来，图像去噪被运用于多种领域中。例如，在医疗领域，对含有噪声的医疗图像进行去噪处理，来提高图像的清晰度，从而帮助医护人员能更好的判断病人情况；在航空航天等领域，图像在成像过程中常常会受到电子设备的干扰，导致图像含有高斯白噪声、椒盐噪声等等，需要通过图像去噪进行改善。手持移动设备或者车载记录仪在夜间拍摄时因为光线、环境等因素常常会产生噪声，影响图像质量从而给后续的图像分析和处理带来诸多不便，同时去噪便于后续处理图像的分类、分割及识别等任务。In recent years, with the continuous development of the information society, image information has become a main way for people to receive information. In the process of acquisition and transmission, digital images are affected by environmental, equipment, human factors, etc., which inevitably introduce noise, thereby reducing image quality and affecting image readability. Image denoising is one of the very important methods to improve image quality. Image denoising has been used in a variety of fields over the years. For example, in the medical field, noise-containing medical images are denoised to improve the clarity of the image, thereby helping medical staff to better judge the patient's condition; in aerospace and other fields, images are often damaged during the imaging process. The interference of electronic equipment causes the image to contain Gaussian white noise, salt and pepper noise, etc., which need to be improved through image denoising. Hand-held mobile devices or vehicle-mounted recorders often produce noise due to factors such as light and environment when shooting at night, which affects image quality and brings a lot of inconvenience to subsequent image analysis and processing. At the same time, denoising is convenient for subsequent image classification, segmentation and recognition tasks.

图像去噪是图像处理领域中的经典问题，同时也是计算机视觉的重要预处理步骤。传统图像去噪算法依赖于较强的先验模型假设以及人工调参，涉及到复杂的优化问题，需要花费大量的时间和成本。随着近年来，深度学习理论的不断完善，相关专家学者成功将其应用于图像去噪领域，并获得了优异的成果。深度神经网络通过卷积、激活等操作有效的从图像中提取的特征信息，对图像去噪研究贡献颇多。当前，注意力机制常被用于卷积神经网络的搭建，广泛应用于自然语言理解、图像去噪和识别等领域。虽然注意力机制能提高卷积神经网络的去噪效果，但是需要人工设置注意力模块在卷积神经网络中的位置，并且不同位置上作用强度也不同，这导致想要通过注意力机制有效提高网络性能需要对不同网络结构进行多次试验，大大增加了人力成本。Image denoising is a classic problem in the field of image processing, and it is also an important preprocessing step in computer vision. Traditional image denoising algorithms rely on strong prior model assumptions and manual parameter adjustment, which involve complex optimization problems and require a lot of time and cost. With the continuous improvement of deep learning theory in recent years, relevant experts and scholars have successfully applied it to the field of image denoising and achieved excellent results. Deep neural network effectively extracts feature information from images through operations such as convolution and activation, and contributes a lot to image denoising research. At present, the attention mechanism is often used in the construction of convolutional neural networks, and is widely used in the fields of natural language understanding, image denoising and recognition. Although the attention mechanism can improve the denoising effect of the convolutional neural network, it is necessary to manually set the position of the attention module in the convolutional neural network, and the strength of the effect is different at different positions, which leads to the effective improvement of the convolutional neural network through the attention mechanism. Network performance requires multiple experiments on different network structures, which greatly increases labor costs.

发明内容Contents of the invention

为解决上述技术问题，本发明提供了一种基于动态注意力的密集型LSTM残差网络去噪方法，通过考虑空间注意力权重和通道间注意力权重，从而更好的指导网络进行去噪，并保证了图像的去噪质量。通过设计LSTM结构，兼顾了全局信息与局部信息的依赖性，此方法对降噪性能和成像质量有显著的提升。In order to solve the above technical problems, the present invention provides a dynamic attention-based dense LSTM residual network denoising method, by considering the spatial attention weight and the inter-channel attention weight, so as to better guide the network to denoise, And ensure the denoising quality of the image. By designing the LSTM structure, taking into account the dependence of global information and local information, this method has significantly improved the noise reduction performance and imaging quality.

一种基于动态注意力的密集型LSTM残差网络去噪方法，其特征在于：包括如下步骤：A kind of dense type LSTM residual network denoising method based on dynamic attention, it is characterized in that: comprise the steps:

步骤S1，构造训练数据集，并对所述数据集进行预处理操作；Step S1, constructing a training data set, and performing a preprocessing operation on the data set;

步骤S2，建立动态注意力机制和密集型LSTM残差网络构成的去噪网络模型；Step S2, establishing a denoising network model composed of a dynamic attention mechanism and a dense LSTM residual network;

步骤S3，设置去噪网络模型的超参数和损失函数；Step S3, setting hyperparameters and loss functions of the denoising network model;

步骤S4，对训练数据集添加不同等级的噪声，并对网络进行训练，得到训练后的去噪网络模型；Step S4, adding different levels of noise to the training data set, and training the network to obtain a trained denoising network model;

步骤S5，输入测试集到训练后的去噪网络模型，得到去噪后图像，并用结构相似性和峰值信噪比评估噪声图像(把待评估图像即网络模型输出的去噪后图像和原始图像代入计算公式进行评估即可)；将通过评估的去噪网络模型投入使用，输入待处理图像，输出去噪后的图像。Step S5, input the test set into the trained denoising network model, obtain the denoised image, and use structural similarity and peak signal-to-noise ratio to evaluate the noise image (the image to be evaluated is the denoised image output by the network model and the original image Substitute into the calculation formula for evaluation); put the denoising network model that has passed the evaluation into use, input the image to be processed, and output the denoised image.

进一步地，步骤S1中，对训练数据集进行预处理操作包括如下步骤：Further, in step S1, the preprocessing operation on the training data set includes the following steps:

S1.1)，从所述训练数据集中选取训练样本作为原始训练集，其中，所述训练样本中的图像均为尺寸相同的无噪声图像；S1.1), selecting training samples from the training data set as the original training set, wherein the images in the training samples are all noise-free images of the same size;

S1.2)，将训练数据集中的图像分别按照1倍、0.9倍、0.8倍、0.7倍的缩放，并使用滑动固定大小的分割窗口将缩放后的每个图像进行分割；S1.2), the images in the training data set are respectively scaled by 1 times, 0.9 times, 0.8 times, and 0.7 times, and each image after scaling is segmented using a sliding fixed-size segmentation window;

S1.3)，将分割后的图像进行增广操作，所述的增广操作的方法包括：对图像进行上下翻转、90°旋转、180°旋转、270°旋转、上下翻转后90°旋转、上下翻转后180°旋转、上下翻转后270°旋转；S1.3), performing an augmentation operation on the segmented image, the method of the augmentation operation includes: flipping the image up and down, rotating by 90°, rotating by 180°, rotating by 270°, rotating by 90° after flipping up and down, 180° rotation after flipping up and down, 270° rotation after flipping up and down;

进一步地，步骤S2中，去噪网络模型由卷积、LReLU以及8个动态注意力模块构成，其中动态注意力模块包括前端和后端的残差单元、非注意力分支、注意力分支和权重分配分支，并在动态注意力模块内部采用LSTM结构，网络模型内部所用卷积核大小为3×3和5×5。Further, in step S2, the denoising network model is composed of convolution, LReLU and 8 dynamic attention modules, where the dynamic attention module includes front-end and back-end residual units, non-attention branches, attention branches and weight distribution branch, and adopts the LSTM structure inside the dynamic attention module, and the convolution kernel size used inside the network model is 3×3 and 5×5.

进一步地，所述前端和后端的残差单元为残差模块，该模块由2个卷积核大小为3×3的卷积层和LReLU激活函数构成，设Y为输入，数学表达式如下：Further, the residual unit of the front end and the back end is a residual module, which is composed of two convolutional layers with a convolution kernel size of 3×3 and an LReLU activation function. Let Y be the input, and the mathematical expression is as follows:

X₀＝ψ(W₂*ψ(W₁*Y+b₁)+b₂)X ₀ ＝ψ(W ₂ *ψ(W ₁ *Y+b ₁ )+b ₂ )

上式中W₁和W₂表示输出通道为64的3×3卷积层，b₁和b₂为偏置，ψ表示LReLU激活函数，X₀表示提取的浅层特征；In the above formula, W ₁ and W ₂ represent a 3×3 convolutional layer with 64 output channels, b ₁ and b ₂ are biases, ψ represents the LReLU activation function, and X ₀ represents the extracted shallow features;

定义X_i-1表示动态注意力模块的输入；前端特征提取部分由1个残差模块和LReLU激活函数组成，对应的数学表达式为：Define Xi _-1 to represent the input of the dynamic attention module; the front-end feature extraction part consists of a residual module and an LReLU activation function, and the corresponding mathematical expression is:

式中表示E残差模块；接着，将分别输入注意力分支、非注意分支。In the formula, it represents the E residual module; then, the Input the attention branch and non-attention branch respectively.

进一步地，所述非注意力分支由2个残差单元、1个LSTM模块组成主干网络，其中残差单元包含1个3×3卷积层和1个5×5卷积层，LSTM模块由残差模块、LSTM单元、卷积层和注意力掩模层组成的循环网络结构；LSTM单元数学表达式如下Further, the non-attention branch consists of 2 residual units and 1 LSTM module to form a backbone network, where the residual unit contains 1 3×3 convolutional layer and 1 5×5 convolutional layer, and the LSTM module consists of Recurrent network structure composed of residual module, LSTM unit, convolution layer and attention mask layer; the mathematical expression of LSTM unit is as follows

i_t＝σ(W_xi*X_t+W_hi*H_t-1+W_ci⊙C_t-1+b_i)i _t ＝σ(W _xi *X _t +W _hi *H _t-1 +W _ci ⊙C _t-1 +b _i )

f_t＝σ(W_xf*X_t+W_hf*H_t-1+W_cf⊙C_t-1+b_f)f _t ＝σ(W _xf *X _t +W _hf *H _t-1 +W _cf ⊙C _t-1 +b _f )

C_t＝f_t⊙C_t-1+i_t⊙tanh(W_xc*X_t+W_hc*H_t-1+b_c)C _t ＝f _t ⊙C _t-1 +i _t ⊙tanh(W _xc *X _t +W _hc *H _t-1 +b _c )

o_t＝σ(W_xo*X_t+W_ho*H_t-1+W_co⊙C_t+b_o)o _t ＝σ(W _xo *X _t +W _ho *H _t-1 +W _co ⊙C _t +b _o )

H_t＝o_t⊙tanh(C_t)H _t ＝o _t ⊙tanh(C _t )

其中LSTM由输入门i_t、遗忘门f_t、输出门o_t、长时记忆状态C_t和短时记忆状态H_t组成；σ和tanh分别表示Sigmoid激活函数和Tanh激活函数，W_jk表示从j到k卷积操作，⊙表示矩阵点乘，b_j表示第j单元的激活值；X_t表示由残差模块获得的特征图，C_t对送到下一个LSTM的特征进行编码，H_t作为当前的LSTM单元输出，并输入一下LSTM单元。Among them, LSTM is composed of input gate _it , forgetting gate f _t , output gate o _t , long-term memory state C _t and short-term memory state _H _t ; Convolution operation from j to k, ⊙ represents matrix point multiplication, b _j represents the activation value of unit j; X _t represents the feature map obtained by the residual module, C _t encodes the features sent to the next LSTM, H _t As the output of the current LSTM unit, and input to the LSTM unit.

进一步地，所述注意力分支由主干网络B_i和掩模部分M_i组成，表示为：Further, the attention branch consists of a backbone network B _i and a mask part M _i , expressed as:

上式中F^att表示注意力分支，i表示第i个动态注意力模块，主干网络B_i包含2个残差模块，掩模部分M_i包含空洞卷积、通道注意力模块、下采样操作、上采样操作以及Sigmoid函数。In the above formula, F ^att represents the attention branch, i represents the i-th dynamic attention module, the backbone network _Bi contains 2 residual modules, and the mask part _Mi contains dilated convolution, channel attention module, downsampling operation, Upsampling operation and Sigmoid function.

进一步地，所述权重分配分支由空间注意力模块、通道注意力模块构成，其中空间注意力模块包括了一对最大池化操作和平均池化操作，特征图分别经过两种池化操作后拼接成一个大特征图，之后由Sigmoid激活函数得到空间上的权重，并输入到通道注意力模块；通道注意力模块由平均池化层、两层线性层和Softmax激活函数构成，最终将产出动态注意力权重分布用于非注意分支和注意力分支。Further, the weight assignment branch is composed of a spatial attention module and a channel attention module, wherein the spatial attention module includes a pair of maximum pooling operations and average pooling operations, and the feature maps are spliced after two pooling operations into a large feature map, and then the spatial weight is obtained by the Sigmoid activation function, and input to the channel attention module; the channel attention module is composed of an average pooling layer, two linear layers and a Softmax activation function, and will eventually produce a dynamic The attention weight distribution is used for non-attention branch and attention branch.

进一步地，步骤S3中，去噪网络模型的超参数包括批大小、初始学习率、迭代次数、学习率衰减策略；损失函数为均方差误差，其数学表达式为：Further, in step S3, the hyperparameters of the denoising network model include batch size, initial learning rate, number of iterations, and learning rate decay strategy; the loss function is the mean square error, and its mathematical expression is:

式中，Θ为动态注意力机制的密集型LSTM残差网络参数，R(y_i；Θ)为网络学习到的残差图像，y_i为噪声图像，x_i代表干净图像，N为训练样本数；采用Train400数据集作为训练数据集，每个轮次随机提取40个尺寸为128×128的图像块作为样本，并采用Adam优化器进行训练，初始学习率设为1×10^-4，每10轮学习率下降0.2倍，网络共计训练100轮。In the formula, Θ is the dense LSTM residual network parameter of the dynamic attention mechanism, R(y _i ; Θ) is the residual image learned by the network, y _i is the noise image, _xi represents the clean image, and N is the training sample number; the Train400 data set is used as the training data set, and 40 image blocks with a size of 128×128 are randomly extracted in each round as samples, and the Adam optimizer is used for training, and the initial learning rate is set to 1×10 ^-4 . The learning rate is reduced by 0.2 times in 10 rounds, and the network is trained for 100 rounds in total.

进一步地，在步骤S4中，去噪网络模型的训练方法为：Further, in step S4, the training method of the denoising network model is:

S4.1)，将原始数据训练集中的图像分别添加噪声等级为15、25、50的高斯白噪声；S4.1), adding Gaussian white noise with noise levels of 15, 25, and 50 to the images in the original data training set;

S4.2)，将加入噪声的训练图像输入到所述网络模型中进行训练，从而得到训练好的去噪网络模型，并对其进行保存。具体训练步骤就是通过运行PYcharm实现。S4.2), inputting the noise-added training image into the network model for training, so as to obtain a trained denoising network model, and save it. The specific training steps are realized by running PYcharm.

进一步地，步骤S5中，结构相似性的计算公式为：Further, in step S5, the calculation formula of structural similarity is:

其中x和y为两幅图像，为均值；/>为方差；σ_xy是x和y的协方差；d₁＝(k₁L)²,d₂＝(k₂L)²，L是像素的动态范围；k₁＝0.01,k₂＝0.03；where x and y are two images, is the mean value; /> is the variance; σ _xy is the covariance of x and y; d ₁ =(k ₁ L) ² , d ₂ =(k ₂ L) ² , L is the dynamic range of the pixel; k ₁ =0.01, k ₂ =0.03;

峰值信噪比的计算公式为：The calculation formula of peak signal-to-noise ratio is:

式中:x(i,j)和y(i,j)分别代表了初始图像x(i,j)和去噪后的图像y(i,j)中相对应位置的像素值，Q表示图像中最大灰度值。In the formula: x(i,j) and y(i,j) respectively represent the pixel values of the corresponding positions in the initial image x(i,j) and the denoised image y(i,j), and Q represents the image The maximum gray value in the medium.

本发明达到的有益效果为：The beneficial effects that the present invention reaches are:

(1)通过考虑空间注意力权重和通道间注意力权重，从而更好的指导网络进行去噪，并保证了图像的去噪质量。(1) By considering the spatial attention weight and the inter-channel attention weight, it can better guide the network to denoise and ensure the denoising quality of the image.

(2)通过设计LSTM结构，兼顾了全局信息与局部信息的依赖性，此方法对降噪性能和成像质量有显著的提升。(2) By designing the LSTM structure, taking into account the dependence of global information and local information, this method has significantly improved the noise reduction performance and imaging quality.

(3)有利于为后续的图像分类、分割、识别与跟踪等高层图像理解与分析任务。(3) It is beneficial for subsequent high-level image understanding and analysis tasks such as image classification, segmentation, recognition and tracking.

附图说明Description of drawings

图1是本发明实施例中的方法流程图。Fig. 1 is a flow chart of the method in the embodiment of the present invention.

图2是本发明实施例中的去噪网络模型示意图。Fig. 2 is a schematic diagram of a denoising network model in an embodiment of the present invention.

图3是本发明实施例中的网络中动态注意力模块示意图。Fig. 3 is a schematic diagram of the dynamic attention module in the network in the embodiment of the present invention.

图4是本发明实施例中的各种算法去噪结果的PSNR值表。Fig. 4 is a table of PSNR values of denoising results of various algorithms in the embodiment of the present invention.

图5是本发明实施例中的各种算法去噪结果的SSIM值表。Fig. 5 is a table of SSIM values of denoising results of various algorithms in the embodiment of the present invention.

具体实施方式Detailed ways

下面结合说明书附图对本发明的技术方案做进一步的详细说明。The technical solution of the present invention will be further described in detail below in conjunction with the accompanying drawings.

如图1所示，一种基于动态注意力的密集型LSTM残差网络去噪算法，包括如下步骤：As shown in Figure 1, a dense LSTM residual network denoising algorithm based on dynamic attention includes the following steps:

步骤S1：构造训练数据集，并对所述数据集进行预处理操作。Step S1: Construct a training data set, and perform preprocessing operations on the data set.

步骤S1中，预处理操作包括如下步骤：In step S1, the preprocessing operation includes the following steps:

S1.1)，从所述训练数据集中选取训练样本作为原始训练集，其中，所述训练样本中的图像均为尺寸相同的无噪声图像；在本实施例中，所述训练样本的容量为400张高180像素，宽180像素的图片。S1.1), selecting training samples from the training data set as the original training set, wherein the images in the training samples are all noise-free images of the same size; in this embodiment, the capacity of the training samples is 400 images with a height of 180 pixels and a width of 180 pixels.

S1.2)，将训练数据集中的图像分别按照1倍、0.9倍、0.8倍、0.7倍的缩放，并使用滑动固定大小的分割窗口将缩放后的每个图像进行分割。S1.2), the images in the training data set are respectively scaled by 1 times, 0.9 times, 0.8 times, and 0.7 times, and each image after scaling is segmented using a sliding fixed-size segmentation window.

S1.3)，将分割后的图像进行增广操作，所述的增广操作的方法包括：对图像进行上下翻转、90°旋转、180°旋转、270°旋转、上下翻转后90°旋转、上下翻转后180°旋转、上下翻转后270°旋转。S1.3), performing an augmentation operation on the segmented image, the method of the augmentation operation includes: flipping the image up and down, rotating by 90°, rotating by 180°, rotating by 270°, rotating by 90° after flipping up and down, 180° rotation after flipping up and down, 270° rotation after flipping up and down.

S1.4)，对数据集中的图像分别加入噪声等级为15、25、50的高斯白噪声。S1.4), adding Gaussian white noise with noise levels of 15, 25, and 50 to the images in the data set.

步骤S2：使用动态注意力机制和密集型LSTM残差网络构成的去噪算法。Step S2: Use a denoising algorithm composed of a dynamic attention mechanism and a dense LSTM residual network.

如图2所示，所述网络去噪模型由卷积操作、LReLU以及8个动态注意力模块构成，其中动态注意力模块包括了空间注意力、通道注意力、残差模块、拼接层等，并在动态注意力模块内部采用LSTM结构，网络模型内部所用卷积核大小均为3×3和5×5。As shown in Figure 2, the network denoising model consists of convolution operation, LReLU and 8 dynamic attention modules, where the dynamic attention module includes spatial attention, channel attention, residual module, splicing layer, etc. And the LSTM structure is used inside the dynamic attention module, and the convolution kernel sizes used in the network model are both 3×3 and 5×5.

如图3所示，所述的动态注意力模块构成部分包括了：前端和后端的残差单元、非注意力分支、注意力分支和权重分配分支组成。所述前端和后端的残差单元为残差模块，该模块由2个卷积核大小为3×3的卷积层和LReLU激活函数构成,设Y为输入，数学表达式如下：As shown in FIG. 3 , the components of the dynamic attention module include: a front-end and a back-end residual unit, a non-attention branch, an attention branch and a weight distribution branch. The residual unit of the front end and the back end is a residual module, which is composed of two convolutional layers with a convolution kernel size of 3×3 and an LReLU activation function. Let Y be the input, and the mathematical expression is as follows:

上式中W₁和W₂表示输出通道为64的3×3卷积层，b₁和b₂为偏置，ψ表示LReLU激活函数，X₀表示提取的浅层特征。In the above formula, W ₁ and W ₂ represent a 3×3 convolutional layer with 64 output channels, b ₁ and b ₂ are biases, ψ represents the LReLU activation function, and X ₀ represents the extracted shallow features.

假设X_i-1表示动态注意力模块的输入。前端特征提取部分由1个残差模块和LReLU激活函数组成，对应的数学表达式为：Let Xi _-1 denote the input of the dynamic attention module. The front-end feature extraction part consists of a residual module and an LReLU activation function, and the corresponding mathematical expression is:

式中表示E残差模块。接着，将分别输入注意力分支、非注意分支。where represents the E residual module. Next, put Input the attention branch and non-attention branch respectively.

非注意力分支由2个残差单元、1个LSTM模块组成主干网络，其中残差单元包含两层3×3和5×5卷积层，LSTM模块由残差模块、LSTM单元、卷积层和注意力掩模层组成循环网络结构。LSTM单元数学表达式如下：The non-attention branch consists of 2 residual units and 1 LSTM module to form the backbone network, where the residual unit contains two layers of 3×3 and 5×5 convolutional layers, and the LSTM module consists of a residual module, LSTM unit, convolutional layer and the attention mask layer to form a recurrent network structure. The mathematical expression of the LSTM unit is as follows:

H_t＝o_t⊙tanh(C_t)H _t ＝o _t ⊙tanh(C _t )

其中LSTM由输入门i_t、遗忘门f_t、输出门o_t、长时记忆状态C_t和短时记忆状态H_t组成。σ和tanh分别表示Sigmoid激活函数和Tanh激活函数，W_jk表示从j到k卷积操作，⊙表示矩阵间点乘，b_j表示第j单元的激活值。X_t表示由残差模块获得的特征图，C_t对送到下一个LSTM的特征进行编码，H_t作为当前的LSTM单元输出，并输入一下LSTM单元。Among them, LSTM is composed of input gate _it , forget gate _ft , output gate _ot , long-term memory state _Ct and short-term memory state _Ht . σ and tanh represent the Sigmoid activation function and Tanh activation function respectively, W _jk represents the convolution operation from j to k, ⊙ represents the point multiplication between matrices, and b _j represents the activation value of the jth unit. X _t represents the feature map obtained by the residual module, C _t encodes the features sent to the next LSTM, H _t is output as the current LSTM unit, and is input to the LSTM unit.

注意力分支由主干网络B_i和掩模部分M_i组成，可以表示为：The attention branch consists of a backbone network _Bi and a mask part _Mi , which can be expressed as:

上式中主干网络B_i包含2个残差模块，掩模部分M_i包含空洞卷积、通道注意力模块、下采样操作、上采样操作以及Sigmoid函数。In the above formula, the backbone network B _i contains 2 residual modules, and the mask part M _i includes hole convolution, channel attention module, downsampling operation, upsampling operation and Sigmoid function.

权重分配分支由空间注意力模块、通道注意力模块构成，其中空间注意力模块包括了一对最大池化操作和平均池化操作，特征图分别经过两种池化操作后拼接成一个大特征图，之后由Sigmoid激活函数得到空间上的权重，并输入到通道注意力模块。通道注意力模块由平均池化层、两层线性层和Softmax激活函数构成，最终将产出动态注意力权重分布用于非注意分支和注意力分支。The weight distribution branch consists of a spatial attention module and a channel attention module. The spatial attention module includes a pair of maximum pooling operations and average pooling operations. The feature maps are spliced into a large feature map after two pooling operations. , and then the spatial weight is obtained by the Sigmoid activation function, and input to the channel attention module. The channel attention module consists of an average pooling layer, two linear layers and a Softmax activation function, which will eventually produce a dynamic attention weight distribution for the non-attention branch and the attention branch.

步骤S3：设置所述去噪网络模型的超参数和损失函数。Step S3: setting hyperparameters and loss functions of the denoising network model.

所述网络模型的超参数包括批大小、初始学习率、迭代次数、学习率衰减策略。The hyperparameters of the network model include batch size, initial learning rate, number of iterations, and learning rate decay strategy.

所述损失函数为均方差误差，其数学表达式为：The loss function is the mean square error, and its mathematical expression is:

式中，Θ为动态注意力机制的密集型LSTM残差网络参数，R(y_i；Θ)为网络学习到的残差图像，y_i为噪声图像，x_i代表干净图像，N为训练样本数。采用Train400数据集作为训练数据集，每轮随机提取40个尺寸为128×128的图像块作为样本，并采用Adam优化器进行训练，初始学习率设为1×10^-4，每10轮学习率下降0.2倍，网络共计训练100轮。In the formula, Θ is the dense LSTM residual network parameter of the dynamic attention mechanism, R(y _i ; Θ) is the residual image learned by the network, y _i is the noise image, _xi represents the clean image, and N is the training sample number. The Train400 data set is used as the training data set, and 40 image blocks with a size of 128×128 are randomly extracted in each round as samples, and the Adam optimizer is used for training. The initial learning rate is set to 1×10 ^-4 , and the learning rate is Dropped by 0.2 times, the network has been trained for a total of 100 rounds.

步骤S4：对训练数据集添加不同等级的噪声，并对网络进行训练，得到训练后的网络模型。Step S4: Add different levels of noise to the training data set, and train the network to obtain a trained network model.

在步骤S4中，去噪网络模型的训练方法为：In step S4, the training method of the denoising network model is:

S4.1)：将原始数据训练集中的图像分别添加噪声等级为15、25、50的高斯白噪声；S4.1): adding Gaussian white noise with noise levels of 15, 25, and 50 to the images in the original data training set;

S4.2)：将加入噪声的训练图像输入到所述网络模型中进行训练，从而得到训练好的去噪网络模型，并对其进行保存。S4.2): Input the noise-added training image into the network model for training, so as to obtain a trained denoising network model, and save it.

步骤S5：输入测试集到网络得到去噪后图像，并用结构相似性和峰值信噪比评估噪声图像。Step S5: Input the test set to the network to obtain the denoised image, and evaluate the noisy image with structural similarity and peak signal-to-noise ratio.

测试集包括Set12、BSD68灰度数据集，所述测试集输入到训练好的网络模型中进行图像去噪。The test set includes Set12, BSD68 grayscale data set, and the test set is input into the trained network model for image denoising.

结构相似性的计算公式为：The formula for calculating structural similarity is:

其中x和y为两幅图像，为均值；/>为方差；σ_xy是x和y的协方差；d₁＝(k₁L)²,d₂＝(k₂L)²,L是像素的动态范围；k₁＝0.01,k₂＝0.03。where x and y are two images, is the mean value; /> is the variance; σ _xy is the covariance of x and y; d ₁ =(k ₁ L) ² , d ₂ =(k ₂ L) ² , L is the dynamic range of the pixel; k ₁ =0.01, k ₂ =0.03.

式中，x(i,j)和y(i,j)分别代表了初始图像x(i,j)和去噪后的图像y(i,j)中相对应位置的像素值，Q表示图像中最大灰度值。In the formula, x(i,j) and y(i,j) respectively represent the pixel values of the corresponding positions in the initial image x(i,j) and the denoised image y(i,j), and Q represents the image The maximum gray value in the medium.

图4和图5给出了各种算法在不同数据集上PSNR和SSIM指标结果。本方法在三种不同噪声情况下PSNR均取得较好结果。在Set12测试数据集上，本方法在噪声强度为15和50的情况下，平均PSNR指标分别达到33.05dB、27.59dB，比FFDNet算法均提高了0.20dB。在噪声强度25情况下，本发明算法在Set12上的PSNR略低于FDnCNN。在高噪声强度15和25的情况下，本方法的SSIM指标结果均优于其他算法。Figure 4 and Figure 5 show the PSNR and SSIM index results of various algorithms on different data sets. This method achieves good PSNR results under three different noise conditions. On the Set12 test data set, when the noise intensity is 15 and 50, the average PSNR indicators of this method reach 33.05dB and 27.59dB respectively, which is 0.20dB higher than the FFDNet algorithm. In the case of noise intensity 25, the PSNR of the algorithm of the present invention on Set12 is slightly lower than that of FDnCNN. In the case of high noise intensity 15 and 25, the SSIM index results of our method are better than other algorithms.

以上所述仅为本发明的较佳实施方式，本发明的保护范围并不以上述实施方式为限，但凡本领域普通技术人员根据本发明所揭示内容所作的等效修饰或变化，皆应纳入权利要求书中记载的保护范围内。The above descriptions are only preferred embodiments of the present invention, and the scope of protection of the present invention is not limited to the above embodiments, but all equivalent modifications or changes made by those of ordinary skill in the art according to the disclosure of the present invention should be included within the scope of protection described in the claims.

Claims

1. A dynamic attention-based intensive LSTM residual network denoising method is characterized in that: the method comprises the following steps:

s1, constructing a training data set, and preprocessing the data set;

step S2, a denoising network model formed by a dynamic attention mechanism and a dense LSTM residual network is established;

step S3, setting super parameters and loss functions of the denoising network model;

step S4, adding different levels of noise to the training data set, and training the network to obtain a trained denoising network model;

s5, inputting a test set to the trained denoising network model to obtain a denoised image, and evaluating the noise image by using the structural similarity and the peak signal-to-noise ratio; and putting the denoising network model which passes the evaluation into use, inputting an image to be processed, and outputting a denoised image.

2. The method for denoising the intensive LSTM residual network based on dynamic attention according to claim 1, wherein the method comprises the following steps: in step S1, the preprocessing operation for the training data set includes the following steps:

s1.1), selecting a training sample from the training data set as an original training set, wherein images in the training sample are noise-free images with the same size;

s1.2), scaling the images in the training data set according to 1 times, 0.9 times, 0.8 times and 0.7 times respectively, sliding a segmentation window with a fixed size, and extracting image blocks from each scaled image;

s1.3), performing an augmentation operation on the segmented image, wherein the augmentation operation method comprises the following steps: the image is turned up and down, rotated by 90 degrees, rotated by 180 degrees, rotated by 270 degrees, rotated by 90 degrees after being turned up and down, rotated by 180 degrees after being turned up and down, and rotated by 270 degrees after being turned up and down.

3. The method for denoising the intensive LSTM residual network based on dynamic attention according to claim 2, wherein the method comprises the following steps: in step S2, the denoising network model is composed of convolution, lrehu and 8 dynamic attention modules, wherein the dynamic attention modules include residual units of front end and back end, non-attention branches, attention branches and weight distribution branches, and an LSTM structure is adopted inside the dynamic attention modules, and the convolution kernel size used inside the network model is 3×3 and 5×5.

4. A method for denoising an intensive LSTM residual network based on dynamic attention according to claim 3, wherein: the residual units of the front end and the rear end are residual modules, the modules are composed of 2 convolution layers with the convolution kernel size of 3 multiplied by 3 and an LReLU activation function, Y is taken as an input, and the mathematical expression is as follows:

X ₀ ＝ψ(W ₂ *ψ(W ₁ *Y+b ₁ )+b ₂ )

w in the above ₁ And W is ₂ 3 x 3 convolutional layer representing output channel 64, b ₁ And b ₂ For bias, ψ represents the LReLU activation function, X ₀ Representing the extracted shallow features;

definition X _i-1 An input representing a dynamic attention module; the front-end feature extraction part consists of 1 residual error module and LReLU activation function, and the corresponding mathematical expression is:

wherein the formula represents an E residual error module; next, the process willThe attention branches and the non-attention branches are input respectively.

5. A method for denoising an intensive LSTM residual network based on dynamic attention according to claim 3, wherein: the non-attention branch consists of a main network composed of 2 residual units and 1 LSTM module, wherein the residual units comprise 1 3×3 convolution layer and 15×5 convolution layer, and the LSTM module is a cyclic network structure composed of the residual units, the LSTM units, the convolution layer and the attention mask layer; the LSTM unit mathematical expression is as follows

i _t ＝σ(W _xi *X _t +W _hi *H _t-1 +W _ci ⊙C _t-1 +b _i )

f _t ＝σ(W _xf *X _t +W _hf *H _t-1 +W _cf ⊙C _t-1 +b _f )

C _t ＝f _t ⊙C _t-1 +i _t ⊙tanh(W _xc *X _t +W _hc *H _t-1 +b _c )

o _t ＝σ(W _xo *X _t +W _ho *H _t-1 +W _co ⊙C _t +b _o )

H _t ＝o _t ⊙tanh(C _t )

Wherein LSTM is defined by input gate i _t Forgetting door f _t Output door o _t State C of long-term memory _t And a short-term memory state H _t Composition; sigma and Tanh represent the Sigmoid activation function and the Tanh activation function, respectively, W _jk Indicating the convolution operation from j to k, +. _j An activation value representing a j-th cell; x is X _t Representing the feature map obtained by the residual block, C _t Encoding features to the next LSTM, H _t As the current LSTM cell output, and input a next LSTM cell.

6. A method for denoising an intensive LSTM residual network based on dynamic attention according to claim 3, wherein: the attention branch is formed by a backbone network B _i And a mask portion M _i The composition is expressed as:

f in the above ^att Representing the attention branch, i representing the ith dynamic attention module, backbone network B _i Comprising 2 residual modules, mask portion M _i Comprising hollow rollsProduct, channel attention module, downsampling operation, upsampling operation, and Sigmoid function.

7. A method for denoising an intensive LSTM residual network based on dynamic attention according to claim 3, wherein: the weight distribution branch consists of a space attention module and a channel attention module, wherein the space attention module comprises a pair of maximum pooling operation and average pooling operation, the feature images are spliced into a large feature image after the two pooling operations respectively, and then the space weight is obtained by a Sigmoid activation function and is input to the channel attention module; the channel attention module consists of an average pooling layer, two linear layers and a Softmax activation function, and finally, the dynamic attention weight distribution of output is used for non-attention branches and attention branches.

8. A method for denoising an intensive LSTM residual network based on dynamic attention according to claim 3, wherein: in step S3, the super parameters of the denoising network model comprise batch size, initial learning rate, iteration times and learning rate attenuation strategies; the loss function is a mean square error, and the mathematical expression is:

where Θ is the dense LSTM residual network parameter for dynamic attention mechanism, R (y _i The method comprises the steps of carrying out a first treatment on the surface of the Θ) is a network learned residual image, y _i Is a noise image, x _i Representing a clean image, wherein N is the number of training samples; adopting Train400 data set as training data set, setting network super parameter, randomly extracting 40 image blocks with 128×128 size as samples in each round, training by Adam optimizer, and setting initial learning rate to 1×10 ^-4 Every 10 rounds of learning rate is reduced by 0.2 times, and the total training of the network is 100 rounds.

9. The method for denoising the intensive LSTM residual network based on dynamic attention according to claim 8, wherein the method comprises the following steps: in step S4, the training method of the denoising network model is as follows:

s4.1), respectively adding Gaussian white noise with noise levels of 15, 25 and 50 to the images in the original data training set;

s4.2), inputting the training image added with noise into the network model for training, thereby obtaining a trained denoising network model, and storing the trained denoising network model.

10. The method for denoising the intensive LSTM residual network based on dynamic attention according to claim 9, wherein the method comprises the following steps: in step S5, the calculation formula of the structural similarity is:

where x and y are two images,is the mean value; />Is the variance; sigma (sigma) _xy Is the covariance of x and y; d, d ₁ ＝(k ₁ L) ² ,d ₂ ＝(k ₂ L) ² L is the dynamic range of the pixel; k (k) ₁ ＝0.01,k ₂ ＝0.03；

The peak signal-to-noise ratio is calculated as:

where x (i, j) and y (i, j) represent pixel values at corresponding positions in the original image x (i, j) and the denoised image y (i, j), respectively, and Q represents a maximum gray value in the image.