CN111667445B

CN111667445B - Image compressed sensing reconstruction method based on Attention multi-feature fusion

Info

Publication number: CN111667445B
Application number: CN202010479734.8A
Authority: CN
Inventors: 曾春艳; 王正辉; 武明虎; 孔帅
Original assignee: Hubei University of Technology
Current assignee: Hubei University of Technology
Priority date: 2020-05-29
Filing date: 2020-05-29
Publication date: 2021-11-16
Anticipated expiration: 2040-05-29
Also published as: CN111667445A

Abstract

The invention proposes a multi-feature fusion image compressive sensing reconstruction method based on Attention, which firstly uses a full convolutional neural network for sampling, and then uses a deconvolutional network for initial reconstruction. Next, dilated convolutions with different dilation rates are used to obtain dilated convolution kernels of different receptive fields to capture information of different scales in the image, and obtain multiple scale feature information; in the convolution channels of different dilation rates, we added a The residual block constitutes a residual network, which bypasses a large amount of low-frequency information through multiple jump connections in the residual network so that the network can focus on learning high-frequency information. After obtaining multiple scale feature information, the attention mechanism is used to weight the different features of the output to better utilize more useful information in the features, and then the multiple feature information is fused to complete the depth reconstruction of the image. The invention can effectively improve the sampling efficiency and greatly improve the quality of image reconstruction.

Description

Image compressed sensing reconstruction method based on Attention multi-feature fusion

Technical Field

The invention relates to the field of image processing, in particular to an Attention-based multi-feature fusion image compressed sensing reconstruction method.

Background

Compressed Sensing (CS) can accurately recover the original signal from measurements much less than nyquist sampling, and thus is widely applied to measurements of data. Most of the traditional compressed sensing measurement methods adopt random measurement matrixes, and a large amount of memory is occupied when high-dimensional data is measured; the traditional reconstruction method has high calculation complexity and is difficult to realize real-time. In recent years, a deep learning-based method is successfully applied to many fields, and the application of a deep learning technology to the field of compressed sensing can greatly reduce the computational complexity and improve the reconstruction performance. A stack de-noising automatic encoder (SDA) is used as an unsupervised feature learner to obtain the statistical correlation among different elements of a signal and improve the signal reconstruction performance; the DeepInverse network learns a representation of the signal using a deep convolutional neural network, and performs the inverse transformation from the measurement vector to the original signal similar to a greedy algorithm or a convex relaxation algorithm; ReconNet is also a non-iterative reconstruction network that uses a convolutional neural network to learn the mapping from CS measurements to the original image; a DR2-Net network is proposed on the basis of Reconnet, the network adopts a depth residual error network to reconstruct an original image from a measured value, the network mainly comprises a linear mapping network and a residual error learning network, the fully-connected linear mapping network reconstructs a primary image with higher quality, and the residual error network composed of a plurality of residual error learning blocks further improves the quality of the reconstructed image. Although the deep learning based method enables the CS to achieve a large improvement in image reconstruction performance, most of the work is to reconstruct an image with a single-channel convolutional neural network, and the weight of each feature spectrum is the same. In the field of neuroscience, the visual receptive field is considered to be automatically adjustable by external stimulation, so that the size of a convolution kernel of a convolution neural network needs to be adaptively adjustable, and different characteristic spectrums need to be endowed with different weights according to the importance degree.

Therefore, based on general observation of the human visual system, the human visual nervous system is simulated, expansion convolution kernels of different receptive fields are obtained by utilizing expansion convolution of different rates, information of different scales in an image is obtained, different output features are weighted through an Attention mechanism, and then a plurality of feature information are fused to complete depth reconstruction of the image. A residual block is used in each channel to form a residual network, so that the depth of the network is improved, and the reconstruction performance is further improved.

Disclosure of Invention

The technical problem of the invention is mainly solved by the following technical scheme:

an image compressed sensing reconstruction method based on Attention multi-feature fusion is characterized by comprising the following steps:

step 1, collecting image data, measuring the image data based on a neural network and outputting measured data;

step 2, outputting an initial reconstruction image after performing initial reconstruction on the measurement signal by adopting a deconvolution neural network;

and 3, processing the reconstructed image in the step 2 by using a three-channel network, and outputting multi-scale characteristic information.

And 4, performing weighting processing on the output different characteristics by an Attention mechanism to obtain final reconstructed image data.

In the foregoing image compressed sensing reconstruction method based on Attention multi-feature fusion, the step 1 specifically includes:

and 2.1, selecting an image with the size of n multiplied by n, and converting the image into a gray-scale image.

And 2.2, measuring by using a full convolution neural network. Simultaneously measuring the input data using m BxBx 1 convolution kernels, wherein

To measure the rate, B is the size of the convolution kernel of the sample layer, and B is 32, it should be noted that in the convolution operation of this step, there is no offset, and there is no Pad zero padding, and the convolution step size is also set to B.

Step 2.3, obtaining the product after the measurement of the convolution layer

The measured data of (2).

In the above image compressed sensing reconstruction method based on Attention multi-feature fusion, step 2.2 specifically includes:

the measurement by using the convolutional neural network is a mode simulating the traditional measurement: y is_i＝Φ_B×x_iThe measurement matrix phi can be_BEach row of (a) is considered a convolution kernel. Where the matrix phi is measured_BIn existence of

Line (A)

For measurement rate) to obtain m measurement points. In the experiment we apply convolution kernelsIs set to be B multiplied by B, the step length is also set to be B, which is equivalent to non-overlapping sampling; each convolution kernel thus outputs a measurement value. When measuring the rate

If B is set to 32, then m is the number of convolution kernels in the measurement layer should be 102.

In the foregoing image compressed sensing reconstruction method based on Attention multi-feature fusion, the step 2 specifically includes:

and 3.1, performing initial reconstruction on the sampling signal by using a deconvolution neural network. At the deconvolution level, the size of the convolution kernel is set to 32 × 32, and the step size is also 32.

And 3.2, obtaining an n multiplied by n initial reconstruction image after the convolution operation of the step 3.1.

In step 3.1, the image compressed sensing reconstruction method based on Attention multi-feature fusion specifically includes:

for the deconvolution process, a convolution kernel after the convolution process is reversed is adopted, and the convolution kernel is equivalent to the inverse process of the convolution. After convolution measurement, the resolution of the image may become low and part of the information may be missing. The deconvolution is intended to restore the original image as much as possible, but complete restoration is impossible, and it is important to obtain a size matching the original image. The output tensor (image) size due to deconvolution is: 0 ═ N-1 × S + K-2P, where N is the input image size, O is the output image size, S is the step size, K is the size of the convolution kernel, and P is edge filling. After the measurement has been carried out, the measurement is carried out,

there is no Pad operation. In order to obtain the same size as the original, S is 32, K is 32, and P is 0 since there is no Pad operation. Thus, after deconvolution, an initial reconstructed image with the same size as the original image is obtained.

In the image compressed sensing reconstruction method based on Attention multi-feature fusion, in step 3, expansion convolutions with different expansion rates are used for forming three parallel channels, a residual block is added into each channel to form a residual network, and a plurality of feature information with different scales are obtained; the three parallel channels specifically include three channels, namely a left channel, a middle channel and a right channel, each channel has four residual blocks, each residual block has three layers, but the expanding convolution rate of each channel is different, specifically:

the leftmost lane has a dilation rate of 1, which is a common 3 x 3 convolution kernel. The first layer of each residual block is 64 convolution kernels of size 3 × 3; the second layer is 32 convolution kernels with the size of 3 x 3; the third layer is 1 convolution kernel with the size of 3 multiplied by 3; the step size of all convolutional layers of the part is 1 and the activation function is ReLU. In order to keep the dimension of the output data the same as the dimension of the input data, Pad operation is performed and there is no Pool operation, and then Pad is 1.

The middle channel is again four residual blocks, each having three convolutional layers. The expansion rate of each convolution kernel is 2, and 5 × 5 convolution kernels are obtained by performing expansion on 3 × 3 convolution kernels. The first layer is 64 convolution kernels; the second layer is 32 convolution kernels; the third layer is a convolution kernel of 1; the step size of all convolutional layers of the part is 1 and the activation function is ReLU. In order to keep the dimension of the output data the same as the dimension of the input data, Pad operation is performed and there is no Pool operation, and then Pad is 2.

The number of convolution layers of the right channel and other channels is the same as that of convolution kernels, but the rate of expanding convolution is 3, namely 7 multiplied by 7 convolution kernels are obtained. Therefore, to keep the output data dimensions the same as the input data dimensions, Pad is now complemented by 3. After each convolution, the ReLU is used as the activation function.

In the above image compressed sensing reconstruction method based on Attention multi-feature fusion, step 4 specifically includes:

after multi-scale feature information is extracted from multiple channels, feature information of each channel is directly fused, and the information is treated equally among the channels, which hinders the representation capability of CNN. In fact, the information of each channel has different proportions to play a role in image reconstruction. However, the per-channel scaling is not considered a setting, and is learned through network training. Extracting features from multiple convolution channels and outputting X ═ X₁,…,x_c,…,x_CAmong them, there are C feature maps with size n × n. Obtaining channel statistics by a global pooling operation

The c is shown as

We then also introduce a gating control mechanism to learn the nonlinear interaction between channels. Here we choose to use a simple sigmoid function as the door mechanism: s ═ f (W)_AR(W_Bz)) where f (-) and R are sigmoid and ReLU activation functions, W_AAnd W_BAre the convolved weights. Finally, we obtain the final channel statistics s for rescaling the input X, finally

S is the scale factor of the channel, i.e. the characteristic information between each channel is adaptively rescaled by the Attention mechanism.

Therefore, the invention has the following advantages: 1. the convolution measurement is used for replacing the traditional fast measurement mode, and the blocking effect generated after the image reconstruction is eliminated. 2. Extracting multi-scale information in the image through a multi-channel network of expansion convolution with different expansion rates; a residual block is added into the channel to form a residual network, and a large amount of low-frequency information is bypassed through a plurality of hop connections in the residual network, so that the network is focused on learning high-frequency information; and the image reconstruction quality is improved. 3. Adding an Attention mechanism to enable the characteristic information among the channels to be adaptively rescaled to enhance the discrimination capability of the network; then, the multiscale information is fused through Concat, so that the information in the extracted image is more comprehensive.

Drawings

FIG. 1 is a schematic flow diagram of the process of the present invention.

Detailed Description

The technical scheme of the invention is further specifically described by the following embodiments and the accompanying drawings.

Example (b):

the method mainly comprises four steps, wherein the first step is a sampling stage of an image, and input data is measured by utilizing a full convolution neural network; the second step is an initial reconstruction stage, and a deconvolution network is adopted to perform initial reconstruction on the measured data; acquiring multi-scale characteristic information, and acquiring information of different scales in an expansion convolution kernel capture image of different receptive fields by using expansion convolutions with different expansion rates to acquire a plurality of scale characteristics; and the fourth step is to weight the output different characteristics through an Attention mechanism, and then fuse the characteristic information of a plurality of scales to complete the final image reconstruction.

Step 1: and (5) an image measuring stage.

1-1) selecting an image with the size of n multiplied by n, and converting the image into a gray scale image.

1-2) measurements were performed using a full convolution neural network. The measurement by using the convolutional neural network is a mode simulating the traditional measurement: y is_i＝Φ_B×x_iThe measurement matrix phi can be_BEach row of (a) is considered a convolution kernel. Where the matrix phi is measured_BIn existence of

Line (A)

For measurement rate) to obtain m measurement points. In the experiment, the size of a convolution kernel is set to be B multiplied by B, the step length is also set to be B, and the method is equivalent to non-overlapping sampling; each convolution kernel thus outputs a measurement value. When measuring the rate

If B is set to 32, then m is the number of convolution kernels in the measurement layer should be 102. Note that in the convolution operation of this step, there is no offset and no Pad zero padding, and the convolution step is also set to B.

1-3) obtaining the product after the measurement of the convolution layer

The measured data of (2).

Step 2: an initial reconstruction phase.

2-1) carrying out initial reconstruction on the sampling signal by using a deconvolution neural network. For the deconvolution process, a convolution kernel after the convolution process is reversed is adopted, and the convolution kernel is equivalent to the inverse process of the convolution. After convolution measurement, the resolution of the image may become low and part of the information may be missing. The deconvolution is intended to restore the original image as much as possible, but complete restoration is impossible, and it is important to obtain a size matching the original image. During measurement, the size of a convolution kernel is set to be B, and the step length is also B; also when the size and step size of the deconvolution kernel are both set to B, this results in an initial reconstruction that is the same size as the original image. A

2-2) obtaining an n multiplied by n initial reconstruction image after the convolution operation of 2-1).

And step 3: obtaining multi-scale characteristic information, forming three parallel channels by using expansion convolutions with different expansion rates, adding a residual block into each channel to form a residual network to obtain a plurality of different-scale characteristic information. Each channel in the three parallel channel networks has four residual blocks, each residual block has three layers, but the rate of the expanding convolution is different for each channel.

After the three parallel channel networks with different rates and convolutions are expanded, a plurality of feature information with different scales are obtained, a residual block is added into each channel to form a residual network, and a large amount of low-frequency information is bypassed through a plurality of hop connections in the residual network, so that the network is focused on learning high-frequency information.

And 4, step 4: after obtaining a plurality of feature information, weighting and Concat different output features through an Attention mechanism to fuse the multi-scale information. After multi-scale feature information is extracted from multiple channels, feature information of each channel is directly fused, and the information is treated equally among the channels, which hinders the representation capability of CNN. In fact, the information of each channel has different proportions to play a role in image reconstruction. The per-channel scaling is not considered a setting, but rather learned through network training. Through network training, the proportion of the channel information which is beneficial to image reconstruction is amplified, the proportion of the channel information which is not beneficial to image reconstruction is reduced, and the weight between the channel information and the channel information is automatically learned. It can be seen from the figure that the color change of each channel after the Attention mechanism, that is, the feature information between each channel is adaptively rescaled through the Attention mechanism, further enhancing the discrimination capability of the network. More useful information in the features is better utilized, then a plurality of pieces of feature information with different scales are fused, and then 1 convolution kernel with the size of 3 multiplied by 3 is used to obtain a final reconstructed image.

The specific embodiments described herein are merely illustrative of the spirit of the invention. Various modifications or additions may be made to the described embodiments or alternatives may be employed by those skilled in the art without departing from the spirit or ambit of the invention as defined in the appended claims.

Claims

1. An image compressed sensing reconstruction method based on Attention multi-feature fusion is characterized by comprising the following steps:

step 3, processing the initial reconstruction image in the step 2 by using a three-channel network and outputting multi-scale characteristic information;

step 4, weighting different output characteristics by an Attention mechanism to obtain final reconstructed image data;

the step 2 specifically comprises:

step 3.1, performing initial reconstruction on the measurement signal by using a deconvolution neural network;

step 3.2, obtaining an n multiplied by n initial reconstruction image after the convolution operation of the step 3.1;

in step 3.1, the method specifically comprises the following steps: for the deconvolution process, a convolution kernel after the convolution process is transferred is adopted, which is equivalent to the inverse process of the convolution; after convolution measurement, the resolution of the image becomes low, and partial information is lost; the deconvolution is to restore the original image, and it is important to obtain the same size as the original image; the output tensor size due to deconvolution is: o ═ N-1 × S + K-2P, where N is the input image size, O is the output image size, S is the step size, K is the size of the convolution kernel, and P is edge filling; after having been subjected to the full convolution measurement,

no Pad operation; in order to obtain the same size as the original, S is 32, K is 32, and P is 0 since there is no Pad operation; thus, after deconvolution, an initial reconstructed image with the same size as the original image is obtained.

2. The method according to claim 1, wherein the step 1 specifically comprises:

2.1, selecting an image with the size of n multiplied by n, and converting the image into a gray scale image;

step 2.2, measuring by using a full convolution neural network; simultaneously measuring the input data using m BxBx 1 convolution kernels, wherein

For measuring the rate, no offset and Pad zero padding are set, and the convolution step length is set as B;

step 2.3, obtaining the product after the measurement of the convolution layer

The measured data of (2).

3. The method for compressed sensing reconstruction of image based on Attention multi-feature fusion as claimed in claim 2, wherein the step 2.2 specifically comprises: the measurement by using the convolutional neural network is a mode simulating the traditional measurement: y is_i＝Φ_B×x_iWill measure the matrix phi_BEach row of (a) is treated as a convolution kernel; measurement matrix phi_BIn existence of

The rows of the image data are, in turn,

obtaining m measuring points for measuring rate; setting the size of the convolution kernel to be B multiplied by B, and setting the step length to be B, which is equivalent to non-overlapping sampling; each convolution kernel thus outputs a measurement value.

4. The image compressed sensing reconstruction method based on Attention multi-feature fusion according to claim 3, characterized in that in step 3, expansion convolutions with different expansion rates are used to form three parallel channels and a residual block is added to each channel to form a residual network, and a plurality of feature information with different scales are obtained; the three parallel channels specifically include three channels, namely a left channel, a middle channel and a right channel, each channel has four residual blocks, each residual block has three layers, but the expanding convolution rate of each channel is different, specifically:

the expansion rate of the leftmost lane is 1, namely the ordinary convolution kernel of 3 multiplied by 3; the first layer of each residual block is 64 convolution kernels of size 3 × 3; the second layer is 32 convolution kernels with the size of 3 x 3; the third layer is 1 convolution kernel with the size of 3 multiplied by 3; the step length of all the convolution layers in the part is 1 and the activation function is ReLU; performing Pad operation and no Pool operation in order to keep the dimension of output data identical to the dimension of input data, wherein Pad is 1;

the middle channel is also four residual blocks, and each residual block is provided with three convolution layers; the expansion rate of each convolution kernel is 2, and 5 × 5 convolution kernels are obtained by expanding on 3 × 3 convolution kernels; the first layer is 64 convolution kernels; the second layer is 32 convolution kernels; the third layer is 1 convolution kernel; the step length of all the convolution layers in the part is 1 and the activation function is ReLU; performing Pad operation and no Pool operation in order to keep the dimension of output data identical to the dimension of input data, wherein Pad is 2;

the convolution layer number of the right channel and other channels is the same as the number of convolution kernels, but the expanding convolution rate is 3, and then a 7 multiplied by 7 convolution kernel is obtained; therefore, in order to keep the dimension of the output data the same as that of the input data, the Pad zero padding number is 3 at this time; after each convolution, the ReLU is used as the activation function.

5. The method according to claim 4, wherein the step 4 specifically includes: after multi-scale feature information is extracted from multiple channels, feature information of each channel is directly fused, and the information is treated equally among the channels, so that the representation capability of CNN is hindered; actually, the information of each channel has different proportions playing a role in image reconstruction; however, the per-channel scaling is not set artificially, but by network trainingLearning to obtain; extracting features from multiple convolution channels and outputting X ═ X₁,…,x_c,…,x_CC characteristic graphs with the size of n 'multiplied by n' are provided; obtaining channel statistics by a global pooling operation

The c is shown as

Then, a gate control mechanism is introduced to learn the nonlinear interaction between channels; here, the choice is to use a simple sigmoid function as the door mechanism: s ═ f (W)_AR(W_Bz)) where f (·) and R are sigmoid and ReLU activation functions, W_AAnd W_BIs the weight after convolution; finally, the final channel statistics s are obtained for rescaling the input X, and finally

That is, the characteristic information between each channel is adaptively scaled by the Attention mechanism by the adaptive rescaling of the original characteristic information