Disclosure of Invention
The technical problem of the invention is mainly solved by the following technical scheme:
an image compressed sensing reconstruction method based on Attention multi-feature fusion is characterized by comprising the following steps:
step 1, collecting image data, measuring the image data based on a neural network and outputting measured data;
step 2, outputting an initial reconstruction image after performing initial reconstruction on the measurement signal by adopting a deconvolution neural network;
and 3, processing the reconstructed image in the step 2 by using a three-channel network, and outputting multi-scale characteristic information.
And 4, performing weighting processing on the output different characteristics by an Attention mechanism to obtain final reconstructed image data.
In the foregoing image compressed sensing reconstruction method based on Attention multi-feature fusion, the step 1 specifically includes:
and 2.1, selecting an image with the size of n multiplied by n, and converting the image into a gray-scale image.
And 2.2, measuring by using a full convolution neural network. Simultaneously measuring the input data using m BxBx 1 convolution kernels, wherein
To measure the rate, B is the size of the convolution kernel of the sample layer, and B is 32, it should be noted that in the convolution operation of this step, there is no offset, and there is no Pad zero padding, and the convolution step size is also set to B.
Step 2.3, obtaining the product after the measurement of the convolution layer
The measured data of (2).
In the above image compressed sensing reconstruction method based on Attention multi-feature fusion, step 2.2 specifically includes:
the measurement by using the convolutional neural network is a mode simulating the traditional measurement: y is
i=Φ
B×x
iThe measurement matrix phi can be
BEach row of (a) is considered a convolution kernel. Where the matrix phi is measured
BIn existence of
Line (A)
For measurement rate) to obtain m measurement points. In the experiment we apply convolution kernelsIs set to be B multiplied by B, the step length is also set to be B, which is equivalent to non-overlapping sampling; each convolution kernel thus outputs a measurement value. When measuring the rate
If B is set to 32, then m is the number of convolution kernels in the measurement layer should be 102.
In the foregoing image compressed sensing reconstruction method based on Attention multi-feature fusion, the step 2 specifically includes:
and 3.1, performing initial reconstruction on the sampling signal by using a deconvolution neural network. At the deconvolution level, the size of the convolution kernel is set to 32 × 32, and the step size is also 32.
And 3.2, obtaining an n multiplied by n initial reconstruction image after the convolution operation of the step 3.1.
In step 3.1, the image compressed sensing reconstruction method based on Attention multi-feature fusion specifically includes:
for the deconvolution process, a convolution kernel after the convolution process is reversed is adopted, and the convolution kernel is equivalent to the inverse process of the convolution. After convolution measurement, the resolution of the image may become low and part of the information may be missing. The deconvolution is intended to restore the original image as much as possible, but complete restoration is impossible, and it is important to obtain a size matching the original image. The output tensor (image) size due to deconvolution is: 0 ═ N-1 × S + K-2P, where N is the input image size, O is the output image size, S is the step size, K is the size of the convolution kernel, and P is edge filling. After the measurement has been carried out, the measurement is carried out,

there is no Pad operation. In order to obtain the same size as the original, S is 32, K is 32, and P is 0 since there is no Pad operation. Thus, after deconvolution, an initial reconstructed image with the same size as the original image is obtained.
In the image compressed sensing reconstruction method based on Attention multi-feature fusion, in step 3, expansion convolutions with different expansion rates are used for forming three parallel channels, a residual block is added into each channel to form a residual network, and a plurality of feature information with different scales are obtained; the three parallel channels specifically include three channels, namely a left channel, a middle channel and a right channel, each channel has four residual blocks, each residual block has three layers, but the expanding convolution rate of each channel is different, specifically:
the leftmost lane has a dilation rate of 1, which is a common 3 x 3 convolution kernel. The first layer of each residual block is 64 convolution kernels of size 3 × 3; the second layer is 32 convolution kernels with the size of 3 x 3; the third layer is 1 convolution kernel with the size of 3 multiplied by 3; the step size of all convolutional layers of the part is 1 and the activation function is ReLU. In order to keep the dimension of the output data the same as the dimension of the input data, Pad operation is performed and there is no Pool operation, and then Pad is 1.
The middle channel is again four residual blocks, each having three convolutional layers. The expansion rate of each convolution kernel is 2, and 5 × 5 convolution kernels are obtained by performing expansion on 3 × 3 convolution kernels. The first layer is 64 convolution kernels; the second layer is 32 convolution kernels; the third layer is a convolution kernel of 1; the step size of all convolutional layers of the part is 1 and the activation function is ReLU. In order to keep the dimension of the output data the same as the dimension of the input data, Pad operation is performed and there is no Pool operation, and then Pad is 2.
The number of convolution layers of the right channel and other channels is the same as that of convolution kernels, but the rate of expanding convolution is 3, namely 7 multiplied by 7 convolution kernels are obtained. Therefore, to keep the output data dimensions the same as the input data dimensions, Pad is now complemented by 3. After each convolution, the ReLU is used as the activation function.
In the above image compressed sensing reconstruction method based on Attention multi-feature fusion, step 4 specifically includes:
after multi-scale feature information is extracted from multiple channels, feature information of each channel is directly fused, and the information is treated equally among the channels, which hinders the representation capability of CNN. In fact, the information of each channel has different proportions to play a role in image reconstruction. However, the per-channel scaling is not considered a setting, and is learned through network training. Extracting features from multiple convolution channels and outputting X ═ X
1,…,x
c,…,x
CAmong them, there are C feature maps with size n × n. Obtaining channel statistics by a global pooling operation
The c is shown as
We then also introduce a gating control mechanism to learn the nonlinear interaction between channels. Here we choose to use a simple sigmoid function as the door mechanism: s ═ f (W)
AR(W
Bz)) where f (-) and R are sigmoid and ReLU activation functions, W
AAnd W
BAre the convolved weights. Finally, we obtain the final channel statistics s for rescaling the input X, finally
S is the scale factor of the channel, i.e. the characteristic information between each channel is adaptively rescaled by the Attention mechanism.
Therefore, the invention has the following advantages: 1. the convolution measurement is used for replacing the traditional fast measurement mode, and the blocking effect generated after the image reconstruction is eliminated. 2. Extracting multi-scale information in the image through a multi-channel network of expansion convolution with different expansion rates; a residual block is added into the channel to form a residual network, and a large amount of low-frequency information is bypassed through a plurality of hop connections in the residual network, so that the network is focused on learning high-frequency information; and the image reconstruction quality is improved. 3. Adding an Attention mechanism to enable the characteristic information among the channels to be adaptively rescaled to enhance the discrimination capability of the network; then, the multiscale information is fused through Concat, so that the information in the extracted image is more comprehensive.
Detailed Description
The technical scheme of the invention is further specifically described by the following embodiments and the accompanying drawings.
Example (b):
the method mainly comprises four steps, wherein the first step is a sampling stage of an image, and input data is measured by utilizing a full convolution neural network; the second step is an initial reconstruction stage, and a deconvolution network is adopted to perform initial reconstruction on the measured data; acquiring multi-scale characteristic information, and acquiring information of different scales in an expansion convolution kernel capture image of different receptive fields by using expansion convolutions with different expansion rates to acquire a plurality of scale characteristics; and the fourth step is to weight the output different characteristics through an Attention mechanism, and then fuse the characteristic information of a plurality of scales to complete the final image reconstruction.
Step 1: and (5) an image measuring stage.
1-1) selecting an image with the size of n multiplied by n, and converting the image into a gray scale image.
1-2) measurements were performed using a full convolution neural network. The measurement by using the convolutional neural network is a mode simulating the traditional measurement: y is
i=Φ
B×x
iThe measurement matrix phi can be
BEach row of (a) is considered a convolution kernel. Where the matrix phi is measured
BIn existence of
Line (A)
For measurement rate) to obtain m measurement points. In the experiment, the size of a convolution kernel is set to be B multiplied by B, the step length is also set to be B, and the method is equivalent to non-overlapping sampling; each convolution kernel thus outputs a measurement value. When measuring the rate
If B is set to 32, then m is the number of convolution kernels in the measurement layer should be 102. Note that in the convolution operation of this step, there is no offset and no Pad zero padding, and the convolution step is also set to B.
1-3) obtaining the product after the measurement of the convolution layer
The measured data of (2).
Step 2: an initial reconstruction phase.
2-1) carrying out initial reconstruction on the sampling signal by using a deconvolution neural network. For the deconvolution process, a convolution kernel after the convolution process is reversed is adopted, and the convolution kernel is equivalent to the inverse process of the convolution. After convolution measurement, the resolution of the image may become low and part of the information may be missing. The deconvolution is intended to restore the original image as much as possible, but complete restoration is impossible, and it is important to obtain a size matching the original image. During measurement, the size of a convolution kernel is set to be B, and the step length is also B; also when the size and step size of the deconvolution kernel are both set to B, this results in an initial reconstruction that is the same size as the original image. A
2-2) obtaining an n multiplied by n initial reconstruction image after the convolution operation of 2-1).
And step 3: obtaining multi-scale characteristic information, forming three parallel channels by using expansion convolutions with different expansion rates, adding a residual block into each channel to form a residual network to obtain a plurality of different-scale characteristic information. Each channel in the three parallel channel networks has four residual blocks, each residual block has three layers, but the rate of the expanding convolution is different for each channel.
The leftmost lane has a dilation rate of 1, which is a common 3 x 3 convolution kernel. The first layer of each residual block is 64 convolution kernels of size 3 × 3; the second layer is 32 convolution kernels with the size of 3 x 3; the third layer is 1 convolution kernel with the size of 3 multiplied by 3; the step size of all convolutional layers of the part is 1 and the activation function is ReLU. In order to keep the dimension of the output data the same as the dimension of the input data, Pad operation is performed and there is no Pool operation, and then Pad is 1.
The middle channel is again four residual blocks, each having three convolutional layers. The expansion rate of each convolution kernel is 2, and 5 × 5 convolution kernels are obtained by performing expansion on 3 × 3 convolution kernels. The first layer is 64 convolution kernels; the second layer is 32 convolution kernels; the third layer is a convolution kernel of 1; the step size of all convolutional layers of the part is 1 and the activation function is ReLU. In order to keep the dimension of the output data the same as the dimension of the input data, Pad operation is performed and there is no Pool operation, and then Pad is 2.
The number of convolution layers of the right channel and other channels is the same as that of convolution kernels, but the rate of expanding convolution is 3, namely 7 multiplied by 7 convolution kernels are obtained. Therefore, to keep the output data dimensions the same as the input data dimensions, Pad is now complemented by 3. After each convolution, the ReLU is used as the activation function.
After the three parallel channel networks with different rates and convolutions are expanded, a plurality of feature information with different scales are obtained, a residual block is added into each channel to form a residual network, and a large amount of low-frequency information is bypassed through a plurality of hop connections in the residual network, so that the network is focused on learning high-frequency information.
And 4, step 4: after obtaining a plurality of feature information, weighting and Concat different output features through an Attention mechanism to fuse the multi-scale information. After multi-scale feature information is extracted from multiple channels, feature information of each channel is directly fused, and the information is treated equally among the channels, which hinders the representation capability of CNN. In fact, the information of each channel has different proportions to play a role in image reconstruction. The per-channel scaling is not considered a setting, but rather learned through network training. Through network training, the proportion of the channel information which is beneficial to image reconstruction is amplified, the proportion of the channel information which is not beneficial to image reconstruction is reduced, and the weight between the channel information and the channel information is automatically learned. It can be seen from the figure that the color change of each channel after the Attention mechanism, that is, the feature information between each channel is adaptively rescaled through the Attention mechanism, further enhancing the discrimination capability of the network. More useful information in the features is better utilized, then a plurality of pieces of feature information with different scales are fused, and then 1 convolution kernel with the size of 3 multiplied by 3 is used to obtain a final reconstructed image.
The specific embodiments described herein are merely illustrative of the spirit of the invention. Various modifications or additions may be made to the described embodiments or alternatives may be employed by those skilled in the art without departing from the spirit or ambit of the invention as defined in the appended claims.