CN112257796A

CN112257796A - Image integration method of convolutional neural network based on selective characteristic connection

Info

Publication number: CN112257796A
Application number: CN202011174153.XA
Authority: CN
Inventors: 汪澜; 贾丹丹
Original assignee: Liaoning Technical University
Current assignee: Liaoning Technical University
Priority date: 2020-10-28
Filing date: 2020-10-28
Publication date: 2021-01-22
Anticipated expiration: 2040-10-28
Also published as: CN112257796B

Abstract

The invention discloses an image integration method of a convolutional neural network based on selective characteristic connection, which comprises the following steps: respectively solving the average characteristics of the low-layer characteristics and the high-layer characteristics; subtracting the average characteristic of the low-level characteristic from the average characteristic of the high-level characteristic to obtain the score of the key characteristic graph; scaling the average characteristic of the high-level characteristic; performing Softmax normalization processing to obtain a characteristic Z; and carrying out maximum value normalization processing on the characteristic Z to obtain the attention score. The image integration method of the convolutional neural network based on selective feature connection can better integrate feature map information based on a high-low layer feature fusion mode of selective feature connection, more effectively utilize learned features and does not increase the number of parameters. The structure of the convolutional neural network is optimized, the performance of the network is improved, the method is particularly significant to the shallow convolutional neural network, and the shallow convolutional neural network is applied to more fields.

Description

Image integration method of convolutional neural network based on selective characteristic connection

Technical Field

The invention belongs to the technical field of convolutional neural networks, and particularly relates to an image integration method of a convolutional neural network based on selective characteristic connection.

Background

In recent years, the research of network architecture has attracted much attention. Today, many excellent network architectures are proposed in succession. Google lenet constructs a 22-layer convolutional neural network, but it reduces the number of parameters from 6000 to 400 ten thousand by using the inclusion model. VGGNet demonstrates that increasing the depth of the network using a very small convolution filter can effectively boost the effectiveness of the model. However, increasing the depth of the network cannot simply stack the network layers one upon the other. Adding more layers in the appropriate depth model may result in higher training errors due to the problems of gradient disappearance and gradient explosion, which make deep networks difficult to train. High way Networks propose an efficient method of using bypass (bypass) and gate units (gating units) to train an end-to-end network with more than 100 layers. Bypass is considered a key factor in training these very deep networks. ResNet further attests to this view, it has added identity maps (identity maps) as a bypass in the network, and by using residual blocks (residual blocks), ResNet has made a breakthrough advance in many challenging tasks (image recognition, localization, and detection, etc.).

A novel visualization technique enables an in-depth understanding of the characteristics of the middle layer of the convolutional neural network and the operation of the classifier. In fact, the feature maps of different levels extract information of different levels of the input image. The lower layer features extract more detailed information, while the higher layer features extract more semantic information, the higher layer semantic information being closer to the last layer with class labels. In many computer vision tasks, combining high-level information and low-level information can effectively improve experimental performance.

At present, a Convolutional Neural Network (CNN) is used as an important branch of deep learning, a hardware basis required by the CNN as a main research direction is gradually matured, and as hardware technology is more and more perfect, deep learning algorithms are more and more diversified, bottom layer languages such as C language and C + + cannot meet a plurality of deep learning research requirements, and a plurality of more convenient and more flexible deep learning development frameworks such as tensflow, Caffe, thano, Keras, and torch are generated. The appearance of visualization technology can deeply analyze each layer of characteristics of the convolutional neural network, wherein the high-layer characteristics contain more semantic information, and the low-layer characteristics contain more detailed information, so that the integration of the high-layer information and the low-layer information to improve the experimental performance is an important research direction of the convolutional neural network in many computer vision tasks.

In a convolutional neural network, high-level and low-level feature fusion is an effective way for improving network performance, however, low-level features have the problems of background confusion and semantic ambiguity, and direct fusion of high-level and low-level features may cause confusion and semantic ambiguity of the fused features, resulting in poor network performance.

Disclosure of Invention

Based on the defects of the prior art, the technical problem to be solved by the invention is to provide an image integration method of a convolutional neural network based on selective feature connection, which is used for fusing a low-level feature with a high-level feature after the low-level feature is processed through a selective feature connection mechanism, so that the network performance is improved.

In order to solve the above technical problem, the present invention provides an image integration method based on a convolutional neural network with selective feature connection, which includes the following steps:

step 1: respectively solving the average characteristics of the low-layer characteristics and the high-layer characteristics;

step 2: subtracting the average characteristic of the low-level characteristic from the average characteristic of the high-level characteristic obtained in the step 1 to obtain a score of a key characteristic graph;

and step 3: scaling the average characteristic of the high-level characteristic;

and 4, step 4: respectively carrying out Softmax normalization processing on the scores of the key feature graphs obtained in the step 2 and the results of the scaling processing in the step 3 to obtain features Z;

and 5: and carrying out maximum value normalization processing on the characteristic Z to obtain the attention score.

Optionally, in step 1, the average characteristics of the low-level features are as follows:

the average features of the high-level features are as follows:

wherein Am ∈ R^F×G×1，Bm∈R^F×G×1The value of A0 at spatial location (i, j, c) corresponds to A0_i,j,cAnd B corresponds to a value of B at spatial location (i, j, c)_i,j,c，C₁Number of features representing lower layers, C₂Representing the number of high-level features.

Optionally, in step 2, the scores of the key feature maps are as follows:

P＝Bm-Am。

further, in step 3, the scaling process is performed on the average feature of the high-level features as follows:

D＝Bm*n

wherein

Further, in step 4, the scores of the key feature maps obtained in step 2 and the results of the scaling processing in step 3 are subjected to Softmax normalization processing, respectively, as follows:

the resulting characteristic Z is as follows:

Z＝S^P-S^D。

therefore, the image integration method based on the convolutional neural network with the selective characteristic connection has the following beneficial effects:

the high-low layer feature fusion mode based on selective feature connection can better integrate feature map information, more effectively utilize learned features and can not increase the number of parameters. The structure of the convolutional neural network is optimized, the performance of the network is improved, the method is particularly significant to the shallow convolutional neural network, and the shallow convolutional neural network is applied to more fields.

The foregoing description is only an overview of the technical solutions of the present invention, and in order to make the technical means of the present invention more clearly understood, the present invention may be implemented in accordance with the content of the description, and in order to make the above and other objects, features, and advantages of the present invention more clearly understood, the following detailed description is given in conjunction with the preferred embodiments, together with the accompanying drawings.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings of the embodiments will be briefly described below.

Fig. 1 is a diagram of a CNN network architecture for selective feature connection;

FIG. 2 is a high-low level feature direct fusion map;

FIG. 3 is a diagram of high-level and low-level feature additive fusion;

FIG. 4 is a diagram of a process for selective feature computation.

Detailed Description

Other aspects, features and advantages of the present invention will become apparent from the following detailed description, taken in conjunction with the accompanying drawings, which form a part of this specification, and which illustrate, by way of example, the principles of the invention. In the referenced drawings, the same or similar components in different drawings are denoted by the same reference numerals.

The invention applies a general network architecture Selective Feature Connection Mechanism (SFCM) to connect convolutional neural network features of different layers. Different layer features contain different information, higher layer features always contain more semantic information, and lower layer features contain more detail information, however, the lower layer features are affected by the background, which causes background confusion and semantic ambiguity. Combining the high-level and low-level features directly, which can cause background clutter and semantic ambiguity, SFCM effectively overcomes this drawback. It uses human visual recognition mechanisms whereby low-level features are selectively connected to high-level features through feature selectors generated from high-level features, which can be employed in many network architectures.

The classical convolutional neural network consists of an input layer, a convolutional layer, a pooling layer, a full-link layer and an output layer, wherein the extracted convolutional features are from a low layer to a high layer, and finally, the final output result is obtained from the high-layer features. In order to improve the performance of the neural network, the invention uses the residual error structure of the ResNet model for reference, and obtains the final output result after fusing the high-level and low-level characteristics through a selective characteristic connection mechanism, wherein the network structure diagram is shown in figure 1:

the selective characteristic operation process will be described in detail below.

The existing method for fusing features of high and low layers generally combines feature maps of different layers directly, and the combined features are shown in formula (1):

O＝[A,B] (1)

wherein

Low-level features representing convolutional neural networks, C₁Representing the number of low-level features, and G and F represent the width and height of the feature map. And C₂Represents the number of high-level features,

represents a high-level feature of a convolutional neural network,

representing a combination of features. The whole process is shown in fig. 2.

However, the combined features obtained by directly combining the feature maps sharply increase the parameters of the full-link layer, so the method of fusing the high-level features and the low-level features of the present invention is to add the high-level features and the low-level features, as shown in formula (2):

O＝A1+B (2)

wherein

Representing the low-level features of a convolutional neural network,

representing the transformed features of the lower level features,

represents a high-level feature of a convolutional neural network,

representing the combined features, the process is shown in fig. 3.

However, directly connecting the lower and upper layer features does not fully exploit the lower and upper layer information complementary properties. The high-level features contain more semantic information and the low-level features contain more detailed information. Combining the high-level and low-level features directly may cause background clutter and semantic ambiguity due to the introduction of too much detailed information. The present invention proposes a Selective Feature Connection Mechanism (SFCM) by referring to the human visual recognition mechanism. An attention score is assigned to each element on the low-level feature map that represents the importance of the element on the low-level feature map.

First, average features Am and Bm of a low-level feature and a high-level feature are obtained, respectively, and the average feature of the low-level feature is shown in formula (3):

the average feature of the high-level features is shown in equation (4):

wherein Am ∈ R^F×G×1，Bm∈R^F×G×1The value of A0 at spatial location (i, j, c) corresponds to A0_i,j,cAnd B corresponds to a value of B at spatial location (i, j, c)_i,j,c。

Because the superficial network extracts texture and detail features, the deep network extracts outline, shape and strongest features, and the superficial network comprises more features and also extracts key features, however, the deeper the layer number is, the more representative the extracted features are, and the more prominent the key features are. Thus, the average feature of the lower layer is subtracted from the average feature of the upper layer to obtain the score P of the key feature map, as shown in equation (5):

P＝Bm-Am (5)

the average feature Bm of the high-level features is scaled to obtain D, as shown in equation (6):

D＝Bm*n (6)

wherein

Performing Softmax normalization on P and D respectively, as shown in formula (7) and formula (8):

thus, the characteristic Z can be obtained as shown in equation (9). It represents the degree of importance of the corresponding position of each element of the low-level features.

Z＝S^P-S^D (9)

The attention score M can be obtained by performing maximum normalization processing on the feature Z, that is, the feature selector is obtained, as shown in formula (10):

wherein M is the same as R^F×G×1，M_i,jIs the final score at position (i, j). The learned attention score represents the importance of the corresponding position of each element of the low-level features. Thus, multiplying the low-level features by the attention score may screen out important features of the low-level features. Thus, the new low-layer feature As can be obtained from equation (11):

the new low-level features are augmented to a1 for fusing the high-level features. And (4) fusing the high-low layer features, and calculating a fusion coefficient L. If the average score of each of the feature maps a1 and B is E and F, it is determined from equation (12) and equation (13).

From this, a fusion coefficient L can be calculated as shown in equation (14)

The final combined features are then as shown in equation (15):

O＝L*A1+B (15)

the whole selective characteristic operation process is shown in fig. 4.

As can be seen from the feature selector M, the feature selector can enhance the salient region of the low-level feature and suppress the background region of the low-level feature. With SFCM, most pixels in the low-level feature map are suppressed. Therefore, on the premise of not damaging the semantic expression capability of the high-level features, more detailed information of the salient regions of the low-level feature map is added, the expression capability of the features is further enhanced, and better performance is obtained.

The method builds two convolutional neural networks to carry out image classification experiments on data sets cifar10 and cifar100, and firstly builds a 9-layer convolutional neural network model which comprises an input layer, 3 convolutional layers, 3 pooling layers, 1 full-connection layer (a feature extraction layer) and an output layer (a Softmax layer). And an 11-layer convolutional neural network model is also built and comprises an input layer, 5 convolutional layers, 3 pooling layers, 1 full-connection layer (feature extraction layer) and an output layer.

The results of the experiment are shown in Table 1

Table 1: image recognition rates on datasets cifar10 and cifar100

As can be seen from table 1, the direct fusion of the high-level features and the low-level features may cause a decrease in the image recognition rate of the neural network, and the convolutional neural network based on the selective feature connection mechanism may ensure an increase in the image recognition rate, and compared with the conventional convolutional neural network, the convolutional neural network based on the selective feature connection mechanism may improve the accuracy by 0.9% in the cifar10 data set and 1.4% in the cifar100 data set, which proves the effectiveness and superiority of the selective feature connection mechanism.

While the foregoing is directed to the preferred embodiment of the present invention, other and further embodiments of the invention may be devised without departing from the basic scope thereof, and the scope thereof is determined by the claims that follow.

Claims

1. an image integration method based on the convolutional neural network of selective feature connection, is characterized in that, comprises the following steps:

Step 1: Find the average features of low-level features and high-level features respectively;

Step 2: The average feature of the high-level feature obtained in step 1 is subtracted from the average feature of the low-level feature to obtain the score of the key feature map;

Step 3: Scaling the average feature of the high-level features;

Step 4: Perform Softmax normalization on the score of the key feature map obtained in step 2 and the result of scaling processing in step 3 to obtain feature Z;

Step 5: Perform maximum normalization on feature Z to get attention score.

2. The image integration method based on the convolutional neural network of selective feature connection as claimed in claim 1, is characterized in that, in step 1, the average feature of low-level feature is as follows:

The average features of the high-level features are as follows:

Where Am∈R ^F×G×1 , Bm∈R ^F×G×1 , A0 corresponds to A0 _i,j,c at the spatial position (i,j,c), and B is at the spatial position (i,j) ,c) corresponds to B _i,j,c , C ₁ represents the number of low-level features, and C ₂ represents the number of high-level features.

3. the image integration method based on the convolutional neural network of selective feature connection as claimed in claim 2, is characterized in that, in step 2, the score of key feature map is as follows:

P=Bm-Am.

4. the image integration method based on the convolutional neural network of selective feature connection as claimed in claim 2 or 3, is characterized in that, in step 3, the average feature of high-level feature is scaled as follows:

D=Bm*n

in

5. the image integration method based on the convolutional neural network of selective feature connection as claimed in claim 1, is characterized in that, in step 4, the score of the key feature map obtained in step 2 and the result of step 3 scaling processing The Softmax normalization processing is performed as follows:

The resulting feature Z is as follows:

Z=S ^P - S ^D .

6. The image integration method based on the convolutional neural network of selective feature connection as claimed in claim 5, is characterized in that, in step 5, pay attention to the score as follows:

where M∈R ^F×G×1 , M _i,j is the final score at position (i,j).