CN117011603B

CN117011603B - Image detection method based on improved FCOS network

Info

Publication number: CN117011603B
Application number: CN202310959453.6A
Authority: CN
Inventors: 王子民; 关挺强
Original assignee: Guilin University of Electronic Technology
Current assignee: Guilin University of Electronic Technology
Priority date: 2023-08-01
Filing date: 2023-08-01
Publication date: 2025-09-19
Anticipated expiration: 2043-08-01
Also published as: CN117011603A

Abstract

The present invention discloses an image detection method based on an improved FCOS network, comprising the following steps: 1) preparing a dataset; 2) labeling images in the dataset; 3) image preprocessing; 4) adding the names of the categories used in the initialization file; 5) setting parameters; 6) using the preprocessed images from step 3 as input to the network model; 7) obtaining output C1; 8) obtaining output C2; 9) obtaining output C3; 10) obtaining output C4; 11) obtaining output C5; 12) obtaining outputs S3, S4, and S5; 13) obtaining P3, P4, P5, P6, and P7; 14) setting a loss function before detection and classification; and 15) transmitting P3 to P7 obtained in step 13) to the detection head to obtain the final prediction result. This method achieves the purpose of expanding the convolution receptive field through intrinsic feature communication, thereby enhancing the diversity of output features and solving the information overload problem, allowing the model to focus on information that is more critical to the current task.

Description

Image detection method based on improved FCOS network

Technical Field

The invention belongs to the technical field of artificial intelligence and image detection, and particularly relates to an image detection method based on an improved FCOS network.

Background

Object detection is an important technology in the field of computer vision, the main object of which is to accurately identify and locate a target object of interest from an image or video. With the rapid development of artificial intelligence and deep learning, the target detection technology has also made great progress. Before deep learning arises, object detection relies primarily on traditional computer vision methods. These methods include extracting features in the image using feature engineering and classifying and locating objects using conventional machine learning algorithms (e.g., SVM, decision tree, etc.). However, due to the diversity and complexity of the targets, conventional approaches often fail to efficiently process complex scenes and large-scale data. With the rise of deep learning technology, in particular Convolutional Neural Networks (CNNs), target detection has revolutionized. The excellent performance of CNN enables the computer to automatically learn advanced features in the image, thereby greatly improving the accuracy and efficiency of target detection. Among them, R-CNN (Region-based Convolutional Neural Networks) proposed by Yann LeCun et al is the earliest end-to-end target detection framework, which lays a foundation for subsequent development. Roos et al in 2014 proposed two-phase networks R-CNN, as a milestone for applying the CNN method to target detection problems. Along with development of R-CNN series algorithms such as Faster R-CNN and Mask R-CNN, the R-CNN series algorithm is applied to image detection and segmentation. These two-stage target detection algorithms have achieved some success in image detection, but have higher computational complexity, slower detection speed, and larger consumed computational resources, requiring higher hardware configuration support.

Disclosure of Invention

The invention aims to solve the problems of low image detection speed, difficult feature extraction, high calculation resource consumption and the like of a first-stage network model, and provides an image detection method based on an improved FCOS network. According to the method, SCConv convolution is adopted, the purpose of amplifying convolution receptive fields can be achieved through internal communication of features, the diversity of output features is further improved, the dependence relationship between remote space and channels is built around each space position in a self-adaptive mode through self-calibration operation, CNN is helped to generate feature expression with more discrimination capability, the feature expression has more abundant information, and under the condition that computing capability is limited, computing resources are distributed to more important tasks, meanwhile, the problem of information overload is solved, and the model is focused on information which is more critical to the current task.

The technical scheme for realizing the aim of the invention is as follows:

an image detection method based on an improved FCOS network, comprising the steps of:

1) Firstly, manufacturing a data set for training and testing, wherein the data set is an MRI-T2 image data set of a human lumbar intervertebral disc, and is divided into a train data set, a val data set and a test data set according to a ratio of 8:1:1;

2) Fixing pixels of an input image in a dataset to 768x768, and marking the image in the dataset by adopting a COCO data format;

3) Data enhancement is carried out on all input images, including turning and scaling, and the enhanced images are preprocessed by using top hat operation and gray stretching image preprocessing technology;

4) Adopting a general target detection platform MMDetection for detection, MMDetection is a target detection algorithm framework based on deep learning, a target detection network can be quickly built by using MMDetection, target detection is realized, firstly, a COCO data set code needs to be modified, 80 categories in the COCO data set code are replaced by normal and diseased 2 categories in a data set, and then the names of the categories are added into an initialization file;

5) A random gradient GRADIENT DESCENT (SGD for short) is adopted to optimize the training process, the initial learning rate is 0.005, and the momentum is 0.9;

6) Step 3) the preprocessed image is used as the input of a network model;

7) The method comprises the steps that a background carries out convolution operation on an input image with a convolution kernel size of 7x7 and a stride of 2, and then carries out maximum pooling with a convolution kernel size of 3x3 and a stride of 2 to obtain an output result C1;

8) C1 is sent to a first self-calibration convolution module SCConv _1 to obtain an output result C2;

9) C2 is sent to a second self-calibration convolution module SCConv _2, and an output result C3 is obtained;

10 C3 is sent to a third self-calibration convolution module SCConv _3 to obtain an output result C4;

11 C4 is sent to a fourth self-calibration convolution module SCConv _4 to obtain an output result C5;

12 The feature dimension is reduced to 1/r of the input, then the feature dimension is increased to the original dimension through one FC layer after being activated by a ReLU, the complex correlation among channels can be better fitted compared with the method of directly using one FC layer, the parameter quantity and the calculated quantity are greatly reduced, the normalized weight between 0 and 1 is obtained through a Sigmoid gate, finally the normalized weight is weighted to the feature of each channel through a Scale operation, after the activation operation, the output S3, S4 and S5 are respectively obtained after the SE size and the channel number before and after the activation operation are not changed;

13 S3, S4 and S5 are sent to an FPN module, P3, P4 and P5 are generated on S3, S4 and S5 output by SE Attention respectively by FPN, P6 is obtained on the basis of P5 through a convolution layer with the convolution kernel size of 3x3 and the step distance of 2, and P7 is obtained on the basis of P6 through a convolution layer with the convolution kernel size of 3x3 and the step distance of 2;

14 Before detection and classification, a loss function is required to be set, wherein the loss function has three output branches, namely classification, regression and centrality, so that the loss consists of three parts, namely classification loss Lcls, positioning loss Lreg and centrality loss Lctrness, and the calculation method is as shown in the following formula:

p _(x,y) represents the score for each category predicted at the feature map (x, y) point, Representing the true class labels corresponding to the points of the feature map (x, y),The value is 1 when the feature map (x, y) points are matched as positive samples, otherwise 0, t _x,y represents the target bounding box information predicted at the feature map (x, y) points,Representing real object bounding box information corresponding to a feature map (x, y) point, s _x,y representing the centrality predicted at the feature map (x, y) point,Representing the true centrality corresponding to the points (x, y) of the feature map;

15 And (3) conveying the P3-P7 obtained in the step (13) to a detection head, wherein the P3-P7 shares a detection head, the detection head shares three subdivided branches, classification, regression and Center-less, wherein the regress and Center-less are two different small branches on the same branch, the Classification, regression and Center-less branches firstly pass through a combination module of 4 Conv2d+GN+ReLU, and then pass through a convolution layer with a convolution kernel size of 3x3 steps of 1 to obtain a final prediction result.

The technical scheme is realized by an anchor-free FCOS network model, SCConv self-calibration convolution modules and an SE Attention mechanism module. The SCConv convolution can achieve the purpose of amplifying the convolution receptive field through the inherent communication of the features, so that the diversity of the output features is further enhanced. SE Attention can better exploit dynamic relationships between feature channels.

The technical scheme has the advantages or beneficial effects that:

The target detection method provided by the technical scheme combines the latest excellent network FCOS, uses SCConv convolution which is different from standard convolution and adopts a small-size kernel (such as 3×3 convolution) to fuse the space dimension domain and channel dimension information, and SCConv can adaptively establish the dependency relationship between the remote space and the channel around each space position through self-calibration operation. Therefore, it can help CNN generate feature expression with more discrimination ability, because it has more abundant information. The SE Attention mechanism module used in the technical scheme distributes computing resources to more important tasks under the condition of limited computing capacity, solves the problem of information overload, and enables the model to focus on information more critical to the current task.

Drawings

FIG. 1 is a network block diagram of an embodiment;

FIG. 2 is a block diagram of SCConv in an embodiment;

FIG. 3 is an SC module configuration in an embodiment;

FIG. 4 is a diagram of a module of an attention mechanism in an embodiment;

FIG. 5 is a diagram of a test head structure in an embodiment;

FIG. 6 is a flow diagram of network reasoning of an embodiment;

FIG. 7 is an original image of a spinal MRI in an embodiment;

Fig. 8 is a graph of spinal MRI test results of an embodiment.

Detailed Description

The present invention will now be further illustrated, but not limited, by the following figures and examples.

Examples:

referring to fig. 6, an image detection method based on an improved FCOS network includes the steps of:

1) Firstly, a data set for training and testing is made, wherein the data set is a non-public human lumbar intervertebral disc MRI-T2 image data set collected from a network, as shown in figure 7, 470 pieces of the data set are divided into a train data set (376 pieces), a val data set (47 pieces) and a test data set (47 pieces) according to the ratio of 8:1:1;

4) Adopting a target detection platform MMDetection to perform detection, firstly, modifying COCO data set codes, replacing 80 categories in the COCO data set codes with normal and diseased 2 categories in the data set, and then adding names of the categories in an initialization file;

5) Optimizing a training process by adopting a random gradient descent method (SGD), wherein the initial learning rate is 0.005 and the momentum is 0.9;

6) Step 3) the preprocessed image is used as the input of a network model, and the structure of the model is shown in figure 1;

8) Feeding C1 into a first self-calibration convolution module SCConv _1, wherein the SCConv module is shown in FIG. 2, FIG. 2 (a) is an original structure, FIG. 2 (b) is a modified structure of the embodiment, and the internal structure of the SC module is shown in FIG. 3 to obtain an output result C2;

12 The structure of the SE Ateention module of this example is shown in fig. 4, global average pooling is used as a squeize operation, then two FC layers are combined into a Bottleneck structure to model the correlation among channels and output the weight the same as the quantity of input features, firstly, the feature dimension is reduced to 1/r of the input, then the feature dimension is increased to the original dimension through one FC layer after being activated by a ReLU, the advantage of the method is that the method has more nonlinearity than that of directly using one FC layer, can better fit the complex correlation among channels, greatly reduces the quantity of parameters and calculated quantity, then obtains the normalized weight between 0-1 through a Sigmoid gate, finally weights the normalized weight to the characteristics of each channel through a Scale operation, and after the activation operation, the size and the channel number before and after the operation are not changed, and the output S3, S4 and S5 are respectively obtained after the processing of the SE Attention module;

15 Conveying the P3-P7 obtained in the step 13) to a detection head, wherein the P3-P7 shares a detection head, the structure of the detection head is shown in figure 5, the detection head shares three subdivided branches, classification, regression and Center-less, wherein the regress and Center-less are two different small branches on the same branch, the Classification, regression and Center-less branches firstly pass through a combination module of 4 Conv2d+GN+ReLU, and then pass through a convolution layer with a convolution kernel size of 3x3 and a step distance of 1 to obtain a final prediction result, as shown in figure 8;

16 Testing the test set by using the trained network model, wherein the test results of the original FCOS network are compared with the test results of the method. The method of the embodiment can be seen to have a significantly improved accuracy over the original FCOS network detection.

Claims

1. An image detection method based on an improved FCOS network, comprising the steps of:

1) Firstly, manufacturing a data set for training and testing, wherein the data set is an MRI-T2 image data set of a human lumbar intervertebral disc, and is divided into a train data set, a val data set and a test data set according to the ratio of 8:1:1;

4) Adopting a general target detection platform MMDetection for detection, firstly, modifying COCO data set codes, replacing 80 categories in the COCO data set codes with normal and diseased 2 categories in the data set, and then adding names of the categories in an initialization file;

6) Step 3) the preprocessed image is used as the input of a network model;

12 The method comprises the steps of) sending C3, C4 and C5 into an SE Attention module, using global average pooling as a squeize operation, then forming a Bottleneck structure by two FC layers to model the correlation among channels, outputting the weight the same as the quantity of input features, firstly reducing the feature dimension to 1/r of the input, then raising the feature dimension to the original dimension through an FC layer after being activated by a ReLU, then obtaining the normalized weight between 0 and 1 through a Sigmoid gate, finally weighting the normalized weight to the feature of each channel through a Scale operation, and obtaining the output S3, S4 and S5 after being processed by the SE Attention module after the activation operation without changing the size and the number of channels;