CN116306813B - A method based on YOLOX lightweight and network optimization - Google Patents
A method based on YOLOX lightweight and network optimizationInfo
- Publication number
- CN116306813B CN116306813B CN202310212335.9A CN202310212335A CN116306813B CN 116306813 B CN116306813 B CN 116306813B CN 202310212335 A CN202310212335 A CN 202310212335A CN 116306813 B CN116306813 B CN 116306813B
- Authority
- CN
- China
- Prior art keywords
- model
- yolox
- network
- improved
- training
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/082—Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/28—Quantising the image, e.g. histogram thresholding for discrimination between background and foreground patterns
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- General Engineering & Computer Science (AREA)
- Molecular Biology (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a method based on YOLOX light weight and network optimization, which comprises the following steps of S1, preparing a data set required by training in a target detection task, S2, training an original YOLOX neural network model on the data set, recording and evaluating performance indexes of the model, S3, performing pruning operation on the original YOLOX neural network model to generate a pruned improved YOLOX network model, S4, training on the data set to generate a pruned improved YOLOX network model, S5, performing pruning operation on the improved YOLOX network model, S6, verifying and analyzing the improved YOLOX network after pruning, if the performance requirements can be met, detecting and analyzing the target, and if the performance requirements can not be met, adjusting the improved model until the performance requirements are met. The invention has higher detection precision and speed in target detection, is easier to deploy and integrate in actual application scenes, and also enables the reasoning process of the model to be more efficient and stable.
Description
Technical Field
The invention belongs to the technical field of target detection in images, and particularly relates to a method based on YOLOX light weight and network optimization.
Background
The object detection problem is to determine the location of objects in a given image and the class to which each object belongs (i.e., object localization and object classification). Today, the object detection technology has been applied to agriculture, medical treatment, automated production, etc. The target detection mainly adopts a deep learning method, and the target detection method based on the deep learning is mainly divided into two types. One class is a candidate region-based dual-stage target detection algorithm, including R-CNN, SPP-Net, fast-RCNN, fast-RCNN, etc.
Since these two-stage algorithms require the generation of a large number of candidate regions, the time-consuming time for object detection is long, and it requires the generation of a large number of candidate regions first, and then the object classification and localization of these candidate regions. This process requires significant computational resources and time, especially in high resolution images and complex scenes, which are relatively slow. Compared with a double-stage algorithm, the single-stage algorithm only needs to intensively sample in the image, and then target detection is directly carried out on the sampling points, so that a large number of candidate areas are not required to be generated, the detection speed is higher, and the method is suitable for real-time target detection. Single-stage algorithms are therefore more popular in practical applications.
YOLOX is used as an excellent single-target detection algorithm, the probability and the position coordinates of the object category can be regressed, and the algorithm speed is high. However, when the object is detected directly by YOLOX, some problems are unavoidable. The YOLOX model is large, the reasoning speed is low, so that the light weight of the YOLOX model needs to be realized, the light weight of the YOLOX network needs to be realized, the original precision needs to be kept without loss, and the detection precision needs to be maintained through network optimization after the light weight.
Disclosure of Invention
In order to overcome the defects in the prior art, the invention aims to provide a method based on YOLOX light weight and network optimization, which solves the problems of limitation of model deployment in target detection and precision reduction in model performance evaluation after light weight, has higher detection precision and speed in target detection, is easier to deploy and integrate in actual application scenes, and also enables the reasoning process of the model to be more efficient and stable.
In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:
a method based on YOLOX light weight and network optimization comprises the following steps;
The method comprises the following steps of S1, in a target detection task, preparing a data set required during training, wherein the data set selects original images under different scenes and different illumination conditions, and the model can realize accurate target detection under various environments through preprocessing the original images;
S2, training an original YOLOX neural network model on the data set, and recording and evaluating performance indexes of the model;
s3, pruning operation is carried out on the original YOLOX neural network model, and an improved YOLOX network model after pruning is generated;
S4, training on the data set to generate a pruned improved YOLOX network model;
s5, pruning operation is carried out on the improved YOLOX network model, and further adjustment is carried out so as to achieve higher detection precision and speed;
And S6, verifying and analyzing the improved YOLOX network after pruning, if the performance requirement can be met, detecting and analyzing the target, and if the performance requirement cannot be met, adjusting the improved model until the performance requirement is met.
In step S1, a data set required for training is prepared, including the steps of:
(1) Collecting data, namely collecting PASCAL VOC data sets from an open data set official network, comprising JPEGImages, imageSets and Annotations, wherein JPEGImages comprises training data sets, IMAGESETS comprises each type of train. Txt, train. Txt and val. Txt files, and Annotations comprises xml files of each type;
(2) The data preprocessing comprises preprocessing the collected original image to enable the original image to be suitable for model training, firstly, adjusting the original image to be in a designated size so as to facilitate subsequent processing, detecting the original image to be in 416x416 pixels by a selected target, then, converting a color image into a gray image, reducing complexity of data storage and processing, reducing time and calculation amount of model training, and finally, scaling pixel values in the image to be between 0 and 1 to enable the data to be more stable in the processing process, and reducing gradient explosion and gradient disappearance problems in the training process.
The step S2 trains an original YOLOX neural network model on a data set, and comprises the following steps:
(1) Adopting the data set preprocessed in the step S1;
(2) Dividing the data set into a training set and a verification set according to the proportion of 8:2;
(3) Inputting the preprocessed data set into an original YOLOX neural network model, inputting a predicted value pred and a true value gt of the network into a loss function L, and obtaining a loss value through the following formula
Loss=L(pred,gt)
Wherein, L represents a loss function, pred represents a predicted value of network output, gt represents a true value, optimizing network parameters according to the loss function L, updating the neural network parameters by using a gradient descent method, and setting the current neural network parameters as theta, wherein the updating formula is as follows:
wherein, eta represents the learning rate, Gradient of the loss function L to the parameter theta is represented, theta represents a parameter value of the t time step, theta t+1 represents a parameter value of the t+1th time step, and the neural network parameters are updated through multiple iterations, so that the network performance is optimized, and the accuracy and the speed of target detection are improved;
(4) After a round of parameter updating, the model needs to be checked by using a verification set to verify the generalization capability of the model, specifically, the verification set is input into YOLOX networks, a loss measure between a predicted result and a real result, namely, the loss degree of the verification set is calculated, the size of the verification set is set to be N, a predicted frame of an ith sample is set to be p i, the real frame is set to be t i, and the loss degree L of the verification set can be calculated as follows:
where S is the number of prediction frames per grid, C is the number of target categories, AndRespectively representing the predicted value and the true value of the c-th class in the j-th lattice of the i-th sample,AndIndicating whether the jth lattice of the ith sample has a predicted value and a true value of the target,AndConfidence predictors and true values in the j-th lattice of the i-th sample are represented, respectively, pos ij represents the index set of the prediction box with the largest cross-over ratio with the true box in the j-th lattice of the i-th sampleAndIs two weight coefficients for balancing the weights of the lattices in which the object exists and the lattices in which the object does not exist;
The performance of the current model can be evaluated by calculating the loss degree of the verification set, and if the loss degree is higher, training is required to be continued until a preset stopping condition is reached;
(5) Each time of iteration is carried out twice, pictures in the data set are input into an optimized YOLOX target detection network for training, the precision of a model is obtained, and the optimized YOLOX target detection network has higher detection precision and higher detection speed;
(6) Repeating the above steps until training is finished.
In the step S3, pruning operation is performed on the network trained in the step S2, including the following steps:
(1) For each network layer, calculating the sensitivity degree of forward propagation of one network layer on the model, and under the condition of given input, calculating the partial derivative of the output variable quantity on the weight of the layer so as to obtain the sensitivity of the layer on the output, wherein the larger the sensitivity is, the larger the influence of the layer on the output is, and the priority should be given when pruning;
(2) Respectively carrying out weight sequencing on the network layers to be pruned;
(3) Determining a threshold according to the weight sequencing result and the pruning rate in each pruning layer;
(4) Rejecting weights lower than a threshold in a network, and reserving weights higher than the threshold;
(5) And saving the new model parameters and weights to generate a pruned improved YOLOX network model.
The step S4 improves YOLOX the network model, including the steps of:
(1) Inserting a channel-space attention mechanism CBAM module between the yolox backbone layer and the data enhancement layer channels;
The CBAM module is an implementation of a channel-space attention mechanism, can effectively improve the precision of a model, and mainly comprises two parts, namely channel attention and space attention;
Firstly, carrying out average pooling and maximum pooling operation on an input feature map through a channel attention mechanism to realize space information of an aggregate feature map, generating average pooling feature F avg and maximum pooling feature F max, carrying out element summation on the average pooling feature and the maximum pooling feature after a shared network is applied to each feature through a shared network layer, and outputting a channel attention map Mc through a Sigmiod activation function on the combined feature;
(2) Replacing the original BCE cross entropy loss function with a VariFacalLoss loss function; in replacing the BCE cross entropy loss function with VariFocalLoss loss function, the output layer of the model needs to be modified, variFocalLoss loss function introduces a learnable index γ, and the weight adjustment term in the calculation formula of the loss function is modified, so that the learning of the difficult sample is emphasized more, therefore, the output layer needs to be correspondingly modified to adapt to the change, in the improved YOLOX network model, the output layer generally includes a classification branch and a regression branch, in the classification branch, each target needs to be classified, in the regression branch, the regression of the position information needs to be performed on each target, in order to adapt to the calculation of the VariFocalLoss loss function, specifically, the output of the classification branch needs to be firstly processed by the sigmoid function and then changed into the prediction probability, in the regression branch, since the VariFocalLoss loss function only modifies the classification branch, the calculation mode of the regression branch does not need to be changed;
(3) Training the improved YOLOX network model;
(4) Performing pruning operation on the trained model, and evaluating performance indexes of the model after pruning;
The attention mechanism module digs more available information through the space or channel dimension of the input feature to carry out weighting processing, enhances the perceptibility of the feature space and channel dimension, enables the network to have the capability of concentrating on inputting the feature, and obtains better detection precision.
By using VariFocalLoss to replace the cross entropy in the original loss function, the positive and negative sample weights can be improved, and the model convergence speed can be increased.
The step S6 comprises the following steps;
Firstly, evaluating the performance of an improved YOLOX network model on unseen data, if the model cannot meet the performance requirement, adjusting and improving, further optimizing the training process of the model by adjusting the learning rate of training and the super parameters of batch size, after adjusting the model, carrying out training and verification again, wherein the process needs to iterate for a plurality of times until the model meeting the performance requirement is achieved, finally, if the model after pruning is improved, carrying out detection analysis on the target, wherein the target detection refers to detecting the position and the category of the target in an image or a video, and realizing faster and more accurate target detection by deploying the model after pruning, thereby improving the efficiency and the accuracy in practical application.
Experiments prove that the method provided by the invention has higher detection precision and speed in the target detection task. Compared with the traditional target detection method, the method provided by the invention can greatly improve the detection speed on the premise of keeping high precision. Therefore, the invention has higher practicability and economic benefit. The invention has the beneficial effects that:
The invention provides a method based on YOLOX light weight and network optimization, which is characterized in that a channel-space attention mechanism module is inserted into the joint of a feature extraction and enhancement layer so as to enhance the feature extraction capability of targets with different scales and inhibit the interference of redundant information. Meanwhile, variFocalLoss is used for replacing cross entropy in the original loss function, a weight value is added to the positive and negative samples, and the sharing weight of the positive and negative samples to the total loss function value is controlled, so that the model is more focused on the samples difficult to separate in the training process, and the problem of unbalanced sample types is solved.
Although the optimization strategy in the present invention has effectively improved the accuracy of target detection, further optimization is still needed to achieve end-to-end real-time target detection on mobile devices. In order to solve the problem, the invention uses a pruning strategy to compress the model volume, reduces the calculation amount of the model and realizes the end-to-end real-time target detection on the mobile equipment.
However, the pruning operation may have an influence on the model accuracy, so that the detection accuracy is lowered. Therefore, the invention verifies and analyzes the model after pruning, if the performance requirement can be met, the target is detected and analyzed, and if the performance requirement cannot be met, the improved model is adjusted until the performance requirement is met. Therefore, the light weight of the model can be ensured, and meanwhile, the higher target detection precision can be maintained, and the problem that the evaluation performance of the original YOLOX model after pruning is reduced is solved.
Therefore, the invention comprehensively applies various optimization strategies such as light weight, network optimization, loss function improvement, pruning and the like, effectively improves the end-to-end real-time target detection performance on the mobile equipment, and solves the problems that the original YOLOX still has large model volume, high floating point number operation amount, poor instantaneity and reduced accuracy after pruning when deployed on the embedded equipment.
Drawings
Fig. 1 is a flow chart of a method based on YOLOX light weight and network optimization according to the present invention.
Fig. 2 is a flow diagram of YOLOX network training.
FIG. 3 is a schematic diagram of a model architecture based on YOLOX network optimization.
FIG. 4 is a flow chart based on YOLOX model training.
Fig. 5 is a block diagram of the CBAM attention mechanism.
FIG. 6 is a schematic diagram of accuracy of dataset detection based on a modified YOLOX network model.
Detailed Description
The invention is described in further detail below with reference to the accompanying drawings.
In order to enhance the understanding of the present invention, the present invention will be further described in detail with reference to the accompanying drawings.
As shown in fig. 1-3, the invention provides a target detection method based on YOLOX light weight, which comprises the following steps:
s1, preparing a data set required during training in a target detection task, wherein the data set comprises target samples of different categories, and meanwhile, taking original images of targets in different scenes and under different illumination conditions into consideration, and ensuring that the model can realize accurate target detection in various environments by preprocessing the original images;
the quality and number of datasets are critical to the training and performance of the model. Thus, a dataset associated with the object detection task needs to be selected and preprocessed and cleaned for subsequent model training and evaluation.
S2, training an original YOLOX neural network model on the data set, and recording and evaluating performance indexes of the model;
Training an original target detection model by using a data set, and recording performance indexes of the model, wherein the performance indexes are accuracy, recall rate and precision, the accuracy represents positive sample proportion accurately predicted by the model, the recall rate represents positive sample proportion accurately detected by the model, the precision represents all sample proportion accurately predicted by the model, and meanwhile, the learning rate and weight attenuation in the training process are adjusted to improve the performance of the model;
s3, performing pruning operation on the original YOLOX neural network model, and generating a pruned improved YOLOX network model;
The method comprises the steps of carrying out pruning operation on an original network model based on a preset pruning strategy to reduce parameters and calculated amount in the model so as to realize light weight, carrying out training again on the pruned model, evaluating performance indexes of the pruned model so as to verify the influence of the pruning operation on the performance of the model, and carrying out pruning in a target detection task to remove some redundant network structures so as to reduce calculated amount and improve the reasoning speed of the model;
S4, training and improving YOLOX a network model on the data set, and improving the performance index of the detection precision evaluation model;
In order to improve the performance of the target detection model, a channel-space attention mechanism CBAM module is inserted between a main layer and a data enhancement layer of the original lightweight model, so that the model can automatically pay attention to important features related to target detection when processing images, and the attention degree and the anti-interference capability of a network are improved. The original BCE cross entropy loss function is replaced by VariFocal Loss loss function, so that the classification performance of the model and the identification capability of difficult samples are improved. VariFocal Loss introducing variable key parameters into the loss function, so that the model is more concerned with samples which are difficult to classify, thereby improving the classification performance of the model, training the improved model on a data set, and recording and evaluating the performance index of the model;
S5, pruning operation is performed on the improved YOLOX network model, and the improved model is further adjusted to achieve higher detection precision and speed, so that in a target detection task, the light weight and efficient deployment of the model are of great importance, because the speed and the precision of the model are directly influenced by the model. The improved network model is subjected to pruning operation, so that parameters and calculation amount of the model are further reduced, rapid and efficient model deployment is realized, meanwhile, the pruned model can be more suitable for scenes with limited resources such as embedded equipment, and the universality and portability of the model are improved;
and S6, verifying and analyzing the improved pruned model to ensure that the improved pruned model can meet the performance requirements of the target detection task, and if the improved pruned model cannot meet the requirements, adjusting the improved model until the performance requirements are met.
The S4 process of using the modified YOLOX network model includes the following steps:
s41, in order to enhance the feature extraction capability of the improved YOLOX network model on targets with different scales and inhibit the interference of redundant information, a means of inserting a channel-space attention mechanism CBAM module between a trunk layer and a data enhancement layer channel is adopted. The CBAM module can adaptively learn the importance of channels and spaces in the feature map and weight and adjust the feature map accordingly, so that the network is more concerned with feature information contributing to target identification and positioning.
S42, in order to solve the problem of sample class imbalance, a means of replacing the original BCE cross entropy loss function with VariFacalLoss loss functions is adopted. VariFocalLoss can add different weights to positive and negative samples, so that the model is more concerned with samples which are difficult to classify, the sample types are balanced better in the training process, and the classification accuracy of the model is improved.
S43, training the improved YOLOX network model to optimize the performance of the network model. In the training process, the network continuously adjusts the weight and the bias through a back propagation algorithm to minimize the value of the loss function, thereby improving the classification and positioning capability of the model.
S44, in order to reduce the model volume and the calculation amount and facilitate the end-to-end real-time target detection on the mobile device, a means of performing pruning operation on the trained model is adopted. The pruning operation may remove unnecessary connections and parameters, thereby reducing the volume and computation of the model while maintaining the accuracy of the model. The pruned model needs to be trained again and the performance index of the pruned model is evaluated so as to ensure that the pruned model can still maintain high-precision target detection capability.
In the step S43, the process of training the network model comprises the following steps:
s431, collecting a data set, and calibrating the original data set;
s432, dividing the picture into a data set and a verification set according to the proportion of 8:2;
s433, inputting the data set into YOLOX neural network, inputting the predicted value and the true value of the network into a loss function, solving the loss value, and updating the neural network parameters according to a gradient descent method;
and S434, inputting the verification set into YOLOX networks for verification after each round of parameter updating, and calculating the loss degree of the verification machine.
S435, after each iteration is carried out twice, inputting the pictures in the data set into the trained model to obtain the precision of the model;
And S436, repeating the steps until the epoch reaches 300 rounds and then ending, wherein the model is converged.
The pruning process for the network model in the step S3 comprises the following steps:
And S31, determining a network layer to be pruned. Before model pruning, the network model needs to be analyzed and evaluated to determine which network layers are pruned. This process requires consideration of the number of parameters, the amount of computation, and the degree of contribution to the overall performance of the model for each network layer in the model. Therefore, the calculated amount and the volume of the model can be reduced to the greatest extent, and meanwhile, the detection accuracy of the model is ensured not to be excessively reduced.
And S32, carrying out weight sorting on the network layer to be pruned, wherein the weight sorting of the network layer is one of important steps of pruning operation. Before pruning, all parameters in the network need to be ordered to determine which parameters contribute less to the performance of the network and which parameters contribute more to the performance, and then the network layer with less performance contribution is selected for pruning.
S33, determining the threshold value according to the weight. By weighting the network layers and thresholding, unnecessary or redundant portions of the network can be identified and pruning can then be performed.
And S34, pruning the part lower than the threshold value to obtain new parameters. After determining the pruning threshold, pruning is performed on network parameters below the threshold. Specifically, the weights in the network are compressed, redundant and unnecessary parameters are removed, and therefore the purposes of reducing network storage and calculation amount are achieved.
And S35, storing new model parameters and weights, and generating a pruned model. The number of the model parameters after pruning is reduced, the calculated amount is correspondingly reduced, the reasoning speed and efficiency of the model can be obviously improved, and the real-time target detection can be realized in the scene with limited resources such as embedded equipment.
The invention is further described in connection with the relevant background art and the implementation steps:
In the step S1, a picture using a common data set is selected, wherein the picture includes VOC2007 and VOC2012 data sets, including 21143 training images and corresponding xml files.
In the step S2, performance evaluation indexes including model accuracy mAP and model size parameters (M) are recorded as criteria for subsequent performance evaluation.
YOLOX is the latest target detection algorithm of the YOLO series, not only realizes the detection precision exceeding the previous YOLO series, but also achieves the effect of extremely competitive in the end-to-end reasoning speed, however, when YOLOX is deployed on embedded equipment, the problems of large model volume, high floating point number operation amount, poor instantaneity and the like exist, and in order to solve the problems, unnecessary energy consumption caused by model pre-training is avoided, so that the light-weight method of YOLOX is provided.
(1) Pruning YOLOX network model
Reading a network model obtained by YOLOX training, saving model weight and structure, determining a convolution layer to be cut when the model size is 3797KB, determining a layer to be cut as a C3_p4 layer, a C3_n3 layer, a reduce_conv1 layer and a bu_conv2 layer in the data enhancement layer, respectively sequencing the weight sizes in each layer, multiplying the maximum weight value by a set pruning rate (40%) to serve as a pruning weight threshold value, resetting the neuron weight lower than the threshold value to 0 and a weight neuron higher than the threshold value, saving new parameters and weights, and generating a new model structure after pruning;
The original model is used for directly carrying out light pruning, although the model can be compressed, the detection speed is improved, no difficulty exists in the process of downloading the embedded board card, the performance evaluation result of the model can be directly and greatly reduced, and therefore the network model needs to be perfected and improved, and the attention mechanism is introduced and meanwhile a loss function is replaced;
(1) Improving and optimizing YOLOX network model
In step S41, as shown in FIG. 5, the fused convolution block attention mechanism CBAM is a combination of the channel attention mechanism and the spatial attention mechanism. Firstly, carrying out average pooling and maximum pooling operation on an input feature map through a channel attention mechanism respectively to realize space information of an aggregate feature map, generating average pooling feature Favg and maximum pooling feature Fmax, applying a shared network to each feature through a shared network layer, carrying out element summation on the average pooling feature and the maximum pooling feature, and outputting channel attention mapping Mc through Sigmiod activation functions on the combined features. The spatial attention carries out the operations of average pooling and maximum pooling on the feature images along the channel axis, so that the feature images are compressed in the channel dimension, and the two feature images are spliced in the channel dimension to generate an effective feature image, and then the effective feature image passes through a convolution layer of 7X 7. Finally, the final channel attention map Ms is obtained by Sigmiod function operations.
As shown in fig. 4, the network structure of YOLOX is improved. In the cross-phase local network CSPNet layer, the channel attention mechanism input portion of connection CBAM, the spatial attention mechanism output portion of CBAM, the data enhancement layer is connected.
The attention mechanism is introduced, a channel-space attention mechanism module is inserted at the joint of the feature extraction and enhancement layers, effective features are screened out from the channel dimension and the space dimension respectively, irrelevant features are restrained, the expression capability of the features is enhanced, and the recognition accuracy of the model is improved.
In step S43, in YOLOX target detection, the cross entropy loss function has a problem of extreme imbalance between the target class and the background class. The problem of unbalance between the target class and the background class can be effectively solved by using the Focal loss. The Focal loss formula is as follows:
The method comprises the steps of (1) setting p as the prediction probability of a target class, setting the range as [ -1,1], setting y as the true positive and negative sample class, setting the value as 1 or-1, setting alpha as an adjustable scale factor, (1-p) setting the beta power of the target modulation class factor, setting the beta power of the p as the background modulation factor, and setting the two modulation factors to reduce the contribution of a simple sample and increase the importance of a false detection sample. The Focal loss can solve the problem of class imbalance during training by using a weighting method.
The positive and negative samples are processed in an equal mode, and in actual detection, the contribution of the positive samples is more important, so that the Focal loss is further improved, varifocal loss is based on cross entropy binary, and the problem of class unbalance in training is processed in a Focal loss weighting mode is referred. The cross entropy binary formula is:
Wherein p is a predicted value representing a target score q is a classification condition, and for a target class, the positive sample class q value is set to a value between a pre-selected box and IoU, otherwise, is set to 0. For the background class, the target q value for all classes is 0. As shown in the above equation Varifocal loss uses the p's β -th power scaling factor to process negative samples, but not positive samples. This can highlight the contribution of positive samples.
(3) Improved target detection network model
The invention relates to three parts of an attention mechanism, a loss function and model pruning. And a attention mechanism is introduced between the trunk feature extraction layer and the data enhancement layer, so that the network has the capability of concentrating on inputting the features of the network, and better detection precision is obtained. In the loss function part, the BCE cross entropy loss function is replaced by VariFocalLoss functions, so that the attention of difficult samples in the data set is improved, and the sample balance is realized. The improved target detection network model is pruned, so that the light weight of YOLOX network models is realized, and the problems of large model volume, high floating point number operation amount and poor real-time performance existing in the deployment of the original YOLOX network on embedded equipment are solved. (as shown in FIG. 6)
In summary, the invention mainly solves the technical problems of large model volume, high floating point number operation and poor real-time performance existing in the prior YOLOX when deployed on embedded equipment, and the problem of reduced evaluation performance after pruning of the prior YOLOX model;
The above disclosure is only an example of the present invention and it is not intended to limit the scope of the claims, and those skilled in the art will understand the procedures for implementing the above examples and make equivalent changes according to the claims of the present invention.
Claims (5)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202310212335.9A CN116306813B (en) | 2023-03-07 | 2023-03-07 | A method based on YOLOX lightweight and network optimization |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202310212335.9A CN116306813B (en) | 2023-03-07 | 2023-03-07 | A method based on YOLOX lightweight and network optimization |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN116306813A CN116306813A (en) | 2023-06-23 |
| CN116306813B true CN116306813B (en) | 2025-08-12 |
Family
ID=86821771
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202310212335.9A Active CN116306813B (en) | 2023-03-07 | 2023-03-07 | A method based on YOLOX lightweight and network optimization |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN116306813B (en) |
Families Citing this family (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN117036427A (en) * | 2023-08-11 | 2023-11-10 | 江苏颂泽科技有限公司 | Industrial printed matter image registration method and device based on lightweight network |
| CN117237599A (en) * | 2023-08-25 | 2023-12-15 | 中银金融科技有限公司 | Image target detection method and device |
| CN117314840A (en) * | 2023-09-12 | 2023-12-29 | 中国科学院空间应用工程与技术中心 | Methods, systems, storage media and equipment for detecting small impact craters on the surface of extraterrestrial objects |
| CN117197841B (en) * | 2023-09-22 | 2025-10-28 | 深圳市天双科技有限公司 | Pedestrian detection method and system for marine vessels |
| CN118762160A (en) * | 2024-06-03 | 2024-10-11 | 徐州华东机械有限公司 | Foreign body detection method for lightweight belt conveyor based on MO-YOLOX network |
| CN118468968B (en) * | 2024-07-12 | 2024-09-17 | 杭州字节方舟科技有限公司 | Deep neural network compression method based on joint dynamic pruning |
| CN119478620B (en) * | 2024-07-29 | 2025-11-28 | 广东工业大学 | Target detection method based on improvement YOLOv n |
| CN119622456A (en) * | 2024-11-21 | 2025-03-14 | 吉林大学 | A method for training end-to-end autonomous driving policies |
| CN120632840B (en) * | 2025-08-15 | 2025-11-21 | 浙江大学滨江研究院 | A Model Fingerprint Injection and Verification Method and Device Based on Side Branch Networks |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN115393690A (en) * | 2022-09-02 | 2022-11-25 | 西安工业大学 | Light neural network air-to-ground observation multi-target identification method |
| CN115471667A (en) * | 2022-09-08 | 2022-12-13 | 重庆邮电大学 | Lightweight target detection method for improving YOLOX network structure |
Family Cites Families (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN112396181A (en) * | 2020-12-31 | 2021-02-23 | 之江实验室 | Automatic pruning method and platform for general compression architecture of convolutional neural network |
| CN114898171B (en) * | 2022-04-07 | 2023-09-22 | 中国科学院光电技术研究所 | A real-time target detection method suitable for embedded platforms |
-
2023
- 2023-03-07 CN CN202310212335.9A patent/CN116306813B/en active Active
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN115393690A (en) * | 2022-09-02 | 2022-11-25 | 西安工业大学 | Light neural network air-to-ground observation multi-target identification method |
| CN115471667A (en) * | 2022-09-08 | 2022-12-13 | 重庆邮电大学 | Lightweight target detection method for improving YOLOX network structure |
Also Published As
| Publication number | Publication date |
|---|---|
| CN116306813A (en) | 2023-06-23 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN116306813B (en) | A method based on YOLOX lightweight and network optimization | |
| CN118711000B (en) | Bearing surface defect detection method and system based on improved YOLOv10 | |
| CN111160176B (en) | Fusion feature-based ground radar target classification method for one-dimensional convolutional neural network | |
| CN116258941B (en) | Lightweight improvement method of yolox target detection based on Android platform | |
| CN114758288A (en) | A kind of distribution network engineering safety management and control detection method and device | |
| CN113609904B (en) | Single-target tracking algorithm based on dynamic global information modeling and twin network | |
| CN115311502A (en) | A small sample scene classification method for remote sensing images based on multi-scale dual-stream architecture | |
| CN117708771B (en) | ITSOBP-based comprehensive transmission device fault prediction algorithm | |
| CN117058552A (en) | Lightweight pest detection method based on improved YOLOv7 and RKNPU2 | |
| CN114627467A (en) | Rice growth period identification method and system based on improved neural network | |
| CN118397427A (en) | A strawberry fruit recognition method based on improved YOLOv5s | |
| CN113205103A (en) | A Lightweight Tattoo Detection Method | |
| CN116778311A (en) | An underwater target detection method based on improved Faster R-CNN | |
| CN118429329A (en) | Road disease identification method and device based on RD-YOLO network | |
| CN119152502A (en) | Landscape plant image semantic segmentation method based on weak supervision | |
| CN119445348A (en) | An improved YOLOv8 fish image recognition method based on transfer learning | |
| CN117315380A (en) | Deep learning-based pneumonia CT image classification method and system | |
| CN114821182A (en) | Rice growth stage image recognition method | |
| CN118396958A (en) | Defect detection method for crystalline silicon component of solar cell | |
| CN114972845A (en) | Two-stage target intelligent detection algorithm and system based on meta-learning | |
| Zhao et al. | Neural network based on convolution and self-attention fusion mechanism for plant leaves disease recognition | |
| CN117371511A (en) | Training method, device, equipment and storage medium for image classification model | |
| CN114863485A (en) | Cross-domain pedestrian re-identification method and system based on deep mutual learning | |
| CN120047818A (en) | TEADISEASELITENET-based lightweight tea disease target detection method | |
| CN118247813A (en) | A person re-identification method based on adaptive optimization network structure |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |