[go: up one dir, main page]

CN116306813B - A method based on YOLOX lightweight and network optimization - Google Patents

A method based on YOLOX lightweight and network optimization

Info

Publication number
CN116306813B
CN116306813B CN202310212335.9A CN202310212335A CN116306813B CN 116306813 B CN116306813 B CN 116306813B CN 202310212335 A CN202310212335 A CN 202310212335A CN 116306813 B CN116306813 B CN 116306813B
Authority
CN
China
Prior art keywords
model
yolox
network
improved
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310212335.9A
Other languages
Chinese (zh)
Other versions
CN116306813A (en
Inventor
张文博
马梓益
姬红兵
李林
臧博
李斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN202310212335.9A priority Critical patent/CN116306813B/en
Publication of CN116306813A publication Critical patent/CN116306813A/en
Application granted granted Critical
Publication of CN116306813B publication Critical patent/CN116306813B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/28Quantising the image, e.g. histogram thresholding for discrimination between background and foreground patterns
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method based on YOLOX light weight and network optimization, which comprises the following steps of S1, preparing a data set required by training in a target detection task, S2, training an original YOLOX neural network model on the data set, recording and evaluating performance indexes of the model, S3, performing pruning operation on the original YOLOX neural network model to generate a pruned improved YOLOX network model, S4, training on the data set to generate a pruned improved YOLOX network model, S5, performing pruning operation on the improved YOLOX network model, S6, verifying and analyzing the improved YOLOX network after pruning, if the performance requirements can be met, detecting and analyzing the target, and if the performance requirements can not be met, adjusting the improved model until the performance requirements are met. The invention has higher detection precision and speed in target detection, is easier to deploy and integrate in actual application scenes, and also enables the reasoning process of the model to be more efficient and stable.

Description

YOLOX-based lightweight and network optimization method
Technical Field
The invention belongs to the technical field of target detection in images, and particularly relates to a method based on YOLOX light weight and network optimization.
Background
The object detection problem is to determine the location of objects in a given image and the class to which each object belongs (i.e., object localization and object classification). Today, the object detection technology has been applied to agriculture, medical treatment, automated production, etc. The target detection mainly adopts a deep learning method, and the target detection method based on the deep learning is mainly divided into two types. One class is a candidate region-based dual-stage target detection algorithm, including R-CNN, SPP-Net, fast-RCNN, fast-RCNN, etc.
Since these two-stage algorithms require the generation of a large number of candidate regions, the time-consuming time for object detection is long, and it requires the generation of a large number of candidate regions first, and then the object classification and localization of these candidate regions. This process requires significant computational resources and time, especially in high resolution images and complex scenes, which are relatively slow. Compared with a double-stage algorithm, the single-stage algorithm only needs to intensively sample in the image, and then target detection is directly carried out on the sampling points, so that a large number of candidate areas are not required to be generated, the detection speed is higher, and the method is suitable for real-time target detection. Single-stage algorithms are therefore more popular in practical applications.
YOLOX is used as an excellent single-target detection algorithm, the probability and the position coordinates of the object category can be regressed, and the algorithm speed is high. However, when the object is detected directly by YOLOX, some problems are unavoidable. The YOLOX model is large, the reasoning speed is low, so that the light weight of the YOLOX model needs to be realized, the light weight of the YOLOX network needs to be realized, the original precision needs to be kept without loss, and the detection precision needs to be maintained through network optimization after the light weight.
Disclosure of Invention
In order to overcome the defects in the prior art, the invention aims to provide a method based on YOLOX light weight and network optimization, which solves the problems of limitation of model deployment in target detection and precision reduction in model performance evaluation after light weight, has higher detection precision and speed in target detection, is easier to deploy and integrate in actual application scenes, and also enables the reasoning process of the model to be more efficient and stable.
In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:
a method based on YOLOX light weight and network optimization comprises the following steps;
The method comprises the following steps of S1, in a target detection task, preparing a data set required during training, wherein the data set selects original images under different scenes and different illumination conditions, and the model can realize accurate target detection under various environments through preprocessing the original images;
S2, training an original YOLOX neural network model on the data set, and recording and evaluating performance indexes of the model;
s3, pruning operation is carried out on the original YOLOX neural network model, and an improved YOLOX network model after pruning is generated;
S4, training on the data set to generate a pruned improved YOLOX network model;
s5, pruning operation is carried out on the improved YOLOX network model, and further adjustment is carried out so as to achieve higher detection precision and speed;
And S6, verifying and analyzing the improved YOLOX network after pruning, if the performance requirement can be met, detecting and analyzing the target, and if the performance requirement cannot be met, adjusting the improved model until the performance requirement is met.
In step S1, a data set required for training is prepared, including the steps of:
(1) Collecting data, namely collecting PASCAL VOC data sets from an open data set official network, comprising JPEGImages, imageSets and Annotations, wherein JPEGImages comprises training data sets, IMAGESETS comprises each type of train. Txt, train. Txt and val. Txt files, and Annotations comprises xml files of each type;
(2) The data preprocessing comprises preprocessing the collected original image to enable the original image to be suitable for model training, firstly, adjusting the original image to be in a designated size so as to facilitate subsequent processing, detecting the original image to be in 416x416 pixels by a selected target, then, converting a color image into a gray image, reducing complexity of data storage and processing, reducing time and calculation amount of model training, and finally, scaling pixel values in the image to be between 0 and 1 to enable the data to be more stable in the processing process, and reducing gradient explosion and gradient disappearance problems in the training process.
The step S2 trains an original YOLOX neural network model on a data set, and comprises the following steps:
(1) Adopting the data set preprocessed in the step S1;
(2) Dividing the data set into a training set and a verification set according to the proportion of 8:2;
(3) Inputting the preprocessed data set into an original YOLOX neural network model, inputting a predicted value pred and a true value gt of the network into a loss function L, and obtaining a loss value through the following formula
Loss=L(pred,gt)
Wherein, L represents a loss function, pred represents a predicted value of network output, gt represents a true value, optimizing network parameters according to the loss function L, updating the neural network parameters by using a gradient descent method, and setting the current neural network parameters as theta, wherein the updating formula is as follows:
wherein, eta represents the learning rate, Gradient of the loss function L to the parameter theta is represented, theta represents a parameter value of the t time step, theta t+1 represents a parameter value of the t+1th time step, and the neural network parameters are updated through multiple iterations, so that the network performance is optimized, and the accuracy and the speed of target detection are improved;
(4) After a round of parameter updating, the model needs to be checked by using a verification set to verify the generalization capability of the model, specifically, the verification set is input into YOLOX networks, a loss measure between a predicted result and a real result, namely, the loss degree of the verification set is calculated, the size of the verification set is set to be N, a predicted frame of an ith sample is set to be p i, the real frame is set to be t i, and the loss degree L of the verification set can be calculated as follows:
where S is the number of prediction frames per grid, C is the number of target categories, AndRespectively representing the predicted value and the true value of the c-th class in the j-th lattice of the i-th sample,AndIndicating whether the jth lattice of the ith sample has a predicted value and a true value of the target,AndConfidence predictors and true values in the j-th lattice of the i-th sample are represented, respectively, pos ij represents the index set of the prediction box with the largest cross-over ratio with the true box in the j-th lattice of the i-th sampleAndIs two weight coefficients for balancing the weights of the lattices in which the object exists and the lattices in which the object does not exist;
The performance of the current model can be evaluated by calculating the loss degree of the verification set, and if the loss degree is higher, training is required to be continued until a preset stopping condition is reached;
(5) Each time of iteration is carried out twice, pictures in the data set are input into an optimized YOLOX target detection network for training, the precision of a model is obtained, and the optimized YOLOX target detection network has higher detection precision and higher detection speed;
(6) Repeating the above steps until training is finished.
In the step S3, pruning operation is performed on the network trained in the step S2, including the following steps:
(1) For each network layer, calculating the sensitivity degree of forward propagation of one network layer on the model, and under the condition of given input, calculating the partial derivative of the output variable quantity on the weight of the layer so as to obtain the sensitivity of the layer on the output, wherein the larger the sensitivity is, the larger the influence of the layer on the output is, and the priority should be given when pruning;
(2) Respectively carrying out weight sequencing on the network layers to be pruned;
(3) Determining a threshold according to the weight sequencing result and the pruning rate in each pruning layer;
(4) Rejecting weights lower than a threshold in a network, and reserving weights higher than the threshold;
(5) And saving the new model parameters and weights to generate a pruned improved YOLOX network model.
The step S4 improves YOLOX the network model, including the steps of:
(1) Inserting a channel-space attention mechanism CBAM module between the yolox backbone layer and the data enhancement layer channels;
The CBAM module is an implementation of a channel-space attention mechanism, can effectively improve the precision of a model, and mainly comprises two parts, namely channel attention and space attention;
Firstly, carrying out average pooling and maximum pooling operation on an input feature map through a channel attention mechanism to realize space information of an aggregate feature map, generating average pooling feature F avg and maximum pooling feature F max, carrying out element summation on the average pooling feature and the maximum pooling feature after a shared network is applied to each feature through a shared network layer, and outputting a channel attention map Mc through a Sigmiod activation function on the combined feature;
(2) Replacing the original BCE cross entropy loss function with a VariFacalLoss loss function; in replacing the BCE cross entropy loss function with VariFocalLoss loss function, the output layer of the model needs to be modified, variFocalLoss loss function introduces a learnable index γ, and the weight adjustment term in the calculation formula of the loss function is modified, so that the learning of the difficult sample is emphasized more, therefore, the output layer needs to be correspondingly modified to adapt to the change, in the improved YOLOX network model, the output layer generally includes a classification branch and a regression branch, in the classification branch, each target needs to be classified, in the regression branch, the regression of the position information needs to be performed on each target, in order to adapt to the calculation of the VariFocalLoss loss function, specifically, the output of the classification branch needs to be firstly processed by the sigmoid function and then changed into the prediction probability, in the regression branch, since the VariFocalLoss loss function only modifies the classification branch, the calculation mode of the regression branch does not need to be changed;
(3) Training the improved YOLOX network model;
(4) Performing pruning operation on the trained model, and evaluating performance indexes of the model after pruning;
The attention mechanism module digs more available information through the space or channel dimension of the input feature to carry out weighting processing, enhances the perceptibility of the feature space and channel dimension, enables the network to have the capability of concentrating on inputting the feature, and obtains better detection precision.
By using VariFocalLoss to replace the cross entropy in the original loss function, the positive and negative sample weights can be improved, and the model convergence speed can be increased.
The step S6 comprises the following steps;
Firstly, evaluating the performance of an improved YOLOX network model on unseen data, if the model cannot meet the performance requirement, adjusting and improving, further optimizing the training process of the model by adjusting the learning rate of training and the super parameters of batch size, after adjusting the model, carrying out training and verification again, wherein the process needs to iterate for a plurality of times until the model meeting the performance requirement is achieved, finally, if the model after pruning is improved, carrying out detection analysis on the target, wherein the target detection refers to detecting the position and the category of the target in an image or a video, and realizing faster and more accurate target detection by deploying the model after pruning, thereby improving the efficiency and the accuracy in practical application.
Experiments prove that the method provided by the invention has higher detection precision and speed in the target detection task. Compared with the traditional target detection method, the method provided by the invention can greatly improve the detection speed on the premise of keeping high precision. Therefore, the invention has higher practicability and economic benefit. The invention has the beneficial effects that:
The invention provides a method based on YOLOX light weight and network optimization, which is characterized in that a channel-space attention mechanism module is inserted into the joint of a feature extraction and enhancement layer so as to enhance the feature extraction capability of targets with different scales and inhibit the interference of redundant information. Meanwhile, variFocalLoss is used for replacing cross entropy in the original loss function, a weight value is added to the positive and negative samples, and the sharing weight of the positive and negative samples to the total loss function value is controlled, so that the model is more focused on the samples difficult to separate in the training process, and the problem of unbalanced sample types is solved.
Although the optimization strategy in the present invention has effectively improved the accuracy of target detection, further optimization is still needed to achieve end-to-end real-time target detection on mobile devices. In order to solve the problem, the invention uses a pruning strategy to compress the model volume, reduces the calculation amount of the model and realizes the end-to-end real-time target detection on the mobile equipment.
However, the pruning operation may have an influence on the model accuracy, so that the detection accuracy is lowered. Therefore, the invention verifies and analyzes the model after pruning, if the performance requirement can be met, the target is detected and analyzed, and if the performance requirement cannot be met, the improved model is adjusted until the performance requirement is met. Therefore, the light weight of the model can be ensured, and meanwhile, the higher target detection precision can be maintained, and the problem that the evaluation performance of the original YOLOX model after pruning is reduced is solved.
Therefore, the invention comprehensively applies various optimization strategies such as light weight, network optimization, loss function improvement, pruning and the like, effectively improves the end-to-end real-time target detection performance on the mobile equipment, and solves the problems that the original YOLOX still has large model volume, high floating point number operation amount, poor instantaneity and reduced accuracy after pruning when deployed on the embedded equipment.
Drawings
Fig. 1 is a flow chart of a method based on YOLOX light weight and network optimization according to the present invention.
Fig. 2 is a flow diagram of YOLOX network training.
FIG. 3 is a schematic diagram of a model architecture based on YOLOX network optimization.
FIG. 4 is a flow chart based on YOLOX model training.
Fig. 5 is a block diagram of the CBAM attention mechanism.
FIG. 6 is a schematic diagram of accuracy of dataset detection based on a modified YOLOX network model.
Detailed Description
The invention is described in further detail below with reference to the accompanying drawings.
In order to enhance the understanding of the present invention, the present invention will be further described in detail with reference to the accompanying drawings.
As shown in fig. 1-3, the invention provides a target detection method based on YOLOX light weight, which comprises the following steps:
s1, preparing a data set required during training in a target detection task, wherein the data set comprises target samples of different categories, and meanwhile, taking original images of targets in different scenes and under different illumination conditions into consideration, and ensuring that the model can realize accurate target detection in various environments by preprocessing the original images;
the quality and number of datasets are critical to the training and performance of the model. Thus, a dataset associated with the object detection task needs to be selected and preprocessed and cleaned for subsequent model training and evaluation.
S2, training an original YOLOX neural network model on the data set, and recording and evaluating performance indexes of the model;
Training an original target detection model by using a data set, and recording performance indexes of the model, wherein the performance indexes are accuracy, recall rate and precision, the accuracy represents positive sample proportion accurately predicted by the model, the recall rate represents positive sample proportion accurately detected by the model, the precision represents all sample proportion accurately predicted by the model, and meanwhile, the learning rate and weight attenuation in the training process are adjusted to improve the performance of the model;
s3, performing pruning operation on the original YOLOX neural network model, and generating a pruned improved YOLOX network model;
The method comprises the steps of carrying out pruning operation on an original network model based on a preset pruning strategy to reduce parameters and calculated amount in the model so as to realize light weight, carrying out training again on the pruned model, evaluating performance indexes of the pruned model so as to verify the influence of the pruning operation on the performance of the model, and carrying out pruning in a target detection task to remove some redundant network structures so as to reduce calculated amount and improve the reasoning speed of the model;
S4, training and improving YOLOX a network model on the data set, and improving the performance index of the detection precision evaluation model;
In order to improve the performance of the target detection model, a channel-space attention mechanism CBAM module is inserted between a main layer and a data enhancement layer of the original lightweight model, so that the model can automatically pay attention to important features related to target detection when processing images, and the attention degree and the anti-interference capability of a network are improved. The original BCE cross entropy loss function is replaced by VariFocal Loss loss function, so that the classification performance of the model and the identification capability of difficult samples are improved. VariFocal Loss introducing variable key parameters into the loss function, so that the model is more concerned with samples which are difficult to classify, thereby improving the classification performance of the model, training the improved model on a data set, and recording and evaluating the performance index of the model;
S5, pruning operation is performed on the improved YOLOX network model, and the improved model is further adjusted to achieve higher detection precision and speed, so that in a target detection task, the light weight and efficient deployment of the model are of great importance, because the speed and the precision of the model are directly influenced by the model. The improved network model is subjected to pruning operation, so that parameters and calculation amount of the model are further reduced, rapid and efficient model deployment is realized, meanwhile, the pruned model can be more suitable for scenes with limited resources such as embedded equipment, and the universality and portability of the model are improved;
and S6, verifying and analyzing the improved pruned model to ensure that the improved pruned model can meet the performance requirements of the target detection task, and if the improved pruned model cannot meet the requirements, adjusting the improved model until the performance requirements are met.
The S4 process of using the modified YOLOX network model includes the following steps:
s41, in order to enhance the feature extraction capability of the improved YOLOX network model on targets with different scales and inhibit the interference of redundant information, a means of inserting a channel-space attention mechanism CBAM module between a trunk layer and a data enhancement layer channel is adopted. The CBAM module can adaptively learn the importance of channels and spaces in the feature map and weight and adjust the feature map accordingly, so that the network is more concerned with feature information contributing to target identification and positioning.
S42, in order to solve the problem of sample class imbalance, a means of replacing the original BCE cross entropy loss function with VariFacalLoss loss functions is adopted. VariFocalLoss can add different weights to positive and negative samples, so that the model is more concerned with samples which are difficult to classify, the sample types are balanced better in the training process, and the classification accuracy of the model is improved.
S43, training the improved YOLOX network model to optimize the performance of the network model. In the training process, the network continuously adjusts the weight and the bias through a back propagation algorithm to minimize the value of the loss function, thereby improving the classification and positioning capability of the model.
S44, in order to reduce the model volume and the calculation amount and facilitate the end-to-end real-time target detection on the mobile device, a means of performing pruning operation on the trained model is adopted. The pruning operation may remove unnecessary connections and parameters, thereby reducing the volume and computation of the model while maintaining the accuracy of the model. The pruned model needs to be trained again and the performance index of the pruned model is evaluated so as to ensure that the pruned model can still maintain high-precision target detection capability.
In the step S43, the process of training the network model comprises the following steps:
s431, collecting a data set, and calibrating the original data set;
s432, dividing the picture into a data set and a verification set according to the proportion of 8:2;
s433, inputting the data set into YOLOX neural network, inputting the predicted value and the true value of the network into a loss function, solving the loss value, and updating the neural network parameters according to a gradient descent method;
and S434, inputting the verification set into YOLOX networks for verification after each round of parameter updating, and calculating the loss degree of the verification machine.
S435, after each iteration is carried out twice, inputting the pictures in the data set into the trained model to obtain the precision of the model;
And S436, repeating the steps until the epoch reaches 300 rounds and then ending, wherein the model is converged.
The pruning process for the network model in the step S3 comprises the following steps:
And S31, determining a network layer to be pruned. Before model pruning, the network model needs to be analyzed and evaluated to determine which network layers are pruned. This process requires consideration of the number of parameters, the amount of computation, and the degree of contribution to the overall performance of the model for each network layer in the model. Therefore, the calculated amount and the volume of the model can be reduced to the greatest extent, and meanwhile, the detection accuracy of the model is ensured not to be excessively reduced.
And S32, carrying out weight sorting on the network layer to be pruned, wherein the weight sorting of the network layer is one of important steps of pruning operation. Before pruning, all parameters in the network need to be ordered to determine which parameters contribute less to the performance of the network and which parameters contribute more to the performance, and then the network layer with less performance contribution is selected for pruning.
S33, determining the threshold value according to the weight. By weighting the network layers and thresholding, unnecessary or redundant portions of the network can be identified and pruning can then be performed.
And S34, pruning the part lower than the threshold value to obtain new parameters. After determining the pruning threshold, pruning is performed on network parameters below the threshold. Specifically, the weights in the network are compressed, redundant and unnecessary parameters are removed, and therefore the purposes of reducing network storage and calculation amount are achieved.
And S35, storing new model parameters and weights, and generating a pruned model. The number of the model parameters after pruning is reduced, the calculated amount is correspondingly reduced, the reasoning speed and efficiency of the model can be obviously improved, and the real-time target detection can be realized in the scene with limited resources such as embedded equipment.
The invention is further described in connection with the relevant background art and the implementation steps:
In the step S1, a picture using a common data set is selected, wherein the picture includes VOC2007 and VOC2012 data sets, including 21143 training images and corresponding xml files.
In the step S2, performance evaluation indexes including model accuracy mAP and model size parameters (M) are recorded as criteria for subsequent performance evaluation.
YOLOX is the latest target detection algorithm of the YOLO series, not only realizes the detection precision exceeding the previous YOLO series, but also achieves the effect of extremely competitive in the end-to-end reasoning speed, however, when YOLOX is deployed on embedded equipment, the problems of large model volume, high floating point number operation amount, poor instantaneity and the like exist, and in order to solve the problems, unnecessary energy consumption caused by model pre-training is avoided, so that the light-weight method of YOLOX is provided.
(1) Pruning YOLOX network model
Reading a network model obtained by YOLOX training, saving model weight and structure, determining a convolution layer to be cut when the model size is 3797KB, determining a layer to be cut as a C3_p4 layer, a C3_n3 layer, a reduce_conv1 layer and a bu_conv2 layer in the data enhancement layer, respectively sequencing the weight sizes in each layer, multiplying the maximum weight value by a set pruning rate (40%) to serve as a pruning weight threshold value, resetting the neuron weight lower than the threshold value to 0 and a weight neuron higher than the threshold value, saving new parameters and weights, and generating a new model structure after pruning;
The original model is used for directly carrying out light pruning, although the model can be compressed, the detection speed is improved, no difficulty exists in the process of downloading the embedded board card, the performance evaluation result of the model can be directly and greatly reduced, and therefore the network model needs to be perfected and improved, and the attention mechanism is introduced and meanwhile a loss function is replaced;
(1) Improving and optimizing YOLOX network model
In step S41, as shown in FIG. 5, the fused convolution block attention mechanism CBAM is a combination of the channel attention mechanism and the spatial attention mechanism. Firstly, carrying out average pooling and maximum pooling operation on an input feature map through a channel attention mechanism respectively to realize space information of an aggregate feature map, generating average pooling feature Favg and maximum pooling feature Fmax, applying a shared network to each feature through a shared network layer, carrying out element summation on the average pooling feature and the maximum pooling feature, and outputting channel attention mapping Mc through Sigmiod activation functions on the combined features. The spatial attention carries out the operations of average pooling and maximum pooling on the feature images along the channel axis, so that the feature images are compressed in the channel dimension, and the two feature images are spliced in the channel dimension to generate an effective feature image, and then the effective feature image passes through a convolution layer of 7X 7. Finally, the final channel attention map Ms is obtained by Sigmiod function operations.
As shown in fig. 4, the network structure of YOLOX is improved. In the cross-phase local network CSPNet layer, the channel attention mechanism input portion of connection CBAM, the spatial attention mechanism output portion of CBAM, the data enhancement layer is connected.
The attention mechanism is introduced, a channel-space attention mechanism module is inserted at the joint of the feature extraction and enhancement layers, effective features are screened out from the channel dimension and the space dimension respectively, irrelevant features are restrained, the expression capability of the features is enhanced, and the recognition accuracy of the model is improved.
In step S43, in YOLOX target detection, the cross entropy loss function has a problem of extreme imbalance between the target class and the background class. The problem of unbalance between the target class and the background class can be effectively solved by using the Focal loss. The Focal loss formula is as follows:
The method comprises the steps of (1) setting p as the prediction probability of a target class, setting the range as [ -1,1], setting y as the true positive and negative sample class, setting the value as 1 or-1, setting alpha as an adjustable scale factor, (1-p) setting the beta power of the target modulation class factor, setting the beta power of the p as the background modulation factor, and setting the two modulation factors to reduce the contribution of a simple sample and increase the importance of a false detection sample. The Focal loss can solve the problem of class imbalance during training by using a weighting method.
The positive and negative samples are processed in an equal mode, and in actual detection, the contribution of the positive samples is more important, so that the Focal loss is further improved, varifocal loss is based on cross entropy binary, and the problem of class unbalance in training is processed in a Focal loss weighting mode is referred. The cross entropy binary formula is:
Wherein p is a predicted value representing a target score q is a classification condition, and for a target class, the positive sample class q value is set to a value between a pre-selected box and IoU, otherwise, is set to 0. For the background class, the target q value for all classes is 0. As shown in the above equation Varifocal loss uses the p's β -th power scaling factor to process negative samples, but not positive samples. This can highlight the contribution of positive samples.
(3) Improved target detection network model
The invention relates to three parts of an attention mechanism, a loss function and model pruning. And a attention mechanism is introduced between the trunk feature extraction layer and the data enhancement layer, so that the network has the capability of concentrating on inputting the features of the network, and better detection precision is obtained. In the loss function part, the BCE cross entropy loss function is replaced by VariFocalLoss functions, so that the attention of difficult samples in the data set is improved, and the sample balance is realized. The improved target detection network model is pruned, so that the light weight of YOLOX network models is realized, and the problems of large model volume, high floating point number operation amount and poor real-time performance existing in the deployment of the original YOLOX network on embedded equipment are solved. (as shown in FIG. 6)
In summary, the invention mainly solves the technical problems of large model volume, high floating point number operation and poor real-time performance existing in the prior YOLOX when deployed on embedded equipment, and the problem of reduced evaluation performance after pruning of the prior YOLOX model;
The above disclosure is only an example of the present invention and it is not intended to limit the scope of the claims, and those skilled in the art will understand the procedures for implementing the above examples and make equivalent changes according to the claims of the present invention.

Claims (5)

1.一种基于YOLOX轻量化及网络优化的方法,其特征在于,包括以下步骤;1. A method for lightweighting and network optimization based on YOLOX, characterized by comprising the following steps; S1:在目标检测任务中,准备训练时所需的数据集;所述数据集选取目标在不同场景、不同光照条件下的原始图像,通过对原始图像的预处理,以确保模型能够在各种环境下实现准确的目标检测;S1: In the target detection task, prepare the dataset required for training; the dataset selects original images of the target in different scenes and different lighting conditions, and preprocesses the original images to ensure that the model can achieve accurate target detection in various environments; S2:在所述数据集上训练原始YOLOX神经网络模型,记录和评估模型的性能指标;S2: Train the original YOLOX neural network model on the dataset, and record and evaluate the performance metrics of the model; S3:对原始YOLOX神经网络模型执行剪枝操作,生成剪枝后的改进YOLOX网络模型;S3: Perform pruning operations on the original YOLOX neural network model to generate an improved YOLOX network model after pruning; S4:在数据集上训练生成剪枝后的改进YOLOX网络模型;S4: Train the pruned improved YOLOX network model on the dataset; S5:对改进YOLOX网络模型执行剪枝操作,进行进一步调整,以达到更高的检测精度和速度;S5: Perform pruning operations on the improved YOLOX network model and make further adjustments to achieve higher detection accuracy and speed; S6:对改进剪枝后的改进YOLOX网络进行验证和分析,若能满足性能上的要求,则对目标进行检测分析;如不能满足性能要求,则调整改进模型,直至满足性能要求为止;S6: Verify and analyze the improved YOLOX network after pruning. If it meets the performance requirements, perform detection analysis on the target. If it does not meet the performance requirements, adjust the improved model until it meets the performance requirements. 所述步骤S4改进YOLOX网络模型,包括下列步骤:The step S4 improves the YOLOX network model, including the following steps: (1)在yolox主干层和数据增强层通道之间插入通道-空间注意力机制CBAM模块;(1) Insert the channel-spatial attention mechanism CBAM module between the yolox backbone layer and the data enhancement layer channel; 首先通过通道注意力机制,将输入的特征图分别进行平均池化和最大池化操作,实现聚合特征图的空间信息,生成的平均池化特征Favg和最大池化特征Fmax通过共享网络层,将共享网络应用到每个特征后,将平均池化特征和最大池化特征进行元素求和,并将合并的特征通过Sigmiod激活函数输出通道注意映射Mc;空间注意力沿着通道轴对特征图进行平均池化和最大池化操作,使特征图在通道维度上进行压缩,并将两个特征图在通道维度上进行拼接生成一个有效的特征图,随后经过7X7的卷积层;最后,通过Sigmiod函数操作得到最终的通道注意映射Ms;First, through the channel attention mechanism, the input feature map is subjected to average pooling and maximum pooling operations respectively to realize the spatial information of the aggregated feature map. The generated average pooling feature F avg and maximum pooling feature F max are passed through the shared network layer. After the shared network is applied to each feature, the average pooling feature and the maximum pooling feature are element-wise summed, and the merged features are output as the channel attention map Mc through the Sigmiod activation function; spatial attention performs average pooling and maximum pooling operations on the feature map along the channel axis to compress the feature map in the channel dimension, and the two feature maps are spliced in the channel dimension to generate a valid feature map, which is then passed through a 7X7 convolution layer; finally, the final channel attention map Ms is obtained through the Sigmiod function operation; (2)将原有的BCE交叉熵损失函数替换为VariFacalLoss损失函数;在替换BCE交叉熵损失函数为VariFocalLoss损失函数的过程中,需要修改模型的输出层,VariFocalLoss损失函数会引入一个可学习的指数γ,并且将损失函数的计算公式中的权重调整项进行了修改,从而更加注重难样本的学习,在改进YOLOX网络模型中,输出层通常包括分类分支和回归分支,在分类分支中,需要对每个目标进行分类,而在回归分支中,需要对每个目标进行位置信息的回归,将分类分支的输出先进行sigmoid函数的处理,然后再将其变成预测概率,根据这个概率来计算VariFocalLoss损失函数;(2) Replace the original BCE cross entropy loss function with the VariFacalLoss loss function; in the process of replacing the BCE cross entropy loss function with the VariFocalLoss loss function, it is necessary to modify the output layer of the model. The VariFocalLoss loss function introduces a learnable exponent γ and modifies the weight adjustment term in the calculation formula of the loss function, thereby paying more attention to the learning of difficult samples. In the improved YOLOX network model, the output layer usually includes a classification branch and a regression branch. In the classification branch, each target needs to be classified, and in the regression branch, the position information of each target needs to be regressed. The output of the classification branch is first processed by the sigmoid function and then converted into a predicted probability. The VariFocalLoss loss function is calculated based on this probability. (3)对改进YOLOX网络模型进行训练;(3) Train the improved YOLOX network model; (4)对训练好的模型执行剪枝操作,并评估模型剪枝后的性能指标。(4) Perform pruning operations on the trained model and evaluate the performance indicators of the pruned model. 2.根据权利要求1所述的一种基于YOLOX轻量化及网络优化的方法,其特征在于,所述步骤S1中,准备训练时所需的数据集,包括以下步骤:2. The method for lightweighting and network optimization based on YOLOX according to claim 1, wherein in step S1, preparing the data set required for training comprises the following steps: (1)收集数据:从开放的数据集官网中收集PASCAL VOC数据集,包括JPEGImages、ImageSets和Annotations;其中JPEGImages包含训练的数据集,ImageSets包含每种类型的train.txt、trainval.txt和val.txt文件,Annotations包含每一类的xml文件;(1) Data collection: The PASCAL VOC dataset, including JPEGImages, ImageSets, and Annotations, was collected from the official website of the open dataset. JPEGImages contains the training dataset, ImageSets contains the train.txt, trainval.txt, and val.txt files of each type, and Annotations contains the XML files of each type. (2)数据预处理:数据预处理是对收集到的原始图像进行预处理,使其适合于模型训练;首先,将原始图像调整为指定的大小,以便于后续处理;接下来,将彩色图像转换为灰度图像,最后,将图像中的像素值缩放到0到1之间。(2) Data preprocessing: Data preprocessing is to preprocess the collected original images to make them suitable for model training; first, the original images are resized to a specified size for subsequent processing; next, the color images are converted to grayscale images, and finally, the pixel values in the images are scaled to between 0 and 1. 3.根据权利要求1所述的一种基于YOLOX轻量化及网络优化的方法,其特征在于,所述步骤S2在数据集上训练原始YOLOX神经网络模型,包括下列步骤:3. The method for lightweighting and network optimization based on YOLOX according to claim 1, wherein step S2 trains the original YOLOX neural network model on the data set, comprising the following steps: (1)采用步骤S1预处理后的数据集;(1) Using the dataset preprocessed in step S1; (2)将数据集划分为训练集和验证集;(2) Divide the dataset into training set and validation set; (3)将经过预处理后的数据集输入到原始YOLOX神经网络模型中,网络的预测值pred与真实值gt输入到损失函数L中,通过以下公式求取损失值(3) Input the preprocessed data set into the original YOLOX neural network model, and input the network's predicted value pred and the true value gt into the loss function L. The loss value is calculated using the following formula Loss=L(pred,gt)Loss = L(pred,gt) 其中,L表示损失函数,pred表示网络输出的预测值,gt表示真实值,根据损失函数L对网络参数进行优化,使用梯度下降法更新神经网络参数,设当前神经网络参数为θ,则更新公式为:Where L represents the loss function, pred represents the predicted value of the network output, and gt represents the true value. The network parameters are optimized according to the loss function L, and the gradient descent method is used to update the neural network parameters. If the current neural network parameters are θ, the update formula is: 其中,η表示学习率,表示损失函数L对参数θ的梯度,θ表示第t个时间步的参数值,θt+1表示第t+1个时间步的参数值,通过多次迭代更新神经网络参数;Among them, η represents the learning rate, represents the gradient of the loss function L with respect to the parameter θ, θ represents the parameter value at the t-th time step, and θ t+1 represents the parameter value at the t+1-th time step. The neural network parameters are updated through multiple iterations; (4)通过一轮参数更新后,使用验证集对模型进行检验,以验证模型的泛化能力,将验证集输入到YOLOX网络中,计算预测结果与真实结果之间的损失度量,验证集损失度,设验证集大小为N,第i个样本的预测框为pi,真实框为ti,则验证集损失度L计算如下:(4) After a round of parameter update, the model is tested using the validation set to verify the generalization ability of the model. The validation set is input into the YOLOX network to calculate the loss metric between the predicted results and the true results, the validation set loss. Assuming the validation set size is N, the predicted box of the i-th sample is p i , and the true box is t i , the validation set loss L is calculated as follows: 其中,S是每个格子预测框的数量,C是目标类别数,分别表示第i个样本的第j个格子中第c个类别的预测值和真实值,分别表示第i个样本的第j个格子是否存在目标的预测值和真实值,分别表示第i个样本的第j个格子中的置信度预测值和真实值,posij表示第i个样本的第j个格子中与真实框有最大交并比的预测框的索引集合是两个权重系数,用于平衡存在目标的格子和不存在目标的格子的权重;Among them, S is the number of prediction boxes for each grid, C is the number of target categories, and They represent the predicted value and true value of the cth category in the jth grid of the i-th sample, respectively. and They respectively represent the predicted value and true value of whether the target exists in the j-th grid of the i-th sample, and Represents the confidence prediction value and true value in the jth grid of the i-th sample, respectively. pos ij represents the index set of the predicted box with the maximum intersection-over-union ratio with the true box in the jth grid of the i-th sample. and are two weight coefficients used to balance the weights of grids with targets and grids without targets; (5)每迭代两次,将数据集中的图片,输入到经过优化的YOLOX目标检测网络中进行训练,获得模型的精度;(5) After two iterations, the images in the dataset are input into the optimized YOLOX object detection network for training to obtain the accuracy of the model; (6)重复以上步骤直到训练结束。(6) Repeat the above steps until the training is completed. 4.根据权利要求1所述的一种基于YOLOX轻量化及网络优化的方法,其特征在于,所述步骤S3中对步骤S2中训练得到的网络执行剪枝操作,包括下列步骤:4. The method for lightweighting and network optimization based on YOLOX according to claim 1, wherein the step S3 performs a pruning operation on the network trained in step S2, comprising the following steps: (1)根据网络层的重要性指标,确定要剪枝的层;对于每个网络层,计算一个网络层在模型上前向传播的敏感程度,在给定输入的情况下,计算输出的变化量对该层权重的偏导数,从而得到该层对输出的敏感度,敏感度越大,则该层对输出的影响越大,剪枝时应该优先考虑;(1) Determine the layers to be pruned based on the importance index of the network layer; for each network layer, calculate the sensitivity of the network layer to the forward propagation of the model. Under the given input, calculate the partial derivative of the change in the output with respect to the weight of the layer, thereby obtaining the sensitivity of the layer to the output. The greater the sensitivity, the greater the impact of the layer on the output, and the layer should be given priority when pruning; (2)对要剪枝的网络层分别进行权重排序;(2) Sort the weights of the network layers to be pruned; (3)根据每个剪枝层中权重排序结果和剪枝率,确定阈值;(3) Determine the threshold value based on the weight ranking results and pruning rate in each pruning layer; (4)剔除网络中低于阈值的权重,保留高于阈值的权重;(4) Eliminate the weights below the threshold in the network and retain the weights above the threshold; (5)保存新的模型参数与权重,生成剪枝后的改进YOLOX网络模型。(5) Save the new model parameters and weights to generate the pruned improved YOLOX network model. 5.根据权利要求1所述的一种基于YOLOX轻量化及网络优化的方法,其特征在于,所述步骤S6包括以下步骤;5. The method for lightweighting and network optimization based on YOLOX according to claim 1, wherein step S6 comprises the following steps: 首先,需要评估改进YOLOX网络模型在未见过的数据上的性能,如果模型无法满足性能要求,进行调整改进,通过调整训练的学习率、批次大小的超参数,来进一步优化模型的训练过程,调整模型后,需要重新进行训练和验证,这个过程需要多次迭代,直到达到满足性能要求的模型,最后,如果改进剪枝后的模型能够满足性能要求,对目标进行检测分析。First, it is necessary to evaluate the performance of the improved YOLOX network model on unseen data. If the model cannot meet the performance requirements, adjustments and improvements are made. By adjusting the training learning rate and batch size hyperparameters, the model training process is further optimized. After adjusting the model, it is necessary to retrain and verify. This process requires multiple iterations until a model that meets the performance requirements is achieved. Finally, if the improved pruned model can meet the performance requirements, the target is detected and analyzed.
CN202310212335.9A 2023-03-07 2023-03-07 A method based on YOLOX lightweight and network optimization Active CN116306813B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310212335.9A CN116306813B (en) 2023-03-07 2023-03-07 A method based on YOLOX lightweight and network optimization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310212335.9A CN116306813B (en) 2023-03-07 2023-03-07 A method based on YOLOX lightweight and network optimization

Publications (2)

Publication Number Publication Date
CN116306813A CN116306813A (en) 2023-06-23
CN116306813B true CN116306813B (en) 2025-08-12

Family

ID=86821771

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310212335.9A Active CN116306813B (en) 2023-03-07 2023-03-07 A method based on YOLOX lightweight and network optimization

Country Status (1)

Country Link
CN (1) CN116306813B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117036427A (en) * 2023-08-11 2023-11-10 江苏颂泽科技有限公司 Industrial printed matter image registration method and device based on lightweight network
CN117237599A (en) * 2023-08-25 2023-12-15 中银金融科技有限公司 Image target detection method and device
CN117314840A (en) * 2023-09-12 2023-12-29 中国科学院空间应用工程与技术中心 Methods, systems, storage media and equipment for detecting small impact craters on the surface of extraterrestrial objects
CN117197841B (en) * 2023-09-22 2025-10-28 深圳市天双科技有限公司 Pedestrian detection method and system for marine vessels
CN118762160A (en) * 2024-06-03 2024-10-11 徐州华东机械有限公司 Foreign body detection method for lightweight belt conveyor based on MO-YOLOX network
CN118468968B (en) * 2024-07-12 2024-09-17 杭州字节方舟科技有限公司 Deep neural network compression method based on joint dynamic pruning
CN119478620B (en) * 2024-07-29 2025-11-28 广东工业大学 Target detection method based on improvement YOLOv n
CN119622456A (en) * 2024-11-21 2025-03-14 吉林大学 A method for training end-to-end autonomous driving policies
CN120632840B (en) * 2025-08-15 2025-11-21 浙江大学滨江研究院 A Model Fingerprint Injection and Verification Method and Device Based on Side Branch Networks

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115393690A (en) * 2022-09-02 2022-11-25 西安工业大学 Light neural network air-to-ground observation multi-target identification method
CN115471667A (en) * 2022-09-08 2022-12-13 重庆邮电大学 Lightweight target detection method for improving YOLOX network structure

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112396181A (en) * 2020-12-31 2021-02-23 之江实验室 Automatic pruning method and platform for general compression architecture of convolutional neural network
CN114898171B (en) * 2022-04-07 2023-09-22 中国科学院光电技术研究所 A real-time target detection method suitable for embedded platforms

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115393690A (en) * 2022-09-02 2022-11-25 西安工业大学 Light neural network air-to-ground observation multi-target identification method
CN115471667A (en) * 2022-09-08 2022-12-13 重庆邮电大学 Lightweight target detection method for improving YOLOX network structure

Also Published As

Publication number Publication date
CN116306813A (en) 2023-06-23

Similar Documents

Publication Publication Date Title
CN116306813B (en) A method based on YOLOX lightweight and network optimization
CN118711000B (en) Bearing surface defect detection method and system based on improved YOLOv10
CN111160176B (en) Fusion feature-based ground radar target classification method for one-dimensional convolutional neural network
CN116258941B (en) Lightweight improvement method of yolox target detection based on Android platform
CN114758288A (en) A kind of distribution network engineering safety management and control detection method and device
CN113609904B (en) Single-target tracking algorithm based on dynamic global information modeling and twin network
CN115311502A (en) A small sample scene classification method for remote sensing images based on multi-scale dual-stream architecture
CN117708771B (en) ITSOBP-based comprehensive transmission device fault prediction algorithm
CN117058552A (en) Lightweight pest detection method based on improved YOLOv7 and RKNPU2
CN114627467A (en) Rice growth period identification method and system based on improved neural network
CN118397427A (en) A strawberry fruit recognition method based on improved YOLOv5s
CN113205103A (en) A Lightweight Tattoo Detection Method
CN116778311A (en) An underwater target detection method based on improved Faster R-CNN
CN118429329A (en) Road disease identification method and device based on RD-YOLO network
CN119152502A (en) Landscape plant image semantic segmentation method based on weak supervision
CN119445348A (en) An improved YOLOv8 fish image recognition method based on transfer learning
CN117315380A (en) Deep learning-based pneumonia CT image classification method and system
CN114821182A (en) Rice growth stage image recognition method
CN118396958A (en) Defect detection method for crystalline silicon component of solar cell
CN114972845A (en) Two-stage target intelligent detection algorithm and system based on meta-learning
Zhao et al. Neural network based on convolution and self-attention fusion mechanism for plant leaves disease recognition
CN117371511A (en) Training method, device, equipment and storage medium for image classification model
CN114863485A (en) Cross-domain pedestrian re-identification method and system based on deep mutual learning
CN120047818A (en) TEADISEASELITENET-based lightweight tea disease target detection method
CN118247813A (en) A person re-identification method based on adaptive optimization network structure

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant