CN116306813B

CN116306813B - A method based on YOLOX lightweight and network optimization

Info

Publication number: CN116306813B
Application number: CN202310212335.9A
Authority: CN
Inventors: 张文博; 马梓益; 姬红兵; 李林; 臧博; 李斌
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2023-03-07
Filing date: 2023-03-07
Publication date: 2025-08-12
Anticipated expiration: 2043-03-07
Also published as: CN116306813A

Abstract

The invention discloses a method based on YOLOX light weight and network optimization, which comprises the following steps of S1, preparing a data set required by training in a target detection task, S2, training an original YOLOX neural network model on the data set, recording and evaluating performance indexes of the model, S3, performing pruning operation on the original YOLOX neural network model to generate a pruned improved YOLOX network model, S4, training on the data set to generate a pruned improved YOLOX network model, S5, performing pruning operation on the improved YOLOX network model, S6, verifying and analyzing the improved YOLOX network after pruning, if the performance requirements can be met, detecting and analyzing the target, and if the performance requirements can not be met, adjusting the improved model until the performance requirements are met. The invention has higher detection precision and speed in target detection, is easier to deploy and integrate in actual application scenes, and also enables the reasoning process of the model to be more efficient and stable.

Description

YOLOX-based lightweight and network optimization method

Technical Field

The invention belongs to the technical field of target detection in images, and particularly relates to a method based on YOLOX light weight and network optimization.

Background

The object detection problem is to determine the location of objects in a given image and the class to which each object belongs (i.e., object localization and object classification). Today, the object detection technology has been applied to agriculture, medical treatment, automated production, etc. The target detection mainly adopts a deep learning method, and the target detection method based on the deep learning is mainly divided into two types. One class is a candidate region-based dual-stage target detection algorithm, including R-CNN, SPP-Net, fast-RCNN, fast-RCNN, etc.

Since these two-stage algorithms require the generation of a large number of candidate regions, the time-consuming time for object detection is long, and it requires the generation of a large number of candidate regions first, and then the object classification and localization of these candidate regions. This process requires significant computational resources and time, especially in high resolution images and complex scenes, which are relatively slow. Compared with a double-stage algorithm, the single-stage algorithm only needs to intensively sample in the image, and then target detection is directly carried out on the sampling points, so that a large number of candidate areas are not required to be generated, the detection speed is higher, and the method is suitable for real-time target detection. Single-stage algorithms are therefore more popular in practical applications.

YOLOX is used as an excellent single-target detection algorithm, the probability and the position coordinates of the object category can be regressed, and the algorithm speed is high. However, when the object is detected directly by YOLOX, some problems are unavoidable. The YOLOX model is large, the reasoning speed is low, so that the light weight of the YOLOX model needs to be realized, the light weight of the YOLOX network needs to be realized, the original precision needs to be kept without loss, and the detection precision needs to be maintained through network optimization after the light weight.

Disclosure of Invention

In order to overcome the defects in the prior art, the invention aims to provide a method based on YOLOX light weight and network optimization, which solves the problems of limitation of model deployment in target detection and precision reduction in model performance evaluation after light weight, has higher detection precision and speed in target detection, is easier to deploy and integrate in actual application scenes, and also enables the reasoning process of the model to be more efficient and stable.

In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:

a method based on YOLOX light weight and network optimization comprises the following steps;

The method comprises the following steps of S1, in a target detection task, preparing a data set required during training, wherein the data set selects original images under different scenes and different illumination conditions, and the model can realize accurate target detection under various environments through preprocessing the original images;

S2, training an original YOLOX neural network model on the data set, and recording and evaluating performance indexes of the model;

s3, pruning operation is carried out on the original YOLOX neural network model, and an improved YOLOX network model after pruning is generated;

S4, training on the data set to generate a pruned improved YOLOX network model;

s5, pruning operation is carried out on the improved YOLOX network model, and further adjustment is carried out so as to achieve higher detection precision and speed;

And S6, verifying and analyzing the improved YOLOX network after pruning, if the performance requirement can be met, detecting and analyzing the target, and if the performance requirement cannot be met, adjusting the improved model until the performance requirement is met.

In step S1, a data set required for training is prepared, including the steps of:

(1) Collecting data, namely collecting PASCAL VOC data sets from an open data set official network, comprising JPEGImages, imageSets and Annotations, wherein JPEGImages comprises training data sets, IMAGESETS comprises each type of train. Txt, train. Txt and val. Txt files, and Annotations comprises xml files of each type;

(2) The data preprocessing comprises preprocessing the collected original image to enable the original image to be suitable for model training, firstly, adjusting the original image to be in a designated size so as to facilitate subsequent processing, detecting the original image to be in 416x416 pixels by a selected target, then, converting a color image into a gray image, reducing complexity of data storage and processing, reducing time and calculation amount of model training, and finally, scaling pixel values in the image to be between 0 and 1 to enable the data to be more stable in the processing process, and reducing gradient explosion and gradient disappearance problems in the training process.

The step S2 trains an original YOLOX neural network model on a data set, and comprises the following steps:

(1) Adopting the data set preprocessed in the step S1;

(2) Dividing the data set into a training set and a verification set according to the proportion of 8:2;

(3) Inputting the preprocessed data set into an original YOLOX neural network model, inputting a predicted value pred and a true value gt of the network into a loss function L, and obtaining a loss value through the following formula

Loss=L(pred,gt)

Wherein, L represents a loss function, pred represents a predicted value of network output, gt represents a true value, optimizing network parameters according to the loss function L, updating the neural network parameters by using a gradient descent method, and setting the current neural network parameters as theta, wherein the updating formula is as follows:

wherein, eta represents the learning rate, Gradient of the loss function L to the parameter theta is represented, theta represents a parameter value of the t time step, theta _t+1 represents a parameter value of the t+1th time step, and the neural network parameters are updated through multiple iterations, so that the network performance is optimized, and the accuracy and the speed of target detection are improved;

(4) After a round of parameter updating, the model needs to be checked by using a verification set to verify the generalization capability of the model, specifically, the verification set is input into YOLOX networks, a loss measure between a predicted result and a real result, namely, the loss degree of the verification set is calculated, the size of the verification set is set to be N, a predicted frame of an ith sample is set to be p _i, the real frame is set to be t _i, and the loss degree L of the verification set can be calculated as follows:

where S is the number of prediction frames per grid, C is the number of target categories, AndRespectively representing the predicted value and the true value of the c-th class in the j-th lattice of the i-th sample,AndIndicating whether the jth lattice of the ith sample has a predicted value and a true value of the target,AndConfidence predictors and true values in the j-th lattice of the i-th sample are represented, respectively, pos _ij represents the index set of the prediction box with the largest cross-over ratio with the true box in the j-th lattice of the i-th sampleAndIs two weight coefficients for balancing the weights of the lattices in which the object exists and the lattices in which the object does not exist;

The performance of the current model can be evaluated by calculating the loss degree of the verification set, and if the loss degree is higher, training is required to be continued until a preset stopping condition is reached;

(5) Each time of iteration is carried out twice, pictures in the data set are input into an optimized YOLOX target detection network for training, the precision of a model is obtained, and the optimized YOLOX target detection network has higher detection precision and higher detection speed;

(6) Repeating the above steps until training is finished.

In the step S3, pruning operation is performed on the network trained in the step S2, including the following steps:

(1) For each network layer, calculating the sensitivity degree of forward propagation of one network layer on the model, and under the condition of given input, calculating the partial derivative of the output variable quantity on the weight of the layer so as to obtain the sensitivity of the layer on the output, wherein the larger the sensitivity is, the larger the influence of the layer on the output is, and the priority should be given when pruning;

(2) Respectively carrying out weight sequencing on the network layers to be pruned;

(3) Determining a threshold according to the weight sequencing result and the pruning rate in each pruning layer;

(4) Rejecting weights lower than a threshold in a network, and reserving weights higher than the threshold;

(5) And saving the new model parameters and weights to generate a pruned improved YOLOX network model.

The step S4 improves YOLOX the network model, including the steps of:

(1) Inserting a channel-space attention mechanism CBAM module between the yolox backbone layer and the data enhancement layer channels;

The CBAM module is an implementation of a channel-space attention mechanism, can effectively improve the precision of a model, and mainly comprises two parts, namely channel attention and space attention;

Firstly, carrying out average pooling and maximum pooling operation on an input feature map through a channel attention mechanism to realize space information of an aggregate feature map, generating average pooling feature F _avg and maximum pooling feature F _max, carrying out element summation on the average pooling feature and the maximum pooling feature after a shared network is applied to each feature through a shared network layer, and outputting a channel attention map Mc through a Sigmiod activation function on the combined feature;

(2) Replacing the original BCE cross entropy loss function with a VariFacalLoss loss function; in replacing the BCE cross entropy loss function with VariFocalLoss loss function, the output layer of the model needs to be modified, variFocalLoss loss function introduces a learnable index γ, and the weight adjustment term in the calculation formula of the loss function is modified, so that the learning of the difficult sample is emphasized more, therefore, the output layer needs to be correspondingly modified to adapt to the change, in the improved YOLOX network model, the output layer generally includes a classification branch and a regression branch, in the classification branch, each target needs to be classified, in the regression branch, the regression of the position information needs to be performed on each target, in order to adapt to the calculation of the VariFocalLoss loss function, specifically, the output of the classification branch needs to be firstly processed by the sigmoid function and then changed into the prediction probability, in the regression branch, since the VariFocalLoss loss function only modifies the classification branch, the calculation mode of the regression branch does not need to be changed;

(3) Training the improved YOLOX network model;

(4) Performing pruning operation on the trained model, and evaluating performance indexes of the model after pruning;

The attention mechanism module digs more available information through the space or channel dimension of the input feature to carry out weighting processing, enhances the perceptibility of the feature space and channel dimension, enables the network to have the capability of concentrating on inputting the feature, and obtains better detection precision.

By using VariFocalLoss to replace the cross entropy in the original loss function, the positive and negative sample weights can be improved, and the model convergence speed can be increased.

The step S6 comprises the following steps;

Firstly, evaluating the performance of an improved YOLOX network model on unseen data, if the model cannot meet the performance requirement, adjusting and improving, further optimizing the training process of the model by adjusting the learning rate of training and the super parameters of batch size, after adjusting the model, carrying out training and verification again, wherein the process needs to iterate for a plurality of times until the model meeting the performance requirement is achieved, finally, if the model after pruning is improved, carrying out detection analysis on the target, wherein the target detection refers to detecting the position and the category of the target in an image or a video, and realizing faster and more accurate target detection by deploying the model after pruning, thereby improving the efficiency and the accuracy in practical application.

Experiments prove that the method provided by the invention has higher detection precision and speed in the target detection task. Compared with the traditional target detection method, the method provided by the invention can greatly improve the detection speed on the premise of keeping high precision. Therefore, the invention has higher practicability and economic benefit. The invention has the beneficial effects that:

The invention provides a method based on YOLOX light weight and network optimization, which is characterized in that a channel-space attention mechanism module is inserted into the joint of a feature extraction and enhancement layer so as to enhance the feature extraction capability of targets with different scales and inhibit the interference of redundant information. Meanwhile, variFocalLoss is used for replacing cross entropy in the original loss function, a weight value is added to the positive and negative samples, and the sharing weight of the positive and negative samples to the total loss function value is controlled, so that the model is more focused on the samples difficult to separate in the training process, and the problem of unbalanced sample types is solved.

Although the optimization strategy in the present invention has effectively improved the accuracy of target detection, further optimization is still needed to achieve end-to-end real-time target detection on mobile devices. In order to solve the problem, the invention uses a pruning strategy to compress the model volume, reduces the calculation amount of the model and realizes the end-to-end real-time target detection on the mobile equipment.

However, the pruning operation may have an influence on the model accuracy, so that the detection accuracy is lowered. Therefore, the invention verifies and analyzes the model after pruning, if the performance requirement can be met, the target is detected and analyzed, and if the performance requirement cannot be met, the improved model is adjusted until the performance requirement is met. Therefore, the light weight of the model can be ensured, and meanwhile, the higher target detection precision can be maintained, and the problem that the evaluation performance of the original YOLOX model after pruning is reduced is solved.

Therefore, the invention comprehensively applies various optimization strategies such as light weight, network optimization, loss function improvement, pruning and the like, effectively improves the end-to-end real-time target detection performance on the mobile equipment, and solves the problems that the original YOLOX still has large model volume, high floating point number operation amount, poor instantaneity and reduced accuracy after pruning when deployed on the embedded equipment.

Drawings

Fig. 1 is a flow chart of a method based on YOLOX light weight and network optimization according to the present invention.

Fig. 2 is a flow diagram of YOLOX network training.

FIG. 3 is a schematic diagram of a model architecture based on YOLOX network optimization.

FIG. 4 is a flow chart based on YOLOX model training.

Fig. 5 is a block diagram of the CBAM attention mechanism.

FIG. 6 is a schematic diagram of accuracy of dataset detection based on a modified YOLOX network model.

Detailed Description

The invention is described in further detail below with reference to the accompanying drawings.

In order to enhance the understanding of the present invention, the present invention will be further described in detail with reference to the accompanying drawings.

As shown in fig. 1-3, the invention provides a target detection method based on YOLOX light weight, which comprises the following steps:

s1, preparing a data set required during training in a target detection task, wherein the data set comprises target samples of different categories, and meanwhile, taking original images of targets in different scenes and under different illumination conditions into consideration, and ensuring that the model can realize accurate target detection in various environments by preprocessing the original images;

the quality and number of datasets are critical to the training and performance of the model. Thus, a dataset associated with the object detection task needs to be selected and preprocessed and cleaned for subsequent model training and evaluation.

Training an original target detection model by using a data set, and recording performance indexes of the model, wherein the performance indexes are accuracy, recall rate and precision, the accuracy represents positive sample proportion accurately predicted by the model, the recall rate represents positive sample proportion accurately detected by the model, the precision represents all sample proportion accurately predicted by the model, and meanwhile, the learning rate and weight attenuation in the training process are adjusted to improve the performance of the model;

s3, performing pruning operation on the original YOLOX neural network model, and generating a pruned improved YOLOX network model;

The method comprises the steps of carrying out pruning operation on an original network model based on a preset pruning strategy to reduce parameters and calculated amount in the model so as to realize light weight, carrying out training again on the pruned model, evaluating performance indexes of the pruned model so as to verify the influence of the pruning operation on the performance of the model, and carrying out pruning in a target detection task to remove some redundant network structures so as to reduce calculated amount and improve the reasoning speed of the model;

S4, training and improving YOLOX a network model on the data set, and improving the performance index of the detection precision evaluation model;

In order to improve the performance of the target detection model, a channel-space attention mechanism CBAM module is inserted between a main layer and a data enhancement layer of the original lightweight model, so that the model can automatically pay attention to important features related to target detection when processing images, and the attention degree and the anti-interference capability of a network are improved. The original BCE cross entropy loss function is replaced by VariFocal Loss loss function, so that the classification performance of the model and the identification capability of difficult samples are improved. VariFocal Loss introducing variable key parameters into the loss function, so that the model is more concerned with samples which are difficult to classify, thereby improving the classification performance of the model, training the improved model on a data set, and recording and evaluating the performance index of the model;

S5, pruning operation is performed on the improved YOLOX network model, and the improved model is further adjusted to achieve higher detection precision and speed, so that in a target detection task, the light weight and efficient deployment of the model are of great importance, because the speed and the precision of the model are directly influenced by the model. The improved network model is subjected to pruning operation, so that parameters and calculation amount of the model are further reduced, rapid and efficient model deployment is realized, meanwhile, the pruned model can be more suitable for scenes with limited resources such as embedded equipment, and the universality and portability of the model are improved;

and S6, verifying and analyzing the improved pruned model to ensure that the improved pruned model can meet the performance requirements of the target detection task, and if the improved pruned model cannot meet the requirements, adjusting the improved model until the performance requirements are met.

The S4 process of using the modified YOLOX network model includes the following steps:

s41, in order to enhance the feature extraction capability of the improved YOLOX network model on targets with different scales and inhibit the interference of redundant information, a means of inserting a channel-space attention mechanism CBAM module between a trunk layer and a data enhancement layer channel is adopted. The CBAM module can adaptively learn the importance of channels and spaces in the feature map and weight and adjust the feature map accordingly, so that the network is more concerned with feature information contributing to target identification and positioning.

S42, in order to solve the problem of sample class imbalance, a means of replacing the original BCE cross entropy loss function with VariFacalLoss loss functions is adopted. VariFocalLoss can add different weights to positive and negative samples, so that the model is more concerned with samples which are difficult to classify, the sample types are balanced better in the training process, and the classification accuracy of the model is improved.

S43, training the improved YOLOX network model to optimize the performance of the network model. In the training process, the network continuously adjusts the weight and the bias through a back propagation algorithm to minimize the value of the loss function, thereby improving the classification and positioning capability of the model.

S44, in order to reduce the model volume and the calculation amount and facilitate the end-to-end real-time target detection on the mobile device, a means of performing pruning operation on the trained model is adopted. The pruning operation may remove unnecessary connections and parameters, thereby reducing the volume and computation of the model while maintaining the accuracy of the model. The pruned model needs to be trained again and the performance index of the pruned model is evaluated so as to ensure that the pruned model can still maintain high-precision target detection capability.

In the step S43, the process of training the network model comprises the following steps:

s431, collecting a data set, and calibrating the original data set;

s432, dividing the picture into a data set and a verification set according to the proportion of 8:2;

s433, inputting the data set into YOLOX neural network, inputting the predicted value and the true value of the network into a loss function, solving the loss value, and updating the neural network parameters according to a gradient descent method;

and S434, inputting the verification set into YOLOX networks for verification after each round of parameter updating, and calculating the loss degree of the verification machine.

S435, after each iteration is carried out twice, inputting the pictures in the data set into the trained model to obtain the precision of the model;

And S436, repeating the steps until the epoch reaches 300 rounds and then ending, wherein the model is converged.

The pruning process for the network model in the step S3 comprises the following steps:

And S31, determining a network layer to be pruned. Before model pruning, the network model needs to be analyzed and evaluated to determine which network layers are pruned. This process requires consideration of the number of parameters, the amount of computation, and the degree of contribution to the overall performance of the model for each network layer in the model. Therefore, the calculated amount and the volume of the model can be reduced to the greatest extent, and meanwhile, the detection accuracy of the model is ensured not to be excessively reduced.

And S32, carrying out weight sorting on the network layer to be pruned, wherein the weight sorting of the network layer is one of important steps of pruning operation. Before pruning, all parameters in the network need to be ordered to determine which parameters contribute less to the performance of the network and which parameters contribute more to the performance, and then the network layer with less performance contribution is selected for pruning.

S33, determining the threshold value according to the weight. By weighting the network layers and thresholding, unnecessary or redundant portions of the network can be identified and pruning can then be performed.

And S34, pruning the part lower than the threshold value to obtain new parameters. After determining the pruning threshold, pruning is performed on network parameters below the threshold. Specifically, the weights in the network are compressed, redundant and unnecessary parameters are removed, and therefore the purposes of reducing network storage and calculation amount are achieved.

And S35, storing new model parameters and weights, and generating a pruned model. The number of the model parameters after pruning is reduced, the calculated amount is correspondingly reduced, the reasoning speed and efficiency of the model can be obviously improved, and the real-time target detection can be realized in the scene with limited resources such as embedded equipment.

The invention is further described in connection with the relevant background art and the implementation steps:

In the step S1, a picture using a common data set is selected, wherein the picture includes VOC2007 and VOC2012 data sets, including 21143 training images and corresponding xml files.

In the step S2, performance evaluation indexes including model accuracy mAP and model size parameters (M) are recorded as criteria for subsequent performance evaluation.

YOLOX is the latest target detection algorithm of the YOLO series, not only realizes the detection precision exceeding the previous YOLO series, but also achieves the effect of extremely competitive in the end-to-end reasoning speed, however, when YOLOX is deployed on embedded equipment, the problems of large model volume, high floating point number operation amount, poor instantaneity and the like exist, and in order to solve the problems, unnecessary energy consumption caused by model pre-training is avoided, so that the light-weight method of YOLOX is provided.

(1) Pruning YOLOX network model

Reading a network model obtained by YOLOX training, saving model weight and structure, determining a convolution layer to be cut when the model size is 3797KB, determining a layer to be cut as a C3_p4 layer, a C3_n3 layer, a reduce_conv1 layer and a bu_conv2 layer in the data enhancement layer, respectively sequencing the weight sizes in each layer, multiplying the maximum weight value by a set pruning rate (40%) to serve as a pruning weight threshold value, resetting the neuron weight lower than the threshold value to 0 and a weight neuron higher than the threshold value, saving new parameters and weights, and generating a new model structure after pruning;

The original model is used for directly carrying out light pruning, although the model can be compressed, the detection speed is improved, no difficulty exists in the process of downloading the embedded board card, the performance evaluation result of the model can be directly and greatly reduced, and therefore the network model needs to be perfected and improved, and the attention mechanism is introduced and meanwhile a loss function is replaced;

(1) Improving and optimizing YOLOX network model

In step S41, as shown in FIG. 5, the fused convolution block attention mechanism CBAM is a combination of the channel attention mechanism and the spatial attention mechanism. Firstly, carrying out average pooling and maximum pooling operation on an input feature map through a channel attention mechanism respectively to realize space information of an aggregate feature map, generating average pooling feature Favg and maximum pooling feature Fmax, applying a shared network to each feature through a shared network layer, carrying out element summation on the average pooling feature and the maximum pooling feature, and outputting channel attention mapping Mc through Sigmiod activation functions on the combined features. The spatial attention carries out the operations of average pooling and maximum pooling on the feature images along the channel axis, so that the feature images are compressed in the channel dimension, and the two feature images are spliced in the channel dimension to generate an effective feature image, and then the effective feature image passes through a convolution layer of 7X 7. Finally, the final channel attention map Ms is obtained by Sigmiod function operations.

As shown in fig. 4, the network structure of YOLOX is improved. In the cross-phase local network CSPNet layer, the channel attention mechanism input portion of connection CBAM, the spatial attention mechanism output portion of CBAM, the data enhancement layer is connected.

The attention mechanism is introduced, a channel-space attention mechanism module is inserted at the joint of the feature extraction and enhancement layers, effective features are screened out from the channel dimension and the space dimension respectively, irrelevant features are restrained, the expression capability of the features is enhanced, and the recognition accuracy of the model is improved.

In step S43, in YOLOX target detection, the cross entropy loss function has a problem of extreme imbalance between the target class and the background class. The problem of unbalance between the target class and the background class can be effectively solved by using the Focal loss. The Focal loss formula is as follows:

The method comprises the steps of (1) setting p as the prediction probability of a target class, setting the range as [ -1,1], setting y as the true positive and negative sample class, setting the value as 1 or-1, setting alpha as an adjustable scale factor, (1-p) setting the beta power of the target modulation class factor, setting the beta power of the p as the background modulation factor, and setting the two modulation factors to reduce the contribution of a simple sample and increase the importance of a false detection sample. The Focal loss can solve the problem of class imbalance during training by using a weighting method.

The positive and negative samples are processed in an equal mode, and in actual detection, the contribution of the positive samples is more important, so that the Focal loss is further improved, varifocal loss is based on cross entropy binary, and the problem of class unbalance in training is processed in a Focal loss weighting mode is referred. The cross entropy binary formula is:

Wherein p is a predicted value representing a target score q is a classification condition, and for a target class, the positive sample class q value is set to a value between a pre-selected box and IoU, otherwise, is set to 0. For the background class, the target q value for all classes is 0. As shown in the above equation Varifocal loss uses the p's β -th power scaling factor to process negative samples, but not positive samples. This can highlight the contribution of positive samples.

(3) Improved target detection network model

The invention relates to three parts of an attention mechanism, a loss function and model pruning. And a attention mechanism is introduced between the trunk feature extraction layer and the data enhancement layer, so that the network has the capability of concentrating on inputting the features of the network, and better detection precision is obtained. In the loss function part, the BCE cross entropy loss function is replaced by VariFocalLoss functions, so that the attention of difficult samples in the data set is improved, and the sample balance is realized. The improved target detection network model is pruned, so that the light weight of YOLOX network models is realized, and the problems of large model volume, high floating point number operation amount and poor real-time performance existing in the deployment of the original YOLOX network on embedded equipment are solved. (as shown in FIG. 6)

In summary, the invention mainly solves the technical problems of large model volume, high floating point number operation and poor real-time performance existing in the prior YOLOX when deployed on embedded equipment, and the problem of reduced evaluation performance after pruning of the prior YOLOX model;

The above disclosure is only an example of the present invention and it is not intended to limit the scope of the claims, and those skilled in the art will understand the procedures for implementing the above examples and make equivalent changes according to the claims of the present invention.

Claims

1. A method for lightweighting and network optimization based on YOLOX, characterized by comprising the following steps;

S1: In the target detection task, prepare the dataset required for training; the dataset selects original images of the target in different scenes and different lighting conditions, and preprocesses the original images to ensure that the model can achieve accurate target detection in various environments;

S2: Train the original YOLOX neural network model on the dataset, and record and evaluate the performance metrics of the model;

S3: Perform pruning operations on the original YOLOX neural network model to generate an improved YOLOX network model after pruning;

S4: Train the pruned improved YOLOX network model on the dataset;

S5: Perform pruning operations on the improved YOLOX network model and make further adjustments to achieve higher detection accuracy and speed;

S6: Verify and analyze the improved YOLOX network after pruning. If it meets the performance requirements, perform detection analysis on the target. If it does not meet the performance requirements, adjust the improved model until it meets the performance requirements.

The step S4 improves the YOLOX network model, including the following steps:

(1) Insert the channel-spatial attention mechanism CBAM module between the yolox backbone layer and the data enhancement layer channel;

First, through the channel attention mechanism, the input feature map is subjected to average pooling and maximum pooling operations respectively to realize the spatial information of the aggregated feature map. The generated average pooling feature F _avg and maximum pooling feature F _max are passed through the shared network layer. After the shared network is applied to each feature, the average pooling feature and the maximum pooling feature are element-wise summed, and the merged features are output as the channel attention map Mc through the Sigmiod activation function; spatial attention performs average pooling and maximum pooling operations on the feature map along the channel axis to compress the feature map in the channel dimension, and the two feature maps are spliced in the channel dimension to generate a valid feature map, which is then passed through a 7X7 convolution layer; finally, the final channel attention map Ms is obtained through the Sigmiod function operation;

(2) Replace the original BCE cross entropy loss function with the VariFacalLoss loss function; in the process of replacing the BCE cross entropy loss function with the VariFocalLoss loss function, it is necessary to modify the output layer of the model. The VariFocalLoss loss function introduces a learnable exponent γ and modifies the weight adjustment term in the calculation formula of the loss function, thereby paying more attention to the learning of difficult samples. In the improved YOLOX network model, the output layer usually includes a classification branch and a regression branch. In the classification branch, each target needs to be classified, and in the regression branch, the position information of each target needs to be regressed. The output of the classification branch is first processed by the sigmoid function and then converted into a predicted probability. The VariFocalLoss loss function is calculated based on this probability.

(3) Train the improved YOLOX network model;

(4) Perform pruning operations on the trained model and evaluate the performance indicators of the pruned model.

2. The method for lightweighting and network optimization based on YOLOX according to claim 1, wherein in step S1, preparing the data set required for training comprises the following steps:

(1) Data collection: The PASCAL VOC dataset, including JPEGImages, ImageSets, and Annotations, was collected from the official website of the open dataset. JPEGImages contains the training dataset, ImageSets contains the train.txt, trainval.txt, and val.txt files of each type, and Annotations contains the XML files of each type.

(2) Data preprocessing: Data preprocessing is to preprocess the collected original images to make them suitable for model training; first, the original images are resized to a specified size for subsequent processing; next, the color images are converted to grayscale images, and finally, the pixel values in the images are scaled to between 0 and 1.

3. The method for lightweighting and network optimization based on YOLOX according to claim 1, wherein step S2 trains the original YOLOX neural network model on the data set, comprising the following steps:

(1) Using the dataset preprocessed in step S1;

(2) Divide the dataset into training set and validation set;

(3) Input the preprocessed data set into the original YOLOX neural network model, and input the network's predicted value pred and the true value gt into the loss function L. The loss value is calculated using the following formula

Loss = L(pred,gt)

Where L represents the loss function, pred represents the predicted value of the network output, and gt represents the true value. The network parameters are optimized according to the loss function L, and the gradient descent method is used to update the neural network parameters. If the current neural network parameters are θ, the update formula is:

Among them, η represents the learning rate, represents the gradient of the loss function L with respect to the parameter θ, θ represents the parameter value at the t-th time step, and θ _t+1 represents the parameter value at the t+1-th time step. The neural network parameters are updated through multiple iterations;

(4) After a round of parameter update, the model is tested using the validation set to verify the generalization ability of the model. The validation set is input into the YOLOX network to calculate the loss metric between the predicted results and the true results, the validation set loss. Assuming the validation set size is N, the predicted box of the i-th sample is p _i , and the true box is t _i , the validation set loss L is calculated as follows:

Among them, S is the number of prediction boxes for each grid, C is the number of target categories, and They represent the predicted value and true value of the cth category in the jth grid of the i-th sample, respectively. and They respectively represent the predicted value and true value of whether the target exists in the j-th grid of the i-th sample, and Represents the confidence prediction value and true value in the jth grid of the i-th sample, respectively. pos _ij represents the index set of the predicted box with the maximum intersection-over-union ratio with the true box in the jth grid of the i-th sample. and are two weight coefficients used to balance the weights of grids with targets and grids without targets;

(5) After two iterations, the images in the dataset are input into the optimized YOLOX object detection network for training to obtain the accuracy of the model;

(6) Repeat the above steps until the training is completed.

4. The method for lightweighting and network optimization based on YOLOX according to claim 1, wherein the step S3 performs a pruning operation on the network trained in step S2, comprising the following steps:

(1) Determine the layers to be pruned based on the importance index of the network layer; for each network layer, calculate the sensitivity of the network layer to the forward propagation of the model. Under the given input, calculate the partial derivative of the change in the output with respect to the weight of the layer, thereby obtaining the sensitivity of the layer to the output. The greater the sensitivity, the greater the impact of the layer on the output, and the layer should be given priority when pruning;

(2) Sort the weights of the network layers to be pruned;

(3) Determine the threshold value based on the weight ranking results and pruning rate in each pruning layer;

(4) Eliminate the weights below the threshold in the network and retain the weights above the threshold;

(5) Save the new model parameters and weights to generate the pruned improved YOLOX network model.

5. The method for lightweighting and network optimization based on YOLOX according to claim 1, wherein step S6 comprises the following steps:

First, it is necessary to evaluate the performance of the improved YOLOX network model on unseen data. If the model cannot meet the performance requirements, adjustments and improvements are made. By adjusting the training learning rate and batch size hyperparameters, the model training process is further optimized. After adjusting the model, it is necessary to retrain and verify. This process requires multiple iterations until a model that meets the performance requirements is achieved. Finally, if the improved pruned model can meet the performance requirements, the target is detected and analyzed.