CN116306813A - Method based on YOLOX light weight and network optimization - Google Patents
Method based on YOLOX light weight and network optimization Download PDFInfo
- Publication number
- CN116306813A CN116306813A CN202310212335.9A CN202310212335A CN116306813A CN 116306813 A CN116306813 A CN 116306813A CN 202310212335 A CN202310212335 A CN 202310212335A CN 116306813 A CN116306813 A CN 116306813A
- Authority
- CN
- China
- Prior art keywords
- model
- yolox
- network
- training
- pruning
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/082—Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/28—Quantising the image, e.g. histogram thresholding for discrimination between background and foreground patterns
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- General Engineering & Computer Science (AREA)
- Molecular Biology (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
Abstract
Description
技术领域technical field
本发明属于图像中目标检测技术领域,具体涉及一种基于YOLOX轻量化及网络优化的方法。The invention belongs to the technical field of object detection in images, and in particular relates to a YOLOX-based lightweight and network optimization method.
背景技术Background technique
目标检测问题是确定目标在给定图像中的位置以及每个目标所属的类别(即目标定位和目标分类)。现如今,目标检测技术已经在农业、医疗、自动化生产等方面得到了重要应用。目标检测主要采用深度学习的方法,基于深度学习的目标检测方法主要分为两类。一类是基于候选区域的双阶段目标检测算法,包括R-CNN、SPP-Net、Fast-RCNN、Faster-RCNN等。The object detection problem is to determine the location of objects in a given image and the class to which each object belongs (i.e., object localization and object classification). Nowadays, target detection technology has been widely used in agriculture, medical treatment, and automated production. Target detection mainly adopts the method of deep learning, and the target detection methods based on deep learning are mainly divided into two categories. One is a two-stage target detection algorithm based on candidate regions, including R-CNN, SPP-Net, Fast-RCNN, Faster-RCNN, etc.
由于这些双阶段算法需生成大量的候选区域,因此目标检测时相对消耗时间较长,它需要首先生成大量的候选区域,然后对这些候选区域进行目标分类和定位。这个过程需要耗费大量的计算资源和时间,尤其是在高分辨率图像和复杂场景下,相对的速度缓慢。相对于双阶段算法,单阶段算法只需要在图像中密集采样,然后对采样点直接进行目标检测,因此不需要生成大量的候选区域,检测速度更快,且适用于实时目标检测。因此单阶段算法在实际应用中更受欢迎。Since these two-stage algorithms need to generate a large number of candidate regions, the target detection takes a relatively long time. It needs to first generate a large number of candidate regions, and then perform target classification and positioning on these candidate regions. This process consumes a lot of computing resources and time, especially in high-resolution images and complex scenes, which is relatively slow. Compared with the two-stage algorithm, the single-stage algorithm only needs to densely sample the image, and then directly perform target detection on the sampling points, so it does not need to generate a large number of candidate regions, the detection speed is faster, and it is suitable for real-time target detection. Therefore, the single-stage algorithm is more popular in practical applications.
YOLOX作为优秀的单目标检测算法,可以回归物体类别的概率和位置坐标,算法速度快。但是直接利用YOLOX对物体进行检测时,不可避免会存在一些问题。YOLOX模型较大,推理速度较慢,因此需要实现其轻量化,在YOLOX网络实现轻量化的同时,需要保持原有精度不受损失,因此需要在轻量化后,通过网络优化来维持其检测精度。As an excellent single-target detection algorithm, YOLOX can return the probability and position coordinates of object categories, and the algorithm is fast. However, when directly using YOLOX to detect objects, there will inevitably be some problems. The YOLOX model is large and the reasoning speed is slow, so it needs to be lightweight. While the YOLOX network is lightweight, it is necessary to maintain the original accuracy without loss. Therefore, it is necessary to maintain its detection accuracy through network optimization after lightweighting. .
发明内容Contents of the invention
为了克服上述现有技术存在的不足,本发明的目的在于提供一种基于YOLOX轻量化及网络优化的方法,解决了在目标检测中模型部署的局限性以及轻量化后模型性能评估中精度下降的问题,在目标检测中具有较高的检测精度和速度,更易于在实际应用场景中进行部署和集成,也使得模型的推理过程更加高效和稳定。In order to overcome the deficiencies of the above-mentioned prior art, the object of the present invention is to provide a method based on YOLOX lightweight and network optimization, which solves the limitations of model deployment in target detection and the problem of decreased accuracy in model performance evaluation after lightweight It has high detection accuracy and speed in target detection, is easier to deploy and integrate in actual application scenarios, and also makes the model's reasoning process more efficient and stable.
为了实现上述目的,本发明采用的技术方案是:In order to achieve the above object, the technical scheme adopted in the present invention is:
一种基于YOLOX轻量化及网络优化的方法,包括以下步骤;A method based on YOLOX lightweight and network optimization, comprising the following steps;
S1:在目标检测任务中,准备训练时所需的数据集;所述数据集选取在不同场景、不同光照条件下的原始图像,通过对原始图像的预处理,以确保模型能够在各种环境下实现准确的目标检测;S1: In the target detection task, prepare the data set required for training; the data set selects the original image in different scenes and different lighting conditions, and preprocesses the original image to ensure that the model can be used in various environments achieve accurate target detection;
S2:在所述数据集上训练原始YOLOX神经网络模型,记录和评估模型的性能指标;S2: Train the original YOLOX neural network model on the data set, record and evaluate the performance indicators of the model;
S3:对原始YOLOX神经网络模型执行剪枝操作,生成剪枝后的改进YOLOX网络模型;S3: Perform a pruning operation on the original YOLOX neural network model to generate an improved YOLOX network model after pruning;
S4:在数据集上训练生成剪枝后的改进YOLOX网络模型;S4: Training and generating the improved YOLOX network model after pruning on the dataset;
S5:对改进YOLOX网络模型执行剪枝操作,进行进一步调整,以达到更高的检测精度和速度;S5: Perform a pruning operation on the improved YOLOX network model, and make further adjustments to achieve higher detection accuracy and speed;
S6:对改进剪枝后的改进YOLOX网络进行验证和分析,若能满足性能上的要求,则对目标进行检测分析;如不能满足性能要求,则调整改进模型,直至满足性能要求为止。S6: Verify and analyze the improved YOLOX network after pruning. If the performance requirements can be met, the target will be detected and analyzed; if the performance requirements cannot be met, the improved model will be adjusted until the performance requirements are met.
所述步骤S1中,准备训练时所需的数据集,包括以下步骤:In the step S1, preparing the required data set for training includes the following steps:
(1)收集数据:从开放的数据集官网中收集PASCAL VOC数据集,包括JPEGImages、ImageSets和Annotations;其中JPEGImages包含训练的数据集,ImageSets包含每种类型的train.txt、trainval.txt和val.txt文件,Annotations包含每一类的xml文件;(1) Collect data: collect PASCAL VOC datasets from the official website of open datasets, including JPEGImages, ImageSets and Annotations; where JPEGImages contains training datasets, and ImageSets contains each type of train.txt, trainval.txt and val. txt file, Annotations contains each type of xml file;
(2)数据预处理:数据预处理是对收集到的原始图像进行预处理,使其适合于模型训练;首先,将原始图像调整为指定的大小,以便于后续处理,选取的目标检测原始图像大小为416x416像素;接下来,将彩色图像转换为灰度图像,减少数据存储和处理的复杂度,并且可以减少模型训练的时间和计算量,最后,将图像中的像素值缩放到0到1之间,使得数据在处理过程中更加稳定,同时可以减小训练过程中的梯度爆炸和梯度消失问题。(2) Data preprocessing: Data preprocessing is to preprocess the collected original images to make them suitable for model training; first, the original images are adjusted to the specified size for subsequent processing, and the selected target detection original images The size is 416x416 pixels; next, the color image is converted to a grayscale image, which reduces the complexity of data storage and processing, and can reduce the time and calculation of model training. Finally, the pixel value in the image is scaled to 0 to 1 Between, making the data more stable in the process of processing, and can reduce the problem of gradient explosion and gradient disappearance in the training process.
所述步骤S2在数据集上训练原始YOLOX神经网络模型,包括下列步骤:Said step S2 trains the original YOLOX neural network model on the data set, comprising the following steps:
(1)采用步骤S1预处理后的数据集;(1) adopt the data set after step S1 preprocessing;
(2)将数据集按照8:2的比例划分为训练集和验证集;(2) Divide the data set into training set and verification set according to the ratio of 8:2;
(3)将经过预处理后的数据集输入到原始YOLOX神经网络模型中,网络的预测值pred与真实值gt输入到损失函数L中,通过以下公式求取损失值(3) Input the preprocessed data set into the original YOLOX neural network model, input the predicted value pred and the real value gt of the network into the loss function L, and calculate the loss value by the following formula
Loss=L(pred,gt)Loss=L(pred,gt)
其中,L表示损失函数,pred表示网络输出的预测值,gt表示真实值,根据损失函数L对网络参数进行优化,使用梯度下降法更新神经网络参数,设当前神经网络参数为θ,则更新公式为:Among them, L represents the loss function, pred represents the predicted value of the network output, gt represents the real value, optimize the network parameters according to the loss function L, use the gradient descent method to update the neural network parameters, set the current neural network parameters to θ, then update the formula for:
其中,η表示学习率,表示损失函数L对参数θ的梯度,θ表示第t个时间步的参数值,θt+1表示第t+1个时间步的参数值,通过多次迭代更新神经网络参数,优化网络性能,提高目标检测的准确率和速度;Among them, η represents the learning rate, Represents the gradient of the loss function L to the parameter θ, θ represents the parameter value of the t-th time step, θ t+1 represents the parameter value of the t+1-th time step, and updates the neural network parameters through multiple iterations to optimize network performance. Improve the accuracy and speed of target detection;
(4)通过一轮参数更新后,需要使用验证集对模型进行检验,以验证模型的泛化能力,具体而言,将验证集输入到YOLOX网络中,计算预测结果与真实结果之间的损失度量,也就是验证集损失度,设验证集大小为N,第i个样本的预测框为pi,真实框为ti,则验证集损失度L可以计算如下:(4) After a round of parameter updating, the model needs to be tested using the verification set to verify the generalization ability of the model. Specifically, the verification set is input into the YOLOX network to calculate the loss between the predicted result and the real result The measurement, that is, the loss degree of the verification set, assuming that the size of the verification set is N, the prediction frame of the i-th sample is p i , and the real frame is t i , then the loss degree L of the verification set can be calculated as follows:
其中,S是每个格子预测框的数量,C是目标类别数,和/>分别表示第i个样本的第j个格子中第c个类别的预测值和真实值,/>和/>分别表示第i个样本的第j个格子是否存在目标的预测值和真实值,/>和/>分别表示第i个样本的第j个格子中的置信度预测值和真实值,posij表示第i个样本的第j个格子中与真实框有最大交并比的预测框的索引集合/>和/>是两个权重系数,用于平衡存在目标的格子和不存在目标的格子的权重;Among them, S is the number of prediction boxes for each grid, C is the number of target categories, and /> respectively represent the predicted value and actual value of the c-th category in the j-th grid of the i-th sample, /> and /> Respectively indicate whether there is a predicted value and a real value of the target in the j-th grid of the i-th sample, /> and /> respectively represent the confidence prediction value and the real value in the jth grid of the i-th sample, pos ij represents the index set of the prediction frame with the largest intersection and union ratio with the real frame in the j-th grid of the i-th sample /> and /> are two weight coefficients, which are used to balance the weights of grids with targets and grids without targets;
通过计算验证集损失度,可以评估当前模型的表现,如果损失度较高,则需要继续训练,直到达到预定的停止条件;By calculating the loss degree of the verification set, the performance of the current model can be evaluated. If the loss degree is high, the training needs to be continued until the predetermined stop condition is reached;
(5)每迭代两次,将数据集中的图片,输入到经过优化的YOLOX目标检测网络中进行训练,获得模型的精度,优化后的YOLOX目标检测网络具有更高的检测精度和更快的检测速度;(5) Every iteration twice, input the pictures in the data set into the optimized YOLOX target detection network for training to obtain the accuracy of the model. The optimized YOLOX target detection network has higher detection accuracy and faster detection speed;
(6)重复以上步骤直到训练结束。(6) Repeat the above steps until the training ends.
所述步骤S3中对步骤S2中训练得到的网络执行剪枝操作,包括下列步骤:In the step S3, the pruning operation is performed on the network trained in the step S2, including the following steps:
(1)根据网络层的重要性指标,确定要剪枝的层;对于每个网络层,计算一个网络层在模型上前向传播的敏感程度,在给定输入的情况下,计算输出的变化量对该层权重的偏导数,从而得到该层对输出的敏感度,敏感度越大,则该层对输出的影响越大,剪枝时应该优先考虑;(1) According to the importance index of the network layer, determine the layer to be pruned; for each network layer, calculate the sensitivity of a network layer on the forward propagation of the model, and calculate the output change under the given input Quantify the partial derivative of the weight of the layer to obtain the sensitivity of the layer to the output. The greater the sensitivity, the greater the impact of the layer on the output, which should be given priority when pruning;
(2)对要剪枝的网络层分别进行权重排序;(2) Sorting the weights of the network layers to be pruned;
(3)根据每个剪枝层中权重排序结果和剪枝率,确定阈值;(3) Determine the threshold according to the weight sorting result and the pruning rate in each pruning layer;
(4)剔除网络中低于阈值的权重,保留高于阈值的权重;(4) Eliminate weights below the threshold in the network and retain weights above the threshold;
(5)保存新的模型参数与权重,生成剪枝后的改进YOLOX网络模型。(5) Save the new model parameters and weights, and generate the improved YOLOX network model after pruning.
所述步骤S4改进YOLOX网络模型,包括下列步骤:Described step S4 improves YOLOX network model, comprises the following steps:
(1)在yolox主干层和数据增强层通道之间插入通道-空间注意力机制CBAM模块;(1) Insert the channel-spatial attention mechanism CBAM module between the yolox backbone layer and the data enhancement layer channel;
CBAM模块是通道-空间注意力机制的一种实现,可以有效提高模型的精度;其主要包含两个部分:通道注意力和空间注意力;The CBAM module is an implementation of the channel-spatial attention mechanism, which can effectively improve the accuracy of the model; it mainly includes two parts: channel attention and spatial attention;
首先通过通道注意力机制,将输入的特征图分别进行平均池化和最大池化操作,实现聚合特征图的空间信息,生成的平均池化特征Favg和最大池化特征Fmax通过共享网络层,将共享网络应用到每个特征后,将平均池化特征和最大池化特征进行元素求和,并将合并的特征通过Sigmiod激活函数输出通道注意映射Mc;空间注意力沿着通道轴对特征图进行平均池化和最大池化操作,使特征图在通道维度上进行压缩,并将两个特征图在通道维度上进行拼接生成一个有效的特征图,随后经过7X7的卷积层;最后,通过Sigmiod函数操作得到最终的通道注意映射Ms;First, through the channel attention mechanism, the input feature map is subjected to average pooling and maximum pooling operations respectively to realize the spatial information of the aggregated feature map, and the generated average pooling feature F avg and maximum pooling feature F max pass through the shared network layer , after applying the shared network to each feature, the average pooled features and the maximum pooled features are element-wise summed, and the combined features are output through the Sigmiod activation function. The channel attention map Mc; the spatial attention is aligned with the feature along the channel axis The average pooling and maximum pooling operations are performed on the graph, so that the feature map is compressed in the channel dimension, and the two feature maps are spliced in the channel dimension to generate an effective feature map, and then go through a 7X7 convolutional layer; finally, Obtain the final channel attention map Ms through the Sigmiod function operation;
(2)将原有的BCE交叉熵损失函数替换为VariFacalLoss损失函数;在替换BCE交叉熵损失函数为VariFocalLoss损失函数的过程中,需要修改模型的输出层,VariFocalLoss损失函数会引入一个可学习的指数γ,并且将损失函数的计算公式中的权重调整项进行了修改,从而更加注重难样本的学习,因此,需要对输出层进行相应的修改以适应这种变化,在改进YOLOX网络模型中,输出层通常包括分类分支和回归分支,在分类分支中,需要对每个目标进行分类,而在回归分支中,需要对每个目标进行位置信息的回归,为了适应VariFocalLoss损失函数的计算,需要对每个目标在分类分支中的预测结果进行处理,具体而言,需要将分类分支的输出先进行sigmoid函数的处理,然后再将其变成预测概率,根据这个概率来计算VariFocalLoss损失函数,在回归分支中,由于VariFocalLoss损失函数只对分类分支进行了修改,因此回归分支的计算方式不需要进行变化;(2) Replace the original BCE cross-entropy loss function with the VariFacalLoss loss function; in the process of replacing the BCE cross-entropy loss function with the VariFocalLoss loss function, the output layer of the model needs to be modified, and the VariFocalLoss loss function will introduce a learnable index γ, and the weight adjustment item in the calculation formula of the loss function has been modified to pay more attention to the learning of difficult samples. Therefore, the output layer needs to be modified accordingly to adapt to this change. In the improved YOLOX network model, the output The layer usually includes a classification branch and a regression branch. In the classification branch, each target needs to be classified, and in the regression branch, the location information of each target needs to be regressed. In order to adapt to the calculation of the VariFocalLoss loss function, each The prediction results of each target in the classification branch are processed. Specifically, the output of the classification branch needs to be processed by the sigmoid function first, and then it is converted into a prediction probability. According to this probability, the VariFocalLoss loss function is calculated. In the regression branch In , since the VariFocalLoss loss function only modifies the classification branch, the calculation method of the regression branch does not need to be changed;
(3)对改进YOLOX网络模型进行训练;(3) Training the improved YOLOX network model;
(4)对训练好的模型执行剪枝操作,并评估模型剪枝后的性能指标;(4) Perform a pruning operation on the trained model, and evaluate the performance index of the model after pruning;
所述注意力机制模块通过输入特征的空间或通道维度挖掘更多可用信息进行加权处理,增强特征空间和通道维度的感知能力,使网络具备专注输入其特征的能力,获得更好的检测精度。The attention mechanism module mines more available information through the space or channel dimension of the input feature for weighting processing, enhances the perception ability of the feature space and channel dimension, enables the network to have the ability to focus on inputting its features, and obtains better detection accuracy.
使用VariFocalLoss代替原损失函数中的交叉熵,可以提高正负样本权重,加快模型收敛速度。Using VariFocalLoss to replace the cross entropy in the original loss function can increase the weight of positive and negative samples and speed up the convergence of the model.
所述步骤S6包括以下步骤;The step S6 includes the following steps;
首先,需要评估改进YOLOX网络模型在未见过的数据上的性能,如果模型无法满足性能要求,进行调整改进,通过调整训练的学习率、批次大小的超参数,来进一步优化模型的训练过程,调整模型后,需要重新进行训练和验证,这个过程需要多次迭代,直到达到满足性能要求的模型,最后,如果改进剪枝后的模型能够满足性能要求,对目标进行检测分析,目标检测是指在图像或视频中检测出目标的位置和类别,通过部署改进剪枝后的模型,实现更快速、准确的目标检测,提升实际应用中的效率和精度。First of all, it is necessary to evaluate and improve the performance of the YOLOX network model on unseen data. If the model cannot meet the performance requirements, adjust and improve, and further optimize the training process of the model by adjusting the training learning rate and the hyperparameters of the batch size. , after adjusting the model, it needs to be retrained and verified. This process requires multiple iterations until the model that meets the performance requirements is reached. Finally, if the improved pruned model can meet the performance requirements, the target is detected and analyzed. The target detection is It refers to the detection of the position and category of the target in the image or video. By deploying the improved pruned model, faster and more accurate target detection can be achieved, and the efficiency and accuracy in practical applications can be improved.
经过实验验证,本发明提出的方法在目标检测任务中取得了较高的检测精度和速度。相比于传统的目标检测方法,本发明提出的方法可以在保持高精度的前提下,大幅度提升检测速度。因此,本发明具有较高的实用性和经济效益。本发明的有益效果:Through experimental verification, the method proposed by the present invention has achieved higher detection accuracy and speed in the target detection task. Compared with the traditional target detection method, the method proposed by the present invention can greatly improve the detection speed under the premise of maintaining high precision. Therefore, the present invention has higher practicality and economic benefit. Beneficial effects of the present invention:
本发明提供了一种基于YOLOX轻量化及网络优化的方法,该方法在特征提取和增强层的连接处插入通道-空间注意力机制模块,以增强对不同尺度目标的特征提取能力和抑制冗余信息的干扰。同时,使用VariFocalLoss代替原损失函数中的交叉熵,并给正负样本加上权重值,控制正负样本对总损失函数值的共享权重,使得模型在训练过程中更专注于难分的样本,进而解决样本类别不均衡的问题。The present invention provides a method based on YOLOX lightweight and network optimization, which inserts a channel-spatial attention mechanism module at the connection between feature extraction and enhancement layers to enhance feature extraction capabilities for targets of different scales and suppress redundancy information interference. At the same time, VariFocalLoss is used to replace the cross entropy in the original loss function, and weight values are added to positive and negative samples to control the shared weight of positive and negative samples on the total loss function value, so that the model can focus more on difficult samples during training. In order to solve the problem of unbalanced sample categories.
尽管本发明中的优化策略已经有效地提高了目标检测精度,但是在移动设备上实现端到端的实时目标检测仍然需要进一步的优化。为了解决这一问题,本发明使用剪枝策略压缩模型体积,减少模型计算量,实现移动设备上端到端的实时目标检测。Although the optimization strategy in the present invention has effectively improved the object detection accuracy, further optimization is still needed to achieve end-to-end real-time object detection on mobile devices. In order to solve this problem, the present invention uses a pruning strategy to compress the model volume, reduce the calculation amount of the model, and realize end-to-end real-time target detection on the mobile device.
然而,剪枝操作可能会对模型精度造成影响,使得检测精度下降。因此,本发明在剪枝后对模型进行验证和分析,若能满足性能上的要求,则对目标进行检测分析;如不能满足性能要求,则调整改进模型,直至满足性能要求为止。这样,就能在保证模型轻量化的同时,保持较高的目标检测精度,解决原有YOLOX模型剪枝后的评估性能下降的问题。However, the pruning operation may affect the model accuracy, resulting in a decrease in detection accuracy. Therefore, the present invention verifies and analyzes the model after pruning. If the performance requirements can be met, the target is detected and analyzed; if the performance requirements cannot be met, the model is adjusted and improved until the performance requirements are met. In this way, it is possible to maintain a high target detection accuracy while ensuring the model is lightweight, and solve the problem of the evaluation performance degradation of the original YOLOX model after pruning.
因此,本发明综合应用了轻量化、网络优化、损失函数改进、剪枝等多种优化策略,有效地提高了移动设备上的端到端实时目标检测性能,解决了原有YOLOX在嵌入式设备上部署时仍存在模型体积大、浮点数运算量高、实时性不佳以及剪枝后精度下降的问题。Therefore, the present invention comprehensively applies multiple optimization strategies such as light weight, network optimization, loss function improvement, pruning, etc., effectively improves the end-to-end real-time target detection performance on mobile devices, and solves the problem of the original YOLOX in embedded devices. When deploying on the Internet, there are still problems such as large model size, high floating-point calculation, poor real-time performance, and decreased accuracy after pruning.
附图说明Description of drawings
图1是本发明一种基于YOLOX轻量化及网络优化的方法的流程示意图。Fig. 1 is a schematic flowchart of a method based on YOLOX lightweight and network optimization in the present invention.
图2是YOLOX网络训练的流程示意图。Figure 2 is a schematic flow diagram of YOLOX network training.
图3是基于YOLOX网络优化的模型结构示意图。Figure 3 is a schematic diagram of the model structure based on YOLOX network optimization.
图4是基于YOLOX模型训练的流程图。Figure 4 is a flow chart based on YOLOX model training.
图5是CBAM注意力机制的结构图。Figure 5 is a structural diagram of the CBAM attention mechanism.
图6是基于改进YOLOX网络模型对数据集检测的准确率示意图。Figure 6 is a schematic diagram of the accuracy of data set detection based on the improved YOLOX network model.
具体实施方式Detailed ways
下面结合附图对本发明作进一步详细说明。The present invention will be described in further detail below in conjunction with the accompanying drawings.
为了加深对本发明的理解,下面将结合附图对本发明做进一步详述。In order to deepen the understanding of the present invention, the present invention will be further described below in conjunction with the accompanying drawings.
如图1-图3所示:本发明提出一种基于YOLOX轻量化的目标检测方法,包括以下步骤:As shown in Figures 1-3: the present invention proposes a lightweight target detection method based on YOLOX, including the following steps:
S1在目标检测任务中,准备训练时所需的数据集;所述数据集应包括不同类别的目标样本,同时要考虑目标在不同场景、不同光照条件下的原始图像,通过对原始图像的预处理,以确保模型能够在各种环境下实现准确的目标检测;S1 In the target detection task, prepare the data set required for training; the data set should include different types of target samples, and at the same time, the original image of the target in different scenes and different lighting conditions should be considered. Processing to ensure that the model can achieve accurate object detection in various environments;
数据集的质量和数量对模型的训练和性能至关重要。因此,需要选择与目标检测任务相关的数据集,并对其进行预处理和清理,以便于后续的模型训练和评估。The quality and quantity of the dataset are critical to the training and performance of the model. Therefore, it is necessary to select a dataset related to the object detection task, and preprocess and clean it for subsequent model training and evaluation.
S2在所述数据集上训练原始YOLOX神经网络模型,记录和评估模型的性能指标;S2 trains the original YOLOX neural network model on the data set, records and evaluates the performance indicators of the model;
使用数据集,对原始的目标检测模型进行训练,并记录模型的性能指标,性能指标为准确率、召回率、精度;其中,准确率表示模型正确预测的正样本比例;召回率表示模型正确检测出的正样本比例;精度表示模型正确预测的所有样本比例;同时,对训练过程中的学习率和权重衰减进行调整,以提高模型的性能表现;在网络训练过程中,采用批量归一化技术,在每个批次中对输入数据进行标准化处理,加速模型的收敛速度和提高模型的泛化能力;Use the data set to train the original target detection model and record the performance indicators of the model. The performance indicators are accuracy rate, recall rate, and precision; where the accuracy rate indicates the proportion of positive samples that the model correctly predicts; the recall rate indicates that the model correctly detects Proportion of positive samples; accuracy means the proportion of all samples correctly predicted by the model; at the same time, adjust the learning rate and weight decay in the training process to improve the performance of the model; in the network training process, batch normalization technology is used , standardize the input data in each batch, accelerate the convergence speed of the model and improve the generalization ability of the model;
S3对原始YOLOX神经网络模型执行剪枝操作;生成剪枝后的改进YOLOX网络模型;S3 performs a pruning operation on the original YOLOX neural network model; generates an improved YOLOX network model after pruning;
通过剪枝技术减少模型中冗余参数,从而实现轻量化;基于预先设置的剪枝策略,对原始网络模型进行剪枝操作,以减少模型中的参数和计算量,从而实现轻量化;剪枝后的模型需要重新进行训练,并评估其性能指标,以验证剪枝操作对模型性能的影响,在目标检测任务中,剪枝用于去除一些冗余的网络结构,从而减少计算量,提高模型的推理速度;Reduce redundant parameters in the model through pruning technology, so as to achieve light weight; based on the preset pruning strategy, pruning operation is performed on the original network model to reduce the parameters and calculation amount in the model, so as to achieve light weight; pruning The final model needs to be retrained and its performance indicators should be evaluated to verify the impact of the pruning operation on the performance of the model. In the target detection task, pruning is used to remove some redundant network structures, thereby reducing the amount of calculation and improving the performance of the model. reasoning speed;
S4在数据集上训练改进YOLOX网络模型,提高检测精度评估模型性能指标;S4 trains and improves the YOLOX network model on the data set, improves the detection accuracy and evaluates the performance indicators of the model;
为了提高目标检测模型的性能表现,在原有轻量化模型的主干层和数据增强层之间插入通道-空间注意力机制CBAM模块,使得模型在处理图像时能够自动关注与目标检测相关的重要特征,提高网络的关注度和抗干扰能力。将原有的BCE交叉熵损失函数替换为VariFocal Loss损失函数,提高模型的分类性能和对难易样本的识别能力。VariFocalLoss在损失函数中引入可变的重点参数,使得模型更加关注难以分类的样本,从而提高模型的分类性能,在数据集上对改进后的模型进行训练,并记录和评估模型的性能指标;In order to improve the performance of the target detection model, a channel-spatial attention mechanism CBAM module is inserted between the backbone layer and the data enhancement layer of the original lightweight model, so that the model can automatically focus on important features related to target detection when processing images. Improve the attention and anti-interference ability of the network. Replace the original BCE cross-entropy loss function with the VariFocal Loss function to improve the classification performance of the model and the ability to identify difficult and easy samples. VariFocalLoss introduces variable key parameters in the loss function, making the model pay more attention to samples that are difficult to classify, thereby improving the classification performance of the model, training the improved model on the data set, and recording and evaluating the performance indicators of the model;
S5对改进YOLOX网络模型执行剪枝操作,对改进后模型进行进一步调整,以达到更高的检测精度和速度,在目标检测任务中,模型的轻量化和高效部署是至关重要的,因为它们直接影响着模型的速度和精度。对改进后的网络模型执行剪枝操作进一步减小模型的参数和计算量,从而实现更快速和高效的模型部署,同时,剪枝后的模型也可以更加适应于嵌入式设备等资源有限的场景,提高模型的通用性和可移植性;S5 performs a pruning operation on the improved YOLOX network model, and further adjusts the improved model to achieve higher detection accuracy and speed. In the target detection task, the lightweight and efficient deployment of the model is crucial, because they It directly affects the speed and accuracy of the model. The pruning operation is performed on the improved network model to further reduce the parameters and calculation amount of the model, so as to achieve faster and more efficient model deployment. At the same time, the pruned model can also be more suitable for scenarios with limited resources such as embedded devices , to improve the versatility and portability of the model;
S6对改进剪枝后的模型进行验证和分析,以确保其能够满足目标检测任务的性能要求,如果不能满足要求,则需要调整改进模型,直至满足性能要求为止。S6 verifies and analyzes the improved pruned model to ensure that it can meet the performance requirements of the target detection task. If the requirements cannot be met, the improved model needs to be adjusted until the performance requirements are met.
所述S4使用改进YOLOX网络模型的过程,包括下列步骤:Said S4 uses the process of improving the YOLOX network model, including the following steps:
S41:为了增强改进后的YOLOX网络模型对不同尺度目标的特征提取能力和抑制冗余信息的干扰,采取在主干层和数据增强层通道之间插入通道-空间注意力机制CBAM模块的手段。CBAM模块可以自适应地学习特征图中通道和空间的重要性,并以此来加权调整特征图,从而使得网络更加关注对目标识别和定位有贡献的特征信息。S41: In order to enhance the feature extraction ability of the improved YOLOX network model for targets of different scales and suppress the interference of redundant information, a method of inserting a channel-spatial attention mechanism CBAM module between the backbone layer and the data enhancement layer channel is adopted. The CBAM module can adaptively learn the importance of channels and spaces in the feature map, and use this to weight and adjust the feature map, so that the network pays more attention to the feature information that contributes to target recognition and localization.
S42:为了解决样本类别不均衡问题,采取将原有的BCE交叉熵损失函数替换为VariFacalLoss损失函数的手段。VariFocalLoss可以给正负样本加上不同的权重,让模型更加关注那些难以分类的样本,从而在训练过程中更好地平衡样本类别,并提高模型的分类精度。S42: In order to solve the problem of unbalanced sample categories, the method of replacing the original BCE cross-entropy loss function with the VariFacalLoss loss function is adopted. VariFocalLoss can add different weights to positive and negative samples, allowing the model to pay more attention to samples that are difficult to classify, so as to better balance sample categories during training and improve the classification accuracy of the model.
S43:为了优化改进后的YOLOX网络模型的性能,采取对其进行训练的手段。训练过程中,网络通过反向传播算法不断调整权值和偏置,以最小化损失函数的值,从而提高模型的分类和定位能力。S43: In order to optimize the performance of the improved YOLOX network model, a means of training it is adopted. During the training process, the network continuously adjusts the weights and biases through the backpropagation algorithm to minimize the value of the loss function, thereby improving the classification and positioning capabilities of the model.
S44:为了减少模型体积和计算量,以便于在移动设备上实现端到端的实时目标检测,采取对训练好的模型执行剪枝操作的手段。剪枝操作可以去除不必要的连接和参数,从而减少模型的体积和计算量,同时保持模型的精度。剪枝后的模型需要重新进行训练,并评估其性能指标,以保证模型在剪枝后仍能够保持高精度的目标检测能力。S44: In order to reduce the size of the model and the amount of calculation, so as to realize the end-to-end real-time object detection on the mobile device, the method of pruning the trained model is adopted. The pruning operation can remove unnecessary connections and parameters, thereby reducing the volume and computation of the model while maintaining the accuracy of the model. The pruned model needs to be retrained and its performance indicators evaluated to ensure that the model can still maintain high-precision target detection capabilities after pruning.
所述S43中;对网络模型进行训练的过程,包括下列步骤:In said S43; the process of training the network model includes the following steps:
S431:采集数据集,将原数据集进行标定;S431: collect the data set, and calibrate the original data set;
S432:将图片按照8:2的比例划分为数据集和验证集;S432: divide the picture into a data set and a verification set according to a ratio of 8:2;
S433:将数据集输入到YOLOX神经网络当中,将网络的预测值与真实值输入到损失函数中,求取损失值,并根据梯度下降法对神经网络参数进行更新;S433: Input the data set into the YOLOX neural network, input the predicted value and the actual value of the network into the loss function, obtain the loss value, and update the neural network parameters according to the gradient descent method;
S434:每通过一轮参数更新后,将验证集输入到YOLOX网络中进行验证,计算验证机损失度。S434: After each round of parameter update, the verification set is input into the YOLOX network for verification, and the loss degree of the verification machine is calculated.
S435:每迭代两次后,将数据集中的图片,输入到训练好的模型当中,获得模型的精度;S435: After every two iterations, input the pictures in the data set into the trained model to obtain the accuracy of the model;
S436:重复以上步骤直到epoch达到300轮后结束,此时模型已经收敛。S436: Repeat the above steps until the epoch reaches 300 rounds, and the model has converged.
所述S3中对网络模型进行剪枝的过程,包括下列步骤:The process of pruning the network model in the S3 includes the following steps:
S31:确定要剪枝的网络层。在进行模型剪枝前,需要对网络模型进行分析和评估,以确定哪些网络层是可以进行剪枝的。这个过程需要考虑模型中每个网络层的参数数量、计算量以及对模型整体性能的贡献程度等因素。这样可以最大程度地减少模型的计算量和体积,同时保证模型的检测精度不会过于下降。S31: Determine the network layer to be pruned. Before model pruning, it is necessary to analyze and evaluate the network model to determine which network layers can be pruned. This process needs to consider factors such as the number of parameters of each network layer in the model, the amount of calculation, and the degree of contribution to the overall performance of the model. This can minimize the amount of calculation and volume of the model, while ensuring that the detection accuracy of the model will not be too low.
S32:将要剪枝的网络层进行权重的排序;网络层的权重排序是剪枝操作的重要步骤之一。在剪枝之前,需要先对网络中的所有参数进行排序,以确定哪些参数对网络的性能贡献较小,哪些参数对性能贡献较大,然后选择性能贡献较小的网络层进行剪枝。S32: sort the weights of the network layers to be pruned; the weight sorting of the network layers is one of the important steps of the pruning operation. Before pruning, it is necessary to sort all the parameters in the network to determine which parameters contribute less to the performance of the network and which parameters contribute more to the performance, and then select the network layer with less performance contribution for pruning.
S33:根据权重的大小,确定其阈值。通过对网络层进行权重排序和设定阈值,可以识别出网络中不必要的或冗余的部分,然后进行剪枝处理。S33: Determine the threshold according to the size of the weight. By sorting the weights of the network layers and setting thresholds, unnecessary or redundant parts of the network can be identified, and then pruned.
S34:将低于阈值的部分进行剪枝,得到新的参数。在确定剪枝阈值之后,将低于该阈值的网络参数进行剪枝。具体地,将网络中的权重进行压缩,剔除冗余和不必要的参数,从而达到减少网络存储和计算量的目的。S34: Pruning the parts below the threshold to obtain new parameters. After determining the pruning threshold, the network parameters lower than the threshold are pruned. Specifically, the weights in the network are compressed, redundant and unnecessary parameters are eliminated, so as to achieve the purpose of reducing the amount of network storage and calculation.
S35:保存新的模型参数与权重,生成剪枝后的模型。剪枝后的模型参数数量减少,计算量也相应减小,可以显著提高模型的推理速度和效率,同时也有助于在嵌入式设备等资源受限的场景中实现实时目标检测。S35: Save the new model parameters and weights, and generate a pruned model. After pruning, the number of model parameters is reduced, and the amount of calculation is also reduced accordingly, which can significantly improve the inference speed and efficiency of the model, and also help to achieve real-time object detection in resource-constrained scenarios such as embedded devices.
以下结合相关背景技术以及实施步骤对本发明作进一步说明:Below in conjunction with relevant background technology and implementation steps, the present invention will be further described:
所述步骤S1中,选取采用公共数据集的图片,其中包括VOC2007和VOC2012数据集,包含21143幅训练图像和对应的xml文件。In the step S1, select pictures using public datasets, including VOC2007 and VOC2012 datasets, including 21143 training images and corresponding xml files.
所述步骤S2中,记录性能评估指标,包括模型精度mAP和模型大小Params(M),将其作为后续性能评估的标准。In the step S2, record performance evaluation indicators, including model accuracy mAP and model size Params(M), and use them as standards for subsequent performance evaluation.
YOLOX是YOLO系列最新目标检测算法,不仅实现了超越之前YOLO系列的检测精度,还在端到端推理速度上达到了极具竞争力的效果,然而当YOLOX在嵌入式设备上部署时,存在着模型体积大、浮点数运算量高、实时性不佳等问题,为了解决以上问题,同时避免模型预训练带来的不必要能耗,因此提出了YOLOX的轻量化方法。YOLOX is the latest target detection algorithm of the YOLO series. It not only achieves the detection accuracy beyond the previous YOLO series, but also achieves a very competitive effect in end-to-end reasoning speed. However, when YOLOX is deployed on embedded devices, there are In order to solve the above problems and avoid unnecessary energy consumption caused by model pre-training, the lightweight method of YOLOX is proposed.
(1)将YOLOX网络模型进行剪枝(1) Pruning the YOLOX network model
将YOLOX训练得到的网络模型进行读取,并保存模型权重和结构,此时模型大小为3797KB,确定要裁剪的卷积层,确定的层为数据增强层中的C3_p4层、C3_n3层,reduce_conv1层以及bu_conv2层,对它们每一个层中的权重大小分别进行排序,权重最大值乘上设定剪枝率(40%)作为剪枝权重的阈值,将低于该阈值的神经元权重置0,高于阈值的权重神经元,保存新的参数和权重,生成剪枝后的新的模型结构;Read the network model trained by YOLOX, and save the model weight and structure. At this time, the model size is 3797KB. Determine the convolution layer to be cut. The determined layer is the C3_p4 layer, C3_n3 layer, and reduce_conv1 layer in the data enhancement layer And the bu_conv2 layer, sort the weights in each layer respectively, multiply the maximum value of the weight by the set pruning rate (40%) as the threshold of the pruning weight, and reset the weight of neurons below this threshold to 0 , weight neurons higher than the threshold, save new parameters and weights, and generate a new model structure after pruning;
使用原始模型直接进行轻量化剪枝,虽然可以压缩模型,提高了检测速度,在下载嵌入式板卡的过程中毫无困难,但是会导致模型性能评估结果直接大幅度下降,因此需要对网络模型进行完善和改进,因此引入注意力机制的同时还替换了损失函数;Use the original model to directly perform lightweight pruning. Although the model can be compressed and the detection speed is improved, there is no difficulty in the process of downloading the embedded board, but it will directly cause a significant drop in the performance evaluation results of the model. Therefore, it is necessary to optimize the network model. Improve and improve, so the loss function is replaced while introducing the attention mechanism;
(1)将YOLOX网络模型进行改进和优化(1) Improve and optimize the YOLOX network model
在步骤S41中,如图5所示,融合卷积块注意力机制CBAM是通道注意力机制和空间注意力机制的结合。首先通过通道注意力机制,将输入的特征图分别进行平均池化和最大池化操作,实现聚合特征图的空间信息,生成的平均池化特征Favg和最大池化特征Fmax通过共享网络层,将共享网络应用到每个特征后,将平均池化特征和最大池化特征进行元素求和,并将合并的特征通过Sigmiod激活函数输出通道注意映射Mc。空间注意力沿着通道轴对特征图进行平均池化和最大池化操作,使特征图在通道维度上进行压缩,并将两个特征图在通道维度上进行拼接生成一个有效的特征图,随后经过7X7的卷积层。最后,通过Sigmiod函数操作得到最终的通道注意映射Ms。In step S41, as shown in Fig. 5, the fused convolutional block attention mechanism CBAM is a combination of channel attention mechanism and spatial attention mechanism. First, through the channel attention mechanism, the input feature map is subjected to average pooling and maximum pooling operations respectively to realize the spatial information of the aggregated feature map, and the generated average pooling feature Favg and maximum pooling feature Fmax pass through the shared network layer. After the shared network is applied to each feature, the average pooled feature and the maximum pooled feature are element-wise summed, and the combined features are output through the Sigmiod activation function to the channel attention map Mc. Spatial attention performs average pooling and maximum pooling operations on the feature map along the channel axis, so that the feature map is compressed in the channel dimension, and the two feature maps are spliced in the channel dimension to generate an effective feature map, and then After a 7X7 convolutional layer. Finally, the final channel attention map Ms is obtained through the Sigmiod function operation.
如图4所示,是对YOLOX网络结构改进后的结果。在跨阶段局部网络CSPNet层中,连接CBAM的通道注意力机制输入部分,在CBAM的空间注意力机制输出部分,连接数据增强层。As shown in Figure 4, it is the result of improving the YOLOX network structure. In the CSPNet layer of the cross-stage local network, the input part of the channel attention mechanism of CBAM is connected, and the data enhancement layer is connected in the output part of the spatial attention mechanism of CBAM.
注意力机制的引入,在特征提取和增强层的连接处插入通道-空间注意力机制模块,分别从通道和空间维度上筛选出有效特征,抑制了无关特征,增强了特征的表达能力,提高了模型的识别准确率。The introduction of the attention mechanism inserts the channel-spatial attention mechanism module at the connection between the feature extraction and the enhancement layer, and screens out effective features from the channel and space dimensions, suppresses irrelevant features, enhances the expressive ability of features, and improves the The recognition accuracy of the model.
在步骤S43中,在YOLOX目标检测中,交叉熵损失函数会有目标类别和背景类别之间极端不均衡的问题。使用Focal loss能够有效解决目标类与背景类之间不均衡的问题。Focal loss公式如下:In step S43, in YOLOX object detection, the cross-entropy loss function has the problem of extreme imbalance between the object category and the background category. Using Focal loss can effectively solve the problem of imbalance between the target class and the background class. Focal loss formula is as follows:
其中,p为目标类的预测概率,范围为[-1,1];y为真实正负样本类别,取值为1或-1;α为可调比例因子;(1-p)的β次方为目标调制类因子,p的β次方为背景类调制因子,两类调制因子可以缩减简单样本的贡献,增加误检样本的重要性。使得Focal loss能够使用加权方法解决训练时类别不均衡问题。Among them, p is the predicted probability of the target class, the range is [-1, 1]; y is the real positive and negative sample category, the value is 1 or -1; α is an adjustable scaling factor; (1-p) β times The square is the target modulation factor, and the β power of p is the background modulation factor. The two types of modulation factors can reduce the contribution of simple samples and increase the importance of false detection samples. Enables Focal loss to use a weighting method to solve the problem of category imbalance during training.
Focal loss采用平等的方式处理正负样本,而实际检测中,正样本的贡献更为重要,为此对Focal loss进一步改进,Varifocal loss基于交叉熵二进制,借鉴Focal loss加权方式处理训练中类别不平衡问题。交叉熵二进制公式为:Focal loss handles positive and negative samples in an equal manner. In actual detection, the contribution of positive samples is more important. For this reason, Focal loss is further improved. Varifocal loss is based on cross-entropy binary, and the weighted method of Focal loss is used to deal with category imbalance in training. question. The binary formula for cross entropy is:
其中,p为预测值,代表目标分数:q为分类条件,对于目标类,将正样本类别q值设置为预选框和IoU之间的值,否则设置为0。对于背景类别,所有类的目标q值均为0。如上述公式所示,Varifocal loss使用p的β次方缩放因子对负样本的处理,而不会对正样本进行缩放处理。这样可以突出正样本的贡献。Among them, p is the predicted value, which represents the target score: q is the classification condition, and for the target class, set the positive sample category q value to the value between the pre-selection box and IoU, otherwise set it to 0. For the background category, the target q-value is 0 for all classes. As shown in the above formula, Varifocal loss uses the β power scaling factor of p to process negative samples, but not to scale positive samples. This can highlight the contribution of positive samples.
(3)改进后的目标检测网络模型(3) Improved target detection network model
本发明涉及注意力机制、损失函数和模型剪枝三个部分。在主干特征提取层和数据增强层之间引入注意力机制,使网络具备了专注输入其特征的能力,获得了更好的检测精度。在损失函数部分,将BCE交叉熵损失函数替换为VariFocalLoss函数,提高了对数据集中困难样本的关注度,实现了样本平衡。对改进后的目标检测网络模型进行剪枝操作,实现了YOLOX网络模型的轻量化,解决了原有YOLOX网络在嵌入式设备上部署时存在模型体积大、浮点数运算量高、实时性不佳的问题。(如图6所示)The present invention involves three parts: attention mechanism, loss function and model pruning. The attention mechanism is introduced between the backbone feature extraction layer and the data enhancement layer, so that the network has the ability to focus on inputting its features and obtain better detection accuracy. In the loss function part, the BCE cross-entropy loss function is replaced by the VariFocalLoss function, which increases the attention to difficult samples in the data set and achieves sample balance. The pruning operation of the improved target detection network model realizes the lightweight of the YOLOX network model, and solves the problems of large model size, high floating-point calculations, and poor real-time performance when the original YOLOX network is deployed on embedded devices The problem. (As shown in Figure 6)
综上所述,本发明主要解决了两个方面的技术问题,一是针对原有YOLOX在嵌入式设备上部署时仍存在模型体积大、浮点数运算量高、实时性不佳的问题;二是针对原有YOLOX模型剪枝后的评估性能下降的问题;To sum up, the present invention mainly solves the technical problems in two aspects. One is that the original YOLOX still has the problems of large model size, high amount of floating-point calculations and poor real-time performance when it is deployed on embedded devices; It is aimed at the problem of the evaluation performance decline after pruning of the original YOLOX model;
以上所披露的仅为本发明一种实例而已,当然不能以此来限定本发明权利范围,本领域普通技术人员可以理解实现上述实例的流程,按照本发明权利要求作出等同变化。What is disclosed above is only an example of the present invention, which certainly cannot limit the scope of the present invention. Those of ordinary skill in the art can understand the process of realizing the above example, and make equivalent changes according to the claims of the present invention.
Claims (6)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202310212335.9A CN116306813B (en) | 2023-03-07 | 2023-03-07 | A method based on YOLOX lightweight and network optimization |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202310212335.9A CN116306813B (en) | 2023-03-07 | 2023-03-07 | A method based on YOLOX lightweight and network optimization |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN116306813A true CN116306813A (en) | 2023-06-23 |
| CN116306813B CN116306813B (en) | 2025-08-12 |
Family
ID=86821771
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202310212335.9A Active CN116306813B (en) | 2023-03-07 | 2023-03-07 | A method based on YOLOX lightweight and network optimization |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN116306813B (en) |
Cited By (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN117036427A (en) * | 2023-08-11 | 2023-11-10 | 江苏颂泽科技有限公司 | Industrial printed matter image registration method and device based on lightweight network |
| CN117197841A (en) * | 2023-09-22 | 2023-12-08 | 深圳市天双科技有限公司 | Pedestrian detection method and system for maritime ships |
| CN117237599A (en) * | 2023-08-25 | 2023-12-15 | 中银金融科技有限公司 | Image target detection method and device |
| CN117314840A (en) * | 2023-09-12 | 2023-12-29 | 中国科学院空间应用工程与技术中心 | Methods, systems, storage media and equipment for detecting small impact craters on the surface of extraterrestrial objects |
| CN118468968A (en) * | 2024-07-12 | 2024-08-09 | 杭州字节方舟科技有限公司 | Deep neural network compression method based on joint dynamic pruning |
| CN118762160A (en) * | 2024-06-03 | 2024-10-11 | 徐州华东机械有限公司 | Foreign body detection method for lightweight belt conveyor based on MO-YOLOX network |
| CN119478620A (en) * | 2024-07-29 | 2025-02-18 | 广东工业大学 | A target detection method based on improved YOLOv5n |
| CN119622456A (en) * | 2024-11-21 | 2025-03-14 | 吉林大学 | A method for training end-to-end autonomous driving policies |
| CN120632840A (en) * | 2025-08-15 | 2025-09-12 | 浙江大学滨江研究院 | Model fingerprint injection and verification method and device based on side branch network |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2022141754A1 (en) * | 2020-12-31 | 2022-07-07 | 之江实验室 | Automatic pruning method and platform for general compression architecture of convolutional neural network |
| CN114898171A (en) * | 2022-04-07 | 2022-08-12 | 中国科学院光电技术研究所 | A real-time target detection method suitable for embedded platform |
| CN115393690A (en) * | 2022-09-02 | 2022-11-25 | 西安工业大学 | Light neural network air-to-ground observation multi-target identification method |
| CN115471667A (en) * | 2022-09-08 | 2022-12-13 | 重庆邮电大学 | Lightweight target detection method for improving YOLOX network structure |
-
2023
- 2023-03-07 CN CN202310212335.9A patent/CN116306813B/en active Active
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2022141754A1 (en) * | 2020-12-31 | 2022-07-07 | 之江实验室 | Automatic pruning method and platform for general compression architecture of convolutional neural network |
| CN114898171A (en) * | 2022-04-07 | 2022-08-12 | 中国科学院光电技术研究所 | A real-time target detection method suitable for embedded platform |
| CN115393690A (en) * | 2022-09-02 | 2022-11-25 | 西安工业大学 | Light neural network air-to-ground observation multi-target identification method |
| CN115471667A (en) * | 2022-09-08 | 2022-12-13 | 重庆邮电大学 | Lightweight target detection method for improving YOLOX network structure |
Non-Patent Citations (1)
| Title |
|---|
| 邵伟平;王兴;曹昭睿;白帆;: "基于MobileNet与YOLOv3的轻量化卷积神经网络设计", 计算机应用, no. 1, 10 July 2020 (2020-07-10) * |
Cited By (10)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN117036427A (en) * | 2023-08-11 | 2023-11-10 | 江苏颂泽科技有限公司 | Industrial printed matter image registration method and device based on lightweight network |
| CN117237599A (en) * | 2023-08-25 | 2023-12-15 | 中银金融科技有限公司 | Image target detection method and device |
| CN117314840A (en) * | 2023-09-12 | 2023-12-29 | 中国科学院空间应用工程与技术中心 | Methods, systems, storage media and equipment for detecting small impact craters on the surface of extraterrestrial objects |
| CN117197841A (en) * | 2023-09-22 | 2023-12-08 | 深圳市天双科技有限公司 | Pedestrian detection method and system for maritime ships |
| CN118762160A (en) * | 2024-06-03 | 2024-10-11 | 徐州华东机械有限公司 | Foreign body detection method for lightweight belt conveyor based on MO-YOLOX network |
| CN118468968A (en) * | 2024-07-12 | 2024-08-09 | 杭州字节方舟科技有限公司 | Deep neural network compression method based on joint dynamic pruning |
| CN118468968B (en) * | 2024-07-12 | 2024-09-17 | 杭州字节方舟科技有限公司 | Deep neural network compression method based on joint dynamic pruning |
| CN119478620A (en) * | 2024-07-29 | 2025-02-18 | 广东工业大学 | A target detection method based on improved YOLOv5n |
| CN119622456A (en) * | 2024-11-21 | 2025-03-14 | 吉林大学 | A method for training end-to-end autonomous driving policies |
| CN120632840A (en) * | 2025-08-15 | 2025-09-12 | 浙江大学滨江研究院 | Model fingerprint injection and verification method and device based on side branch network |
Also Published As
| Publication number | Publication date |
|---|---|
| CN116306813B (en) | 2025-08-12 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN116306813A (en) | Method based on YOLOX light weight and network optimization | |
| CN111160176B (en) | Fusion feature-based ground radar target classification method for one-dimensional convolutional neural network | |
| CN116258941B (en) | Lightweight improvement method of yolox target detection based on Android platform | |
| CN112541532B (en) | Target detection method based on dense connection structure | |
| CN111783841A (en) | Garbage classification method, system and medium based on transfer learning and model fusion | |
| CN117708771B (en) | ITSOBP-based comprehensive transmission device fault prediction algorithm | |
| CN115019173A (en) | Garbage identification and classification method based on ResNet50 | |
| CN111833322A (en) | A Garbage Multi-target Detection Method Based on Improved YOLOv3 | |
| CN114283320B (en) | Branch-free structure target detection method based on full convolution | |
| CN104463194A (en) | Driver-vehicle classification method and device | |
| CN112308825A (en) | A method for identification of crop leaf diseases based on SqueezeNet | |
| CN117315380B (en) | Deep learning-based pneumonia CT image classification method and system | |
| CN119152193B (en) | A YOLO target detection method and system based on differentiable architecture search | |
| CN117197524A (en) | Image classification method of lightweight network structure based on pruning | |
| CN118537653A (en) | Remote sensing image land utilization classification method based on residual error network | |
| CN116778311A (en) | An underwater target detection method based on improved Faster R-CNN | |
| CN109902697A (en) | Multi-target detection method, device and mobile terminal | |
| CN116680639A (en) | Deep-learning-based anomaly detection method for sensor data of deep-sea submersible | |
| Zhao et al. | Neural network based on convolution and self-attention fusion mechanism for plant leaves disease recognition | |
| CN118396958A (en) | Defect detection method for crystalline silicon component of solar cell | |
| CN112561054B (en) | Neural network filter pruning method based on batch characteristic heat map | |
| CN114863485A (en) | Cross-domain pedestrian re-identification method and system based on deep mutual learning | |
| CN120106597A (en) | Long-term prediction method and system for regional net carbon emissions based on multi-factor mixed determination | |
| CN115034314B (en) | System fault detection method and device, mobile terminal and storage medium | |
| CN118940050A (en) | A data element multi-feature fusion intelligent matching method |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |