WO2024032010A1 - 一种基于迁移学习策略的少样本目标实时检测方法 - Google Patents
一种基于迁移学习策略的少样本目标实时检测方法 Download PDFInfo
- Publication number
- WO2024032010A1 WO2024032010A1 PCT/CN2023/086781 CN2023086781W WO2024032010A1 WO 2024032010 A1 WO2024032010 A1 WO 2024032010A1 CN 2023086781 W CN2023086781 W CN 2023086781W WO 2024032010 A1 WO2024032010 A1 WO 2024032010A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- few
- detection
- sample
- model
- training
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
Definitions
- the invention belongs to the field of image processing and relates to a real-time detection method of few-sample targets based on a transfer learning strategy.
- Object detection is one of the most important and fundamental tasks in computer vision.
- CNN Convolutional Neural Network
- visual Transformer with high detection performance.
- the excellent detection performance of these models is achieved at the expense of large amounts of data. Due to the complexity of the object and the large number of model parameters, the detection accuracy will drop rapidly when the amount of data is limited. Therefore, few-shot target detection has received more and more attention in recent years.
- the purpose of the method based on meta-learning strategy is to obtain the correlation between the current image and the few samples.
- the detection performance for the few samples has been improved, due to the feature extraction structure, input features and few sample features in the minority sample detection branch, The structure of the relationship between them and the number of small sample categories have resulted in a greatly increased computational complexity of the model.
- the purpose of the method based on the transfer learning strategy is to enable the detection model that already has feature representation capabilities to be well adapted to the few-sample target.
- the purpose of the present invention is to provide a two-way combined real-time target detection model, based on the transfer learning strategy, using Darknet-53 combined with Spatial Pyramid Pooling (SPP) and Feature Pyramid Network (Feature Pyramid). Network, FPN) as the backbone and neck, respectively extract image features and provide semantic features at different scales.
- SPP Spatial Pyramid Pooling
- Feature Pyramid Feature Pyramid Network
- FPN Feature Pyramid Network
- the large-sample category detection branch is only used to detect large-sample category objects, while the few-sample category detection branch is used to detect all categories of objects.
- the discriminator After outputting the detection results in parallel, the discriminator will scan the two results and output the more appropriate result of the two parallel branches based on a metric criterion.
- the main reason for using the dual-path combination structure is that when the model is trained on a small number of samples, the detection accuracy of objects in the large sample category will degrade, and the few sample detection branch will have false positive bounding boxes that actually belong to the large sample category.
- the few-sample detection branch also learns the prediction differences of large-sample categories from the large-sample detection branch through knowledge distillation, thereby improving the generalization ability of the detection branch.
- the present invention proposes a feature-based response
- the Attentive DropBlock regularization method is used to guide the model to focus on the overall characteristics of the target, avoid being dominated by local salient features, and enhance the generalization ability of the model.
- a real-time detection method of few-sample targets based on transfer learning strategy including the following steps:
- S4 Fine-tune the few-sample category detection branch on the few-sample category data; use a new regularization method to guide the model to focus on the overall characteristics of the object during fine-tuning;
- the detection network model includes: the backbone network is Darknet-53 combined with Spatial Pyramid Pooling (SPP), which is used to extract image features; the detection neck network is composed of Feature Pyramid Network (Feature Pyramid Network, FPN), used to provide semantic features of different scales to the detection head network; the detection head network is a dual-channel detection branch network structure with a discriminator, in which the large sample category detection branch is only used to detect categories corresponding to large samples The target, few-shot category detection branch is used to detect all categories of targets, and the discriminator is used to scan the results of the two branches in sequence and obtain the final output result according to a measurement criterion.
- SPP Spatial Pyramid Pooling
- FPN Feature Pyramid Network
- step S2 processing limited data by using random affine transformation, multi-scale image training strategy, MixUp data fusion strategy and Label Smoothing label processing strategy.
- step S3 the backbone network is initialized to the weights trained on the ImageNet data set, and the network model except the few-sample detection branch is trained from scratch using large-sample category data.
- L box is the additive combination of the GIoU loss function and smooth L1 loss of coordinate regression;
- L cls and L obj are the Focal Loss function and the binary cross-entropy loss function respectively.
- step S4 the model parameters of the main part of the detection model, the detection neck part and the large sample category detection branch part are frozen, and only the small sample category detection branch is fine-tuned.
- the loss function at this stage involves the coordinates of the prediction frame , target confidence, classification results and the difference of large sample category detection branches.
- step S4 specifically includes the following steps:
- N represents the batch size
- l represents the absolute error function
- ⁇ is used to control the impact of base class distillation loss on model gradient update
- O d (i, j) represents the discriminator output of a specific spatial grid.
- the new regularization method is the Attentive DropBlock algorithm, which has a dynamic coefficient ⁇ , as shown below:
- the parameters keep_prob and block_size affect the frequency and range of the feature map being set to zero
- ⁇ represents the sigmoid function, which is used to control the response range
- ⁇ represents the response amplification factor
- the Attentive DropBlock algorithm first determines whether it is currently in the fine-tuning stage. If the model is fine-tuning, obtain the channel response f C and spatial response f S of the few-sample category detection branch; then, calculate the parameter ⁇ according to the parameters keep_prob, block_size and ⁇ . Finally, the spatial position of each different channel feature is set to zero according to the Bernoulli distribution probability with parameter ⁇ ; finally, with the zero position as the center, a mask block with a length and width value of block_size is constructed, so that Regularize the model.
- step S5 train and test on the PASCAL VOC and MS COCO data sets
- the training set and the verification set are first merged into one set for training to detect the magic heart, and then its test set is selected for testing.
- the test evaluation standard adopts the Intersection over Union (IoU) threshold of 0.5
- the mean Average Precision (mAP) i.e. mAP@50
- the average number of frames per second (mean Frames Per Second, mFPS) of multiple different small sample collections represent the detection accuracy and speed of the detection model;
- mAP i.e. AP
- FPS frames per second
- step S5 stochastic gradient descent is used as the optimization method of the network model, the initial learning rate is 1 ⁇ 10 -3 , and the set minimum batch size is 16 in different data sets; for PASCAL VOC and MS COCO Data set, the number of times of initial training and fine-tuning of the detection model is 300, and the CosineLR learning rate change strategy (from 0.001 to 0.00001) is used during the training process; during the prediction process, the length and width of the input image are fixed at 448 ⁇ 448; FPS To obtain the sum of the waiting time for each result and the time for post-processing the results, mFPS is the average FPS under different few-sample sets.
- the present invention proposes an Attentive DropBlock regularization method based on feature response to guide the model to pay attention to the overall characteristics of the object, avoid over-fitting of the model in the fine-tuning stage, avoid being dominated by local salient features, and enhance Due to the generalization ability of the model, the present invention can not only achieve accurate detection of few-sample category objects under smaller model parameters, but also achieve real-time detection of related targets.
- Figure 1 is an overall flow chart of the model proposed by the present invention.
- Figure 2 is a visual comparison chart of DropBlock algorithm and Attentive DropBlock algorithm
- Figure 3 is a diagram showing the visual detection results of large sample and small sample category objects by the model proposed by the present invention.
- Figure 4 shows the response to the target and the visual detection results of the large-sample category detection branch and the few-sample category detection branch of the model proposed by the present invention.
- a real-time detection method of few-sample targets based on transfer learning strategy includes the following steps:
- the S1 specifically includes the following steps:
- multi-scale image training strategy (320, 352, 384, 416, 448, 480, 512, 544, 576 and 608), MixUp data fusion strategy and Label Smoothing label processing strategy to conduct limited data Processing, thereby increasing the generalization performance of the detection model to the sample.
- L box is the additive combination of the GIoU loss function of coordinate regression and the smooth L1 loss.
- L cls and L obj are the Focal Loss function and the binary cross-entropy loss function respectively.
- the backbone, detection neck and large sample detection branches are frozen to maintain strong generalization ability, and only the few sample detection branches and SPP layers and their adjacent volumes are Stacked layers for training.
- many false positive bounding boxes are generated, resulting in low detection accuracy due to the similarity between objects in the two classes. Therefore, we randomly sample K instances from the corresponding data for each large-sample category, so that the few-shot detection branch predicts all categories of objects.
- the large-sample category detection branch has strong generalization ability
- the few-sample detection branch should learn this branch to obtain better generalization ability. Therefore, we establish the base class distillation loss L b between the two branches, and the calculation formula is as follows:
- N the batch size.
- ⁇ is used to control the impact of base class distillation loss on model gradient update.
- O d (i, j) represents the discriminator output of a specific spatial grid.
- the present invention proposes an Attentive DropBlock algorithm.
- This algorithm is not only affected by the parameters keep_prob and block_size, but also affected by the model's semantic features. Impact of response.
- the DropBlock algorithm sets a constant coefficient for all locations within the feature map, as follows:
- ⁇ is a dynamic coefficient that relies on the feature map response extracted in the Attentive DropBlock algorithm.
- ⁇ is a dynamic coefficient that relies on the feature map response extracted in the Attentive DropBlock algorithm.
- F ⁇ R B ⁇ C ⁇ H ⁇ W adopts the global max pooling function for each channel feature to obtain the response f C ⁇ R B ⁇ C ⁇ 1 ⁇ 1
- the global average pooling function yields the response f S ⁇ R B ⁇ 1 ⁇ H ⁇ W . Therefore, the calculation formula of ⁇ in the Attentive DropBlock algorithm is as follows:
- ⁇ represents the sigmoid function used to control the response range
- ⁇ represents the response amplification factor
- the Attentive DropBlock algorithm will first determine whether the model is currently in the fine-tuning stage. If the model is fine-tuning, obtain the channel response f C and spatial response f S of the few-sample category detection branch. Afterwards, after calculating the parameter ⁇ based on the two responses, keep_prob, block_size and ⁇ , the spatial position of each different channel feature is set to zero according to the Bernoulli distribution probability with the parameter ⁇ . Finally, with the zero position as the center, a mask block with a length and width of block_size is constructed to regularize the model.
- Figure 2 shows the difference between DropBlock and Attentive DropBlock. It can be observed that Attentive The gamma value in DropBlock is related to the target response. Feature maps that contain more target responses have higher ⁇ values, which means that the detection model can better avoid being dominated by local obvious features and thus pay more attention to unobvious features during the training process, thereby obtaining better results. Sample target detection accuracy.
- the S5 for the PASCAL VOC data set, three different data combination structures are obtained in such a way that 15 categories are large-sample categories and the remaining 5 categories are few-sample categories (the first few-sample category includes Birds, buses, cows, motorcycles, and sofas; the second few-shot category includes airplanes, bottles, cows, horses, and sofas; the third few-shot category includes boats, cats, motorcycles, sheep, and sofas); for MS In the COCO data set, the 20 categories that are the same as those in the PASCAL VOC data set are small-sample categories, and the remaining 60 categories are large-sample categories.
- the present invention uses stochastic gradient descent as the optimization method of the network model, the initial learning rate is 1 ⁇ 10 -3 , and the set minimum batch size is 16 in different data sets. For these two data sets, the number of times the model was trained from scratch and fine-tuned was 300, and the CosineLR learning rate change strategy (from 0.001 to 0.00001) was used during the training process.
- the length and width of the input image are fixed at 448 ⁇ 448.
- the present invention compares the detection accuracy and detection speed of various few-sample target detection models proposed in recent years on the PASCAL VOC 2007 and MS COCO 2014 data sets.
- the detection model of the present invention was evaluated on the challenging PASCAL VOC 2007 and MS COCO 2014 data sets according to the evaluation criteria specified in the PASCAL VOC and MS COCO data.
- These two benchmark data contain training sets, validation sets and test sets.
- the PASCAL VOC 2007 data set contains 20 target categories
- the MS COCO 2014 data set contains 80 categories.
- the present invention first combines the PASCAL VOC 2007 and PASCAL VOC 2012 training sets and verification sets into one set for training the detection model, and selects the PASCAL VOC 2007 test set for testing.
- the test evaluation standard adopts the Intersection Ratio (Intersection).
- the detection model is represented by the mean Average Precision (mAP) (i.e. mAP@50) with a threshold of 0.5 over Union (IoU) and the average number of frames per second (mean Frames Per Second, mFPS) of multiple different few sample sets. detection accuracy and speed.
- mAP mean Average Precision
- IoU 0.5 over Union
- mFPS mean Frames Per Second
- the present invention only uses the MS COCO 2014 training set for training, and uses its verification set for verification in the test phase, using the mAP (i.e. AP) of IoU from 0.5 to 0.95 (interval is 0.05) and the number of transmission frames per second (Frames Per Second, FPS) represents the detection accuracy and speed of the detection model.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Image Analysis (AREA)
Abstract
本发明涉及一种基于迁移学习策略的少样本目标实时检测方法,属于图像处理领域,涉包括以下步骤:S1:构建检测网络模型;S2:对输入数据进行预处理;S3:在大样本类别数据上对目标检测模型进行从头训练;S4:在少样本类别数据上对少样本类别检测分支进行微调;在微调时通过一种新的正则化方法以引导模型关注物体的整体特征;S5:通过训练集训练检测模型,再测试集进行测试。本发明避免了模型在微调阶段出现过拟合,避免了受局部显著特征主导,增强了模型的泛化能力。本发明不仅能够在较小的模型参数下对少样本类别物体实现精准检测,并且能够对相关目标实现实时检测。
Description
本发明属于图像处理领域,涉及一种基于迁移学习策略的少样本目标实时检测方法。
目标检测是计算机视觉中最重要且基础的任务之一。有许多基于卷积神经网络(Convolutional Neural Network,CNN)或视觉Transformer的检测器具有较高的检测性能。然而,这些模型优异的检测性能是以大量数据为代价实现的。由于对象的复杂性和模型参数的庞大性,当数据数量有限时会导致检测精度将迅速下降。因此,近几年来,少样本目标检测受到了越来越多的关注。
为了更好地适应样本数量限制的情景,目前已经有一些基于元学习策略及迁移学习策略的少样本目标检测模型。基于元学习策略的方法目的是获取当前图像和少样本之间的相关性,虽然对于少样本的检测性能得到了改善,但由于少数样本检测分支中的特征提取结构、输入特征和少样本特征之间建立关系的结构以及少样本类别的数量,导致了模型的计算复杂度也大大增加。基于迁移学习策略的方法目的是使已经具备特征表示能力的检测模型能够很好地适应少样本目标。然而,为了提高检测精度,大多数方法侧重于两阶段检测模型,例如Faster RCNN或Cascade RCNN,由于输入至这些模型的图像较大,并且建议框需要在Region Proposal Network(RPN)中生成,导致了这类检测模型在推断阶段较为耗时。
发明内容
有鉴于此,本发明的目的在于提供一种双路组合的实时目标检测模型,基于迁移学习策略,利用Darknet-53结合空间金字塔池化层(Spatial Pyramid Pooling,SPP)和特征金字塔网络(Feature Pyramid Network,FPN)作为主干和颈部,分别提取图像特征和提供不同尺度的语义特征。对于检测头部结构,提出了带鉴别器的双路径检测分支,大样本类别检测分支仅用于检测大样本类别对象,而少样本类别检测分支用于检测所有类别对象。在并行输出检测结果后,鉴别器将扫描这两个结果,并根据一种度量准则输出两个并行分支中更合适的结果。使用双路径组合结构的主要原因是,当模型在少样本上训练时,会对大样本类别物体出现检测精度退化的现象,并且少样本检测分支会出现实际属于大样本类别的误报边界框。此外,少样本检测分支还通过知识蒸馏从大样本检测分支中学习大样本类别的预测差异,从而提升该检测分支的泛化能力。最后,为了避免模型在微调阶段出现过拟合,本发明提出了基于特征响应
的Attentive DropBlock正则化方法来引导模型关注目标的整体特征,避免受局部显著特征主导,增强模型的泛化能力。
为达到上述目的,本发明提供如下技术方案:
一种基于迁移学习策略的少样本目标实时检测方法,包括以下步骤:
S1:构建检测网络模型;
S2:对输入数据进行预处理;
S3:在大样本类别数据上对目标检测模型进行从头训练;
S4:在少样本类别数据上对少样本类别检测分支进行微调;在微调时通过一种新的正则化方法以引导模型关注物体的整体特征;
S5:通过训练集训练检测模型,再测试集进行测试。
进一步,所述检测网络模型包括:主干网络为Darknet-53结合空间金字塔池化层(Spatial Pyramid Pooling,SPP),用于对图像特征进行提取;检测颈部网络由特征金字塔网络(Feature Pyramid Network,FPN)构成,用于给检测头部网络提供不同尺度的语义特征;检测头部网络为带判别器的双路检测分支网络结构,其中,大样本类别检测分支仅用于检测大样本对应的类别目标,少样本类别检测分支用于检测所有类别目标,判别器用于依次扫描两个分支的结果,并根据一种度量准则获取最终输出结果。
进一步,步骤S2中所述的预处理具体为:通过使用具有随机仿射变换、多尺度图像训练策略、MixUp数据融合策略及Label Smoothing标签处理策略来对有限数据进行处理。
进一步,步骤S3中,主干网络初始化为ImageNet数据集训练下的权重,对除少样本检测分支外的网络模型利用大样本类别数据进行从头训练,本阶段损失函数涉及预测框坐标,目标置信度及分类结果,损失函数为:
Lbase training=Lbox+Lcls+Lobj (1)
Lbase training=Lbox+Lcls+Lobj (1)
其中,Lbox是坐标回归的GIoU损失函数和smooth L1损失的相加组合;Lcls和Lobj分别是Focal Loss函数和二元交叉熵损失函数。
进一步,步骤S4中,对检测模型的主干部分、检测颈部部分及大样本类别检测分支部分的模型参数进行冻结,只对少样本类别检测分支进行微调,本阶段的损失函数涉及预测框的坐标,目标置信度、分类结果和大样本类别检测分支的差异度。
进一步,步骤S4中,具体包括以下步骤:
S41:在大样本类别检测分支与少样本检测分支之间建立基类蒸馏损失Lb,计算公式如下:
其中,N表示批量大小,l表示绝对误差函数,和分别表示第i张图像在大样本检测分支和少样本类别检测分支的输出;
S42:在少样本上微调的损失函数为:
Lfew-shot tuning=Lbox+2Lcls+Lobj+λ·Lb (3)
Lfew-shot tuning=Lbox+2Lcls+Lobj+λ·Lb (3)
其中,λ用于控制基类蒸馏损失对模型梯度更新的影响程度;
S43:在大样本类别检测分支与少样本检测分支后加入判别器,判别器选择大样本类别检测分支结果以及少样本类别检测分支结果之间的最大值作为最终输出,其度量准则如下所示:
其中Od(i,j)表示某一具体空间网格的判别器输出。
进一步,所述新的正则化方法为Attentive DropBlock算法,其具有动态系数γ,如下所示:
其中,参数keep_prob和block_size影响特征图置零的频率及范围,σ表示sigmoid函数,用于控制响应范围,α表示响应放大因子。
进一步,所述Attentive DropBlock算法首先判断当前是否处于微调阶段,如果模型正在微调,则获取少样本类别检测分支的通道响应fC和空间响应fS;之后,根据参数keep_prob、block_size和α计算参数γ后,每个不同通道特征的空间位置按照服从参数为γ的伯努利分布概率对该位置特征置零;最后,以置零位置为中心,构建一个长宽数值为block_size的掩膜块,从而对模型实现正则化处理。
进一步,步骤S5中,在PASCAL VOC及MS COCO数据集上进行训练和测试;
对于PASCAL VOC数据集,首先将训练集和验证集合并为一个集合,用于训练检测魔心,再选择其测试集进行测试,测试评估标准采用交并比(Intersection over Union,IoU)阈值为0.5的平均精度均值(mean Average Precision,mAP)(即mAP@50)和多个不同少样本集合的平均每秒处理帧数(mean Frames Per Second,mFPS)表示检测模型的检测精度及速度;
对于MS COCO数据集,只采用其训练集进行训练,利用其验证集进行验证,使用IoU从0.5至0.95(间隔为0.05)的mAP(即AP)和每秒传输帧数(Frames Per Second,FPS)表示检测模型的检测精度及速度。
进一步,步骤S5的训练过程中,采用随机梯度下降作为网络模型的优化方法,初始学习率为1×10-3,并且设定的最小批量在不同数据集下都为16;对于PASCAL VOC及MS COCO
数据集,检测模型从头训练及微调的次数皆为300,并且在训练过程中采用CosineLR学习率变化策略(从0.001到0.00001);在预测过程中,输入图像的长宽固定为448×448;FPS为获取每个结果的等待时间及对结果进行后处理的时间之和,mFPS为不同少样本集合下的FPS均值。
本发明的有益效果在于:本发明提出了基于特征响应的Attentive DropBlock正则化方法来引导模型关注物体的整体特征,避免了模型在微调阶段出现过拟合,避免了受局部显著特征主导,增强了模型的泛化能力,本发明不仅能够在较小的模型参数下对少样本类别物体实现精准检测,并且能够对相关目标实现实时检测。
本发明的其他优点、目标和特征在某种程度上将在随后的说明书中进行阐述,并且在某种程度上,基于对下文的考察研究对本领域技术人员而言将是显而易见的,或者可以从本发明的实践中得到教导。本发明的目标和其他优点可以通过下面的说明书来实现和获得。
为了使本发明的目的、技术方案和优点更加清楚,下面将结合附图对本发明作优选的详细描述,其中:
图1为本发明提出的模型整体流程图;
图2为DropBlock算法及Attentive DropBlock算法可视化比较图;
图3为本发明提出的模型对大样本及少样本类别物体的可视化检测结果图;
图4为本发明提出的模型大样本类别检测分支及少样本类别检测分支对目标的响应及可视化检测结果。
以下通过特定的具体实例说明本发明的实施方式,本领域技术人员可由本说明书所揭露的内容轻易地了解本发明的其他优点与功效。本发明还可以通过另外不同的具体实施方式加以实施或应用,本说明书中的各项细节也可以基于不同观点与应用,在没有背离本发明的精神下进行各种修饰或改变。需要说明的是,以下实施例中所提供的图示仅以示意方式说明本发明的基本构想,在不冲突的情况下,以下实施例及实施例中的特征可以相互组合。
其中,附图仅用于示例性说明,表示的仅是示意图,而非实物图,不能理解为对本发明的限制;为了更好地说明本发明的实施例,附图某些部件会有省略、放大或缩小,并不代表实际产品的尺寸;对本领域技术人员来说,附图中某些公知结构及其说明可能省略是可以理解的。
本发明实施例的附图中相同或相似的标号对应相同或相似的部件;在本发明的描述中,需要理解的是,若有术语“上”、“下”、“左”、“右”、“前”、“后”等指示的方位或位置关系为基于附图所示的方位或位置关系,仅是为了便于描述本发明和简化描述,而不是指示或暗示所指的装置或元件必须具有特定的方位、以特定的方位构造和操作,因此附图中描述位置关系的用语仅用于示例性说明,不能理解为对本发明的限制,对于本领域的普通技术人员而言,可以根据具体情况理解上述术语的具体含义。
请参阅图1~图4,一种基于迁移学习策略的少样本目标实时检测方法,该方法包括以下步骤:
S1:对输入数据进行预处理;
S2:在大样本类别数据上对目标检测模型(除少样本检测分支外)进行从头训练;
S3:在少样本类别数据上对少样本类别检测分支进行微调;
S4:在微调阶段引入一种新的正则化方法以引导模型关注物体的整体特征;
S5:在自然数据集PASCAL VOC 2007和MS COCO 2014数据集上进行实验;
可选的,所述S1具体包括以下步骤:
通过使用具有随机仿射变换、多尺度图像训练策略(320、352、384、416、448、480、512、544、576和608)、MixUp数据融合策略及Label Smoothing标签处理策略来对有限数据进行处理,从而增加检测模型对样本的泛化性能。
可选的,所述S2中,为使模型具有较强的目标表证能力,对除少样本检测分支外的整个网络利用大样本类别数据进行从头训练。因此,第一个阶段整个网络训练的损失函数为:
Lbasw training=Lbox+Lcls+Lobj (1)
Lbasw training=Lbox+Lcls+Lobj (1)
其中,Lbox是坐标回归的GIoU损失函数和smooth L1损失的相加组合。Lcls和Lobj分别是Focal Loss函数和二元交叉熵损失函数。
可选的,所述S3中,在少样本的微调阶段,主干、检测颈部和大样本检测分支被冻结以保持较强泛化能力,仅对少样本检测分支和SPP层及其相邻卷积层进行训练。然而,当仅采用新类对象时,由于两种类别的物体存在相似性,因此生成会许多假阳性边界框从而导致检测精度较低。因此,我们为每个大样本类别从相应数据中随机抽取K个实例,使得少样本检测分支预测所有类别物体。此外,考虑到大样本类别检测分支具较强的泛化能力,为获得更好的泛化能力,少样本检测分支应该学习该分支以获得更好的泛化能力。因此,我们在两个分支之间建立了基类蒸馏损失Lb,计算公式如下:
其中,N表示批量大小。l是绝对误差函数之和。和分别表示第i张图像在大样本检测分支和少样本检测分支的输出。因此,在少样本上微调的损失函数可以总结为:
Lfew-shot tuning=Lbox+2Lcls+Lobj+λ·Lb (3)
Lfew-shot tuning=Lbox+2Lcls+Lobj+λ·Lb (3)
其中,λ用于控制基类蒸馏损失对模型梯度更新的影响程度。
在推理阶段,两并联分支用于联合检测对象。然而,同时分析两个分支的输出结果将严重延长推断过程。因此,我们在这两个分支后面加入了一个判别器,以选择两者输出中最可能的结果。具体而言,判别器将选择大样本类别检测分支结果以及少样本类别检测分支的结果之间的最大值作为最终输出。其度量准则如下所示:
其中Od(i,j)表示某一具体空间网格的判别器输出。
可选的,所述S4中,为了进一步提高模型对于少样本类别的泛化能力,本发明提出了一种Attentive DropBlock算法,该算法不仅受参数keep_prob和block_size的影响,而且还受到模型对于语义特征响应的影响。具体而言,DropBlock算法为特征图内的所有位置设置了恒定系数,如下所示:
其中,参数keep_prob和block_size影响特征置零的频率和范围。与原始DropBlock不同,γ是一个动态系数,它依赖于Attentive DropBlock算法中提取的特征图响应。具体而言,考虑一个特征图F∈RB×C×H×W,对每个通道特征采用全局最大池化函数得到响应fC∈RB×C×1×1,对每个空间坐标采用全局平均池化函数得到响应fS∈RB×1×H×W。因此,Attentive DropBlock算法中γ的计算公式如下:
其中,σ表示sigmoid函数用于控制响应范围,α表示响应放大因子。
Attentive DropBlock算法将首先判断当前是否处于微调阶段,如果模型正在微调,则获取少样本类别检测分支的通道响应fC和空间响应fS。之后,根据两种响应、,keep_prob、block_size和α计算参数γ后,每个不同通道特征的空间位置按照服从参数为γ的伯努利分布概率对该位置特征置零。最后,以置零位置为中心,构建一个长宽数值为block_size的掩膜块,从而对模型实现正则化处理。
图2显示了DropBlock和Attentive DropBlock之间的差异。从中可以观察到,Attentive
DropBlock中的γ值与目标响应相关。包含更多目标响应的特征图具有更高的γ值,这意味着检测模型可以更好地避免受局部明显特征的支配,从而在训练过程中更加关注不明显的特征,从而获得更好的少样本目标检测精度。
可选的,所述S5中,对于PASCAL VOC数据集,按照其中15类为大样本类别,其余5类为少样本类别的方式得到了三种不同的数据组合结构(第一种少样本类别包含鸟、公共汽车、奶牛、摩托车和沙发;第二种少样本类别包含飞机、瓶子、奶牛、马和沙发;第三种少样本类别包含船、猫、摩托车、羊和沙发);对于MS COCO数据集,令其与PASCAL VOC数据集中类别相同的20类为少样本类别,其余60类为大样本类别。在训练过程中,本发明采用随机梯度下降作为网络模型的优化方法,初始学习率为1×10-3,并且设定的最小批量在不同数据集下都为16。对于这两个数据集,模型从头训练及微调的次数皆为300,并且在训练过程中采用CosineLR学习率变化策略(从0.001到0.00001)。在预测过程中,输入图像的长宽固定为448×448。
实验结果
在本实例中,本发明在PASCAL VOC 2007和MS COCO 2014数据集上比较了近年来所提出的多种少样本目标检测模型的检测精度及检测速度。具体而言,按照PASCAL VOC及MS COCO数据中规定的评估标准,在具有挑战性的PASCAL VOC 2007和MS COCO 2014数据集上评估本发明的检测模型。这两个基准数据含有训练集、验证集和测试集,PASCAL VOC 2007数据集包含20个目标类别,MS COCO 2014数据集含有80个类别。对于前者,本发明先将PASCAL VOC 2007和PASCAL VOC 2012训练集和验证集合并为一个集合,用于训练该检测模型,并选择PASCAL VOC 2007测试集进行测试,测试评估标准采用交并比(Intersection over Union,IoU)阈值为0.5的平均精度均值(mean Average Precision,mAP)(即mAP@50)和多个不同少样本集合的平均每秒处理帧数(mean Frames Per Second,mFPS)表示检测模型的检测精度及速度。对于后者,本发明只用MS COCO 2014训练集进行训练,测试阶段利用其验证集进行验证,使用IoU从0.5至0.95(间隔为0.05)的mAP(即AP)和每秒传输帧数(Frames Per Second,FPS)表示检测模型的检测精度及速度。
表1
最后说明的是,以上实施例仅用以说明本发明的技术方案而非限制,尽管参照较佳实施例对本发明进行了详细说明,本领域的普通技术人员应当理解,可以对本发明的技术方案进行修改或者等同替换,而不脱离本技术方案的宗旨和范围,其均应涵盖在本发明的权利要求范围当中。
Claims (10)
- 一种基于迁移学习策略的少样本目标实时检测方法,其特征在于:包括以下步骤:S1:构建检测网络模型;S2:对输入数据进行预处理;S3:在大样本类别数据上对目标检测模型进行从头训练;S4:在少样本类别数据上对少样本类别检测分支进行微调;在微调时通过一种新的正则化方法以引导模型关注物体的整体特征;S5:通过训练集训练检测模型,再测试集进行测试。
- 根据权利要求1所述的基于迁移学习策略的少样本目标实时检测方法,其特征在于:所述检测网络模型包括:主干网络为Darknet-53结合空间金字塔池化层,用于对图像特征进行提取;检测颈部网络由特征金字塔网络构成,用于给检测头部网络提供不同尺度的语义特征;检测头部网络为带判别器的双路检测分支网络结构,其中,大样本类别检测分支仅用于检测大样本对应的类别目标,少样本类别检测分支用于检测所有类别目标,判别器用于依次扫描两个分支的结果,并根据一种度量准则获取最终输出结果。
- 根据权利要求1所述的基于迁移学习策略的少样本目标实时检测方法,其特征在于:步骤S2中所述的预处理具体为:通过使用具有随机仿射变换、多尺度图像训练策略、MixUp数据融合策略及Label Smoothing标签处理策略来对有限数据进行处理。
- 根据权利要求2所述的基于迁移学习策略的少样本目标实时检测方法,其特征在于:步骤S3中,主干网络初始化为ImageNet数据集训练下的权重,对除少样本检测分支外的网络模型利用大样本类别数据进行从头训练,本阶段损失函数涉及预测框坐标,目标置信度及分类结果,损失函数为:
Lbase training=Lbox+Lcls+Lobj (1)其中,Lbox是坐标回归的GIoU损失函数和smooth L1损失的相加组合;Lcls和Lobj分别是Focal Loss函数和二元交叉熵损失函数。 - 根据权利要求2所述的基于迁移学习策略的少样本目标实时检测方法,其特征在于:步骤S4中,对检测模型的主干部分、检测颈部部分及大样本类别检测分支部分的模型参数进行冻结,只对少样本类别检测分支进行微调,本阶段的损失函数涉及预测框的坐标,目标置信度、分类结果和大样本类别检测分支的差异度。
- 根据权利要求5所述的基于迁移学习策略的少样本目标实时检测方法,其特征在于:步骤S4中,具体包括以下步骤:S41:在大样本类别检测分支与少样本检测分支之间建立基类蒸馏损失Lb,计算公式如下:
其中,N表示批量大小,l表示绝对误差函数,和分别表示第i张图像在大样本检测分支和少样本类别检测分支的输出;S42:在少样本上微调的损失函数为:
Lfew-shot tuning=Lbox+2Lcls+Lobj+λ·Lb (3)其中,λ用于控制基类蒸馏损失对模型梯度更新的影响程度;S43:在大样本类别检测分支与少样本检测分支后加入判别器,判别器选择大样本类别检测分支结果以及少样本类别检测分支结果之间的最大值作为最终输出,其度量准则如下所示:
其中Od(i,j)表示某一具体空间网格的判别器输出。 - 根据权利要求1所述的基于迁移学习策略的少样本目标实时检测方法,其特征在于:所述新的正则化方法为Attentive DropBlock算法,其具有动态系数γ,如下所示:
其中,参数keep_prob和block_size影响特征图置零的频率及范围,σ表示sigmoid函数,用于控制响应范围,α表示响应放大因子。 - 根据权利要求7所述的基于迁移学习策略的少样本目标实时检测方法,其特征在于:所述Attentive DropBlock算法首先判断当前是否处于微调阶段,如果模型正在微调,则获取少样本类别检测分支的通道响应fC和空间响应fS;之后,根据参数keep_prob、block_size和α计算参数γ后,每个不同通道特征的空间位置按照服从参数为γ的伯努利分布概率对该位置特征置零;最后,以置零位置为中心,构建一个长宽数值为block_size的掩膜块,从而对模型实现正则化处理。
- 根据权利要求1所述的基于迁移学习策略的少样本目标实时检测方法,其特征在于:步骤S5中,在PASCAL VOC及MS COCO数据集上进行训练和测试;对于PASCAL VOC数据集,首先将训练集和验证集合并为一个集合,用于训练检测魔心,再选择其测试集进行测试,测试评估标准采用交并比阈值为0.5的平均精度均值和多个不同少样本集合的平均每秒处理帧数表示检测模型的检测精度及速度;对于MS COCO数据集,只采用其训练集进行训练,利用其验证集进行验证,使用IoU从0.5至0.95,间隔为0.05的mAP和每秒传输帧数表示检测模型的检测精度及速度。
- 根据权利要求9所述的基于迁移学习策略的少样本目标实时检测方法,其特征在于:步骤S5的训练过程中,采用随机梯度下降作为网络模型的优化方法,初始学习率为1×10-3,并且设定的最小批量在不同数据集下都为16;对于PASCAL VOC及MS COCO数据集,检测模型从头训练及微调的次数皆为300,并且在训练过程中采用CosineLR学习率变化策略,即学习率从0.001到0.00001;在预测过程中,输入图像的长宽固定为448×448;FPS为获取每个结果的等待时间及对结果进行后处理的时间之和,mFPS为不同少样本集合下的FPS均值。
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202210962295.5A CN115393634B (zh) | 2022-08-11 | 2022-08-11 | 一种基于迁移学习策略的少样本目标实时检测方法 |
| CN202210962295.5 | 2022-08-11 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2024032010A1 true WO2024032010A1 (zh) | 2024-02-15 |
Family
ID=84118843
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2023/086781 Ceased WO2024032010A1 (zh) | 2022-08-11 | 2023-04-07 | 一种基于迁移学习策略的少样本目标实时检测方法 |
Country Status (2)
| Country | Link |
|---|---|
| CN (1) | CN115393634B (zh) |
| WO (1) | WO2024032010A1 (zh) |
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN117876823A (zh) * | 2024-03-11 | 2024-04-12 | 浙江甲骨文超级码科技股份有限公司 | 一种茶园图像检测方法及其模型训练方法和系统 |
| CN118097373A (zh) * | 2024-04-17 | 2024-05-28 | 智洋创新科技股份有限公司 | 一种无监督的输电通道隐患检测方法、系统及存储介质 |
| CN118965232A (zh) * | 2024-08-20 | 2024-11-15 | 山东圣喆环境科技有限公司 | 一种基于神经网络的ldar数据管理方法、系统及设备 |
| CN120014323A (zh) * | 2025-01-06 | 2025-05-16 | 中煤科工集团沈阳研究院有限公司 | 基于岩体裂隙识别的矿山井下精准防灭火协同系统及方法 |
Families Citing this family (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN115393634B (zh) * | 2022-08-11 | 2023-12-26 | 重庆邮电大学 | 一种基于迁移学习策略的少样本目标实时检测方法 |
Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN109615016A (zh) * | 2018-12-20 | 2019-04-12 | 北京理工大学 | 一种基于金字塔输入增益的卷积神经网络的目标检测方法 |
| CN110674866A (zh) * | 2019-09-23 | 2020-01-10 | 兰州理工大学 | 迁移学习特征金字塔网络对X-ray乳腺病灶图像检测方法 |
| CN111223553A (zh) * | 2020-01-03 | 2020-06-02 | 大连理工大学 | 一种两阶段深度迁移学习中医舌诊模型 |
| AU2020100705A4 (en) * | 2020-05-05 | 2020-06-18 | Chang, Jiaying Miss | A helmet detection method with lightweight backbone based on yolov3 network |
| US20220067335A1 (en) * | 2020-08-26 | 2022-03-03 | Beijing University Of Civil Engineering And Architecture | Method for dim and small object detection based on discriminant feature of video satellite data |
| CN114663729A (zh) * | 2022-03-29 | 2022-06-24 | 南京工程学院 | 一种基于元学习的气缸套小样本缺陷检测方法 |
| CN115393634A (zh) * | 2022-08-11 | 2022-11-25 | 重庆邮电大学 | 一种基于迁移学习策略的少样本目标实时检测方法 |
Family Cites Families (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110008842A (zh) * | 2019-03-09 | 2019-07-12 | 同济大学 | 一种基于深度多损失融合模型的行人重识别方法 |
| CN109977812B (zh) * | 2019-03-12 | 2023-02-24 | 南京邮电大学 | 一种基于深度学习的车载视频目标检测方法 |
| CN113971815B (zh) * | 2021-10-28 | 2024-07-02 | 西安电子科技大学 | 基于奇异值分解特征增强的少样本目标检测方法 |
| CN114841257B (zh) * | 2022-04-21 | 2023-09-22 | 北京交通大学 | 一种基于自监督对比约束下的小样本目标检测方法 |
-
2022
- 2022-08-11 CN CN202210962295.5A patent/CN115393634B/zh active Active
-
2023
- 2023-04-07 WO PCT/CN2023/086781 patent/WO2024032010A1/zh not_active Ceased
Patent Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN109615016A (zh) * | 2018-12-20 | 2019-04-12 | 北京理工大学 | 一种基于金字塔输入增益的卷积神经网络的目标检测方法 |
| CN110674866A (zh) * | 2019-09-23 | 2020-01-10 | 兰州理工大学 | 迁移学习特征金字塔网络对X-ray乳腺病灶图像检测方法 |
| CN111223553A (zh) * | 2020-01-03 | 2020-06-02 | 大连理工大学 | 一种两阶段深度迁移学习中医舌诊模型 |
| AU2020100705A4 (en) * | 2020-05-05 | 2020-06-18 | Chang, Jiaying Miss | A helmet detection method with lightweight backbone based on yolov3 network |
| US20220067335A1 (en) * | 2020-08-26 | 2022-03-03 | Beijing University Of Civil Engineering And Architecture | Method for dim and small object detection based on discriminant feature of video satellite data |
| CN114663729A (zh) * | 2022-03-29 | 2022-06-24 | 南京工程学院 | 一种基于元学习的气缸套小样本缺陷检测方法 |
| CN115393634A (zh) * | 2022-08-11 | 2022-11-25 | 重庆邮电大学 | 一种基于迁移学习策略的少样本目标实时检测方法 |
Non-Patent Citations (2)
| Title |
|---|
| GHIASI GOLNAZ, TSUNG-YI LIN, LE QUOC V: "Dropblock: A regularization method for convolutional networks", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, ARXIV.ORG, ITHACA, 30 October 2018 (2018-10-30), Ithaca, XP093137589, [retrieved on 20240305], DOI: 10.48550/arXiv.1810.12890 * |
| XIA RUIYANG; LI GUOQUAN; HUANG ZHENGWEN; MENG HONGYING; PANG YU: "Bi-path Combination YOLO for Real-time Few-shot Object Detection", PATTERN RECOGNITION LETTERS., ELSEVIER, AMSTERDAM., NL, vol. 165, 1 December 2022 (2022-12-01), NL , pages 91 - 97, XP087247996, ISSN: 0167-8655, DOI: 10.1016/j.patrec.2022.11.025 * |
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN117876823A (zh) * | 2024-03-11 | 2024-04-12 | 浙江甲骨文超级码科技股份有限公司 | 一种茶园图像检测方法及其模型训练方法和系统 |
| CN118097373A (zh) * | 2024-04-17 | 2024-05-28 | 智洋创新科技股份有限公司 | 一种无监督的输电通道隐患检测方法、系统及存储介质 |
| CN118965232A (zh) * | 2024-08-20 | 2024-11-15 | 山东圣喆环境科技有限公司 | 一种基于神经网络的ldar数据管理方法、系统及设备 |
| CN120014323A (zh) * | 2025-01-06 | 2025-05-16 | 中煤科工集团沈阳研究院有限公司 | 基于岩体裂隙识别的矿山井下精准防灭火协同系统及方法 |
Also Published As
| Publication number | Publication date |
|---|---|
| CN115393634A (zh) | 2022-11-25 |
| CN115393634B (zh) | 2023-12-26 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN109584248B (zh) | 基于特征融合和稠密连接网络的红外面目标实例分割方法 | |
| WO2024032010A1 (zh) | 一种基于迁移学习策略的少样本目标实时检测方法 | |
| Gao et al. | YOLOv4 object detection algorithm with efficient channel attention mechanism | |
| CN111460980B (zh) | 基于多语义特征融合的小目标行人的多尺度检测方法 | |
| CN114758288A (zh) | 一种配电网工程安全管控检测方法及装置 | |
| CN114842343B (zh) | 一种基于ViT的航空图像识别方法 | |
| CN115187786A (zh) | 一种基于旋转的CenterNet2目标检测方法 | |
| CN117456167A (zh) | 一种基于改进YOLOv8s的目标检测算法 | |
| CN114764886B (zh) | 基于cfar指导的双流ssd sar图像目标检测方法 | |
| CN112862860B (zh) | 一种用于多模态目标跟踪的对象感知图像融合方法 | |
| CN114565048A (zh) | 基于自适应特征融合金字塔网络的三阶段害虫图像识别方法 | |
| CN114332921A (zh) | 基于改进聚类算法的Faster R-CNN网络的行人检测方法 | |
| CN110781962B (zh) | 基于轻量级卷积神经网络的目标检测方法 | |
| CN118279320A (zh) | 基于自动提示学习的目标实例分割模型建立方法及其应用 | |
| CN113205103A (zh) | 一种轻量级的文身检测方法 | |
| CN116486296A (zh) | 目标检测方法、装置及计算机可读存储介质 | |
| CN113887455A (zh) | 一种基于改进fcos的人脸口罩检测系统及方法 | |
| CN117576381B (zh) | 目标检测训练方法及电子设备、计算机可读存储介质 | |
| CN111950451A (zh) | 基于多尺度预测cnn及龙芯芯片的多类别目标识别方法 | |
| CN116994287B (zh) | 动物盘点方法、装置及动物盘点设备 | |
| CN118135353A (zh) | 一种基于迁移学习微调的ddetr小样本目标检测方法 | |
| CN110163130B (zh) | 一种用于手势识别的特征预对齐的随机森林分类系统及方法 | |
| Chen et al. | Ship detection with optical image based on attention and loss improved YOLO | |
| CN120014244B (zh) | 一种基于改进YOLOv11s算法的无人机航拍小目标检测方法 | |
| CN114648753A (zh) | 基于Faster R-CNN的自然场景文本检测算法 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23851240 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 23851240 Country of ref document: EP Kind code of ref document: A1 |