[go: up one dir, main page]

CN115100459A - Image classification algorithm based on full attention network structure search - Google Patents

Image classification algorithm based on full attention network structure search Download PDF

Info

Publication number
CN115100459A
CN115100459A CN202210660061.5A CN202210660061A CN115100459A CN 115100459 A CN115100459 A CN 115100459A CN 202210660061 A CN202210660061 A CN 202210660061A CN 115100459 A CN115100459 A CN 115100459A
Authority
CN
China
Prior art keywords
search
network
self
parameters
attention
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210660061.5A
Other languages
Chinese (zh)
Other versions
CN115100459B (en
Inventor
周圆
王海洋
霍树伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN202210660061.5A priority Critical patent/CN115100459B/en
Publication of CN115100459A publication Critical patent/CN115100459A/en
Application granted granted Critical
Publication of CN115100459B publication Critical patent/CN115100459B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses an image classification algorithm based on full attention network structure search, which comprises the steps of firstly designing a staged search space, and selecting different self-attention operations at each stage of a network; then, searching by using an automatic supervision searching method, updating the weight parameters and the structural parameters inside the network, and when the automatic supervision searching stage is completed, keeping the structural parameters and using the structural parameters as initial values of the supervision searching stage; and finally, searching by using a supervision searching method, updating the internal weight parameters and the structural parameters of the network, and obtaining the optimal full attention network according to the structural parameters. The searching method provided by the invention can search out a high-performance full attention network structure and simultaneously ensure the required searching efficiency. Experimental results on an image classification task show that the searched full attention network model is superior to the most advanced architecture, and meanwhile, the parameter quantity is greatly reduced.

Description

基于全注意力网络结构搜索的图像分类算法Image classification algorithm based on full attention network structure search

技术领域technical field

本发明涉及计算机视觉中的图像分类领域,更具体地,涉及到一种基于全注 意力网络结构搜索算法来高效设计针对图像分类任务的全注意力网络结构。The invention relates to the field of image classification in computer vision, and more particularly, to an efficient design of a full attention network structure for image classification tasks based on a full attention network structure search algorithm.

背景技术Background technique

近年来,注意力网络结构设计取得了重大进展。自注意力(self-attention)操作由于其捕获远距离特征依赖关系的能力和基于内容的参数学习机制,成为神经网 络的重要组成部分。完全由自注意力操作构成的神经网络在各类计算机视觉任务 中的应用取得了良好的效果。手工设计一个优秀的全注意力神经网络结构是一个 具有挑战性且复杂的工作,需要大量的先验知识与经验,同时需要消耗大量的时 间与资源来进行实验验证,这大大减缓了注意力网络的发展速度。In recent years, significant progress has been made in the structural design of attention networks. Self-attention operations have become an important part of neural networks due to their ability to capture long-range feature dependencies and content-based parameter learning mechanisms. Neural networks composed entirely of self-attention operations have achieved good results in various computer vision tasks. Manually designing an excellent full-attention neural network structure is a challenging and complex task that requires a lot of prior knowledge and experience, and consumes a lot of time and resources for experimental verification, which greatly slows down the attention network. development speed.

网络结构搜索(NAS)为解决上述问题提供了一个途径。NAS是神经网络结构 设计自动化的过程,其目标是在一定的先验知识下,从给定的搜索空间中自动搜 索得到比人工设计的模型性能更优的网络结构。NAS方法不仅能够提高模型的 性能,并且将人类专家从设计网络结构的繁琐任务中解放出来。近年来NAS的 相关研究取得了诸多重要进展。目前,网络结构的搜索算法主要分为三种:基于 强化学习的方法,基于进化算法的方法以及基于梯度的可微分网络结构搜索算法, 他们主要研究如何使用不同的搜索策略来提高搜索的效率与最终得到的网络的 性能。相比于基于强化学习与进化算法的NAS方法,基于梯度的可微分网络结 构搜索算法收敛更快,其优化目标也更为灵活。Network Structure Search (NAS) provides a way to solve the above problems. NAS is an automatic process of neural network structure design. Its goal is to automatically search from a given search space to obtain a network structure with better performance than the manually designed model under certain prior knowledge. The NAS method can not only improve the performance of the model, but also liberate human experts from the tedious task of designing the network structure. In recent years, many important progresses have been made in the related research of NAS. At present, the search algorithms for network structures are mainly divided into three types: methods based on reinforcement learning, methods based on evolutionary algorithms, and differentiable network structure search algorithms based on gradients. They mainly study how to use different search strategies to improve search efficiency and The resulting network performance. Compared with the NAS method based on reinforcement learning and evolutionary algorithm, the gradient-based differentiable network structure search algorithm converges faster, and its optimization objective is more flexible.

现有的NAS方法不适合直接用来搜索全注意力网络。首先,现有NAS方法 通常采用基于细胞的搜索空间,在搜索得到的网络结构中,网络浅层和网络深层 的细胞结构是相同的。这并不适用于自注意力网络搜索,因为自注意力操作在网 络的不同阶段的效果是不同的。并且,现有NAS方法通常使用分类任务作为结 构搜索的监督,分类任务要求模型将更多的注意力放在从与标签信息相关的局部 区域,不需要考虑远距离像素间的内容关联性。然而,自注意力模型专注于捕捉 像素之间的远距离内容依赖,以学习丰富的图像表示。因此,直接使用现有的 NAS方法搜索全注意力网络结构是不合适的。Existing NAS methods are not suitable for directly searching full attention networks. First, existing NAS methods usually use a cell-based search space. In the network structure obtained by the search, the cell structures of the shallow and deep layers of the network are the same. This does not apply to self-attention network search, because the effects of self-attention operations at different stages of the network are different. Moreover, existing NAS methods usually use classification tasks as supervision for structure search. Classification tasks require the model to pay more attention to local regions related to label information and do not need to consider content correlations between distant pixels. However, self-attention models focus on capturing long-range content dependencies between pixels to learn rich image representations. Therefore, it is inappropriate to directly use existing NAS methods to search for full-attention network structures.

发明内容SUMMARY OF THE INVENTION

为了解决现有技术中的问题,本发明提供一种基于全注意力网络结构搜索的 图像分类算法,解决现有技术中直接使用现有的NAS方法搜索全注意力网络不准 确的问题。In order to solve the problems in the prior art, the present invention provides an image classification algorithm based on full attention network structure search, which solves the problem of inaccuracy of directly using the existing NAS method to search the full attention network in the prior art.

本发明是通过以下技术方案实现:The present invention is achieved through the following technical solutions:

一种基于全注意力网络结构搜索的图像分类算法,包括以下步骤:An image classification algorithm based on full attention network structure search, including the following steps:

设计一个阶段性搜索空间,在该空间中,网络的每个阶段选择不同的自注意 力操作;Design a staged search space in which each stage of the network selects a different self-attention operation;

使用自监督搜索方法进行搜索,将图像输入到阶段性搜索空间的网络模型中, 更新网络内部权重参数和结构参数,当自监督搜索阶段完成时,保留结构参数, 并使用它们作为监督搜索阶段的初始值;Search using a self-supervised search method, input images into the network model of the staged search space, update the network internal weight parameters and structural parameters, when the self-supervised search phase is completed, retain the structural parameters and use them as the supervised search phase. initial value;

使用监督搜索方法进行搜索,将图像输入到阶段性搜索空间的网络模型中, 更新网络内部权重参数和结构参数,根据结构参数得到最优的全注意力网络。Use the supervised search method to search, input the image into the network model of the staged search space, update the internal weight parameters and structural parameters of the network, and obtain the optimal full attention network according to the structural parameters.

所述阶段性搜索空间的网络结构中,第一层是一个固定的局部自注意力操作, 最后两层是平均池化层和分类层,其余中间部分由五个阶段构成,在第二个阶段 和第四个阶段中存在一个固定的池化操作,将特征图的空间尺寸减半,通道数加 倍;每个阶段有三个可搜索层,对于每个可搜索层,需要从很多个候选操作中选 择出性能最优的操作;候选操作由7个自注意力操作组成,包括一个非局部自注 意力操作和6个具有不同超参数的局部自注意力操作,局部自注意力操作的超参 数包括空间范围和头部数量,其中空间范围选择3、5和7,头部数量选择4和8; 综上所述,阶段性搜索空间包含15个可搜索层,每层从7个候选操作中进行选 择,搜索空间包含了715个可能的结构。In the network structure of the staged search space, the first layer is a fixed local self-attention operation, the last two layers are the average pooling layer and the classification layer, and the rest of the middle part is composed of five stages. And there is a fixed pooling operation in the fourth stage, which halves the spatial size of the feature map and doubles the number of channels; each stage has three searchable layers, and for each searchable layer, it needs to be selected from many candidate operations. The operation with the best performance is selected; the candidate operation consists of 7 self-attention operations, including a non-local self-attention operation and 6 local self-attention operations with different hyperparameters. The hyperparameters of the local self-attention operation include Spatial range and number of heads, where 3, 5, and 7 are selected for the spatial range, and 4 and 8 are selected for the number of heads; To sum up, the staged search space contains 15 searchable layers, each of which is performed from 7 candidate operations Selected, the search space contains 715 possible structures.

在自监督搜索阶段,设计一种基于上下文自回归任务的自监督搜索算法;上 下文自回归任务指的是,将输入图像的多个区域随机掩盖,训练网络预测缺失部 分的内容信息。采用一种编码器-解码器结构去提取输入图像的特征,并对缺失 的图像内容进行重建。然后,利用这个任务来搜索全注意力网络。使用全注意力 网络作为特征编码器对输入图像进行特征提取;该网络包含两类可学习的参数: 自注意力操作的权重参数w和每个候选操作对应的结构参数a;将图像数据集划 分为两个独立的集合,分别用DatasetA和DatasetB表示;使用DatasetA数据集 优化权重参数,使用DatasetB优化结构参数;使用L1损失作为损失函数,其定 义如下:In the self-supervised search stage, a self-supervised search algorithm based on contextual autoregressive task is designed; the contextual autoregressive task refers to randomly masking multiple regions of the input image and training the network to predict the content information of the missing part. An encoder-decoder structure is used to extract the features of the input image and reconstruct the missing image content. Then, use this task to search the full attention network. Use the full attention network as the feature encoder to extract features from the input image; the network contains two types of learnable parameters: the weight parameter w of the self-attention operation and the structural parameter a corresponding to each candidate operation; the image dataset is divided into are two independent sets, represented by DatasetA and DatasetB respectively; use DatasetA to optimize weight parameters, and use DatasetB to optimize structure parameters; use L1 loss as the loss function, which is defined as follows:

Figure BDA0003690338310000031
Figure BDA0003690338310000031

其中M为像素个数,pi为输入像素,yi为真实值;然后,采用可微分结构搜 索方法交替优化权重参数w和结构参数a;以迭代的方式,在DatasetA数据集 上通过梯度下降

Figure BDA0003690338310000032
来优化权重参数,然后在DatasetB数据集上通 过梯度下降
Figure BDA0003690338310000033
来优化结构参数;当自监督搜索阶段完成时,存储 结构参数,并使用它们作为监督搜索阶段的初始值。where M is the number of pixels, p i is the input pixel, and y i is the real value; then, the differentiable structure search method is used to alternately optimize the weight parameter w and the structure parameter a; in an iterative way, on the DatasetA data set through gradient descent
Figure BDA0003690338310000032
to optimize the weight parameters, and then pass gradient descent on the DatasetB dataset
Figure BDA0003690338310000033
to optimize the structural parameters; when the self-supervised search phase is complete, store the structural parameters and use them as initial values for the supervised search phase.

在监督搜索阶段,使用可微分结构搜索方法在图像分类数据集上进行搜索; 使用自监督搜索阶段获得的结构参数作为初始值,采用梯度下降法交替优化结构 参数α和权重参数w;使用交叉熵损失作为损失函数,其定义为:In the supervised search stage, the differentiable structure search method is used to search on the image classification dataset; the structure parameters obtained in the self-supervised search stage are used as initial values, and the gradient descent method is used to alternately optimize the structure parameter α and the weight parameter w; use cross-entropy Loss as a loss function, which is defined as:

Figure RE-GDA0003808320290000034
Figure RE-GDA0003808320290000034

其中K为类别数,pk为属于第k类的预测概率,yk为类别标签;当监督搜 索过程结束后,将结构参数排序,结构参数最大值对应的操作被选择出来,得到 最终的体系结构。where K is the number of categories, p k is the predicted probability of belonging to the kth category, and y k is the category label; when the supervised search process is over, the structure parameters are sorted, and the operation corresponding to the maximum value of the structure parameters is selected to obtain the final system. structure.

本发明的有益效果是:本发明提出的搜索方法能够搜索出高性能的全注意力 网络结构,同时保证所需的搜索效率。在图像分类任务上的实验结果表明,搜索 出来的全注意力网络模型优于最先进的架构,同时大大减少了参数量。The beneficial effects of the present invention are: the search method proposed by the present invention can search for a high-performance full attention network structure, while ensuring the required search efficiency. Experimental results on image classification tasks show that the searched full attention network model outperforms state-of-the-art architectures while greatly reducing the amount of parameters.

附图说明Description of drawings

图1网络结构搜索流程图;Fig. 1 network structure search flow chart;

图2搜索空间的宏观网络结构图;Figure 2 is a macroscopic network structure diagram of the search space;

图3候选操作的具体配置;其中k表示空间范围,h表示局部自注意力操作 中头部的数量;The specific configuration of the candidate operation in Fig. 3; wherein k represents the spatial extent, and h represents the number of heads in the local self-attention operation;

图4搜索得到的最佳全注意力网络结构图;Figure 4. The best full attention network structure obtained by the search;

图5本本发明算法与现有先进网络在CIFAR-10和ImageNet上的性能比较; (a)CIFAR数据集上的实验结果;(b)ImageNet数据集上的实验结果。Figure 5. The performance comparison between the algorithm of the present invention and the existing advanced network on CIFAR-10 and ImageNet; (a) experimental results on the CIFAR data set; (b) experimental results on the ImageNet data set.

具体实施方式Detailed ways

为使本发明的技术方案更加清楚,下面结合附图对本发明做进一步阐述。In order to make the technical solutions of the present invention clearer, the present invention is further described below with reference to the accompanying drawings.

本发明提出了一种全注意力网络结构搜索算法,来高效设计针对图像分类任 务的全注意力网络结构。首先设计了一个阶段性搜索空间,在该空间中,网络的 每个阶段可以选择不同的自注意力操作。之后,为了从搜索空间中高效的找到最 优的网络结构,提出了一种新的搜索策略,联合使用自监督搜索和监督搜索,为 图像分类任务搜索高效的全注意力网络结构。算法流程图可参见图1,下面对该 搜索算法的具体细节进行介绍:The present invention proposes a full-attention network structure search algorithm to efficiently design a full-attention network structure for image classification tasks. We first design a staged search space in which each stage of the network can choose a different self-attention operation. Afterwards, in order to efficiently find the optimal network structure from the search space, a new search strategy is proposed to jointly use self-supervised search and supervised search to search for an efficient full-attention network structure for image classification tasks. The algorithm flow chart can be seen in Figure 1. The specific details of the search algorithm are introduced below:

1.阶段性搜索空间1. Staged search space

本发明提出了一个阶段性搜索空间用于搜索全注意力网络。图2显示了搜索 空间的宏观网络结构,图中显示了网络每一阶段的层数和输入的维度。在搜索空 间的网络结构中,第一层是一个固定的局部自注意力操作,最后两层是平均池化 层和分类层。其余中间部分由五个阶段构成,在第二个阶段和第四个阶段中存在 一个固定的池化操作,将特征图的空间尺寸减半,通道数加倍。每个阶段有三个 可搜索层,对于每个可搜索层,需要从很多个候选操作中选择出性能最优的操作。 候选操作由7个自注意力操作组成,包括一个非局部自注意力操作和6个具有不 同超参数的局部自注意力操作,它们的配置列在图3中。局部自注意力操作的超 参数包括空间范围和头部数量,其中空间范围可选择3、5和7,头部数量可选 择4和8。综上所述,我们的搜索空间包含15个可搜索层,每层可以从7个候 选操作中进行选择,搜索空间包含了715个可能的结构。The present invention proposes a staged search space for searching full attention networks. Figure 2 shows the macroscopic network structure of the search space, and the figure shows the number of layers and the dimension of the input at each stage of the network. In the network structure of the search space, the first layer is a fixed local self-attention operation, and the last two layers are the average pooling layer and the classification layer. The rest of the middle part consists of five stages, in the second and fourth stage there is a fixed pooling operation that halves the spatial size of the feature map and doubles the number of channels. Each stage has three searchable layers, and for each searchable layer, the operation with the best performance needs to be selected from many candidate operations. The candidate operation consists of 7 self-attention operations, including one non-local self-attention operation and 6 local self-attention operations with different hyperparameters, and their configurations are listed in Fig. 3. The hyperparameters of the local self-attention operation include the spatial extent and the number of heads, where the spatial extent can be selected from 3, 5 and 7, and the number of heads can be selected from 4 and 8. To sum up, our search space contains 15 searchable layers, each of which can be selected from 7 candidate operations, and the search space contains 7 15 possible structures.

2.联合自监督搜索和监督搜索的搜索策略2. A search strategy for joint self-supervised search and supervised search

第一步,准备数据集。The first step is to prepare the dataset.

在图像分类任务中,本发明选取CIFAR-10图像分类数据集和ImageNet图 像分类数据集进行算法性能的测试和对比。CIFAR-10数据集包含10个类别的图 像,每个类别有6,000张图片,总共有60,000张图片,其中分成50,000张训练 图片和10,000张测试图片,在CIFAR数据集中,每张图片的空间分辨率为32×32 大小。ImageNet 2012数据集包含1000个类别的图像,其中有128万张训练图像 和50000张验证图像。对于ImageNet的图像,我们将其剪裁为224×224大小。In the image classification task, the present invention selects the CIFAR-10 image classification data set and the ImageNet image classification data set to test and compare the algorithm performance. The CIFAR-10 dataset contains images of 10 categories, each with 6,000 images, for a total of 60,000 images, which are divided into 50,000 training images and 10,000 testing images. In the CIFAR dataset, the spatial resolution of each image is 32×32 in size. The ImageNet 2012 dataset contains images of 1000 categories, with 1.28 million training images and 50,000 validation images. For ImageNet images, we crop them to 224×224 size.

第二步,使用自监督搜索方法进行搜索。The second step is to search using a self-supervised search method.

从ImageNet的1000个原始类中随机选择100个类来构建一个训练集,并将 图像大小调整到32×32的分辨率。训练集被分成两个相等的子集,一部分用于优 化网络权重参数,另一个部分用于优化架构参数。在自监督搜索过程中,将图像 输入到阶段性搜索空间的网络模型中,使用随机梯度下降(SGD)优化器优化网 络参数,其中优化器的权重参数设置为0.0003,动量衰减率设置为0.9,网络初 始学习率为0.025。使用Adam优化器优化结构参数,学习率为3×10-5,权重衰 减为0.001。经过20轮迭代之后,搜索结束,保存结构参数。100 classes were randomly selected from ImageNet's 1000 original classes to construct a training set, and the images were resized to a resolution of 32 × 32. The training set is divided into two equal subsets, one part is used to optimize the network weight parameters and the other part is used to optimize the architecture parameters. In the self-supervised search process, the images are input into the network model of the staged search space, and the network parameters are optimized using a stochastic gradient descent (SGD) optimizer, where the weight parameter of the optimizer is set to 0.0003, and the momentum decay rate is set to 0.9, The initial learning rate of the network is 0.025. Structural parameters are optimized using the Adam optimizer with a learning rate of 3 × 10 -5 and a weight decay of 0.001. After 20 iterations, the search ends and the structure parameters are saved.

第三步,使用监督搜索方法进行搜索。The third step is to use a supervised search method to search.

以第一阶段得到的结构参数为初始化值,在CIFAR-10数据集上进行监督搜 索。CIFAR-10的训练图像被随机分成两部分,每组包含25000张图像。一组用 于优化网络权重参数,另一组用于优化结构参数。在监督搜索过程中,将分类数 据集的图像输入到阶段性搜索空间的网络模型中,使用SGD优化器优化权重参 数,初始学习率为0.025,动量为0.9,权重衰减为0.0003。使用Adam优化器优 化权重参数,学习率为1×10-4,权重衰减为0.001。经过50轮迭代之后,搜索结 束,根据结构参数得到最优的全注意力网络。搜索得到的网络如图4所示。Using the structural parameters obtained in the first stage as initialization values, supervised search is performed on the CIFAR-10 dataset. The training images for CIFAR-10 were randomly split into two parts, each containing 25,000 images. One set is used to optimize network weight parameters, and the other set is used to optimize structure parameters. In the supervised search process, the images of the classification dataset are input into the network model of the staged search space, and the weight parameters are optimized using the SGD optimizer with an initial learning rate of 0.025, momentum of 0.9, and weight decay of 0.0003. The weight parameters are optimized using the Adam optimizer with a learning rate of 1×10 -4 and a weight decay of 0.001. After 50 iterations, the search ends, and the optimal full attention network is obtained according to the structural parameters. The searched network is shown in Figure 4.

第四步,在CIFAR-10和ImageNet数据集上训练网络结构并测试。In the fourth step, the network structure is trained and tested on the CIFAR-10 and ImageNet datasets.

首先在CIFAR-10上训练网络,使用SGD优化器优化模型,其中优化器的 权重参数设置为4×10-4,动量衰减率设置为0.9,初始学习率为0.04,根据余弦 衰减规则逐渐衰减为0。经过前向传导、反向传导后即更新一次网络的参数权重, 经过500轮迭代之后,即可得到训练好的网络。测试时,将CIFAR-10的测试集 图像输入网络模型,得出测试结果,如图5(a)所示。First, train the network on CIFAR-10, and use the SGD optimizer to optimize the model. The weight parameter of the optimizer is set to 4×10 -4 , the momentum decay rate is set to 0.9, and the initial learning rate is 0.04. According to the cosine decay rule, it gradually decays to 0. After forward conduction and reverse conduction, the parameter weights of the network are updated once, and after 500 iterations, the trained network can be obtained. During the test, the test set images of CIFAR-10 are input into the network model, and the test results are obtained, as shown in Figure 5(a).

之后在ImageNet上训练网络,使用SGD优化器优化模型,其中优化器的权 重参数设置为3×10-5,动量衰减率设置为0.9,初始学习率为0.04,根据余弦衰 减规则逐渐衰减为0。经过前向传导、反向传导后即更新一次网络的参数权重, 经过300轮迭代之后,即可得到训练好的网络。测试时,将ImageNet的测试集 图像输入网络模型,得出测试结果,如图5(b)所示。Then train the network on ImageNet and use the SGD optimizer to optimize the model, where the weight parameter of the optimizer is set to 3×10 -5 , the momentum decay rate is set to 0.9, and the initial learning rate is 0.04, which gradually decays to 0 according to the cosine decay rule. After forward conduction and reverse conduction, the parameter weights of the network are updated once, and after 300 rounds of iterations, the trained network can be obtained. During the test, the test set image of ImageNet is input into the network model, and the test result is obtained, as shown in Figure 5(b).

本发明将搜索得到的模型和现有先进模型的分类结果进行了对比。从实验结 果中可以看到,本发明算法在保证搜索效率的同时,极大的提高了搜索得到的全 注意力网络结构在分类任务上的准确率。The present invention compares the searched model with the classification result of the existing advanced model. It can be seen from the experimental results that the algorithm of the present invention greatly improves the accuracy of the full attention network structure obtained by the search in the classification task while ensuring the search efficiency.

Claims (4)

1.一种基于全注意力网络结构搜索的图像分类算法,其特征在于,包括以下步骤:1. an image classification algorithm based on full attention network structure search, is characterized in that, comprises the following steps: 设计一个阶段性搜索空间,在该空间中,网络的每个阶段选择不同的自注意力操作;Design a staged search space in which each stage of the network selects a different self-attention operation; 使用自监督搜索方法进行搜索,将图像输入到阶段性搜索空间的网络模型中,更新网络内部权重参数和结构参数,当自监督搜索阶段完成时,保留结构参数,并使用它们作为监督搜索阶段的初始值;Search using a self-supervised search method, input images into the network model of the staged search space, update the network internal weight parameters and structural parameters, when the self-supervised search stage is completed, retain the structural parameters and use them as the supervised search stage. initial value; 使用监督搜索方法进行搜索,将图像输入到阶段性搜索空间的网络模型中,更新网络内部权重参数和结构参数,根据结构参数得到最优的全注意力网络。The supervised search method is used to search, the image is input into the network model of the staged search space, the internal weight parameters and structural parameters of the network are updated, and the optimal full attention network is obtained according to the structural parameters. 2.根据权利要求1所述基于全注意力网络结构搜索的图像分类算法,其特征在于,所述阶段性搜索空间的网络结构中,第一层是一个固定的局部自注意力操作,最后两层是平均池化层和分类层,其余中间部分由五个阶段构成,在第二个阶段和第四个阶段中存在一个固定的池化操作,将特征图的空间尺寸减半,通道数加倍;每个阶段有三个可搜索层,对于每个可搜索层,需要从很多个候选操作中选择出性能最优的操作;候选操作由7个自注意力操作组成,包括一个非局部自注意力操作和6个具有不同超参数的局部自注意力操作,局部自注意力操作的超参数包括空间范围和头部数量,其中空间范围选择3、5和7,头部数量选择4和8;综上所述,阶段性搜索空间包含15个可搜索层,每层从7个候选操作中进行选择,搜索空间包含了715个可能的结构。2. The image classification algorithm based on full attention network structure search according to claim 1, characterized in that, in the network structure of the phased search space, the first layer is a fixed local self-attention operation, and the last two The layers are the average pooling layer and the classification layer, and the rest of the middle part consists of five stages. There is a fixed pooling operation in the second and fourth stages, which halves the spatial size of the feature map and doubles the number of channels. ; There are three searchable layers in each stage. For each searchable layer, the operation with the best performance needs to be selected from many candidate operations; the candidate operation consists of 7 self-attention operations, including a non-local self-attention operation and 6 local self-attention operations with different hyperparameters, the hyperparameters of the local self-attention operation include the spatial range and the number of heads, where 3, 5 and 7 are selected for the spatial range, and 4 and 8 are selected for the number of heads; As mentioned above, the staged search space contains 15 searchable layers, each of which is selected from 7 candidate operations, and the search space contains 715 possible structures. 3.根据权利要求1所述基于全注意力网络结构搜索的图像分类算法,其特征在于,在自监督搜索阶段,设计一种基于上下文自回归任务的自监督搜索算法;上下文自回归任务指的是,将输入图像的多个区域随机掩盖,训练网络预测缺失部分的内容信息。采用一种编码器-解码器结构去提取输入图像的特征,并对缺失的图像内容进行重建。然后,利用这个任务来搜索全注意力网络。3. The image classification algorithm based on full attention network structure search according to claim 1, is characterized in that, in self-supervised search stage, design a kind of self-supervised search algorithm based on context autoregressive task; context autoregressive task refers to Yes, randomly mask multiple regions of the input image and train the network to predict the content information of the missing parts. An encoder-decoder structure is employed to extract the features of the input image and reconstruct the missing image content. Then, use this task to search the full attention network. 使用全注意力网络作为特征编码器对输入图像进行特征提取;该网络包含两类可学习的参数:自注意力操作的权重参数w和每个候选操作对应的结构参数a;将图像数据集划分为两个独立的集合,分别用DatasetA和DatasetB表示;使用DatasetA数据集优化权重参数,使用DatasetB优化结构参数;使用L1损失作为损失函数,其定义如下:Use the full attention network as the feature encoder to extract features from the input image; the network contains two types of learnable parameters: the weight parameter w of the self-attention operation and the structural parameter a corresponding to each candidate operation; the image dataset is divided into are two independent sets, represented by DatasetA and DatasetB respectively; use DatasetA to optimize weight parameters, and use DatasetB to optimize structure parameters; use L1 loss as the loss function, which is defined as follows:
Figure FDA0003690338300000021
Figure FDA0003690338300000021
其中M为像素个数,pi为输入像素,yi为真实值;然后,采用可微分结构搜索方法交替优化权重参数w和结构参数a;以迭代的方式,在DatasetA数据集上通过梯度下降
Figure FDA0003690338300000022
来优化权重参数,然后在DatasetB数据集上通过梯度下降
Figure FDA0003690338300000023
来优化结构参数;当自监督搜索阶段完成时,存储结构参数,并使用它们作为监督搜索阶段的初始值。
where M is the number of pixels, p i is the input pixel, and y i is the real value; then, the differentiable structure search method is used to alternately optimize the weight parameter w and the structure parameter a; in an iterative way, on the DatasetA data set through gradient descent
Figure FDA0003690338300000022
to optimize the weight parameters, and then pass gradient descent on the DatasetB dataset
Figure FDA0003690338300000023
to optimize the structural parameters; when the self-supervised search phase is complete, store the structural parameters and use them as initial values for the supervised search phase.
4.根据权利要求1所述基于全注意力网络结构搜索的图像分类算法,其特征在于,在监督搜索阶段,使用可微分结构搜索方法在图像分类数据集上进行搜索;使用自监督搜索阶段获得的结构参数作为初始值,采用梯度下降法交替优化结构参数α和权重参数w;使用交叉熵损失作为损失函数,其定义为:4. The image classification algorithm based on full attention network structure search according to claim 1, is characterized in that, in the supervised search stage, use the differentiable structure search method to search on the image classification data set; use the self-supervised search stage to obtain As the initial value, the structure parameter α and the weight parameter w are optimized alternately by the gradient descent method; the cross entropy loss is used as the loss function, which is defined as:
Figure DEST_PATH_RE-GDA0003808320290000034
Figure DEST_PATH_RE-GDA0003808320290000034
其中K为类别数,pk为属于第k类的预测概率,yk为类别标签;当监督搜索过程结束后,将结构参数排序,结构参数最大值对应的操作被选择出来,得到最终的体系结构。where K is the number of categories, p k is the predicted probability of belonging to the kth category, and y k is the category label; when the supervised search process is over, the structure parameters are sorted, and the operation corresponding to the maximum value of the structure parameters is selected to obtain the final system. structure.
CN202210660061.5A 2022-06-13 2022-06-13 Image classification method based on full attention network structure search Active CN115100459B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210660061.5A CN115100459B (en) 2022-06-13 2022-06-13 Image classification method based on full attention network structure search

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210660061.5A CN115100459B (en) 2022-06-13 2022-06-13 Image classification method based on full attention network structure search

Publications (2)

Publication Number Publication Date
CN115100459A true CN115100459A (en) 2022-09-23
CN115100459B CN115100459B (en) 2025-04-25

Family

ID=83290327

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210660061.5A Active CN115100459B (en) 2022-06-13 2022-06-13 Image classification method based on full attention network structure search

Country Status (1)

Country Link
CN (1) CN115100459B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116433980A (en) * 2023-04-19 2023-07-14 中科南京智能技术研究院 Image classification method, device, equipment and medium of spiking neural network structure

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112633494A (en) * 2020-12-17 2021-04-09 电子科技大学 Automatic neural network structure searching method based on automatic machine learning
CN113469263A (en) * 2021-07-13 2021-10-01 润联软件系统(深圳)有限公司 Prediction model training method and device suitable for small samples and related equipment
CN114299344A (en) * 2021-12-31 2022-04-08 江南大学 A low-cost automatic search method of neural network structure for image classification

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112633494A (en) * 2020-12-17 2021-04-09 电子科技大学 Automatic neural network structure searching method based on automatic machine learning
CN113469263A (en) * 2021-07-13 2021-10-01 润联软件系统(深圳)有限公司 Prediction model training method and device suitable for small samples and related equipment
CN114299344A (en) * 2021-12-31 2022-04-08 江南大学 A low-cost automatic search method of neural network structure for image classification

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116433980A (en) * 2023-04-19 2023-07-14 中科南京智能技术研究院 Image classification method, device, equipment and medium of spiking neural network structure

Also Published As

Publication number Publication date
CN115100459B (en) 2025-04-25

Similar Documents

Publication Publication Date Title
Liu et al. Progressive neural architecture search
CN109948029B (en) Neural network self-adaptive depth Hash image searching method
CN112508104B (en) A cross-task image classification method based on fast network architecture search
CN116129219B (en) SAR target class increment recognition method based on knowledge robust-rebalancing network
CN114492581B (en) A method based on transfer learning and attention mechanism meta-learning applied to small sample image classification
CN114626506B (en) A neural network unit structure search method and system based on attention mechanism
CN113642574A (en) Small sample target detection method based on feature weighting and network fine tuning
CN116089883A (en) A training method for improving the discrimination between old and new categories in incremental learning of existing categories
CN111783688B (en) A classification method of remote sensing image scene based on convolutional neural network
CN114742199B (en) A neural network macro-architecture search method and system based on attention mechanism
CN114067155A (en) Image classification method, device, product and storage medium based on meta learning
CN112101364A (en) A Semantic Segmentation Method Based on Incremental Learning of Parameter Importance
CN116310466A (en) A Few-Sample Image Classification Method Based on Locally Irrelevant Region Screening Graph Neural Networks
CN114781611B (en) Natural language processing method, language model training method and related equipment
CN113989655A (en) Radar or sonar image target detection and classification method based on automatic deep learning
CN115100459A (en) Image classification algorithm based on full attention network structure search
CN112733724B (en) Kinship verification method and device based on discriminative sample element miner
CN117195951B (en) A learning genetic inheritance method based on architecture search and self-knowledge distillation
CN117218409B (en) Image classification network architecture design method, device, equipment and medium
CN116776934B (en) Method for automatically searching neural network structure
JP6993250B2 (en) Content feature extractor, method, and program
CN115100694B (en) A fast fingerprint retrieval method based on self-supervised neural network
CN118587494A (en) An image classification method based on CNN neural network
CN118365952A (en) Crop pest image identification method based on causal intervention
CN116403051A (en) A Crater Age Classification Method Based on Multi-source Data Fusion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant