CN118196600A

CN118196600A - Neural architecture searching method and system based on differential evolution algorithm

Info

Publication number: CN118196600A
Application number: CN202410615399.8A
Authority: CN
Inventors: 韩小龙; 薛羽; 田青; 蒲勇霖; 项正龙; 王修来
Original assignee: Nanjing University of Information Science and Technology
Current assignee: Nanjing University of Information Science and Technology
Priority date: 2024-05-17
Filing date: 2024-05-17
Publication date: 2024-06-14
Anticipated expiration: 2044-05-17
Also published as: CN118196600B

Abstract

The present invention provides a neural architecture search method and system based on a differential evolution algorithm. The method first defines a search space, combines all candidate operations together to construct a supernet, and encodes them using continuous numerical values. Next, the network weights of the supernet are trained using training data and their labels. Then, a unique encoding method is designed for each subnet in the supernet, and the network weights are obtained directly from the supernet. Subsequently, the encoding of the subnet is optimized using a differential evolution algorithm. Finally, the network weight optimization of the supernet and the encoding optimization of the subnet are performed alternately. Unlike the method of optimizing subnet encoding using a gradient descent method and a genetic algorithm, the differential evolution algorithm of the present invention can make more full use of the vectorized information of the continuous encoding. Compared with traditional optimization methods, this method has higher efficiency and better search performance, and can find the optimal neural network architecture suitable for the task more quickly.

Description

Neural architecture search method and system based on differential evolution algorithm

技术领域Technical Field

本发明涉及深度神经网络技术领域，具体涉及基于差分进化算法的神经架构搜索方法和系统。The present invention relates to the field of deep neural network technology, and in particular to a neural architecture search method and system based on a differential evolution algorithm.

背景技术Background technique

深度学习是机器学习领域的一个分支，它通过构建深层神经网络来模拟和学习复杂的非线性关系。由于深度学习模型在图像识别、自然语言处理等任务上取得了卓越的成绩，它在科学研究和工业应用中引起了广泛关注。传统的神经网络架构是由专家手工设计，再根据网络效果不断调整架构，这种方式需要大量的人力和专家经验。为了适应深度神经网络技术的发展，自动得到一个适合目标数据分布的网络架构，神经网络架构搜索技术应运而生。神经网络架构搜索是一种自动化设计神经网络架构的方法，该技术利用搜索算法来自动发现预先定义的搜索空间内高性能的神经网络架构，这使得即使是非神经网络专业的人也可以在其从事的领域很好地利用神经网络技术。这种方法的出现不仅使神经网络的设计和优化变得更加高效，还为各行各业的研究人员和工程师提供了更广阔的应用前景和机会。Deep learning is a branch of machine learning that simulates and learns complex nonlinear relationships by building deep neural networks. As deep learning models have achieved outstanding results in tasks such as image recognition and natural language processing, they have attracted widespread attention in scientific research and industrial applications. Traditional neural network architectures are manually designed by experts, and then the architecture is continuously adjusted according to the network effect. This method requires a lot of manpower and expert experience. In order to adapt to the development of deep neural network technology and automatically obtain a network architecture suitable for the target data distribution, neural network architecture search technology came into being. Neural network architecture search is a method for automatically designing neural network architectures. This technology uses search algorithms to automatically discover high-performance neural network architectures in a pre-defined search space, which allows even non-neural network professionals to make good use of neural network technology in their fields. The emergence of this method not only makes the design and optimization of neural networks more efficient, but also provides researchers and engineers from all walks of life with broader application prospects and opportunities.

神经网络架构搜索是一种重要的研究领域，旨在自动发现最优的神经网络结构，以满足特定任务的需求。在这个过程中，需要从庞大的搜索空间中找到一种有效的组合方式，使得网络能够充分地学习并表现出色。在早期，神经网络架构搜索算法主要依赖于强化学习和演化计算等方法。这些方法需要在每次采样新的架构后进行完整的训练，并将其真实的精度作为优化的依据。尽管这些算法在提升网络性能方面取得了显著成绩，但由于每次训练都需要大量时间和计算资源，因此效率并不理想。Neural network architecture search is an important research field that aims to automatically discover the optimal neural network structure to meet the needs of specific tasks. In this process, it is necessary to find an effective combination from the huge search space so that the network can fully learn and perform well. In the early days, neural network architecture search algorithms mainly relied on methods such as reinforcement learning and evolutionary computing. These methods require full training after each sampling of a new architecture, and use its true accuracy as the basis for optimization. Although these algorithms have achieved remarkable results in improving network performance, the efficiency is not ideal because each training requires a lot of time and computing resources.

为了解决这一问题，近年来出现了可微分架构搜索框架。这种方法通过给每个候选操作赋予一个架构权重，将整个搜索空间连续化，从而利用梯度下降等优化方法来同时调整网络权重和架构权重。当网络收敛时，根据架构权重值得到最终的离散化网络。这一方法的出现显著提高了神经网络架构搜索的效率，使得在大规模任务下的网络设计更加灵活和高效。然而，这种方法仍然存在一些问题。首先，梯度下降方法在搜索到一个局部最优解时难以跳出当前解。即使引入了随机梯度下降方法和动量的概念，这个问题依然无法彻底解决。其次，可微分架构搜索算法是在整个超网上实现的，所有候选操作的网络架构权重在同时优化，这导致这个过程中操作之间会互相干扰，造成一定的耦合性。因此，提出一种基于演化计算方法的全局搜索算法来跳出局部最优解并对子网单独评估是十分必要的。In order to solve this problem, the differentiable architecture search framework has emerged in recent years. This method makes the entire search space continuous by assigning an architecture weight to each candidate operation, so as to use optimization methods such as gradient descent to adjust the network weight and architecture weight at the same time. When the network converges, the final discretized network is obtained according to the architecture weight value. The emergence of this method has significantly improved the efficiency of neural network architecture search, making network design more flexible and efficient under large-scale tasks. However, this method still has some problems. First, the gradient descent method is difficult to jump out of the current solution when searching for a local optimal solution. Even if the stochastic gradient descent method and the concept of momentum are introduced, this problem cannot be completely solved. Secondly, the differentiable architecture search algorithm is implemented on the entire supernet, and the network architecture weights of all candidate operations are optimized at the same time, which leads to interference between operations in this process, resulting in a certain degree of coupling. Therefore, it is very necessary to propose a global search algorithm based on evolutionary computation methods to jump out of the local optimal solution and evaluate the subnet separately.

发明内容Summary of the invention

发明目的：本发明所要解决的技术问题是针对现有技术的不足，提供基于差分进化算法的神经架构搜索方法和系统。Purpose of the invention: The technical problem to be solved by the present invention is to provide a neural architecture search method and system based on a differential evolution algorithm in view of the deficiencies in the prior art.

所述方法包括：The method comprises:

步骤1，构建一个包含M条连接边和N个操作的超网，每个子网使用到范围内的连续数值为每条连接边上的每个操作赋予一个权重，所述权重作为编码来唯一地表示一个种群内的个体；Step 1: construct a supernet containing M connecting edges and N operations. Each subnet uses a continuous value in a range to assign a weight to each operation on each connecting edge. The weight is used as a code to uniquely represent an individual in a population.

步骤2，使用训练数据及其标签训练超网的网络权重；Step 2, use the training data and its labels to train the network weights of the supernet;

步骤3，使用差分进化算法对子网编码进行优化；Step 3, using differential evolution algorithm to optimize subnet coding;

步骤4，交替执行步骤2和步骤3，直至种群内网络架构错误率收敛。Step 4: Alternately execute steps 2 and 3 until the error rate of the network architecture within the population converges.

步骤1中，所述超网中的每条连接边上含有N种候选操作，N取值为自然数；M和超网中设置的内部节点对应，设定内部节点是H个，则；In step 1, each edge in the supernet contains N candidate operations, where N is a natural number; M corresponds to the internal nodes set in the supernet, and the number of internal nodes is H. ;

所述超网包括X个子网，X的值为，在使用梯度下降方法训练网络权重的阶段需要使用到超网，即使得搜索空间内所有候选操作的参数都得到训练。子网表示所有连接边上仅有一个候选操作的网络，它是超网的一部分，本发明使用差分进化算法对子网的编码进行优化，以确保在搜索过程中找到最好的子网架构。超网的主要功能是训练所有候选操作的网络权重，以便直接提供评估子网性能所需的网络权重。The supernet includes X subnets, where the value of X is , the supernet needs to be used in the stage of training the network weights using the gradient descent method, that is, the parameters of all candidate operations in the search space are trained. The subnet represents a network with only one candidate operation on all connected edges. It is part of the supernet. The present invention uses a differential evolution algorithm to optimize the encoding of the subnet to ensure that the best subnet architecture is found during the search process. The main function of the supernet is to train the network weights of all candidate operations so as to directly provide the network weights required to evaluate the performance of the subnet.

超网是囊括搜索空间内所有候选操作和拓扑连接方式的混合网络的总称。一般来说，输入的信息往往是经过单个操作，如一个3*3卷积的处理得到新的特征图，但是在超网中输入信息会经过多种操作的处理，如3*3动态可分离卷积、3*3空洞卷、跳跃连接等一系列操作，并将各自得到的特征图按对应元素位相加的方式得到新的混合特征图信息。A supernet is a general term for a hybrid network that includes all candidate operations and topological connections in the search space. Generally speaking, input information is often processed by a single operation, such as a 3*3 convolution to obtain a new feature map, but in a supernet, the input information is processed by multiple operations, such as 3*3 dynamic separable convolution, 3*3 hole convolution, jump connection, and a series of operations, and the feature maps obtained by each are added according to the corresponding element positions to obtain a new hybrid feature map information.

本发明中，使用0到1范围内的连续数值为每条连接边上的每个操作赋予一个权重，这样的编码方式使得子网可以根据这个权重的大小被离散化为常规的神经网络，即去掉权重较小的候选操作与连接边。In the present invention, a continuous value in the range of 0 to 1 is used to assign a weight to each operation on each connection edge. This encoding method allows the subnet to be discretized into a conventional neural network according to the size of the weight, that is, candidate operations and connection edges with smaller weights are removed.

在本发明中，超网包括M条连接边，超网中的每条连接边上含有N种候选操作，包含3*3动态可分离卷积、5*5动态可分离卷积、3*3空洞卷积、5*5空洞卷积、3*3最大池化、3*3平均池化、跳跃连接和空操作等，空操作代表连接边不存在；In the present invention, the supernet includes M connection edges, and each connection edge in the supernet contains N candidate operations, including 3*3 dynamic separable convolution, 5*5 dynamic separable convolution, 3*3 dilated convolution, 5*5 dilated convolution, 3*3 maximum pooling, 3*3 average pooling, skip connection and empty operation, etc., and the empty operation represents that the connection edge does not exist;

根据上面的描述已知本发明中的超网是一种混合多种操作的网络结构，但本发明的目的是通过算法搜索到一个经历单一操作处理的子网，例如M条连接边需要在每条连接边上确定唯一的一个操作；According to the above description, it is known that the supernet in the present invention is a network structure that mixes multiple operations, but the purpose of the present invention is to search through an algorithm to find a subnet that undergoes a single operation. For example, M connecting edges need to determine a unique operation on each connecting edge.

为了表示出需要的子网，需要一个0到1范围内的数字来代表每条连接边上的重要性，例如采用了一种M*N的矩阵形式。所以超网的目的是用来存储各类操作训练好的网络权重，而编码则是针对子网，在演化计算这种群体优化方法中，子网也被视作种群内的个体，当编码展现出不同的数值时，就可以按照数值大小得到多种神经网络。此外需要注意的是N并没有限定范围，理论上任意一种卷积、池化操作均可加入到该模式中，M和设置的内部节点对应，设定内部节点是H个，则。In order to indicate the required subnet, a number in the range of 0 to 1 is needed to represent the importance of each connecting edge, for example, an M*N matrix is used. Therefore, the purpose of the supernet is to store the trained network weights of various operations, and the encoding is for the subnet. In the population optimization method of evolutionary computation, the subnet is also regarded as an individual in the population. When the encoding shows different values, a variety of neural networks can be obtained according to the value size. In addition, it should be noted that N has no limited range. In theory, any convolution and pooling operation can be added to this mode. M corresponds to the set internal nodes. If the internal nodes are set to H, then .

步骤1中，具体来说，编码的形式是一种14*8的矩阵，分别代表超网内的14条候选连接边，和连接边上的8个候选操作，随着后续搜索空间的扩大，矩阵的维度也可随之增大，即包含更多的连接边和候选操作。In step 1, specifically, the encoding is in the form of a 14*8 matrix, which represents the 14 candidate connection edges in the supernet and the 8 candidate operations on the connection edges. As the subsequent search space expands, the dimension of the matrix can also be increased, that is, it contains more connection edges and candidate operations.

步骤2中，超网需要先进行训练，其内部各类操作的权重才可以去使用；In step 2, the supernet needs to be trained before the weights of various operations inside it can be used;

超网初始化时，为了确保一条连接边上的所有候选操作的数值之和为1，并且每个操作同等重要，矩阵内的每一个元素值为，确保超网内每个操作的参数都得到训练，后续优化过程中，在生成子网编码后，将子网内较优个体的编码赋值给超网，来引导超网优化网络权重，编码的内部数值会有所倾向，架构权重更大的操作往往能训练的更充分，具体包括：在优化超网时，不同连接边上的特征图信息在汇聚到同一个节点时，将在通道维度上进行拼接，表示为/>，其中，concat代表拼接函数，它对输入的不同特征图/>和/>实现通道维度上的拼接，每条连接边上的特征图均由候选操作对输入数据处理后得到的特征图混合而成，表示为/>，其中，/>表示第i个操作对应的架构权重，input表示输入数据，/>则表示第i个操作提取输入信息特征的过程；经过这样的处理，神经网络反向传播更新算法优化时得以覆盖整个网络架构内的所有候选操作。When the supernet is initialized, in order to ensure that the sum of the values of all candidate operations on a connection edge is 1 and each operation is equally important, the value of each element in the matrix is , ensuring that the parameters of each operation in the supernet are trained. In the subsequent optimization process, after generating the subnet code, the code of the better individual in the subnet is assigned to the supernet to guide the supernet to optimize the network weight. The internal value of the code will be inclined, and the operation with a larger architecture weight can often be trained more fully. Specifically, when optimizing the supernet, the feature map information on different connection edges will be spliced in the channel dimension when it converges to the same node, expressed as/> , where concat represents the concatenation function, which operates on different feature maps of the input./> and/> To achieve splicing in the channel dimension, the feature map on each connecting edge is a mixture of the feature maps obtained by the candidate operation after processing the input data, expressed as/> , where /> Indicates the architecture weight corresponding to the i-th operation, input indicates the input data, /> It represents the process of extracting input information features by the i-th operation; after such processing, the neural network back propagation update algorithm is optimized to cover all candidate operations in the entire network architecture.

所述输入数据是经过stem layer（名为初始层，包括一个二维卷积和批量正则化，其主要功能是对一般图像的输入数据做预处理，提取特征并转换为适合后续处理的特征表示形式）处理得到的图像信息。对于一般的图像来说，直接作为输入会让神经网络难以提取共有的抽象化信息，所以本发明使用由卷积操作构成的stem layer得到更具表征性的图像信息。The input data is image information obtained by processing the stem layer (called the initial layer, including a two-dimensional convolution and batch normalization, whose main function is to pre-process the input data of general images, extract features and convert them into feature representation forms suitable for subsequent processing). For general images, directly using them as input will make it difficult for the neural network to extract common abstract information, so the present invention uses the stem layer composed of convolution operations to obtain more representative image information.

所述训练数据来源于现实生活中的应用场景，以医学领域为例，如果需要根据皮肤影像判断皮肤疾病类型，需要赋予已有图像确定的疾病类型作为标签。凭现有图像搜索到的神经网络经训练后可对新产生的医学图像做医学辅助处理，新图像在判断结束后添加到图像库用以更新网络架构和参数。经过这样的处理，神经网络反向传播更新算法得以覆盖整个网络架构内的所有候选操作，从而实现对超网内所有操作权重的优化。The training data is derived from real-life application scenarios. For example, in the medical field, if the type of skin disease needs to be determined based on skin images, the disease type determined by the existing image needs to be assigned as a label. The neural network searched based on the existing image can be trained to perform medical auxiliary processing on the newly generated medical image. After the judgment is completed, the new image is added to the image library to update the network architecture and parameters. After such processing, the neural network back-propagation update algorithm can cover all candidate operations in the entire network architecture, thereby optimizing the weights of all operations in the supernet.

步骤3包括：Step 3 includes:

步骤3.1，初始化种群个体，包括子网编码和伸缩因子；Step 3.1, initialize the population individuals, including subnet coding and scaling factors;

步骤3.2，使用演化算子更新伸缩因子；Step 3.2, use the evolution operator to update the scaling factor;

步骤3.3，使用包含自适应因子的演化算子更新子网编码；Step 3.3, update the subnetwork encoding using the evolution operator containing the adaptive factor;

步骤3.4，执行环境选择确定新一代的子网编码和伸缩因子。Step 3.4, the execution environment selects and determines the subnet encoding and scaling factor of the new generation.

步骤3.1包括：Step 3.1 includes:

步骤3.1.1，按照每条连接边N个候选操作的定义，规定第g代种群内第n个个体的第j条连接边的编码方式为，/>表示一条连接边矢量形式的编码，/>表示向量编码内处于第N个位置的数值；/>与步骤2中一条连接边上的/>对应；Step 3.1.1, according to the definition of N candidate operations for each link, the encoding method of the jth link of the nth individual in the gth generation population is defined as ,/> Represents a connection edge vector encoding,/> Indicates the value at the Nth position in the vector code;/> The /> on a connecting edge in step 2 correspond;

步骤3.1.2，按照一个超网共有M条连接边的设置，规定第g代种群内第n个个体的编码方式为，其中，/>表示对步骤3.1.1中第M条连接边的编码做转置处理得到的结果，/>表示第g代种群内第n个个体矩阵形式的编码；Step 3.1.2, according to the setting that a supernet has M connecting edges, the encoding method of the nth individual in the gth generation population is specified as , where /> represents the result of transposing the code of the Mth edge in step 3.1.1,/> Represents the matrix encoding of the nth individual in the gth generation population;

步骤3.1.3，根据连续数值的上限和下限，对子网个体内的每一部分实现随机初始化，/>表示取0~1之间的随机数,/>代表第g代种群内第n个个体的连接边上第N个位置上取得的最小值，在本发明中，该值为0；代表第g代种群内第n个个体的连接边上第N个位置上取得的最大值，在本发明中，该值为1；Step 3.1.3, according to the upper and lower limits of the continuous value, randomly initialize each part of the subnet individual ,/> Indicates a random number between 0 and 1, /> represents the minimum value obtained at the Nth position on the connecting edge of the nth individual in the gth generation population. In the present invention, this value is 0; represents the maximum value obtained at the Nth position on the connecting edge of the nth individual in the gth generation population. In the present invention, this value is 1;

步骤3.1.4，将第g代种群内第n个个体的伸缩因子作为一种从矢量编码差获取信息的参数，初始时被设置为0~1之间的随机数：/>。Step 3.1.4: Change the expansion factor of the nth individual in the gth generation population to As a parameter that obtains information from the vector encoding difference, it is initially set to a random number between 0 and 1:/> .

步骤3.2包括：Step 3.2 includes:

步骤3.2.1，遍历种群内个体，每次遍历时从第g代种群内随机选择个不重复的个体，分别是，/>和/>，3个不重复的个体的伸缩因子编码分别为/>、和/>，3个不重复的个体的子网编码分别为/>、/>和/>；对于演化算子，其中一个个体的编码作为基向量，另外两个计算编码之间的差异；Step 3.2.1, traverse the individuals in the population, and randomly select non-repeating individuals from the g-th generation population each time, which are ,/> and/> , the three non-repeating individual scaling factor codes are respectively/> , and/> , the subnet codes of the three non-repeating individuals are respectively/> 、/> and/> ;For the evolution operator, the encoding of one individual is used as the basis vector, and the difference between the encodings of the other two is calculated;

步骤3.2.2，从随机选择的3个个体中挑选2个个体的伸缩因子编码计算矢量信息差；Step 3.2.2, select the scaling factor encoding of 2 individuals from the 3 randomly selected individuals to calculate the vector information difference ;

步骤3.2.3，将计算得到的矢量编码差乘以第g代第n个个体的伸缩因子，表示为，其中/>表示第g代种群遍历过程中变异后的伸缩因子。通过这种变异操作得到步骤3.3中的自适应因子。Step 3.2.3, multiply the calculated vector code difference by the scaling factor of the nth individual in the gth generation, expressed as , where/> It represents the expansion factor after mutation during the g-th generation population traversal. The adaptive factor in step 3.3 is obtained through this mutation operation.

步骤3.3包括：Step 3.3 includes:

步骤3.3.1，从步骤3.2.1随机选择的3个个体中挑选对应的子网编码计算矢量信息差；Step 3.3.1, select the corresponding subnet code from the three individuals randomly selected in step 3.2.1 and calculate the vector information difference ;

步骤3.3.2，用剩余一个未参与计算矢量信息差的随机子网编码加上，然后得到变异后的子网编码，表述为/>，其中/>代表第g代种群遍历过程中经过变异算子得到的新子网编码；Step 3.3.2, use the remaining random subnet encoding that does not participate in the calculation of the vector information difference plus , and then get the mutated subnet code, expressed as/> , where/> Represents the new subnet code obtained by the mutation operator during the g-th generation population traversal process;

步骤3.3.3，对当前遍历个体和步骤3.3.2中得到的变异个体执行交叉，得到第j部分的编码信息。其中，/>是指种群内第n个个体的第j部分从原子网编码/>和变异后的子网编码/>中得到，具体来说各有一半的概率得到第j部分的编码信息。Step 3.3.3, perform crossover on the current traversal individual and the variant individual obtained in step 3.3.2 to obtain the encoding information of the jth part Among them, /> It refers to the jth part of the nth individual in the population encoded from the atomic network/> and the mutated subnet code/> Specifically, each has a 50% probability of obtaining the j-th part of the coded information.

步骤3.4包括：Step 3.4 includes:

步骤3.4.1，按照子网编码的数值大小做离散化处理：每条连接边上仅保留优化后得到的编码数值最大者，离散化后的编码方式转换为0-1编码，即每条连接边上保留下的操作对应的编码为1，其余皆为0，公式表示为；Step 3.4.1, discretize the subnet code according to its numerical value: only keep the code with the largest value after optimization on each connection edge, and convert the discretized code into 0-1 code, that is, the code corresponding to the operation retained on each connection edge is 1, and the rest are 0. The formula is expressed as ;

步骤3.4.2，计算当前种群内遍历第g代种群中第n个个体的适应度和经由演化算子处理后得到的第n个个体的适应度/>：用混淆矩阵C来衡量分类模型中的预测结果和真实标签之间的关系，设定面对一个包含L个类别的图像分类任务，混淆矩阵的行表示模型的预测类别，列表示图像的真实类别；设定总样本数为/>，正确分类的样本数为/>，计算公式为：Step 3.4.2, calculate the fitness of the nth individual in the gth generation population in the current population And the fitness of the nth individual obtained after being processed by the evolutionary operator/> ：Use the confusion matrix C to measure the relationship between the prediction results and the true labels in the classification model. Suppose there is an image classification task with L categories. The rows of the confusion matrix represent the predicted categories of the model, and the columns represent the true categories of the image. Set the total number of samples to/> , the number of correctly classified samples is/> , the calculation formula is:

， ,

其中，表示混淆矩阵C中第k行第k列的元素值；in, Represents the element value of the k-th row and k-th column in the confusion matrix C;

， ,

其中，表示混淆矩阵中第k行第l列的元素值，k和l均代表索引序号；正确率即适应度Fitness的计算公式为：in, Represents the element value of the kth row and lth column in the confusion matrix, where k and l both represent index numbers; the calculation formula for the accuracy, i.e., fitness, is:

； ;

步骤3.4.3，比较第g代种群内第n个个体对应的变异后的个体/>的适应度值，较高者保存到下一代种群内：Step 3.4.3, compare the nth individual in the gth generation population The corresponding mutated individual/> The higher fitness value is saved to the next generation population:

， ,

其中，表示保存到下一代的个体，它包括子网编码和伸缩因子，如果经过演化算子得到的个体适应度函数较高，则将演化得到的种群内第n个个体/>的子网编码和伸缩因子均保存下来，否则保留原第g代种群中第n个个体/>的子网编码和伸缩因子；表示演化得到的种群内第n个个体/>内的子网编码/>的适应度，表示原第g代种群中第n个个体/>内的子网编码/>的适应度。in, Indicates the individuals saved to the next generation, which includes the subnet code and the scaling factor. If the individual fitness function obtained by the evolution operator is higher, the nth individual in the population obtained by evolution will be saved. The subnet code and scaling factor are preserved, otherwise the nth individual in the original g-th generation population is retained/> Subnet coding and scaling factor; Indicates the nth individual in the population obtained by evolution/> Subnet code in /> The fitness of Indicates the nth individual in the original g-generation population/> Subnet code in /> The fitness of .

步骤4中，为了减少计算资源开销，采取了网络权重和子网编码交替优化的方式，直至种群内网络架构错误率收敛。一次性将超网完整训练好较难实现，因为如何去分配超网的训练权重是未知的，所以本发明提供一种种群内优秀个体引导的训练超网的策略，即一方面需要训练超网为优化编码提供依据，另一方面需要让超网复制种群内优秀个体的编码信息去引导超网网络权重的优化，用公式表示为：In step 4, in order to reduce the computing resource overhead, the network weight and subnet coding are optimized alternately until the error rate of the network architecture in the population converges. It is difficult to train the supernet completely at one time because it is unknown how to allocate the training weights of the supernet. Therefore, the present invention provides a strategy for training the supernet guided by excellent individuals in the population, that is, on the one hand, the supernet needs to be trained to provide a basis for optimizing the coding, and on the other hand, the supernet needs to copy the coding information of the excellent individuals in the population to guide the optimization of the supernet network weights, which can be expressed as:

服从于/>， Subject to/> ,

其中表示子网编码，/>代表最高适应度的个体对应的子网编码，/>表示适应度计算函数，W表示超网内的权重，/>表示损失函数值最低时权重W的取值，表示返回括号内的值取最大时的/>，/>表示返回括号内的值取最小时的W，将训练数据集均分为两份，分别记为/>和/>，其中数据集/>中的数据用于训练超网网络权重，数据集/>中的数据用于在优化子网编码过程中评估子网性能，Loss表示分类任务中常用的交叉熵损失函数。in Indicates the subnet code, /> The subnet code corresponding to the individual with the highest fitness,/> represents the fitness calculation function, W represents the weight in the supernet, /> Indicates the value of weight W when the loss function value is the lowest, Indicates that the maximum value in the brackets is returned /> ,/> It means returning the smallest value in the brackets, W, and dividing the training data set into two parts, denoted as/> and/> , where the data set /> The data in is used to train the supernet network weights, data set/> The data in is used to evaluate the subnetwork performance in the process of optimizing the subnetwork encoding, and Loss represents the cross entropy loss function commonly used in classification tasks.

本发明还提供了基于所述的基于差分进化算法的神经架构搜索方法的系统，所述系统包括：The present invention also provides a system based on the neural architecture search method based on the differential evolution algorithm, the system comprising:

编码模块，用于构建一个包含M条连接边和N个操作的超网，每个子网使用0到1范围内的连续数值为每条连接边上的每个操作赋予一个权重，所述权重作为编码来唯一地表示一个种群内的个体；An encoding module, used to construct a supernet including M connecting edges and N operations, each subnet uses a continuous value ranging from 0 to 1 to assign a weight to each operation on each connecting edge, and the weight is used as a code to uniquely represent an individual in a population;

训练模块，用于使用训练数据及其标签训练超网的网络权重；A training module for training the network weights of the supernet using training data and its labels;

优化模块，用于使用差分进化算法对子网编码进行优化。The optimization module is used to optimize the subnet encoding using the differential evolution algorithm.

本发明通过引入演化计算方法，可以更好地探索整个搜索空间，从而避免陷入局部最优解。同时，对子网进行单独评估可以降低操作之间的干扰，提高搜索的效率和精度。这样的全局搜索算法有望解决当前可微分架构搜索算法存在的局限性，并为神经网络架构搜索领域带来新的突破和进展。因此本方法提出一种依据每个细胞的输入信息利用多层感知机来学习操作权重的方法。这种方法取得了比现有方法更好的结果。By introducing evolutionary computation methods, the present invention can better explore the entire search space, thereby avoiding falling into a local optimal solution. At the same time, evaluating the subnets individually can reduce interference between operations and improve the efficiency and accuracy of the search. Such a global search algorithm is expected to solve the limitations of the current differentiable architecture search algorithm and bring new breakthroughs and progress to the field of neural network architecture search. Therefore, this method proposes a method for learning operation weights using a multilayer perceptron based on the input information of each cell. This method has achieved better results than existing methods.

本发明的有益效果是：The beneficial effects of the present invention are:

本发明的方法针对可微分神经网络架构搜索提出一种基于差分进化算法的神经架构搜索算法。该算法通过差分进化算法优化种群内个体编码，并采用梯度下降方式优化超网网络权重。在这一方法中，我们采用了连续化编码方案，使得整个搜索空间连续化，可用于训练超网内的所有操作权重参数。每次使用差分进化算法评估种群内个体适应度时，直接从超网复制权重，并计算适应度，无需重新完整训练网络架构，从而大大缩减了计算开销。差分进化算法针对连续编码，能够充分利用编码内的矢量化信息，从而更有效地搜索到更优秀的架构，此外为子网编码的进化设计自适应因子，让子网性能引导伸缩因子的进化，从而辅助差分进化算法优化子网编码。这一基于差分进化算法的神经架构搜索算法为神经网络架构搜索领域带来了新的方法和思路，具有较高的实用价值和广泛的应用前景。The method of the present invention proposes a neural architecture search algorithm based on a differential evolution algorithm for differentiable neural network architecture search. The algorithm optimizes individual codes in a population through a differential evolution algorithm, and optimizes supernet network weights by a gradient descent method. In this method, we adopt a continuous coding scheme to make the entire search space continuous, which can be used to train all operation weight parameters in the supernet. Each time the differential evolution algorithm is used to evaluate the fitness of individuals in a population, the weights are directly copied from the supernet and the fitness is calculated without the need to completely retrain the network architecture, thereby greatly reducing the computational overhead. The differential evolution algorithm is targeted at continuous coding and can make full use of the vectorized information in the coding, so as to more effectively search for better architectures. In addition, an adaptive factor is designed for the evolution of subnet coding, so that the subnet performance guides the evolution of the scaling factor, thereby assisting the differential evolution algorithm in optimizing the subnet coding. This neural architecture search algorithm based on the differential evolution algorithm brings new methods and ideas to the field of neural network architecture search, and has high practical value and broad application prospects.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

下面结合附图和具体实施方式对本发明做更进一步的具体说明，本发明的上述和/或其他方面的优点将会变得更加清楚。The present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments, and the above and/or other advantages of the present invention will become more clear.

图1为本发明中的整体框架图。FIG. 1 is an overall framework diagram of the present invention.

图2为本发明中的变异描述图。FIG. 2 is a diagram describing the variation in the present invention.

图3是训练得到的超网精度示意图。FIG3 is a schematic diagram of the super network accuracy obtained through training.

图4是本发明方法和其他方法对比示意图。FIG. 4 is a schematic diagram comparing the method of the present invention with other methods.

具体实施方式Detailed ways

如图1所示，本发明提供了基于差分进化算法的神经架构搜索方法和系统。所述方法包括：As shown in FIG1 , the present invention provides a neural architecture search method and system based on a differential evolution algorithm. The method comprises:

超网是囊括搜索空间内所有候选操作和拓扑连接方式的混合网络的总称。一般来说，输入的信息往往是经过单个操作，如一个3*3卷积的处理得到新的特征图，但是在超网中输入信息会经过多种操作的处理，如3*3动态可分离卷积、3*3空洞卷、跳跃连接等一系列操作，并将各自得到的特征图按对应元素位相加的方式得到新的混合特征图信息。Supernet is a general term for hybrid networks that include all candidate operations and topological connections in the search space. Generally speaking, input information is often processed by a single operation, such as a 3*3 convolution to obtain a new feature map, but in the supernet, the input information will be processed by multiple operations, such as 3*3 dynamic separable convolution, 3*3 void convolution, jump connection, and a series of operations, and the feature maps obtained by each are added according to the corresponding element positions to obtain a new hybrid feature map information.

步骤3包括：Step 3 includes:

步骤3.1包括：Step 3.1 includes:

步骤3.2包括：Step 3.2 includes:

步骤3.3包括：Step 3.3 includes:

步骤3.4包括：Step 3.4 includes:

步骤3.4.2，计算当前种群内遍历第g代种群中第n个个体的适应度和经由演化算子处理后得到的第n个个体的适应度/>：用混淆矩阵C来衡量分类模型中的预测结果和真实标签之间的关系，设定面对一个包含L个类别的图像分类任务，混淆矩阵的行表示模型的预测类别，列表示图像的真实类别；设定总样本数为/>，正确分类的样本数为/>，计算公式为：Step 3.4.2, calculate the fitness of the nth individual in the gth generation population in the current population And the fitness of the nth individual obtained after being processed by the evolutionary operator/> ：Use the confusion matrix C to measure the relationship between the prediction results and the true labels in the classification model. Suppose there is an image classification task containing L categories. The rows of the confusion matrix represent the predicted categories of the model, and the columns represent the true categories of the image. Set the total number of samples to/> , the number of correctly classified samples is/> , the calculation formula is:

， ,

； ;

， ,

服从于/>， Subject to/> ,

本发明一具体实施例中，图1以包含6条连接边和3个候选操作的搜索空间为例，将详细介绍本发明提出的基于差分进化算法的神经架构搜索方法和系统的实施流程和细节。In a specific embodiment of the present invention, FIG1 takes a search space including 6 connecting edges and 3 candidate operations as an example to introduce in detail the implementation process and details of the neural architecture search method and system based on the differential evolution algorithm proposed in the present invention.

如图1所示，本发明提出的基于差分进化算法的神经架构搜索方法，包括以下步骤：As shown in FIG1 , the neural architecture search method based on the differential evolution algorithm proposed in the present invention includes the following steps:

步骤1：首先构建如图1左侧子图所示的超网，该超网包括4个表示特征图的节点，节点具有先后顺序，特征图只能从前序节点经过处理后传递到后序节点；该超网具有3条连接边代表候选操作处理，其中Op₁表示3*3卷积，Op₂表示5*5卷积，Op₃表示跳跃连接。以处理超网内节点0上的输入数据为例，三个候选操作处理输入信息得到的特征图将按照编码的权重做加权求和处理，得到的特征图信息存储在节点1上。对于节点2和节点3这种后续节点，它们从多个前序节点获得特征图信息，此时需要将多个特征图信息在通道维度上做拼接。需要注意的是，以图1的描述为例，超网仅辅助网络权重的训练，它不存在具体的编码，而优化的种群内的个体具备对应超网内所有候选操作的编码。Step 1: First, construct a supernet as shown in the left sub-graph of Figure 1. The supernet includes 4 nodes representing feature maps. The nodes have a sequence, and the feature map can only be transferred from the previous node to the next node after processing; the supernet has 3 connecting edges representing candidate operation processing, where Op ₁ represents 3*3 convolution, Op ₂ represents 5*5 convolution, and Op ₃ represents skip connection. Taking the input data on node 0 in the supernet as an example, the feature maps obtained by processing the input information of the three candidate operations will be weighted and summed according to the encoded weights, and the obtained feature map information is stored on node 1. For subsequent nodes such as node 2 and node 3, they obtain feature map information from multiple previous nodes. At this time, multiple feature map information needs to be spliced in the channel dimension. It should be noted that, taking the description of Figure 1 as an example, the supernet only assists the training of network weights, and it does not have a specific encoding, and the individuals in the optimized population have the encoding corresponding to all candidate operations in the supernet.

步骤2：图1内的编码与子网个体一一对应，表示架构权重，括号里的数字表示节点序号，在训练超网时它作为特征图加权求和中的权重信息，从而实现超网内的所有候选操作均参与网络的训练。为了展现出种群内优秀个体的优势，从种群内的个体依次将编码权重赋值给超网，实现超网网络权重的优化。接下来，以一个真实场景的输入数据为例介绍完整的超网训练流程。首先，需要收集目标场景的图像数据；然后，从图像中心出发切割出一个224*224像素的图像；接着，将切割出的图像送入到stem layer用于提取特征；最后，将提取出的抽象特征作为超网的输入。Step 2: The codes in Figure 1 correspond to the subnet individuals one by one. represents the architecture weight, and the number in the brackets represents the node number. When training the supernet, it serves as the weight information in the weighted sum of the feature graph, so that all candidate operations in the supernet participate in the network training. In order to show the advantages of excellent individuals in the population, the encoding weights are assigned to the supernet from the individuals in the population in turn to optimize the supernet network weights. Next, the complete supernet training process is introduced using the input data of a real scene as an example. First, the image data of the target scene needs to be collected; then, a 224*224 pixel image is cut out from the center of the image; then, the cut image is sent to the stem layer for feature extraction; finally, the extracted abstract features are used as the input of the supernet.

步骤3：如图1所示，在交替优化的过程中以种群内个体评价指标作为依据，使用差分进化算法对子网编码进行优化包括如下步骤：Step 3: As shown in FIG1 , in the process of alternating optimization, the subnet coding is optimized using the differential evolution algorithm based on the individual evaluation index within the population, including the following steps:

步骤3.1：初始化种群个体如图1内的种群所示，包括如下步骤：Step 3.1: Initialize the population individuals as shown in Figure 1, including the following steps:

步骤3.1.1：本示例每条连接边包含3个候选操作，规定第g代种群内第n个个体的第j条连接边的编码方式为；Step 3.1.1: In this example, each link contains three candidate operations. The encoding method of the jth link of the nth individual in the gth generation population is specified as ;

步骤3.1.2：按照一个超网共有6条连接边的设置，规定第n个个体的编码方式为；Step 3.1.2: According to the setting that a supernet has 6 connecting edges, the encoding method of the nth individual is specified as ;

步骤3.1.3：根据连续数值的上限和下限，以一条连接边上的第一个候选操作为例实现子网个体的初始化，种群内初始化个体数为50。Step 3.1.3: Based on the upper and lower limits of the continuous values, take the first candidate operation on a connecting edge as an example to initialize the subnet individuals. , the number of individuals in the population is initialized to 50.

步骤3.2：使用演化算子更新伸缩因子：Step 3.2: Update the scaling factor using the evolution operator:

步骤3.2.1：遍历种群内个体，每次遍历时从当前种群内随机选择3个个体，以某一次遍历的个体为例，随机得到三个不重复的个体分别是/>，/>和/>，下标表示该个体在种群内的索引序号，3个个体均包含子网编码/>，/>和/>，以及伸缩因子编码/>，/>和/>；Step 3.2.1: Traverse the individuals in the population, randomly select 3 individuals from the current population each time, and use the individuals in a certain traversal as For example, three non-repeating individuals are randomly obtained, namely/> ,/> and/> , the subscript indicates the index number of the individual in the population, and all three individuals contain subnet codes/> ,/> and/> , and scaling factor encoding/> ,/> and/> ;

步骤3.2.2:随机选择3个个体中的两个计算矢量信息差并乘以当前遍历的个体的伸缩因子，然后将得出的信息和第三个个体相加，在本示例中，变异伸缩因子/>的具体计算方式为/>；Step 3.2.2: Randomly select two of the three individuals to calculate the vector information difference and multiply it by the scaling factor of the currently traversed individual , and then add the resulting information to the third individual. In this example, the mutation scaling factor /> The specific calculation method is as follows: ;

步骤3.3：使用包含自适应伸缩因子的演化算子更新子网编码：Step 3.3: Update the subnetwork encoding using the evolution operator containing the adaptive scaling factor:

步骤3.3.1：筛选出与步骤3.2中随机选择的3个个体\/>和/>，和对应的子网编码/>，/>和/>；Step 3.3.1: Screen out the three individuals randomly selected in step 3.2 \/> and/> , and the corresponding subnet code/> ,/> and/> ;

步骤3.3.2：将从步骤3.2中得到的伸缩因子乘以两个子网编码的矢量差，再加上第3个子网编码，得到具体的变异后子网编码，其具体计算方式为/>，图2可视化了该计算过程，此时为/>的特殊情况；Step 3.3.2: Multiply the scaling factor obtained in step 3.2 by the vector difference of the two subnet codes, and add the third subnet code to get the specific mutated subnet code. , the specific calculation method is/> , Figure 2 visualizes the calculation process, at this time/> special circumstances;

步骤3.3.3：对当前遍历个体和步骤3.2.3中得到的变异个体执行交叉，，其中j代表子网编码的第j部分，这部分得到的是遍历到的子网编码及其变异后的子网编码进行交叉，各有一半的概率组成新的子网编码。Step 3.3.3: Perform crossover on the current traversal individual and the mutant individual obtained in step 3.2.3. , where j represents the jth part of the subnet code. This part is obtained by crossing the traversed subnet code and its mutated subnet code, each with a half probability of forming a new subnet code.

步骤3.4：执行环境选择确定新一代的子网编码和伸缩因子，该过程包括：Step 3.4: Execution environment selection determines the next generation of subnet encoding and scaling factor. The process includes:

步骤3.4.1：将子网的连续编码离散化用于性能评估，以图2为例，新的到的子网编码含有具体数值，节点0和节点1之间存在3条候选操作，Op₁，Op₂，Op₃的架构权重分别是0.66，0.84和0.64，选取权重值最大的候选操作，所以离散化的架构其节点0和节点1之间的操作是Op₂；Step 3.4.1: Discretize the continuous code of the subnet for performance evaluation. Taking Figure 2 as an example, the new subnet code contains specific values. There are three candidate operations between node 0 and node 1. The architecture weights of Op ₁ , Op ₂ , and Op ₃ are 0.66, 0.84, and 0.64 respectively. The candidate operation with the largest weight value is selected, so the operation between node 0 and node 1 in the discretized architecture is Op ₂ .

步骤3.4.2：计算当前种群内遍历个体的适应度和经由演化算子处理后得到的个体的适应度/>，具体来说用混淆矩阵来衡量分类模型中的预测结果和真实标签之间的关系，假设面对一个包含L个类别的图像分类任务，混淆矩阵的行表示模型的预测类别，列表示图像的真实类别。假设总样本数为/>，正确分类的样本数为/>，那么正确率即适应度的计算方式为/>，具体计算正确分类的样本数需要将混淆矩阵对角线上所有元素相加，即/>，计算总样本数/>；Step 3.4.2: Calculate the fitness of the traversed individuals in the current population And the fitness of the individual obtained after being processed by the evolutionary operator/> Specifically, the confusion matrix is used to measure the relationship between the prediction results and the true labels in the classification model. Assuming that there are an image classification task containing L categories, the rows of the confusion matrix represent the predicted categories of the model, and the columns represent the true categories of the image. Assuming the total number of samples is/> , the number of correctly classified samples is/> , then the calculation method of accuracy, i.e. fitness, is / > To calculate the number of correctly classified samples, we need to add up all the elements on the diagonal of the confusion matrix, that is, /> , calculate the total number of samples/> ;

步骤3.4.3：比较两个适应度值，较高者保存到下一代种群内，，其中，/>表示保存到下一代的个体，它包括子网编码和伸缩因子，即若经过进化算子得到的个体适应度函数较高，则将进化得到的个体/>的子网编码和伸缩因子保存下来，反之则保留原来个体/>的子网编码和伸缩因子。Step 3.4.3: Compare the two fitness values, and save the higher one to the next generation population. , where /> Indicates the individuals saved to the next generation, which includes the subnet code and the scaling factor. That is, if the individual fitness function obtained by the evolution operator is higher, the evolved individual will be saved to the next generation. The subnet code and scaling factor of the subnet are preserved, otherwise the original individual Subnet coding and scaling factor.

步骤4：为了减少计算资源开销，本发明采取了网络权重和子网编码交替优化的方式，直至种群内网络架构错误率收敛。一次性将超网完整训练好较难实现，因为如何去分配超网的训练权重是未知的，所以本发明提供一种种群内优秀个体引导的训练超网的策略，即一方面需要训练超网为优化编码提供依据，另一方面需要让超网复制种群内优秀个体的编码信息去引导超网网络权重的优化。该过程描述为：Step 4: In order to reduce the computing resource overhead, the present invention adopts the method of alternating optimization of network weights and subnet codes until the error rate of the network architecture in the population converges. It is difficult to fully train the supernet in one go because it is unknown how to allocate the training weights of the supernet. Therefore, the present invention provides a strategy for training the supernet guided by excellent individuals in the population, that is, on the one hand, the training supernet needs to provide a basis for optimizing the coding, and on the other hand, the supernet needs to copy the coding information of the excellent individuals in the population to guide the optimization of the supernet network weights. The process is described as:

服从于/>， Subject to/> ,

其中表示子网编码，/>表示适应度计算函数，W表示超网内的权重，/>表示将数据集划分的用于训练超网的那部分数据，/>表示用于优化子网编码的那部分数据，Loss表示分类任务中常用的交叉熵损失函数。在不断迭代优化的过程中，最终得到最优子网/>，具体表示为：in Indicates the subnet code, /> represents the fitness calculation function, W represents the weight in the supernet, /> Indicates the data set divided into parts used to train the supernet, /> Indicates the part of data used to optimize the subnet encoding, and Loss represents the cross entropy loss function commonly used in classification tasks. In the process of continuous iterative optimization, the optimal subnet is finally obtained. , specifically expressed as:

其中，表示卷积核大小为5*5的动态可分离卷积，/>表示卷积核大小为3*3的动态可分离卷积，/>表示卷积核大小为3*3的空洞卷积，/>表示尺寸为3*3的最大池化，/>表示跳跃连接，/>表示尺寸为3*3的平均池化。in, Indicates a dynamic separable convolution with a convolution kernel size of 5*5,/> Indicates a dynamic separable convolution with a convolution kernel size of 3*3,/> Indicates a dilated convolution with a kernel size of 3*3, /> Indicates the maximum pooling size of 3*3,/> represents a skip connection, /> Indicates average pooling of size 3*3.

与现有技术相比，本发明因演化算子的优秀全局搜索能力，能缓解现有方法在搜索过程陷入局部最优的问题。此外，本发明的演化算子实施过程中更加注重种群多样性，能够避免因小模型陷阱导致的优化过程中出现的早熟收敛问题。为了便于和其它算法对比，本发明方法在CIFAR-10这一公共数据集上检验了性能，该数据集共包含10种类别，6万张图片，其中5万张用于训练，1万张用于测试，所有类别的图片尺寸为32*32且数量均等。本发明在GPU 2080ti上仅需4.8小时即可搜索到一个精度高达97.45%的神经网络架构，这比现有的可微分架构搜索算法快了31.2个小时，精度高了0.45%。具体来说，第一步构建超网，并从超网中的每条连接边上采样一种操作获得子网，重复采样步骤50次获得初始种群；步骤二中训练得到的超网精度如图3所示。Compared with the prior art, the present invention can alleviate the problem of existing methods falling into local optimality during the search process due to the excellent global search ability of the evolution operator. In addition, the evolution operator of the present invention pays more attention to population diversity during implementation, which can avoid the premature convergence problem caused by the small model trap in the optimization process. In order to facilitate comparison with other algorithms, the performance of the method of the present invention is tested on the public data set CIFAR-10, which contains 10 categories and 60,000 pictures, of which 50,000 are used for training and 10,000 are used for testing. The size of pictures of all categories is 32*32 and the number is equal. The present invention can search for a neural network architecture with an accuracy of up to 97.45% in only 4.8 hours on GPU 2080ti, which is 31.2 hours faster than the existing differentiable architecture search algorithm and 0.45% higher in accuracy. Specifically, the first step is to build a supernet, and sample an operation from each connection edge in the supernet to obtain a subnet, and repeat the sampling step 50 times to obtain the initial population; the accuracy of the supernet trained in step 2 is shown in Figure 3.

在交替优化超网网络权重和种群个体编码的过程中，本发明将每代内所有个体的平均验证精度和其它方法进行了对比，如图4所示，相比于其它方法因为小模型陷阱而导致的早熟收敛问题（即橙色曲线过早的收敛，并在后续的优化过程中出现精度下降的现象），本发明的方法能够正常收敛于较低的验证错误率。In the process of alternately optimizing the supernet network weights and the population individual codes, the present invention compares the average verification accuracy of all individuals in each generation with other methods. As shown in FIG4 , compared with the premature convergence problem of other methods caused by the small model trap (i.e., the orange curve converges prematurely and the accuracy decreases in the subsequent optimization process), the method of the present invention can converge normally to a lower verification error rate.

本发明提供了基于差分进化算法的神经架构搜索方法和系统，具体实现该技术方案的方法和途径很多，以上所述仅是本发明的优选实施方式，应当指出，对于本技术领域的普通技术人员来说，在不脱离本发明原理的前提下，还可以做出若干改进和润饰，这些改进和润饰也应视为本发明的保护范围。本实施例中未明确的各组成部分均可用现有技术加以实现。The present invention provides a neural architecture search method and system based on a differential evolution algorithm. There are many methods and ways to implement the technical solution. The above is only a preferred embodiment of the present invention. It should be pointed out that for ordinary technicians in this technical field, several improvements and modifications can be made without departing from the principles of the present invention. These improvements and modifications should also be regarded as the protection scope of the present invention. All components not specified in this embodiment can be implemented using existing technologies.

Claims

1. The neural architecture searching method based on the differential evolution algorithm is characterized by comprising the following steps of:

Step 1, constructing a super-network comprising M connection sides and N operations, wherein each sub-network uses continuous values in a range to assign a weight to each operation on each connection side, and the weights are used as codes to uniquely represent individuals in a population;

Step 2, training the network weight of the super-network by using the training data and the labels thereof;

step 3, optimizing the subnet code by using a differential evolution algorithm;

and step 4, alternately executing the step 2 and the step 3 until the error rate of the network architecture in the population converges.

2. The method according to claim 1, wherein in step 1, each connection edge in the super network contains N candidate operations, where N is a natural number; m corresponds to the internal nodes arranged in the super network, and if the number of the internal nodes is H, the internal nodes are set；

The super network comprises X sub-networks, and the value of X isThe sub-net table shows a network with only one candidate operation on all connected edges.

3. The method of claim 2, wherein in step 2, the value of each element in the matrix is equal in importance in order to ensure that the sum of the values of all candidate operations on one connection edge is 1 during the initialization of the super-networkIn the subsequent optimization process, after generating the subnet code, assigning the code of the better individual in the subnet to the super-network to guide the super-network to optimize the network weight, and specifically comprises the following steps: when the super network is optimized, the characteristic diagram information on different connection edges is spliced on the channel dimension when converging to the same node, and the characteristic diagram information is expressed as/>Wherein concat represents a stitching function which maps/>, for different features of the inputAnd/>The splicing in the channel dimension is realized, and the characteristic diagrams on each connecting edge are formed by mixing characteristic diagrams obtained by processing input data through candidate operations, and are expressed as/>Wherein/>Representing architecture weight corresponding to the ith operation, input representing input data,/>The i-th operation is represented as a process of extracting the features of the input information.

4. A method according to claim 3, wherein step 3 comprises:

step 3.1, initializing population individuals, including subnet codes and expansion factors;

Step 3.2, updating the expansion factor by using an evolution operator;

Step 3.3, updating the subnet code by using an evolution operator containing the self-adaptive factor;

and 3.4, selecting an execution environment to determine a new generation of subnet coding and a scaling factor.

5. The method of claim 4, wherein step 3.1 comprises:

Step 3.1.1, defining the coding mode of the j-th connecting edge of the nth individual in the g-th generation group as follows according to the definition of N candidate operations of each connecting edge ，/>Representing a code in the form of a concatenated edge vector,A value representing an nth position within the vector code;

Step 3.1.2, according to the setting of M joint edges shared by a super network, defining the coding mode of the nth individual in the g generation group as Wherein/>Representing the result obtained by transposed processing the code of the M-th connecting edge in step 3.1.1,/>Representing the code in the form of an nth individual matrix within the g-th generation population;

step 3.1.3, implementing random initialization of each part in the sub-network unit according to the upper limit and the lower limit of the continuous values ，/>Representing that the random number between 0 and 1 is takenRepresenting the minimum value taken at the nth position on the connecting edge of the nth individual within the g generation population,/>Representing the maximum value obtained at the nth position on the connecting edge of the nth individual in the g generation population;

step 3.1.4, the expansion factors of the nth individuals in the g generation population are calculated As a parameter for acquiring information from the vector code difference, a random number between 0 and 1 is initially set: /(I)。

6. The method of claim 5, wherein step 3.2 comprises:

Step 3.2.1, traversing individuals in the population, randomly selecting non-repeating individuals from the g generation population each time, respectively ，/>And/>The scale factor codes of 3 non-repeating individuals were/>, respectively、/>AndThe subnet codes of 3 non-repeating individuals were/>, respectively、/>And/>; For evolution operators, wherein the codes of one individual are used as basis vectors, and the other two calculate the difference between the codes;

Step 3.2.2, selecting 2 individuals from 3 randomly selected individuals to calculate vector information difference by using the scale factor codes ；

Step 3.2.3, multiplying the calculated vector code difference by the expansion factor of the nth generation individual, expressed asWherein/>And the mutated expansion factor in the traversal process of the g generation population is shown.

7. The method of claim 6, wherein step 3.3 comprises:

Step 3.3.1, selecting corresponding subnet codes from 3 individuals selected randomly in step 3.2.1 to calculate vector information difference ；

Step 3.3.2, adding with the remaining one random subnet code not participating in the calculation of the vector information differenceThen, a mutated subnet code is obtained, expressed as/>Wherein/>Representing a new subnet code obtained by a mutation operator in the traversal process of the g generation population;

step 3.3.3, intersecting the current traversing individual and the variant individual obtained in step 3.3.2 to obtain the coding information of the j-th part 。

8. The method of claim 7, wherein step 3.4 comprises:

Step 3.4.1, discretizing according to the value of the subnet code: only the maximum code value obtained after optimization is reserved on each connecting edge, the discretized coding mode is converted into 0-1 code, namely the corresponding code of the operation reserved on each connecting edge is 1, the rest is 0, and the formula is expressed as ；

Step 3.4.2, calculating the adaptability of the nth individual in the current population to the nth generation populationAnd the fitness/>, of the nth individual obtained after processing via evolution operators: The relation between the prediction result and the real label in the classification model is measured by using an confusion matrix C, an image classification task containing L categories is set, the rows of the confusion matrix represent the prediction category of the model, and the real categories of the image are listed; setting the total number of samples to/>The number of correctly classified samples is/>The calculation formula is as follows:

，

Wherein, Element values representing the kth column of the kth row in the confusion matrix C;

，

Wherein, The element value of the kth row and the kth column in the confusion matrix is represented, and k and l both represent index serial numbers; the calculation formula of the accuracy, namely the Fitness Fitness is as follows:

；

Step 3.4.3 comparing nth individuals within the g generation population Corresponding mutated individuals/>The higher ones are saved into the next generation population:

，

Wherein, Representing individuals saved to the next generation, which comprises a subnet code and a scaling factor, if the individual fitness function obtained by an evolution operator is higher, the nth individual/>, in the population obtained by evolutionThe subnet code and the expansion factor of the population are all preserved, otherwise, the nth individual/>, in the original g generation population is preservedA subnet coding and scaling factor of (a); representing the nth individual/>, within the evolving population Intra subnet coding/>Is used for the degree of adaptation of the system,Representing the nth individual/>, in the original g-th generation populationIntra subnet coding/>Is used for the adaptation degree of the device.

9. The method of claim 8, wherein in step 4, a manner of alternately optimizing network weights and subnet codes is adopted until the intra-population network architecture error rate converges, expressed as:

obeys to/> ，

Wherein the method comprises the steps ofRepresenting subnet encodings,/>Subnet code corresponding to individual representing highest fitness,/>Representing fitness calculation function, W representing weights within the super-network,/>Representing the value of weight W when the loss function value is the lowest,/>Indicating/>, when the value in the return bracket is maximized，/>W, representing the minimum return of the values in brackets, divides the training dataset into two parts, denoted as/>, respectivelyAnd/>Wherein the dataset/>The data in the data set is used for training the weight of the super-network, and the data set/>The data in (2) is used for evaluating the subnet performance in the process of optimizing the subnet coding, and Loss represents a cross entropy Loss function commonly used in classification tasks.

10. The system of a neural architecture search method based on a differential evolution algorithm according to any one of claims 1 to 9, comprising:

The coding module is used for constructing a super-network comprising M connecting edges and N operations, wherein each sub-network uses continuous numerical values ranging from 0 to 1 to assign a weight to each operation on each connecting edge, and the weights are used as codes to uniquely represent individuals in a population;

the training module is used for training the network weight of the super-network by using the training data and the labels thereof;

and the optimizing module is used for optimizing the subnet codes by using a differential evolution algorithm.