CN116977819A

CN116977819A - Facial expression recognition neural network architecture searching method, device and application

Info

Publication number: CN116977819A
Application number: CN202310750769.4A
Authority: CN
Inventors: 孙亚楠; 杨潇; 李宸; 魏嘉宝
Original assignee: Sichuan University
Current assignee: Sichuan University
Priority date: 2023-06-22
Filing date: 2023-06-22
Publication date: 2023-10-31
Anticipated expiration: 2043-06-22
Also published as: CN116977819B

Abstract

The invention discloses a facial expression recognition neural network searching method, a device and application, wherein the searching method comprises the steps of constructing a searching space facing facial expression recognition; generating a plurality of facial expression recognition candidate networks meeting preset parameters in a random search mode; selecting a preset number of zero-shot agents meeting preset conditions from the NAS reference data set; and selecting an optimal candidate network from the plurality of facial expression recognition candidate networks by adopting the selected zero-shot agent as a final facial expression recognition network. The application method of the facial expression recognition network comprises the steps that the facial expression recognition network is obtained by searching by adopting a structure search method, and the facial expression recognition network is trained by selecting a facial expression image data set; acquiring a facial expression image to be identified, and preprocessing the facial expression image; and inputting the preprocessed facial expression image into a trained facial expression recognition network for recognition, and obtaining a recognition result.

Description

Facial expression recognition neural network architecture search method, device and application

技术领域Technical field

本发明涉及面部表情识别技术，具体涉及一种面部表情识别神经网络架构搜索方法、装置及应用。The invention relates to facial expression recognition technology, and in particular to a facial expression recognition neural network architecture search method, device and application.

背景技术Background technique

面部表情识别（Facial Expression Recognition，FER）是计算机视觉领域中的一个重要研究方向，它具有广阔的应用前景和巨大的商业价值。例如，在远程教育、安全驾驶和健康医疗等人机交互领域中，通过识别用户的情感状态，从而提供更加智能和个性化的服务。目前，卷积神经网络（Convolutional Neural Network，CNN）凭借其强大的特征提取能力以及灵活的层次化表达，在面部表情识别任务中取得了巨大的成功。然而，传统的基于CNN的FER网络往往需要大量的计算资源来运行，这导致其在移动电话、笔记本电脑等资源受限的智能设备上部署存在极大的局限性。Facial Expression Recognition (FER) is an important research direction in the field of computer vision. It has broad application prospects and huge commercial value. For example, in the fields of human-computer interaction such as distance education, safe driving and health care, more intelligent and personalized services can be provided by identifying the user's emotional state. Currently, Convolutional Neural Network (CNN) has achieved great success in facial expression recognition tasks due to its powerful feature extraction capabilities and flexible hierarchical expression. However, traditional CNN-based FER networks often require a large amount of computing resources to run, which leads to great limitations in their deployment on resource-constrained smart devices such as mobile phones and laptops.

例如，当前线上学习受到众多好评，为学生提供了灵活的学习方式，但教师无法实时了解到学生的学习状态，从而接收课堂反馈并进行调整。面部表情识别模型能够实时收集学生的学习状态，进而提高线上教学质量。然而，线上学习通常是在移动端设备（手机、平板电脑等）上进行，因而FER模型需要被部署在移动端设备，但目前已有的高精度的FER模型占有资源大，受到移动端设备的资源限制无法直接部署应用。因此，为这些资源受限的设备设计轻量化的FER模型，从而增强这些设备的智能化，成为了一个亟待解决的问题。For example, currently online learning has received many positive reviews and provides students with a flexible learning method, but teachers cannot understand students' learning status in real time to receive classroom feedback and make adjustments. The facial expression recognition model can collect students' learning status in real time, thereby improving the quality of online teaching. However, online learning is usually conducted on mobile devices (mobile phones, tablets, etc.), so the FER model needs to be deployed on mobile devices. However, the existing high-precision FER models occupy a large amount of resources and are limited by mobile devices. The resource limit prevents the application from being deployed directly. Therefore, designing lightweight FER models for these resource-constrained devices to enhance the intelligence of these devices has become an urgent problem to be solved.

为了实现FER模型在移动终端上的部署，市面上出现了轻量化FER模型的构造方法，其中基于采样的NAS方法，其是在一定规则下对搜索空间中网络进行采样，利用评估得到的模型性能作为反馈信号，来指导算法搜索出更优的FER模型。该种方法需要训练多个FER 模型，而 FER模型的训练是非常耗时的过程，这使得在现实中难以根据不同移动终端快速设计出占用较小资源的 FER 网络，这进一步限制了FER在不同移动终端中的推广应用。In order to implement the deployment of FER models on mobile terminals, lightweight FER model construction methods have emerged on the market. Among them, the sampling-based NAS method samples the network in the search space under certain rules and uses the evaluated model performance As a feedback signal, it guides the algorithm to search for a better FER model. This method requires training multiple FER models, and the training of FER models is a very time-consuming process, which makes it difficult to quickly design FER networks that occupy smaller resources according to different mobile terminals in reality, which further limits the application of FER in different applications. Promotion and application in mobile terminals.

发明内容Contents of the invention

针对现有技术中的上述不足，本发明提供的面部表情识别神经网络架构搜索方法、装置及应用解决了现有架构搜索方法难以针对不同移动终端快速设计出占用较小资源的FER网络的问题。In view of the above-mentioned deficiencies in the prior art, the facial expression recognition neural network architecture search method, device and application provided by the present invention solve the problem that it is difficult for the existing architecture search method to quickly design a FER network that occupies smaller resources for different mobile terminals.

为了达到上述发明目的，本发明采用的技术方案为：In order to achieve the above-mentioned object of the invention, the technical solutions adopted by the present invention are:

第一方面，提供一种面部表情识别神经网络架构搜索方法及其装置，其包括步骤：In the first aspect, a facial expression recognition neural network architecture search method and device are provided, which includes the steps:

S1、构建面向面部表情识别的搜索空间；S1. Construct a search space for facial expression recognition;

S2、采用随机搜索的方式生成若干个满足预设参数量的面部表情识别候选网络；S2. Use random search to generate several facial expression recognition candidate networks that meet the preset parameters;

S3、在NAS数据集中选取预设数量满足预设条件的zero-shot代理；S3. Select a preset number of zero-shot agents that meet the preset conditions in the NAS data set;

S4、采用选取的zero-shot代理从若干面部表情识别候选网络中选取最优的候选网络作为最终的面部表情识别网络。S4. Use the selected zero-shot agent to select the optimal candidate network from several facial expression recognition candidate networks as the final facial expression recognition network.

进一步地，所述步骤S3进一步包括：Further, the step S3 further includes:

S31、选取若干zero-shot代理，在多个NAS基准数据集中随机采样若干的神经网络，并采用zero-shot代理计算每个神经网络基于精度的代理值；S31. Select several zero-shot proxies, randomly sample several neural networks in multiple NAS benchmark data sets, and use zero-shot proxies to calculate the accuracy-based proxy value of each neural network;

S32、根据zero-shot代理值和神经网络的验证精度的相关性及zero-shot代理值与验证精度之间的互信息，选取预设数量的zero-shot代理。S32. Select a preset number of zero-shot agents based on the correlation between the zero-shot agent value and the verification accuracy of the neural network and the mutual information between the zero-shot agent value and the verification accuracy.

进一步地，所述步骤S4进一步包括：Further, the step S4 further includes:

S41、采用选取的每个zero-shot代理计算每个候选FER网络的代理值，并对同一个zero-shot代理的所有代理值进行升序排序；S41. Use each selected zero-shot agent to calculate the agent value of each candidate FER network, and sort all agent values of the same zero-shot agent in ascending order;

S42、累加同一个候选FER网络的排名得到累加值，选取累加值最大的候选面部表情识别网络作为最终的面部表情识别网络。S42. Accumulate the rankings of the same candidate FER network to obtain the cumulative value, and select the candidate facial expression recognition network with the largest cumulative value as the final facial expression recognition network.

进一步地，further,

步骤S32进一步包括：Step S32 further includes:

S321、根据每个zero-shot代理在同一个NAS基准数据集中的所有zero-shot代理值及对应神经网络的验证精度，计算Spearman相关系数；S321. Calculate the Spearman correlation coefficient based on all zero-shot agent values of each zero-shot agent in the same NAS benchmark data set and the verification accuracy of the corresponding neural network;

S322、判断Spearman相关系数始终同向的zero-shot代理的数量与预设数量的关系，若大于则进入S323，若小于则进入S325，若等于则进入S326；S322. Determine the relationship between the number of zero-shot agents whose Spearman correlation coefficients are always in the same direction and the preset number. If it is greater, go to S323, if it is less, go to S325, and if it is equal, go to S326;

S323、对Spearman相关系数始终同向的所有zero-shot代理进行所有可能的组合，并使每组的zero-shot代理数量等于预设数量；S323. Perform all possible combinations of all zero-shot agents whose Spearman correlation coefficients are always in the same direction, and make the number of zero-shot agents in each group equal to the preset number;

S324、计算每个组合在每个NAS基准数据集的互信息，计算每个组合对应的所有互信息的平均值，并选取最大平均值对应zero-shot代理组合；S324. Calculate the mutual information of each combination in each NAS benchmark data set, calculate the average of all mutual information corresponding to each combination, and select the zero-shot agent combination corresponding to the maximum average;

S325、删除Spearman相关系数中正负值个数相差最少的zero-shot代理，并使选取的zero-shot代理数量等于预设数量；S325. Delete the zero-shot agents with the smallest number of positive and negative values in the Spearman correlation coefficient, and make the number of selected zero-shot agents equal to the preset number;

S326、选取Spearman相关系数始终同向的zero-shot代理。S326. Select a zero-shot agent whose Spearman correlation coefficient is always in the same direction.

上述技术方案的有益效果为：使用zero-shot代理进行神经网络性能的评估，能够极大的降低计算复杂度，节约计算成本。在多个NAS基准数据集中，使用Spearman相关系数衡量zero-shot代理预测神经网络精度的能力，以保证挑选出的zero-shot代理具有泛化性，能够直接迁移至其他任务中，从而保证在未知NAS任务中仍能有效预测神经网络的精度。The beneficial effect of the above technical solution is: using zero-shot agent to evaluate the performance of neural network can greatly reduce the computational complexity and save computational costs. In multiple NAS benchmark data sets, the Spearman correlation coefficient is used to measure the ability of the zero-shot agent to predict the accuracy of the neural network to ensure that the selected zero-shot agent has generalization properties and can be directly transferred to other tasks, thereby ensuring that the zero-shot agent can be used in unknown tasks. The accuracy of neural networks can still be effectively predicted in NAS tasks.

进一步地，所述互信息的计算公式为：Further, the calculation formula of the mutual information is:

其中，A为神经网络的验证精度；分别为对应的第1…m个zero-shot代理计算出的代理值；/>为对应于A的信息熵；/>为/>的联合信息熵；为/>的联合信息熵；/>为A取/>的边缘概率分布函数；m为随机变量的维度；/>为/>取/>的联合概率分布函数；为/>取/>的联合概率分布函数。Among them, A is the verification accuracy of the neural network; The agent values calculated for the corresponding 1st... m zero-shot agents respectively;/> is the information entropy corresponding to A;/> for/> The joint information entropy of; for/> The joint information entropy of ;/> Take/> for A The marginal probability distribution function of ; m is the dimension of the random variable;/> for/> Take/> joint probability distribution function; for/> Take/> joint probability distribution function.

上述技术方案的有益效果为：互信息能够直观的衡量出zero-shot代理捕获神经网络中精度相关特征的能力，以便于从理论层面分析不同zero-shot代理是否适合用于组合。同时，通过对比多个和单个zero-shot代理捕获特征的能力，能够从理论上说明部分zero-shot代理之间确实存在互补关系，组合多种代理确实能够提高预测神经网络精度的能力。The beneficial effect of the above technical solution is that mutual information can intuitively measure the ability of the zero-shot agent to capture accuracy-related features in the neural network, so as to facilitate a theoretical analysis of whether different zero-shot agents are suitable for combination. At the same time, by comparing the ability of multiple and single zero-shot agents to capture features, it can be theoretically demonstrated that there is indeed a complementary relationship between some zero-shot agents, and that combining multiple agents can indeed improve the accuracy of the prediction neural network.

进一步地，所述预设数量的选取方法包括：Further, the selection method of the preset number includes:

A1、对选取的所有zero-shot代理进行q轮组合，生成数量等于d值的所有可能的zero-shot代理组合，一轮选取完成后，令d=d+1， 1≤d≤q，q为zero-shot代理的总数量；A1. Perform q rounds of combinations on all selected zero-shot agents to generate all possible zero-shot agent combinations with a number equal to the d value. After one round of selection is completed, let d = d +1, 1≤ d ≤ q , q is the total number of zero-shot agents;

A2、在每个值d下，计算其对应的所有组合在每个NAS基准数据集的互信息，并选取同一NAS基准数据集下所有互信息中的最大值；A2. Under each value d , calculate the mutual information of all its corresponding combinations in each NAS benchmark data set, and select the maximum value among all mutual information under the same NAS benchmark data set;

A3、采用同一NAS基准数据集对应的不同d值中最大互信息绘制曲线，选取所有曲线均趋于稳定时对应的d值作为预设数量。A3. Use the maximum mutual information among different d values corresponding to the same NAS benchmark data set to draw a curve, and select the corresponding d value when all curves tend to be stable as the preset number.

上述技术方案的有益效果为：通过计算持续增加zero-shot代理混合数量过程中互信息的增长趋势，可以得到最优的组合数量。当组合数量增加，而互信息值不再变动时，则意味着计算资源的浪费。选择最恰当的组合数量，保证预测网络精度有效性的同时，保证低计算复杂度。The beneficial effect of the above technical solution is that by calculating the growth trend of mutual information in the process of continuously increasing the number of zero-shot agent mixtures, the optimal number of combinations can be obtained. When the number of combinations increases and the mutual information value no longer changes, it means a waste of computing resources. Select the most appropriate number of combinations to ensure the accuracy and effectiveness of the prediction network while ensuring low computational complexity.

进一步地，所述步骤S2进一步包括：Further, the step S2 further includes:

S201、每个候选面部表情识别网络包括N个依次连接的stage序列，每个stage序列包括l个依次连接的layer；S201. Each candidate facial expression recognition network includes N sequentially connected stage sequences, and each stage sequence includes l sequentially connected layers;

S202、选取当前候选面部表情识别网络的每个stage序列的通道扩张率；S202. Select the channel expansion rate of each stage sequence of the current candidate facial expression recognition network;

S203、随机选取标准卷积、移动反向瓶颈卷积、混合深度卷积和跳跃连接中的一个作为layer的操作块，S203. Randomly select one of standard convolution, moving reverse bottleneck convolution, hybrid depth convolution and skip connection as the layer operation block.

S204、当选中的是标准卷积，则从{3×3，5×5，7×7}三种卷积核中随机选取一个卷积核大小，之后进入步骤S208；S204. When the standard convolution is selected, randomly select a convolution kernel size from the three convolution kernels {3×3, 5×5, 7×7}, and then enter step S208;

S205、当选中的是移动反向瓶颈卷积，则从{3×3，5×5，7×7}三种卷积核中随机选取一个卷积核大小，之后进入步骤S207；S205. When the moving reverse bottleneck convolution is selected, randomly select a convolution kernel size from the three convolution kernels {3×3, 5×5, 7×7}, and then enter step S207;

当选中的是混合深度卷积，则分别从{2，3，4}三种组数中随机选择一个待混合的不同尺度卷积核的个数，之后进入步骤S207；When the hybrid depth convolution is selected, the number of convolution kernels of different scales to be mixed is randomly selected from three groups of {2, 3, 4}, and then step S207 is entered;

S206、当选中的是跳跃连接，则进入步骤S208；S206. When the skip connection is selected, enter step S208;

S207、生成[0，1]中的一个随机数，当随机数大于等于0.5，则在操作块中加入注意力模块，当随机数小于0.5，则不加入注意力模块，之后进入步骤S208；S207. Generate a random number in [0, 1]. When the random number is greater than or equal to 0.5, the attention module is added to the operation block. When the random number is less than 0.5, the attention module is not added, and then step S208 is entered;

S208、判断当前候选面部表情识别网络中每个stage的所有layer是否已配置操作块，若是，则当前候选面部表情识别网络生成完成，进入步骤S209，否则返回步骤S203；S208. Determine whether all layers of each stage in the current candidate facial expression recognition network have been configured with operation blocks. If so, the generation of the current candidate facial expression recognition network is completed and step S209 is entered. Otherwise, return to step S203;

S209、判断当前候选面部表情识别网络的参数量是否满足预设参数量，若是，则进入步骤S210，否则删除当前候选面部表情识别网络，并返回步骤S202；S209. Determine whether the parameter amount of the current candidate facial expression recognition network meets the preset parameter amount. If so, proceed to step S210. Otherwise, delete the current candidate facial expression recognition network and return to step S202;

S210、判断已生成的候选面部表情识别网络是否存在与当前候选面部表情识别网络相同的网络，若是，则删除当前候选面部表情识别网络，并返回步骤S202，否则进入步骤S211；S210. Determine whether the generated candidate facial expression recognition network is the same as the current candidate facial expression recognition network. If so, delete the current candidate facial expression recognition network and return to step S202. Otherwise, proceed to step S211;

S211、判断已生成的候选面部表情识别网络的数量是否满足预设网络数量，若是，则完成候选网络的生成，否则，返回步骤S202。S211. Determine whether the number of generated candidate facial expression recognition networks meets the preset number of networks. If so, complete the generation of candidate networks. Otherwise, return to step S202.

上述技术方案的有益效果为：面向面部表情识别任务设计了一个层级搜索空间，允许更加灵活的生成性能优异的FER网络。在搜索空间中引入的混合深度卷积，能够更全面的学习面部表情图像的特征；引入的注意力机制能够更加关注表情图像中的关键特征，从而有效提高FER网络的识别精度。同时，在生成候选FER网络过程中，增加了一个预设的约束条件，能够保证生成满足目标设备资源限制的FER网络。The beneficial effect of the above technical solution is: a hierarchical search space is designed for facial expression recognition tasks, allowing more flexible generation of FER networks with excellent performance. The hybrid depth convolution introduced in the search space can learn the features of facial expression images more comprehensively; the attention mechanism introduced can pay more attention to the key features in expression images, thereby effectively improving the recognition accuracy of the FER network. At the same time, in the process of generating candidate FER networks, a preset constraint is added to ensure that a FER network that meets the resource constraints of the target device is generated.

第二方面，提供一种面部表情识别神经网络架构搜索装置，其包括：In a second aspect, a facial expression recognition neural network architecture search device is provided, which includes:

搜索空间构建模块，用于构建面向面部表情识别的搜索空间；Search space building module, used to build a search space for facial expression recognition;

候选网络生成模块，用于采用随机搜索的方式生成若干个满足预设参数量的面部表情识别候选网络；The candidate network generation module is used to use random search to generate several facial expression recognition candidate networks that meet the preset parameters;

zero-shot代理选取模块，用于在NAS数据集中选取预设数量满足预设条件的zero-shot代理；The zero-shot agent selection module is used to select a preset number of zero-shot agents that meet the preset conditions in the NAS data set;

面部表情识别网络性能评估模块，用于采用选取的zero-shot代理从若干面部表情识别候选网络中选取最优的候选网络作为最终的面部表情识别网络；Facial expression recognition network performance evaluation module, used to use the selected zero-shot agent to select the best candidate network from several facial expression recognition candidate networks as the final facial expression recognition network;

面部表情识别模块，用于选取面部表情图像数据集对面部表情识别神经网络进行训练，训练完毕后用于对待识别面部表情图像进行识别。The facial expression recognition module is used to select a facial expression image data set to train the facial expression recognition neural network. After training, it is used to recognize the facial expression image to be recognized.

第三方面，提供一种面部表情识别网络的应用方法，面部表情识别网络采用面部表情识别神经网络架构搜索方法搜索得到，其特征在于，包括：In the third aspect, an application method of a facial expression recognition network is provided. The facial expression recognition network is searched using a facial expression recognition neural network architecture search method, and is characterized by including:

选取面部表情图像数据集对面部表情识别网络进行训练；Select a facial expression image data set to train the facial expression recognition network;

获取待识别的面部表情图像，并对所述面部表情图像进行预处理；Obtain a facial expression image to be recognized and preprocess the facial expression image;

将预处理后的面部表情图像输入已训练的面部表情识别网络进行识别，得到识别结果。Input the preprocessed facial expression image into the trained facial expression recognition network for recognition, and obtain the recognition result.

进一步地，所述面部表情图像数据集中的表情图像包括生气、厌恶、害怕、开心、悲伤、惊讶和自然七种情绪。Further, the expression images in the facial expression image data set include seven emotions: anger, disgust, fear, happiness, sadness, surprise and nature.

进一步地，采用随机梯度下降优化器对面部表情识别神经网络进行训练，训练时，设置初始学习率为0.01，训练300个epoch。Furthermore, a stochastic gradient descent optimizer was used to train the facial expression recognition neural network. During training, the initial learning rate was set to 0.01 and trained for 300 epochs.

本发明的有益效果为：1．本方案采用的混合zero-shot代理能够在初始化阶段预测出网络的精度，极大的降低了设计过程的计算时间，这样可以根据不同移动终端具有的资源情况，快速设计出占用对应移动终端较小资源的FER网络，这使得FER网络在不同移动终端中的推广应用成为了可能。The beneficial effects of the present invention are: 1. The hybrid zero-shot agent used in this solution can predict the accuracy of the network in the initialization stage, which greatly reduces the calculation time of the design process. In this way, according to the resource conditions of different mobile terminals, we can quickly design a mobile terminal that occupies a smaller area. The FER network of resources makes it possible to promote and apply the FER network in different mobile terminals.

2．本方案面向面部表情识别领域设计了一个新型搜索空间，通过增加模型构建的自由度，增强FER模型的多样性，从而设计出性能更优的FER模型。自由度更高的搜索空间能够在模型复杂度和精度之间获得更好的平衡。2. This program designs a new search space for the field of facial expression recognition. By increasing the degree of freedom in model construction, it enhances the diversity of the FER model, thereby designing a FER model with better performance. A search space with higher degrees of freedom can achieve a better balance between model complexity and accuracy.

3．本方案在搜索空间中采用了注意力模块，使得FER模型可以根据全局信息增加重要特征的权重，更加关注对识别精度重要的表情特征，从而进一步增强FER网络的识别精度，而仅增加极少的参数量，这使得更高识别精度且占用内存较小的FER网络在具有不同计算资源的移动终端布置成为了可能。3. This solution uses an attention module in the search space, so that the FER model can increase the weight of important features based on global information and pay more attention to expression features that are important for recognition accuracy, thereby further enhancing the recognition accuracy of the FER network while only adding very little The amount of parameters makes it possible for the FER network with higher recognition accuracy and smaller memory footprint to be deployed on mobile terminals with different computing resources.

附图说明Description of the drawings

图1为面部表情识别神经网络架构搜索方法的流程图。Figure 1 is a flow chart of the neural network architecture search method for facial expression recognition.

图2为搜索空间的框架的示意图。Figure 2 is a schematic diagram of the framework of the search space.

图3为每个NAS基准数据集在不同值d下绘制的曲线的示意图。Figure 3 is a schematic diagram of the curves plotted at different values of d for each NAS benchmark data set.

具体实施方式Detailed ways

下面对本发明的具体实施方式进行描述，以便于本技术领域的技术人员理解本发明，但应该清楚，本发明不限于具体实施方式的范围，对本技术领域的普通技术人员来讲，只要各种变化在所附的权利要求限定和确定的本发明的精神和范围内，这些变化是显而易见的，一切利用本发明构思的发明创造均在保护之列。The specific embodiments of the present invention are described below to facilitate those skilled in the art to understand the present invention. However, it should be clear that the present invention is not limited to the scope of the specific embodiments. For those of ordinary skill in the technical field, as long as various changes These changes are obvious within the spirit and scope of the invention as defined and determined by the appended claims, and all inventions and creations utilizing the concept of the invention are protected.

参考图1，图1示出了面部表情识别神经网络架构搜索方法的流程图；如图1所示，该方法S包括步骤S1至步骤S4。Referring to Figure 1, Figure 1 shows a flow chart of a neural network architecture search method for facial expression recognition; as shown in Figure 1, the method S includes steps S1 to S4.

在步骤S1中，构建面向面部表情识别的搜索空间；In step S1, a search space for facial expression recognition is constructed;

在步骤S2中，采用随机搜索的方式生成若干个满足预设参数量的面部表情识别候选网络；In step S2, a random search method is used to generate several facial expression recognition candidate networks that meet the preset parameter amounts;

在本发明的一个实施例中，步骤S2进一步包括：In one embodiment of the present invention, step S2 further includes:

本方案的通道扩张率可取值{1，4，6}，每次选取时随机选择，本方案引入通道扩张率后可以通过增大分离卷积的输出通道数来增加非线性表达能力。The channel expansion rate of this scheme can take values {1, 4, 6}, which are randomly selected each time. After the channel expansion rate is introduced in this scheme, the nonlinear expression ability can be increased by increasing the number of output channels of the separated convolution.

由于面部表情图像通常是在不同光照、有遮挡物和不同姿态下获得的，极大的影响了FER模型提取表情特征的能力，本方案引入的混合深度卷积，能够提取不同尺度特征，以学习面部表情的细微差别，以期提高模型的表情识别精度。Since facial expression images are usually obtained under different lighting, occlusions and different postures, it greatly affects the ability of the FER model to extract expression features. The hybrid depth convolution introduced in this solution can extract features of different scales to learn Subtle differences in facial expressions in order to improve the model's expression recognition accuracy.

本方案引入的跳跃连接（skip操作）可以直接输出输入的特征映射且无需任何计算，以允许自动调节FER网络架构的深度，从而设计出更加多样化的FER模型。注意力模块SEblock的引入，其能够利用全局信息有选择性地增加重要面部表情特征地权重，使得模型更加关注学习对表情识别有用地特征，从而增加识别精度，减小由光线、姿态和遮挡带来的影响。The skip connection (skip operation) introduced in this solution can directly output the input feature map without any calculation, allowing the depth of the FER network architecture to be automatically adjusted, thereby designing more diverse FER models. The introduction of the attention module SEblock can use global information to selectively increase the weight of important facial expression features, allowing the model to pay more attention to learning features useful for expression recognition, thereby increasing recognition accuracy and reducing the effects of light, posture and occlusion. coming influence.

在步骤S3中，在NAS数据集中选取预设数量满足预设条件的zero-shot代理：In step S3, a preset number of zero-shot agents that meet the preset conditions are selected from the NAS data set:

本方案从每个NAS基准数据集中随机采样1000个神经网络用于计算，其中采用的基准数据集为NAS-Bench-101、NAS-Bench-201和Network Design Space（NDS），除了NAS-Bench-201中在三个数据集CIFAR10、CIFAR100和ImageNet16-120上训练获得，其他都仅在CIFAR10上进行训练，即本方案采用了9个NAS基准数据集，在每个NAS基准数据集中随机采样1000个神经网络用于进行zero-shot代理的选取。This solution randomly samples 1000 neural networks from each NAS benchmark data set for calculation. The benchmark data sets used are NAS-Bench-101, NAS-Bench-201 and Network Design Space (NDS), except for NAS-Bench- 201 is obtained by training on three data sets: CIFAR10, CIFAR100 and ImageNet16-120. The others are only trained on CIFAR10. That is, this solution uses 9 NAS benchmark data sets and randomly samples 1,000 in each NAS benchmark data set. Neural networks are used for zero-shot agent selection.

Zero-shot代理方法是一种无需训练的性能评估方法，其是基于一种与网络精度相关的网络特征，从单个批次数据的前向或反向传播中预测出神经网络的相对精度，其在几秒内即可完成对单个网络架构的精度评估，从而实现面部表情识别神经网络搜索的加速。The zero-shot proxy method is a performance evaluation method that does not require training. It predicts the relative accuracy of a neural network from forward or backward propagation of a single batch of data based on a network characteristic related to network accuracy. Accuracy assessment of individual network architectures is completed in seconds, accelerating neural network searches for facial expression recognition.

实施时，本方案优选预设数量的选取方法包括：During implementation, the preferred selection methods for the preset quantity of this plan include:

在步骤A1中，假设一共选取了12个不同的zero-shot代理，那么就需要进行12轮组合，每轮得到的zero-shot代理组合数量为、/>、/>、…/>…/>，每轮中的/>个组合中的zero-shot代理数均为d。In step A1, assuming a total of 12 different zero-shot agents are selected, then 12 rounds of combinations are needed. The number of zero-shot agent combinations obtained in each round is ,/> ,/> ,…/> …/> ,/> in each round The number of zero-shot agents in each combination is d .

A3、采用同一NAS基准数据集对应的不同d值中最大互信息绘制曲线，选取所有曲线均趋于稳定时对应的d值作为预设数量。绘制的曲线可以参考图3，通过图3可以看出，在值d大于4时，曲线就趋于稳定了，此时预设数量的取值就为4。A3. Use the maximum mutual information among different d values corresponding to the same NAS benchmark data set to draw a curve, and select the corresponding d value when all curves tend to be stable as the preset number. The curve drawn can be referred to Figure 3. It can be seen from Figure 3 that when the value d is greater than 4, the curve becomes stable, and the value of the preset quantity is 4 at this time.

在本发明的一个实施例中，步骤S32进一步包括：In one embodiment of the present invention, step S32 further includes:

假设本方案选取了12个zero-shot代理（参考表1第一横排的示出的12个zero-shot代理），再结合上面提及的9个NAS基准数据集及随机采样的1000个神经网络，最终计算出来的Spearman相关系数可以参考表1。Assume that this solution selects 12 zero-shot agents (refer to the 12 zero-shot agents shown in the first horizontal row of Table 1), combined with the 9 NAS benchmark data sets mentioned above and 1000 randomly sampled neural Network, the final calculated Spearman correlation coefficient can be referred to Table 1.

表1在不同NAS基准数据集上，计算出的相关系数Table 1 Calculated correlation coefficients on different NAS benchmark data sets

在表1中，Spearman相关系数绝对值越高，预测精度越可靠。In Table 1, the higher the absolute value of the Spearman correlation coefficient, the more reliable the prediction accuracy is.

在步骤S4中，采用选取的zero-shot代理从若干面部表情识别候选网络中选取最优的候选网络作为最终的面部表情识别网络：In step S4, the selected zero-shot agent is used to select the optimal candidate network from several facial expression recognition candidate networks as the final facial expression recognition network:

实施时，所述互信息的计算公式为：When implemented, the calculation formula of the mutual information is:

其中，A为神经网络的验证精度；/>分别为对应的第1…m个zero-shot代理计算出的代理值；为对应于A的信息熵；/>为/>的联合信息熵；/>为的联合信息熵；/>为A取/>的边缘概率分布函数；m为随机变量的维度；为/>取/>的联合概率分布函数；/>为/>取/>的联合概率分布函数。 Among them, A is the verification accuracy of the neural network;/> The agent values calculated for the corresponding 1st... m zero-shot agents respectively; is the information entropy corresponding to A;/> for/> The joint information entropy of ;/> for The joint information entropy of ;/> Take/> for A The marginal probability distribution function of ; m is the dimension of the random variable; for/> Take/> joint probability distribution function;/> for/> Take/> joint probability distribution function.

第三方面，本方案还提供一种面部表情识别网络的应用方法，面部表情识别网络采用面部表情识别神经网络架构搜索方法搜索得到，其特征在于，包括：In the third aspect, this solution also provides an application method of facial expression recognition network. The facial expression recognition network is searched using the facial expression recognition neural network architecture search method. It is characterized by:

其中，面部表情图像数据集中的表情图像至少涵盖生气、厌恶、害怕、开心、悲伤、惊讶和自然七种情绪；采用随机梯度下降优化器对面部表情识别神经网络进行训练，训练时，设置初始学习率为0.01，训练300个epoch。Among them, the expression images in the facial expression image data set cover at least seven emotions: anger, disgust, fear, happiness, sadness, surprise and natural; a stochastic gradient descent optimizer is used to train the facial expression recognition neural network. During training, the initial learning is set The rate is 0.01 and the training is for 300 epochs.

下面对本方案设计的轻量化面部表情识别网络能力的验证：The following is a verification of the lightweight facial expression recognition network capabilities designed in this solution:

为验证集成zero-shot代理作为性能评估策略的神经架构搜索算法自动设计轻量化FER模型的性能，本方案在面部表情识别领域广泛使用的数据集FER2013上进行评估工作，将基于zero-shot代理预测出的性能最优的FER模型，与现有的最新的手工精心设计的轻量化FER模型以及自动设计出的轻量化FER模型进行对比。In order to verify the performance of the neural architecture search algorithm that integrates the zero-shot agent as a performance evaluation strategy to automatically design a lightweight FER model, this program conducts evaluation work on the FER2013 dataset, which is widely used in the field of facial expression recognition, and will predict based on the zero-shot agent. The FER model with the best performance is compared with the existing latest hand-designed lightweight FER model and the automatically designed lightweight FER model.

A．数据集构建A. Dataset construction

本方案选取了面部表情识别领域中具有挑战性的数据集FER2013。该数据集是在真实环境下获得的表情图像，包含了生气、厌恶、害怕、开心、悲伤、惊讶和自然其七种情绪。This solution selects FER2013, a challenging data set in the field of facial expression recognition. This data set is an expression image obtained in a real environment, including seven emotions including anger, disgust, fear, happiness, sadness, surprise and nature.

B．任务评价指标B． Task evaluation index

对于设计轻量化面部表情识别网络，采用模型复杂度和识别精度作为评价指标。在这里为便于计算和对比，采用参数量作为模型复杂度的评价指标，以兆（M）为单位。For designing lightweight facial expression recognition networks, model complexity and recognition accuracy are used as evaluation indicators. In order to facilitate calculation and comparison here, the parameter amount is used as the evaluation index of model complexity, in megabytes (M) as the unit.

C．算法参数设置C． Algorithm parameter settings

如zero-shot代理方法中常用的Kaiming初始化方法，在计算FER模型的代理值之前对模型参数进行初始化。本方案设置了足够大的采样数量10000，使随机搜索的FER模型接近最优解。为了进一步说明本方案的方法的有效性并非偶然，本方案运行了10次算法，并使用不同的随机种子。此外，考虑现有的NAS方法设计出的FER模型的参数量在3M左右，本方案设置约束条件为参数量小于3M。For example, the Kaiming initialization method commonly used in the zero-shot proxy method initializes the model parameters before calculating the proxy value of the FER model. This plan sets a large enough sampling number of 10,000 to make the random search FER model close to the optimal solution. In order to further illustrate that the effectiveness of the method of this scheme is not accidental, this scheme ran the algorithm 10 times and used different random seeds. In addition, considering that the parameter amount of the FER model designed by the existing NAS method is around 3M, this solution sets the constraint that the parameter amount is less than 3M.

训练预测出的最优模型的验证精度，采用随机梯度下降优化器，设置初始学习率为0.01，训练300个epoch。To verify the accuracy of the optimal model predicted by training, use the stochastic gradient descent optimizer, set the initial learning rate to 0.01, and train for 300 epochs.

D．结果比较与分析D. Results comparison and analysis

结果比较：本方案（NAS based on hybrid zero-shot proxy，HZS-NAS），在数据集FER2013中，与手工设计的轻量化模型BKVGG12、SHCNN、Light-CNN、eXnet、MBCC-CNN以及自动设计的轻量化模型MNAS-based、Auto-FERNet进行对比，比对结果参考表2。Comparison of results: This solution (NAS based on hybrid zero-shot proxy, HZS-NAS), in the data set FER2013, compared with the manually designed lightweight model BKVGG12, SHCNN, Light-CNN, eXnet, MBCC-CNN and automatically designed The lightweight models MNAS-based and Auto-FERNet are compared. The comparison results are shown in Table 2.

表2 数据集FER2013上的实验结果Table 2 Experimental results on the data set FER2013

结果分析：在数据集FER2013上，就两个评价指标而言，识别精度有着显著的提升，并且减少了参数量。与Light-CNN和Auto-FERNet相比，虽然分别增加了1.32M和0.33M的参数量，但识别精度也得了6.02%和0.91%的提升。Result analysis: On the data set FER2013, in terms of the two evaluation indicators, the recognition accuracy has been significantly improved, and the number of parameters has been reduced. Compared with Light-CNN and Auto-FERNet, although the number of parameters increased by 1.32M and 0.33M respectively, the recognition accuracy was also improved by 6.02% and 0.91%.

此外，本方案只消耗了10个GPU小时，与自动设计两种方法相比，极大的降低了计算开销，Auto-FERNet方法实现需要46个GPU小时，MNAS-based方法虽然没有报道计算耗时，但在ImageNet数据集上，MNAS算法共消耗了40000个GPU小时，即便在FER2013中也仍是需要大量计算开销的。In addition, this solution only consumes 10 GPU hours. Compared with the two automatic design methods, it greatly reduces the computational overhead. The Auto-FERNet method requires 46 GPU hours to implement. Although the MNAS-based method does not report the calculation time , but on the ImageNet data set, the MNAS algorithm consumed a total of 40,000 GPU hours, which still requires a lot of computing overhead even in FER2013.

通过上述对比分析可知，本方案的搜索方法大幅缩短了搜索时间，且参数量也相对较小，这样可以根据不同移动终端具有的资源情况，快速设计出占用对应移动终端较小资源的 FER 网络，以使FER 网络在不同智能终端上有望得以普及。From the above comparative analysis, it can be seen that the search method of this solution greatly shortens the search time, and the number of parameters is relatively small. In this way, according to the resource conditions of different mobile terminals, a FER network that occupies smaller resources corresponding to the mobile terminal can be quickly designed. So that the FER network is expected to be popularized on different smart terminals.

Claims

1. Facial expression recognition neural network search method, which is characterized by including the following steps:

S1. Construct a search space for facial expression recognition;

S2. Use random search to generate several facial expression recognition candidate networks that meet the preset parameters;

S3. Select a preset number of zero-shot agents that meet the preset conditions in the NAS benchmark data set;

S4. Use the selected zero-shot agent to select the optimal candidate network from several facial expression recognition candidate networks as the final facial expression recognition network.

2. The facial expression recognition neural network search method according to claim 1, characterized in that the step S3 further includes:

S31. Select several zero-shot proxies, randomly sample several neural networks in multiple NAS benchmark data sets, and use zero-shot proxies to calculate the accuracy-based proxy value of each neural network;

S32. Select a preset number of zero-shot agents based on the correlation between the zero-shot agent value and the verification accuracy of the neural network and the mutual information between the zero-shot agent value and the verification accuracy.

3. The facial expression recognition neural network search method according to claim 1, characterized in that the step S4 further includes:

S41. Use each selected zero-shot agent to calculate the agent value of each candidate FER network, and sort all agent values of the same zero-shot agent in ascending order;

S42. Accumulate the rankings of the same candidate FER network to obtain the cumulative value, and select the candidate facial expression recognition network with the largest cumulative value as the final facial expression recognition network.

4. The facial expression recognition neural network architecture search method according to claim 2, characterized in that step S32 further includes:

S321. Calculate the Spearman correlation coefficient based on all zero-shot agent values of each zero-shot agent in the same NAS benchmark data set and the verification accuracy of the corresponding neural network;

S322. Determine the relationship between the number of zero-shot agents whose Spearman correlation coefficients are always in the same direction and the preset number. If it is greater, go to S323, if it is less, go to S325, and if it is equal, go to S326;

S323. Perform all possible combinations of all zero-shot agents whose Spearman correlation coefficients are always in the same direction, and make the number of zero-shot agents in each group equal to the preset number;

S324. Calculate the mutual information of each combination in each NAS benchmark data set, calculate the average of all mutual information corresponding to each combination, and select the zero-shot agent combination corresponding to the maximum average;

S325. Delete the zero-shot agents with the smallest number of positive and negative values in the Spearman correlation coefficient, and make the number of selected zero-shot agents equal to the preset number;

S326. Select a zero-shot agent whose Spearman correlation coefficient is always in the same direction.

5. The facial expression recognition neural network architecture search method according to claim 2 or 4, characterized in that the calculation formula of the mutual information is:

I(Z ₁ ,…, _m ;)=(A)+(Z ₁ ,…, _m )-(A,Z ₁ ,…, _m )

Among them, A is the verification accuracy of the neural network; Z ₁ ,..., _m are respectively the agent values calculated by the corresponding 1...mth zero-shot agent; H(A) is the information entropy corresponding to A; H(Z ₁ ,…, _m ) is the joint information entropy of Z ₁ ,…, _m ; H(A,Z ₁ ,…, _m ) is the joint information entropy of A, Z ₁ ,…, _m ; (a _i ) is the joint information entropy of A The marginal probability distribution function of a _i ; m is the dimension of the random variable; P(z ₁ ,...z _m ) is the joint probability distribution function of Z ₁ ,..., _m takes z ₁ ,...z _m ; P(a,z ₁ ,...z _m ) is the joint probability distribution function of a,z ₁ ,... _z _m for A,Z ₁ ,...,m.

6. The facial expression recognition neural network architecture search method according to claim 1, characterized in that the selection method of the preset number includes:

A1. Perform q rounds of combinations on all selected zero-shot agents to generate all possible zero-shot agent combinations with a number equal to the value of d. After one round of selection is completed, let d=d+1, 1≤d≤q, q is the total number of zero-shot agents;

A2. Under each value d, calculate the mutual information of all its corresponding combinations in each NAS benchmark data set, and select the maximum value among all mutual information under the same NAS benchmark data set;

A3. Use the maximum mutual information among different d values corresponding to the same NAS benchmark data set to draw a curve, and select the corresponding d value when all curves tend to be stable as the preset number.

7. The facial expression recognition neural network architecture search method according to any one of claims 1-4 and 6, characterized in that the step S2 further includes:

S201. Each candidate facial expression recognition network includes N sequentially connected stage sequences, and each stage sequence includes l sequentially connected layers;

S202. Select the channel expansion rate of each stage sequence of the current candidate facial expression recognition network;

S203. Randomly select one of standard convolution, moving reverse bottleneck convolution, hybrid depth convolution and skip connection as the layer operation block.

S204. When the standard convolution is selected, randomly select a convolution kernel size from the three convolution kernels {3×3, 5×5, 7×7}, and then enter step S208;

S205. When the moving reverse bottleneck convolution is selected, randomly select a convolution kernel size from the three convolution kernels {3×3, 5×5, 7×7}, and then enter step S207;

When the hybrid depth convolution is selected, the number of convolution kernels of different scales to be mixed is randomly selected from three groups of {2, 3, 4}, and then step S207 is entered;

S206. When the skip connection is selected, enter step S208;

S207. Generate a random number in [0, 1]. When the random number is greater than or equal to 0.5, the attention module is added to the operation block. When the random number is less than 0.5, the attention module is not added, and then step S208 is entered;

S208. Determine whether all layers of each stage in the current candidate facial expression recognition network have been configured with operation blocks. If so, the generation of the current candidate facial expression recognition network is completed and step S209 is entered. Otherwise, return to step S203;

S209. Determine whether the parameter amount of the current candidate facial expression recognition network meets the preset parameter amount. If so, proceed to step S210. Otherwise, delete the current candidate facial expression recognition network and return to step S202;

S210. Determine whether the generated candidate facial expression recognition network is the same as the current candidate facial expression recognition network. If so, delete the current candidate facial expression recognition network and return to step S202. Otherwise, proceed to step S211;

S211. Determine whether the number of generated candidate facial expression recognition networks meets the preset number of networks. If so, complete the generation of candidate networks. Otherwise, return to step S202.

8. A facial expression recognition neural network architecture search device, characterized by including:

Search space building module, used to build a search space for facial expression recognition;

The candidate network generation module is used to use random search to generate several facial expression recognition candidate networks that meet the preset parameters;

The zero-shot agent selection module is used to select a preset number of zero-shot agents that meet the preset conditions in the NAS benchmark data set;

Facial expression recognition network performance evaluation module, used to use the selected zero-shot agent to select the best candidate network from several facial expression recognition candidate networks as the final facial expression recognition network;

The facial expression recognition module is used to select a facial expression image data set to train the facial expression recognition neural network. After training, it is used to recognize the facial expression image to be recognized.

9. Application method of facial expression recognition network. The facial expression recognition network is searched using the facial expression recognition neural network architecture search method described in any one of claims 1 to 7, and is characterized by including:

Select a facial expression image data set to train the facial expression recognition network;

Obtain a facial expression image to be recognized and preprocess the facial expression image;

Input the preprocessed facial expression image into the trained facial expression recognition network for recognition, and obtain the recognition result.

10. The application method of the facial expression recognition network according to claim 9, characterized in that the expression images in the facial expression image data set include seven emotions: anger, disgust, fear, happiness, sadness, surprise and natural.