WO2025139322A1

WO2025139322A1 - Chip defect detection method and system

Info

Publication number: WO2025139322A1
Application number: PCT/CN2024/128016
Authority: WO
Inventors: 明雪飞; 李可; 宿磊; 于成昊; 孙钰; 顾杰斐; 赵新维
Original assignee: Jiangnan University; CETC 58 Research Institute
Current assignee: Jiangnan University; CETC 58 Research Institute
Priority date: 2023-12-28
Filing date: 2024-10-29
Publication date: 2025-07-03
Anticipated expiration: 2026-06-28
Also published as: CN117788427A

Abstract

The present invention relates to a chip defect detection method and system. The method comprises: acquiring a one-dimensional vibration signal of a chip and converting same into two-dimensional image data; training an algorithm NAS by means of the two-dimensional image data, and optimizing the algorithm NAS to obtain a neural architecture search algorithm model ASNDARTS; constructing a knowledge distillation algorithm model DPSKD on the basis of the neural architecture search algorithm model ASNDARTS; and acquiring a one-dimensional vibration signal of a chip under test and converting sane into two-dimensional image data, and detecting the two-dimensional image data of the chip under test by means of the knowledge distillation algorithm model DPSKD, so as to determine whether the chip under test has a defect. In the present invention, the algorithm NAS is optimized, the knowledge distillation transfer method is optimized, and the obtained model DPSKD effectively improves the accuracy of chip defect detection while having reduced model redundancy.

Description

Chip defect detection method and system

Technical Field

本发明涉及芯片质量检测技术领域，尤其是指一种芯片缺陷检测方法和系统。The present invention relates to the technical field of chip quality detection, and in particular to a chip defect detection method and system.

Background Art

电子产品的迅速发展促使电子器件向小型化和高集成度方向发展，传统的引线键合技术难以满足要求。倒装芯片封装技术以其较好的可制造性与高可靠性在微电子封装行业被广泛应用。然而，由于芯片和基板的热膨胀系数不匹配，很容易导致芯片的内部缺陷，如裂纹、缺焊和虚焊等。因此，为保证产品的高性能和高可靠性，芯片缺陷检测技术在电子封装领域越来越受到关注。The rapid development of electronic products has prompted electronic devices to develop in the direction of miniaturization and high integration, and traditional wire bonding technology is difficult to meet the requirements. Flip chip packaging technology is widely used in the microelectronic packaging industry due to its good manufacturability and high reliability. However, due to the mismatch of thermal expansion coefficients between the chip and the substrate, it is easy to cause internal defects of the chip, such as cracks, missing solder joints and cold solder joints. Therefore, in order to ensure the high performance and high reliability of the product, chip defect detection technology has received more and more attention in the field of electronic packaging.

近期，深度学习技术已逐步走入工程人视野，许多原因促使它成功的应用于故障检测领域。第一，深度学习可以从大量数据中挖掘有用的信息，并具有较强的大数据处理能力。第二，工业数据采集与存储技术不断发展，工业数据规模不断扩大。这些海量数据为深度学习应用奠定了基础。第三，深度学习具有强大的特征提取和拟合能力，可以实现端到端的故障诊断，节省大量人工数据处理和特征提取步骤。第四，深度学习可以解决分类、预测、决策等多种多样问题。Recently, deep learning technology has gradually entered the field of engineering. There are many reasons for its successful application in the field of fault detection. First, deep learning can extract useful information from a large amount of data and has strong big data processing capabilities. Second, industrial data collection and storage technology continues to develop, and the scale of industrial data continues to expand. These massive data have laid the foundation for the application of deep learning. Third, deep learning has powerful feature extraction and fitting capabilities, which can realize end-to-end fault diagnosis and save a lot of manual data processing and feature extraction steps. Fourth, deep learning can solve a variety of problems such as classification, prediction, and decision-making.

现有的故障诊断深度模型在应对不同的应用场景时，需要专业人员进行大量的试错调试，导致开发成本高、周期长。再者，为了实现最终预测精度的提升，许多现有深度应用表现欠佳的模型将会被添加其他的模块，进而变得越来越臃肿。这无疑会减慢模型在实际应用中的预测速度。深度模型的设计调整需要工作人员具有一定水平的专业知识。因为在不同的工程应用场景、不同的研究对象、不同的信号分布下，针对特定任务需搭建不同的模型结构，来针对性的进行准确的故障模式识别。设计适合各种故障诊断任务并具有出色性能的算法模型是一个很大的挑战。因此用于故障诊断的自动轻量化深度模型研究亟待发展。Existing fault diagnosis deep models require a lot of trial and error debugging by professionals when dealing with different application scenarios, resulting in high development costs and long cycles. Furthermore, in order to achieve the final improvement in prediction accuracy, many existing deep application models with poor performance will be added with other modules, thus becoming more and more bloated. This will undoubtedly slow down the prediction speed of the model in practical applications. The design and adjustment of deep models requires staff to have a certain level of professional knowledge. Because in different engineering application scenarios, Different scenarios, different research objects, and different signal distributions require different model structures for specific tasks to accurately identify fault patterns. Designing algorithm models that are suitable for various fault diagnosis tasks and have excellent performance is a great challenge. Therefore, research on automatic lightweight deep models for fault diagnosis is in urgent need of development.

发明内容Summary of the invention

为此，本发明所要解决的技术问题在于克服现有技术中深度技术中人工干预度大、工程应用指向性模糊、模型冗余臃肿、不便于边缘设备部署的问题。To this end, the technical problem to be solved by the present invention is to overcome the problems in the prior art of deep technology, such as high degree of manual intervention, vague engineering application directionality, redundant and bloated models, and inconvenience in edge device deployment.

为解决上述技术问题，本发明提供了一种芯片缺陷检测方法，包括：In order to solve the above technical problems, the present invention provides a chip defect detection method, comprising:

获取芯片的一维振动信号并转化为二维图像数据；Acquire the one-dimensional vibration signal of the chip and convert it into two-dimensional image data;

构建神经架构搜索算法模型ASNDARTS，基于所述神经架构搜索算法模型ASNDARTS构建知识蒸馏算法模型DPSKD；Construct a neural architecture search algorithm model ASNDARTS, and construct a knowledge distillation algorithm model DPSKD based on the neural architecture search algorithm model ASNDARTS;

所述神经架构搜索算法模型ASNDARTS的构建方法为：将芯片的二维图像数据输入神经架构搜索算法NAS进行训练，并在训练过程中对所述神经架构搜索算法NAS进行优化，当所述神经架构搜索算法NAS达到纳什平衡时，得到神经架构搜索算法模型ASNDARTS；The method for constructing the neural architecture search algorithm model ASNDARTS is as follows: inputting the two-dimensional image data of the chip into the neural architecture search algorithm NAS for training, and optimizing the neural architecture search algorithm NAS during the training process, and obtaining the neural architecture search algorithm model ASNDARTS when the neural architecture search algorithm NAS reaches Nash equilibrium;

所述知识蒸馏算法模型DPSKD的构建方法为：将所述神经架构搜索算法模型ASNDARTS作为知识蒸馏的老师网络，并取神经架构搜索算法模型ASNDARTS中单个cell作为知识蒸馏的学生网络，其中，所述神经架构搜索算法模型ASNDARTS包括若干串联的cell，每个所述cell均为神经架构搜索算法模型ASNDARTS在搜索空间搜索得到的结果；The construction method of the knowledge distillation algorithm model DPSKD is: using the neural architecture search algorithm model ASNDARTS as a teacher network for knowledge distillation, and taking a single cell in the neural architecture search algorithm model ASNDARTS as a student network for knowledge distillation, wherein the neural architecture search algorithm model ASNDARTS includes a plurality of cells connected in series, and each of the cells is a result obtained by searching the neural architecture search algorithm model ASNDARTS in the search space;

基于概率空间解耦老师网络和学生网络之间的知识传递方式，并对学生网络进行超参数寻优，当学生网络收敛时，得到知识蒸馏算法模型DPSKD；Based on the probability space, the knowledge transfer method between the teacher network and the student network is decoupled, and the hyperparameters of the student network are optimized. When the student network converges, the knowledge distillation algorithm model DPSKD is obtained;

获取待检测芯片的一维振动信号并转化为二维图像数据，通过所述知识蒸馏算法模型DPSKD对待检测芯片的二维图像数据进行检测，以判断待检测芯片是否存在缺陷。The one-dimensional vibration signal of the chip to be detected is obtained and converted into two-dimensional image data. The two-dimensional image data of the chip to be detected is detected by the knowledge distillation algorithm model DPSKD to determine the Is the chip defective?

在本发明的一个实施例中，所述在训练过程中对所述神经架构搜索算法NAS进行优化，包括第一优化方法，具体为：In one embodiment of the present invention, the optimization of the neural architecture search algorithm NAS during the training process includes a first optimization method, specifically:

在训练过程中对所述神经架构搜索算法NAS原始的skip_connect操作被选概率进行带限修正。During the training process, the probability of the original skip_connect operation of the neural architecture search algorithm NAS being selected is band-limited.

在本发明的一个实施例中，所述在训练过程中对所述神经架构搜索算法NAS进行优化，包括第二优化方法，具体为：In one embodiment of the present invention, the optimization of the neural architecture search algorithm NAS during the training process includes a second optimization method, specifically:

训练过程中，从算法NAS的搜索空间中随机选择若干操作方式OP组成初始的搜索空间Q1，再将算法NAS的搜索空间中的剩余操作方式OP组成备用的搜索空间Q2，根据预设判据确定初始的搜索空间Q1的操作方式OP与备用的搜索空间Q2中的操作方式OP之间是否需要自适应调整，其中，若操作方式OP需要自适应调整，则将初始的搜索空间Q1中被选概率最低的操作方式OP被备用的搜索空间Q2中的第一个操作方式OP替换掉，并将该被选概率最低的操作方式OP的作为备用的搜索空间Q2的最后一个操作方式OP。During the training process, several operation modes OP are randomly selected from the search space of the algorithm NAS to form an initial search space Q1, and the remaining operation modes OP in the search space of the algorithm NAS are then combined to form a spare search space Q2. According to a preset criterion, it is determined whether adaptive adjustment is required between the operation modes OP in the initial search space Q1 and the operation modes OP in the spare search space Q2. If the operation mode OP requires adaptive adjustment, the operation mode OP with the lowest probability of being selected in the initial search space Q1 is replaced by the first operation mode OP in the spare search space Q2, and the operation mode OP with the lowest probability of being selected is used as the last operation mode OP in the spare search space Q2.

在本发明的一个实施例中，所述在训练过程中对所述神经架构搜索算法NAS进行优化，包括第三优化方法，具体为：In one embodiment of the present invention, the optimization of the neural architecture search algorithm NAS during the training process includes a third optimization method, specifically:

训练过程中，根据各操作方式OP的模型贡献率进行节点的操作方式OP数量的自适应扩展。During the training process, the number of operation modes OP of the node is adaptively expanded according to the model contribution rate of each operation mode OP.

在本发明的一个实施例中，所述在训练过程中对所述神经架构搜索算法NAS原始的skip_connect操作被选概率进行带限修正，方法包括：In one embodiment of the present invention, the method of performing band-limited correction on the probability of selection of the original skip_connect operation of the neural architecture search algorithm NAS during the training process includes:

实时记录神经架构搜索算法NAS训练过程中skip_connect操作的被选概率值α_skip，α_skip的被选概率分布情况为P_skip，定义P_skip出现梯度上升现象时的点为算法NAS崩溃起始点epoch_break；引入参数k_skip对skip_connect操作的α_skip进行修正得到修正值α_{correct_skip}＝α_skip*k_skip，将算法NAS崩溃起始点的α_skip与α_{correct_skip}的差值定义为△，根据α_skip、α_{correct_skip}和△得到带限修正概率公式如下：

The selected probability value α _skip of the skip_connect operation during the training of the neural architecture search algorithm NAS is recorded in real time. The selected probability distribution of α _skip is P _skip . The point when P _skip shows a gradient rise phenomenon is defined as the algorithm NAS crash starting point epoch_break. The parameter k _skip is introduced to correct the α _skip of the skip_connect operation to obtain the corrected value α _{correct_skip} = α _skip *k _skip . The difference between α _skip and α _{correct_skip} at the algorithm NAS crash starting point is defined as △. The band-limited corrected probability is obtained according to α _skip , α _{correct_skip} and △. The formula is as follows:

其中，*表示乘号，k_skip表示对于算法NAS搜索过程中对skip_connect操作出现概率的制约；k_skip前半部分k_skip(tanh)为基于tanh激活函数实现epoch层面上的前期skip_connect操作数量限制，k_skip后半部分k_{skip(sigmoid)}为基于sigmoid激活函数实现edge层面上的skip_connect操作数量限制；表示崩溃起始点epoch_break前的α_skip取值概率；表示崩溃起始点epoch_break后的α_skip取值概率。Among them, * represents the multiplication sign, k _skip represents the restriction on the probability of skip_connect operation in the algorithm NAS search process; the first half of k _skip , k _skip(tanh) , is the limit on the number of early skip_connect operations at the epoch level based on the tanh activation function, and the second half of k _skip, k _{skip(sigmoid),} is the limit on the number of skip_connect operations at the edge level based on the sigmoid activation function; Indicates the probability of α _skip taking value before the crash starting point epoch_break; Indicates the probability of α _skip taking the value after the crash starting point epoch_break.

在本发明的一个实施例中，所述预设判据包括：In one embodiment of the present invention, the preset criterion includes:

所述预设判据按照判别顺序依次包括等训练epoch间隔下算法准确率趋势、等训练epoch间隔下是否存在准确率值异常情况、等训练epoch间隔下各操作方式OP贡献值趋势和等训练epoch间隔下各node所选操作方式OP频次，其中，The preset criteria include, in order of judgment, the algorithm accuracy trend under equal training epoch intervals, whether there is an abnormal accuracy value under equal training epoch intervals, the OP contribution value trend of each operation mode under equal training epoch intervals, and the OP frequency of each node selected operation mode under equal training epoch intervals, wherein:

所述等训练epoch间隔下算法准确率趋势为：获取预设个数的等训练epoch间隔，统计每个等训练epoch间隔的算法准确率值，并将所有等训练epoch间隔的算法准确率值进行线性拟合，得到第一线性拟合值，若第一线性拟合值大于0，则表示算法准确率为上升趋势；若第一线性拟合值小于0，则表示算法准确率为下降趋势；The algorithm accuracy trend under the equal training epoch interval is as follows: a preset number of equal training epoch intervals are obtained, the algorithm accuracy value of each equal training epoch interval is counted, and the algorithm accuracy values of all equal training epoch intervals are linearly fitted to obtain a first linear fitting value. If the first linear fitting value is greater than 0, it indicates that the algorithm accuracy is on an upward trend; if the first linear fitting value is less than 0, This indicates that the accuracy of the algorithm is on a downward trend;

所述等训练epoch间隔下是否存在准确率值异常情况为：获取预设个数的等训练epoch间隔，统计每个等训练epoch间隔的算法准确率值，若算法准确率值未随等训练epoch间隔的增加而增加，则表明该等训练epoch间隔的算法准确率值异常，否则为正常；Whether there is an abnormal accuracy value under the equal training epoch interval is as follows: a preset number of equal training epoch intervals are obtained, and the algorithm accuracy value of each equal training epoch interval is counted. If the algorithm accuracy value does not increase with the increase of the equal training epoch interval, it indicates that the algorithm accuracy value of the equal training epoch interval is abnormal, otherwise it is normal;

所述等训练epoch间隔下各操作方式OP贡献值趋势为：获取预设个数的等训练epoch间隔，同时获取每个等训练epoch间隔中各个操作方式OP的被选概率，并将所有等训练epoch间隔的OP的被选概率进行线性拟合，得到第二线性拟合值，若第二线性拟合值大于0，则表示操作方式OP的贡献值为上升趋势；若第二线性拟合值小于0，则表示操作方式OP的贡献值为下降趋势；The contribution value trend of each operation mode OP under the equal training epoch interval is as follows: a preset number of equal training epoch intervals are obtained, and at the same time, the selection probability of each operation mode OP in each equal training epoch interval is obtained, and the selection probability of OP in all equal training epoch intervals is linearly fitted to obtain a second linear fitting value, and if the second linear fitting value is greater than 0, it indicates that the contribution value of the operation mode OP is in an upward trend; if the second linear fitting value is less than 0, it indicates that the contribution value of the operation mode OP is in a downward trend;

所述等训练epoch间隔下各node所选操作方式OP频次为：获取预设个数的等训练epoch间隔，同时获取所有等训练epoch间隔中各个操作方式OP的被选次数，根据预设被选次数a，判断各个操作方式OP的被选次数与预设被选次数a的关系。The frequency of the operation mode OP selected by each node under the equal training epoch interval is: obtain a preset number of equal training epoch intervals, and simultaneously obtain the number of times each operation mode OP is selected in all equal training epoch intervals, and according to the preset number of selections a, determine the relationship between the number of times each operation mode OP is selected and the preset number of selections a.

在本发明的一个实施例中，所述根据各操作方式OP的模型贡献率进行节点的操作方式OP数量的自适应扩展，方法包括：In one embodiment of the present invention, the method for adaptively expanding the number of operation modes OP of a node according to the model contribution rate of each operation mode OP includes:

在算法NAS中各节点前驱连接边引入操作方式OP贡献判别机制，获取预设个数的等训练epoch间隔，同时获取每个等训练epoch间隔中各个操作方式OP的被选概率，并将所有等训练epoch间隔的OP的被选概率进行线性拟合，得到第二线性拟合值，若第二线性拟合值大于0，则表示操作方式OP的贡献值为上升趋势；若第二线性拟合值小于0，则表示操作方式OP的贡献值为下降趋势；In the algorithm NAS, the operation mode OP contribution discrimination mechanism is introduced into the predecessor connection edge of each node, and a preset number of equal training epoch intervals are obtained. At the same time, the selection probability of each operation mode OP in each equal training epoch interval is obtained, and the selection probability of OP in all equal training epoch intervals is linearly fitted to obtain the second linear fitting value. If the second linear fitting value is greater than 0, it means that the contribution value of the operation mode OP is on an upward trend; if the second linear fitting value is less than 0, it means that the contribution value of the operation mode OP is on a downward trend;

在训练过程中，当前节点选择贡献值为上升趋势的操作方式OP，忽略贡献值为下降趋势的操作方式OP，完成当前节点操作方式OP数量的自适应扩展。 During the training process, the current node selects the operation mode OP whose contribution value is an upward trend, ignores the operation mode OP whose contribution value is a downward trend, and completes the adaptive expansion of the number of operation mode OPs of the current node.

在本发明的一个实施例中，所述算法NAS的搜索空间包括若干可选的操作方式OP，所述算法NAS的搜索空间包括若干可选的操作方式OP，所述操作方式OP包括若干max_pool_nxn、若干avg_pool_nxn、skip_connect、若干sep_conv_nxn、若干dil_conv_nxn、none，其中，max_pool表示最大池化，avg_pool表示平均池化，skip_connect表示跳跃连接，sep_conv表示深度可分离卷积，dil_conv表示空洞卷积，none表示无操作，n取值范围∈[1,9]的奇数。In one embodiment of the present invention, the search space of the algorithm NAS includes several optional operation modes OP, and the search space of the algorithm NAS includes several optional operation modes OP, including several max_pool_nxn, several avg_pool_nxn, skip_connect, several sep_conv_nxn, several dil_conv_nxn, and none, wherein max_pool represents maximum pooling, avg_pool represents average pooling, skip_connect represents skip connection, sep_conv represents depthwise separable convolution, dil_conv represents hole convolution, none represents no operation, and n takes an odd number in the value range ∈ [1,9].

在本发明的一个实施例中，所述基于概率空间解耦老师网络和学生网络之间的知识传递方式，方法包括：In one embodiment of the present invention, the method for decoupling the knowledge transfer between the teacher network and the student network based on the probability space includes:

将模型ASNDARTS进行概率重配，划分成3个层次类别概率空间，分别为目标空间t、目标类内空间O及整体空间C；为分离3个层次类别概率空间的预测情况，定义符号P＝[p_t,p_O\t,p_\O]，p_t表示目标空间t概率，p_O\t表示在目标类内空间O内除目标t概率，p_\O表示在整体空间C中排除O概率，其中，
The model ASNDARTS is probability reassigned and divided into three hierarchical category probability spaces, namely target space t, target class space O and overall space C. To separate the predictions of the three hierarchical category probability spaces, the symbol P = [p _t , p _O\t , p _\O ] is defined, where p _t represents the probability of target space t, p _O\t represents the probability of excluding target t in the target class space O, and p _\O represents the probability of excluding O in the overall space C.

其中，z_t表示目标类软概率，z_j表示所有类的软概率，z_k表示非目标类软概况，k表示非目标类的类内索引，j表示非目标类类外索引；Among them, _zt represents the soft probability of the target class, _zj represents the soft probability of all classes, _zk represents the soft profile of the non-target class, k represents the in-class index of the non-target class, and j represents the out-class index of the non-target class;

将O\t空间及\O下各类概率分别定义为和其中，∧表示最终skip_connect的修正符号，表示在目标类内空间O内，排除目标空间t的其他独立同类概率；同理，表示在整体空间C内排除目标类内空间O后的其他独立同类概率，其中，
Define the various probabilities in O\t space and \O as and Among them, ∧ represents the correction symbol of the final skip_connect, It means that in the target class space O, other independent similar probabilities of the target space t are excluded; similarly, represents the probability of other independent similar objects in the overall space C after excluding the target class space O, where

将知识蒸馏KD进行3个层次类别概率空间解耦，公式为：
The knowledge distillation KD is decoupled into three-level category probability spaces, and the formula is:

令⊙表示点乘；
make ⊙ represents dot product;

其中，表示老师-学生模型在所定义的3个空间上的KL散度TKD；表示老师-学生模型在非目标的KL散度CKD；表示老师-学生模型在其他类间的KL散度OKD； in, Represents the KL divergence TKD of the teacher-student model in the three defined spaces; Represents the KL divergence CKD of the teacher-student model on the non-target; It represents the KL divergence OKD of the teacher-student model among other classes;

对3个层次类别概率空间的KL散度TKD、KL散度CKD和KL散度OKD进行重组，根据重组结果实现老师网络和学生网络之间的知识传递。The KL divergence TKD, KL divergence CKD and KL divergence OKD of the three-level category probability space are reorganized, and the knowledge transfer between the teacher network and the student network is realized based on the reorganization results.

为解决上述技术问题，本发明提供了一种芯片缺陷检测系统，包括：In order to solve the above technical problems, the present invention provides a chip defect detection system, comprising:

获取模块：用于获取芯片的一维振动信号并转化为二维图像数据；Acquisition module: used to acquire the one-dimensional vibration signal of the chip and convert it into two-dimensional image data;

构建模块：用于构建神经架构搜索算法模型ASNDARTS，基于所述神经架构搜索算法模型ASNDARTS构建知识蒸馏算法模型DPSKD；Construction module: used to construct a neural architecture search algorithm model ASNDARTS, and to construct a knowledge distillation algorithm model DPSKD based on the neural architecture search algorithm model ASNDARTS;

检测模块：用于获取待检测芯片的一维振动信号并转化为二维图像数据，通过所述知识蒸馏算法模型DPSKD对待检测芯片的二维图像数据进行检测，以判断待检测芯片是否存在缺陷。Detection module: used to obtain the one-dimensional vibration signal of the chip to be detected and convert it into two-dimensional image data, and detect the two-dimensional image data of the chip to be detected through the knowledge distillation algorithm model DPSKD to determine whether the chip to be detected has defects.

本发明的上述技术方案相比现有技术具有以下优点：The above technical solution of the present invention has the following advantages compared with the prior art:

本发明选择神经架构搜索算法NAS作为基础网络模型，有效解决人工设计模型耗时问题，设计出能够有效针对特有工程应用背景的性能优异轻量级网络模型；The present invention selects the neural architecture search algorithm NAS as the basic network model, effectively solves the time-consuming problem of manually designing models, and designs a lightweight network model with excellent performance that can effectively target specific engineering application backgrounds;

本发明对算法NAS中的skip_connect操作进行带限修正，解决NAS算法崩溃现象；本发明提出了备选搜索空间的概念，在扩大搜索空间多样性基础上解决NAS算法的GPU占用量过大问题；本发明在NAS算法中各节点(node)前驱连接边引入贡献判别机制，获得可变数量的节点结构，构成多样化cell；本发明极大提高神经架构搜索算法NAS的训练稳定性及检测准确性，降低硬件资源消耗；The present invention performs band-limited correction on the skip_connect operation in the NAS algorithm to solve the NAS algorithm The present invention proposes the concept of alternative search space, which solves the problem of excessive GPU usage of NAS algorithm on the basis of expanding the diversity of search space; the present invention introduces a contribution discrimination mechanism to the predecessor connection edge of each node in the NAS algorithm, obtains a variable number of node structures, and constitutes diversified cells; the present invention greatly improves the training stability and detection accuracy of the neural architecture search algorithm NAS, and reduces the consumption of hardware resources;

本发明以神经架构搜索得出的网络模型ASNDARTS为基础老师模型，通过知识蒸馏技术进一步轻量化网络结构得到学生模型，并解耦老师-学生模型之间的知识传递逻辑，有效提高从老师模型获得信息的能力，在不降低预测精度的前提下减少模型的计算资源损耗；The present invention uses the network model ASNDARTS obtained by neural architecture search as the basic teacher model, further lightweights the network structure through knowledge distillation technology to obtain the student model, and decouples the knowledge transfer logic between the teacher and student models, effectively improving the ability to obtain information from the teacher model, and reducing the computing resource loss of the model without reducing the prediction accuracy;

本发明能够自适应设计不同工业背景下的深度神经网络，提高模型数据预测精度的同时轻量化模型结构，便于部署于边缘设备，以实现芯片缺陷的检测。The present invention can adaptively design deep neural networks under different industrial backgrounds, improve the prediction accuracy of model data while lightweighting the model structure, making it easy to deploy on edge devices to realize chip defect detection.

BRIEF DESCRIPTION OF THE DRAWINGS

为了使本发明的内容更容易被清楚的理解，下面根据本发明的具体实施例并结合附图，对本发明作进一步详细的说明。In order to make the contents of the present invention more clearly understood, the present invention is further described in detail below based on specific embodiments of the present invention in conjunction with the accompanying drawings.

图1是本发明芯片缺陷检测方法流程图；FIG1 is a flow chart of a chip defect detection method according to the present invention;

图2是本发明实施例中skip_connect操作带限修正的流程图；FIG2 is a flow chart of a skip_connect operation band limit correction in an embodiment of the present invention;

图3是本发明实施例中skip_connect操作带限修正的逻辑图；FIG3 is a logic diagram of a skip_connect operation band limit correction in an embodiment of the present invention;

图4是本发明实施例中模型DPSKD中老师-学生知识传递逻辑解耦逻辑图；4 is a logic diagram of the decoupling of the teacher-student knowledge transfer logic in the DPSKD model according to an embodiment of the present invention;

图5是本发明实施例中模型ASNDARTS搜索出的结果结构(即一个cell)示意图。FIG. 5 is a schematic diagram of a result structure (ie, a cell) searched by the model ASNDARTS in an embodiment of the present invention.

DETAILED DESCRIPTION

下面结合附图和具体实施例对本发明作进一步说明，以使本领域的技术人员可以更好地理解本发明并能予以实施，但所举实施例不作为对本发明的限定。 The present invention is further described below in conjunction with the accompanying drawings and specific embodiments so that those skilled in the art can better understand the present invention and implement it, but the embodiments are not intended to limit the present invention.

实施例一Embodiment 1

参照图1所示，本发明涉及一种芯片缺陷检测方法，包括：Referring to FIG. 1 , the present invention relates to a chip defect detection method, comprising:

所述知识蒸馏算法模型DPSKD的构建方法为：将所述神经架构搜索算法模型ASNDARTS作为知识蒸馏的老师网络，并取神经架构搜索算法模型ASNDARTS中单个cell作为知识蒸馏的学生网络，其中，所述神经架构搜索算法模型ASNDARTS包括若干串/并联的cell，每个所述cell均为神经架构搜索算法模型ASNDARTS在搜索空间搜索得到的结果，搜索空间用于存储操作方式OP；The construction method of the knowledge distillation algorithm model DPSKD is: using the neural architecture search algorithm model ASNDARTS as a teacher network for knowledge distillation, and taking a single cell in the neural architecture search algorithm model ASNDARTS as a student network for knowledge distillation, wherein the neural architecture search algorithm model ASNDARTS includes a plurality of cells connected in series/parallel, each of which is a result obtained by the neural architecture search algorithm model ASNDARTS in a search space, and the search space is used to store the operation mode OP;

获取待检测芯片的一维振动信号并转化为二维图像数据，通过所述知识蒸馏算法模型DPSKD对待检测芯片的二维图像数据进行检测，以判断待检测芯片是否存在缺陷。The one-dimensional vibration signal of the chip to be detected is obtained and converted into two-dimensional image data, and the two-dimensional image data of the chip to be detected is detected by the knowledge distillation algorithm model DPSKD to determine whether the chip to be detected has defects.

进一步地，所述在训练过程中对所述神经架构搜索算法NAS进行优化，包括第一优化方法，具体为：在训练过程中对所述神经架构搜索算法NAS原始的skip_connect操作被选概率进行带限修正。Furthermore, the optimization of the neural architecture search algorithm NAS during the training process includes a first optimization method, specifically: performing a band-limited correction on the original skip_connect operation selection probability of the neural architecture search algorithm NAS during the training process.

具体地，所述在训练过程中对所述神经架构搜索算法NAS原始的skip_connect操作被选概率进行带限修正，方法包括： Specifically, during the training process, the original skip_connect operation of the neural architecture search algorithm NAS is subjected to a band-limited correction, and the method includes:

其中，*表示乘号，k_skip表示对于算法NAS搜索过程中对skip_connect操作出现概率的制约；在算法搜索前期能够充分提取数据特征而不使用skip_connect操作进行跳过连接；k_skip前半部分k_skip(tanh)为基于tanh激活函数实现epoch层面上的前期skip_connect操作数量限制，k_skip后半部分k_{skip(sigmoid)}为基于sigmoid激活函数实现edge层面(edge是NAS算法自带的，如图5中0和2之间的连线就是一个edge)上的skip_connect操作数量限制(前/后半部分：对epoch/edge在前/后半程影响较大的操作)；表示崩溃起始点epoch_break前的α_skip取值概率；表示崩溃起始点epoch_break后的α_skip取值概率。Among them, * represents the multiplication sign, k _skip represents the restriction on the probability of skip_connect operation in the algorithm NAS search process; in the early stage of the algorithm search, data features can be fully extracted without using skip_connect operation for skip connection; the first half of k _skip k _skip(tanh) is the limit on the number of early skip_connect operations at the epoch level based on the tanh activation function, and the second half of k _skip k _{skip(sigmoid)} is the limit on the number of skip_connect operations at the edge level (edge is built-in to the NAS algorithm, such as the line between 0 and 2 in Figure 5 is an edge) based on the sigmoid activation function (the first/second half: the operation that has a greater impact on the epoch/edge in the first/second half); Indicates the probability of α _skip taking value before the crash starting point epoch_break; Indicates the probability of α _skip taking the value after the crash starting point epoch_break.

本实施例对skip_connect操作被选概率进行修正，能够有效解决算法崩溃现象。 This embodiment corrects the probability of the skip_connect operation being selected, which can effectively solve the algorithm crash phenomenon.

进一步地，所述在训练过程中对所述神经架构搜索算法NAS进行优化，包括第二优化方法，具体为：训练过程中，从算法NAS的搜索空间中随机选择若干操作方式OP组成初始的搜索空间Q1，再将算法NAS的搜索空间中的剩余操作方式OP组成备用的搜索空间Q2，根据预设判据确定初始的搜索空间Q1的操作方式OP与备用的搜索空间Q2中的操作方式OP之间是否需要自适应调整，其中，若操作方式OP需要自适应调整，则将初始的搜索空间Q1中被选概率最低的操作方式OP被备用的搜索空间Q2中的第一个操作方式OP替换掉，并将该被选概率最低的操作方式OP的作为备用的搜索空间Q2的最后一个操作方式OP。需要注意的是，备用的搜索空间Q2中的操作方式OP初始排序是随机的。Furthermore, the optimization of the neural architecture search algorithm NAS during the training process includes a second optimization method, specifically: during the training process, a number of operation modes OP are randomly selected from the search space of the algorithm NAS to form an initial search space Q1, and then the remaining operation modes OP in the search space of the algorithm NAS form a spare search space Q2, and whether the operation modes OP in the initial search space Q1 and the operation modes OP in the spare search space Q2 need to be adaptively adjusted according to a preset criterion, wherein, if the operation mode OP needs to be adaptively adjusted, the operation mode OP with the lowest probability of being selected in the initial search space Q1 is replaced by the first operation mode OP in the spare search space Q2, and the operation mode OP with the lowest probability of being selected is used as the last operation mode OP in the spare search space Q2. It should be noted that the initial order of the operation modes OP in the spare search space Q2 is random.

本实施例中，算法NAS的搜索空间包括若干可选的操作方式OP，所述操作方式OP包括若干max_pool_nxn、若干avg_pool_nxn、skip_connect、若干sep_conv_nxn、若干dil_conv_nxn、none，其中，max_pool表示最大池化，avg_pool表示平均池化，skip_connect表示跳跃连接，sep_conv表示深度可分离卷积，dil_conv表示空洞卷积，none表示无操作，n取值范围∈[1,9]的奇数。本实施例中算法NAS的搜索策略采用梯度下降法。In this embodiment, the search space of the algorithm NAS includes several optional operation modes OP, and the operation modes OP include several max_pool_nxn, several avg_pool_nxn, skip_connect, several sep_conv_nxn, several dil_conv_nxn, and none, where max_pool represents maximum pooling, avg_pool represents average pooling, skip_connect represents skip connection, sep_conv represents depthwise separable convolution, dil_conv represents dilated convolution, none represents no operation, and n takes an odd number in the range ∈ [1,9]. In this embodiment, the search strategy of the algorithm NAS adopts the gradient descent method.

具体地，所述预设判据按照判别顺序依次包括(1)等训练epoch间隔下算法准确率趋势、(2)等训练epoch间隔下是否存在准确率值异常情况、(3)等训练epoch间隔下各操作方式OP贡献值趋势、(4)等训练epoch间隔下各node所选操作方式OP频次，其中，Specifically, the preset criteria include, in order of judgment, (1) the algorithm accuracy trend under equal training epoch intervals, (2) whether there is an abnormal accuracy value under equal training epoch intervals, (3) the OP contribution value trend of each operation mode under equal training epoch intervals, and (4) the OP frequency of each node selected under equal training epoch intervals, wherein:

(1)所述等训练epoch间隔下算法准确率趋势为：获取预设个数的等训练epoch间隔，统计每个等训练epoch间隔的算法准确率值，并将所有等训练epoch间隔的算法准确率值进行线性拟合，得到第一线性拟合值，若第一线性拟合值大于0，则表示算法准确率为上升趋势；若第一线性拟合值小于0，则表示算法准确率为下降趋势。(1) The algorithm accuracy trend under the equal training epoch interval is as follows: a preset number of equal training epoch intervals are obtained, the algorithm accuracy value of each equal training epoch interval is counted, and the algorithm accuracy values of all equal training epoch intervals are linearly fitted to obtain a first linear fitting value. If the first linear fitting value is greater than 0, it indicates that the algorithm accuracy is on an upward trend; if the first linear fitting value is less than 0, it indicates that the algorithm accuracy is on a downward trend.

(2)所述等训练epoch间隔下是否存在准确率值异常情况为：获取预设个数的等训练epoch间隔，统计每个等训练epoch间隔的算法准确率值，若算法准确率值未随等训练epoch间隔的增加而增加，则表明该等训练epoch间隔的算法准确率值异常，否则为正常。例如，获取四个等训练epoch间隔的算法准确率值：0.3，0.7，0.4，0.8，那么0.4就是准确率值异常，因为一般模型训练准确率会越来越高，而0.4突然下降则判别为准确率值异常。(2) Whether there is an abnormal accuracy value under the equal training epoch interval is: Get the preset The algorithm accuracy value of each equal training epoch interval is counted. If the algorithm accuracy value does not increase with the increase of the equal training epoch interval, it indicates that the algorithm accuracy value of the equal training epoch interval is abnormal, otherwise it is normal. For example, the algorithm accuracy values of four equal training epoch intervals are obtained: 0.3, 0.7, 0.4, and 0.8. Then 0.4 is an abnormal accuracy value, because the accuracy of general model training will increase, and a sudden drop of 0.4 is judged as an abnormal accuracy value.

(3)所述等训练epoch间隔下各操作方式OP贡献值趋势为：获取预设个数的等训练epoch间隔，同时获取每个等训练epoch间隔中各个操作方式OP的被选概率，并将所有等训练epoch间隔的OP的被选概率进行线性拟合，得到第二线性拟合值，若第二线性拟合值大于0，则表示操作方式OP的贡献值为上升趋势；若第二线性拟合值小于0，则表示操作方式OP的贡献值为下降趋势。(3) The contribution value trend of each operation mode OP under the equal training epoch interval is as follows: a preset number of equal training epoch intervals are obtained, and at the same time, the selection probability of each operation mode OP in each equal training epoch interval is obtained, and the selection probability of OP in all equal training epoch intervals is linearly fitted to obtain a second linear fitting value. If the second linear fitting value is greater than 0, it indicates that the contribution value of the operation mode OP is on an upward trend; if the second linear fitting value is less than 0, it indicates that the contribution value of the operation mode OP is on a downward trend.

(4)所述等训练epoch间隔下各node所选操作方式OP频次为：获取预设个数的等训练epoch间隔，同时获取所有等训练epoch间隔中各个操作方式OP的被选次数，根据预设被选次数a，判断各个操作方式OP的被选次数与预设被选次数a的关系。(4) The frequency of the operation mode OP selected by each node under the equal training epoch interval is as follows: a preset number of equal training epoch intervals is obtained, and the number of times each operation mode OP is selected in all equal training epoch intervals is obtained, and according to the preset number of times a, the relationship between the number of times each operation mode OP is selected and the preset number of times a is determined.

进一步地，所述在训练过程中对所述神经架构搜索算法NAS进行优化，包括第三优化方法，具体为：训练过程中，根据各操作方式OP的模型贡献率进行节点的操作方式OP数量的自适应扩展。Furthermore, the neural architecture search algorithm NAS is optimized during the training process, including a third optimization method, specifically: during the training process, the number of operation modes OP of the node is adaptively expanded according to the model contribution rate of each operation mode OP.

具体地，所述根据各操作方式OP的模型贡献率进行节点的操作方式OP数量的自适应扩展，方法包括：Specifically, the method for adaptively expanding the number of operation modes OP of a node according to the model contribution rate of each operation mode OP includes:

在算法NAS中各节点前驱连接边引入操作方式OP贡献判别机制，获取预设个数的等训练epoch间隔，同时获取每个等训练epoch间隔中各个操作方式OP的被选概率，并将所有等训练epoch间隔的OP的被选概率进行线性拟合，得到第二线性拟合值，若第二线性拟合值大于0，则表示操作方式OP的贡献值为上升趋势(相当于有积极意义)；若第二线性拟合值小于0，则表示操作方式OP的贡献值为下降趋势； In the algorithm NAS, the operation mode OP contribution discrimination mechanism is introduced into the predecessor connection edge of each node, and a preset number of equal training epoch intervals are obtained. At the same time, the selection probability of each operation mode OP in each equal training epoch interval is obtained, and the selection probability of OP in all equal training epoch intervals is linearly fitted to obtain the second linear fitting value. If the second linear fitting value is greater than 0, it means that the contribution value of the operation mode OP is on an upward trend (equivalent to having a positive meaning); if the second linear fitting value is less than 0, it means that the contribution value of the operation mode OP is on a downward trend;

在训练过程中，当前节点选择贡献值为上升趋势的操作方式OP，忽略贡献值为下降趋势的操作方式OP，完成当前节点操作方式OP数量的自适应扩展。During the training process, the current node selects the operation mode OP whose contribution value is an upward trend, ignores the operation mode OP whose contribution value is a downward trend, and completes the adaptive expansion of the number of operation mode OPs of the current node.

所述基于概率空间解耦老师网络和学生网络之间的知识传递方式，方法包括：The method of knowledge transfer between the teacher network and the student network based on probability space decoupling includes:

将深度学习的软概率模型ASNDARTS进行概率重配，划分成3个层次类别概率空间，分别为目标空间t、目标类内空间O及整体空间C；为分离3个层次类别概率空间的预测情况，定义符号P＝[p_t,p_O\t,p_\O]，p_t表示目标空间t概率，p_O\t表示在目标类内空间O内除目标t概率，p_\O表示在整体空间C中排除O概率，其中，
The deep learning soft probability model ASNDARTS is probability reassigned and divided into three hierarchical category probability spaces, namely target space t, target class space O and overall space C. To separate the predictions of the three hierarchical category probability spaces, the symbol P = [p _t , p _O\t , p _\O ] is defined, where p _t represents the probability of target space t, p _O\t represents the probability of excluding target t in the target class space O, and p _\O represents the probability of excluding O in the overall space C.

将O\t空间及\O下各类概率分别定义为和其中，∧表示最终skip_connect的修正符号，表示在目标类内空间O内，排除目标空间t的其他独立同类概率；同理，表示在整体空间C内排除目标类内空间O后的其他独立同类概率(其他独立同类表示同一类缺陷的不同缺陷程度)，其中，
Define the various probabilities in O\t space and \O as and Among them, ∧ represents the correction symbol of the final skip_connect, It means that in the target class space O, other independent similar probabilities of the target space t are excluded; similarly, represents the probability of other independent similar defects in the overall space C after excluding the target class space O (other independent similar defects represent different defect degrees of the same type of defects), where

令⊙表示点乘；
make ⊙ represents dot product;

其中，表示老师-学生模型在所定义的3个空间上的KL散度TKD；表示老师-学生模型在非目标的KL散度CKD；表示老师-学生模型在其他类间的KL散度OKD；in, Represents the KL divergence TKD of the teacher-student model in the three defined spaces; Represents the KL divergence CKD of the teacher-student model on the non-target; represents the KL divergence between the teacher-student model and other classes Degree OKD;

对3个层次类别概率空间的KL散度TKD、KL散度CKD和KL散度OKD进行重组(DPSKD＝α*TKD+β*CKD+γ*OKD)，根据重组结果实现老师网络和学生网络之间的知识传递。The KL divergence TKD, KL divergence CKD and KL divergence OKD of the three-level category probability space are reorganized (DPSKD=α*TKD+β*CKD+γ*OKD), and the knowledge transfer between the teacher network and the student network is realized based on the reorganization results.

图4展示了利用3个层次类别概率空间实现知识传递逻辑解耦逻辑，图4中的A1、A2、A3、A4、A5分别如下：Figure 4 shows the decoupling logic of knowledge transfer logic using three-level category probability space. A1, A2, A3, A4, and A5 in Figure 4 are as follows:

A1: A1:

A2: A2:

A3: A3:

A4: A4:

A5: A5:

以下结合具体实施例对本发明的技术方案做进一步说明：The technical solution of the present invention is further described below in conjunction with specific embodiments:

步骤S1：从故障芯片中获取一维振动信号，以n*n长度为窗宽对数据进行等步长取样；将取样得到的信号经过傅里叶变换后重构为n*n的矩阵，拼接3层n*n的矩阵形成3通道图像数据。每个样本为n*n统一尺寸，并为每个样本制作相应类别的one-hot类型标签(由于是现有技术，本实施例不再赘述)。将所有样本按照比例分为训练集与测试集。Step S1: Obtain a one-dimensional vibration signal from the faulty chip, sample the data with equal steps using a window width of n*n length; reconstruct the sampled signal into an n*n matrix after Fourier transformation, and concatenate three layers of n*n matrices to form a three-channel image data. Each sample is of uniform size n*n and is A one-hot type label of the corresponding category is produced for each sample (since it is a prior art, it will not be described in detail in this embodiment). All samples are divided into a training set and a test set according to a certain ratio.

步骤S2：构建神经架构搜索算法模型ASNDARTS，基于所述神经架构搜索算法模型ASNDARTS构建知识蒸馏算法模型DPSKD。Step S2: construct a neural architecture search algorithm model ASNDARTS, and construct a knowledge distillation algorithm model DPSKD based on the neural architecture search algorithm model ASNDARTS.

搭建改进神经架构搜索算法ASNDARTS，获得轻量化模型model_V1.0。Build an improved neural architecture search algorithm ASNDARTS and obtain the lightweight model model_V1.0.

步骤S3：在原始模型NAS基础上中引入skip_connect操作修正值，与原始skip_connect操作被选概率值相结合，基于算法训练过程中准确率收敛趋势及模型结构收敛趋势，对skip_connect操作被选概率进行修正，解决算法崩溃，具体工作流程及使用逻辑如图2所示。Step S3: Based on the original model NAS, a skip_connect operation correction value is introduced, which is combined with the original skip_connect operation selection probability value. Based on the accuracy convergence trend and model structure convergence trend during the algorithm training process, the skip_connect operation selection probability is corrected to solve the algorithm crash. The specific workflow and usage logic are shown in Figure 2.

实时记录其搜索空间中skip_connect操作的被选概率值α_skip及修正值α_{correct_skip}，α_skip的分布情况为P_skip；定义P_skip出现梯度上升现象时的点为算法崩溃起始点epoch_break。引入参数k_skip对skip操作的a值进行修正得到α_{correct_skip}＝α_skip*k_skip，修正后的系数α_{correct_skip}会趋于平稳。将算法崩溃点与的差值定义为△，对于崩溃点后的α_skip取值概率均取得到最终带限修正概率

The selected probability value α _skip and the correction value α _{correct_skip} of the skip_connect operation in the search space are recorded in real time. The distribution of α _skip is P _skip ; the point where the gradient rise phenomenon occurs in P _skip is defined as the algorithm crash starting point epoch_break. The parameter k _skip is introduced to correct the a value of the skip operation to obtain α _{correct_skip} = α _skip *k _skip . The corrected coefficient α _{correct_skip} will tend to be stable. The difference between the algorithm crash point and is defined as △. The probability of α _skip taking values after the crash point is taken as Get the final band-limited corrected probability

其中，k_skip表示对于算法搜索过程中对skip_connect操作出现概率的制约。在算法搜索前期能够充分提取数据特征而不使用skip_connect进行跳过连接，k_skip前半部分基于tanh激活函数实现epoch层面上的前期skip_connect操作数量限制。后半部分基于sigmoid激活函数实现edge层面上的skip_connect操作数量限制。Among them, k _skip represents the control of the probability of skip_connect operation during the algorithm search process. In the early stage of algorithm search, data features can be fully extracted without using skip_connect for skip connection. The first half of k _skip is based on the tanh activation function to limit the number of early skip_connect operations at the epoch level. The second half is based on the sigmoid activation function to limit the number of skip_connect operations at the edge level.

步骤S4：例如定义初始化神经架构搜索算法搜索空间：Q1＝{max_pool_3x3，sep_conv_3x3，sep_conv_7x7，dil_conv_3x3，dil_conv_7x7，skip_connect,none}，备用搜索空间：Q2＝{avg_pool_3x3，avg_pool_5x5，sep_conv_5x5，dil_conv_5x5，max_pool_5x5}。根据所设判据，进行搜索空间内部可选操作的替换调整。判据依次包括：(1)等训练epoch间隔下模型准确率趋势、(2)等训练epoch间隔下模型准确率值异常情况、(3)等训练epoch间隔下各操作方式贡献值趋势、(4)等训练epoch间隔下各节点所选操作方式频次。具体判断逻辑如下表1所示。Step S4: For example, define the initialization neural architecture search algorithm search space: Q1 = {max_pool_3x3, sep_conv_3x3, sep_conv_7x7, dil_conv_3x3, dil_conv_7x7, skip_connect, none}, spare search space: Q2 = {avg_pool_3x3, avg_pool_5x5, sep_conv_5x5, dil_conv_5x5, max_pool_5x5}. According to the set criteria, the optional operations in the search space are replaced and adjusted. The criteria include: (1) the trend of model accuracy under equal training epoch intervals, (2) the abnormal situation of model accuracy value under equal training epoch intervals, (3) the trend of contribution value of each operation mode under equal training epoch intervals, and (4) the frequency of operation mode selected by each node under equal training epoch intervals. The specific judgment logic is shown in Table 1 below.

表1搜索空间自适应判据
Table 1 Search space adaptation criteria

步骤S5：针对各node前驱连接边引入操作方式op贡献判别机制，根据op的模型贡献率进行节点操作数量的自适应扩展，获得可变数量节点结构；Step S5: Introduce the operation mode op contribution discrimination mechanism for each node predecessor connection edge, and adaptively expand the number of node operations according to the op model contribution rate to obtain a variable number of node nodes. structure;

步骤S6:以等epoch为步长循环步骤S3至S5，本实施例大约迭代至150次时，模型准确率收敛到最佳、神经架构搜索出的模型结构稳定收敛，获得神经架构搜索算法模型ASNDARTS。输入测试集，则芯片缺陷检测轻量化模型model_V1.0搭建完成。Step S6: loop steps S3 to S5 with equal epoch as the step length. When the model accuracy converges to the best and the model structure searched by the neural architecture converges stably after about 150 iterations in this embodiment, the neural architecture search algorithm model ASNDARTS is obtained. Input the test set, and the chip defect detection lightweight model model_V1.0 is built.

步骤S7:以model_V1.0(即模型ASNDARTS)作为知识蒸馏模型中老师网络，取单个cell作为学生网络。Step S7: Use model_V1.0 (i.e., model ASNDARTS) as the teacher network in the knowledge distillation model, and take a single cell as the student network.

图5为模型ASNDARTS搜索出的结果结构，即一个cell，图5中的方块1-方块4表示四个节点(node)，每个节点的输入前驱边都对应一个操作方式OP。model_V1.0由多个cell串/并联形成，同时也是model_V2.0(即模型DPSKD)结构。Figure 5 shows the result structure of the search by the model ASNDARTS, i.e., a cell. Blocks 1 to 4 in Figure 5 represent four nodes, and the input predecessor edge of each node corresponds to an operation mode OP. model_V1.0 is formed by multiple cells in series/parallel, and is also the structure of model_V2.0 (i.e., model DPSKD).

步骤S8：构建知识蒸馏算法中老师-学生网络的信息传递新逻辑，结合芯片故障类别及故障程度情况实现目标类、目标族类及所有类为3概率空间解耦，得到不同概率空间下的知识传递逻辑式，具体工作流程及使用逻辑如图3所示。Step S8: Construct a new information transfer logic of the teacher-student network in the knowledge distillation algorithm, combine the chip fault category and fault degree to achieve the decoupling of the target class, target family class and all classes into three probability spaces, and obtain the knowledge transfer logic formula under different probability spaces. The specific workflow and usage logic are shown in Figure 3.

将模型概率重配，划分3层次类概率空间，分别为目标空间t，目标类内空间O及整体空间C。为了分离3个层次的预测情况，定义以下符号P＝[p_t,p_O\t,p_\O]。p_t表示目标t概率，p_O\t表示在O空间内除目标t的概率，p_\O表示在整体空间C中排除O的概率。
The model probability is reassigned and divided into three levels of class probability space, namely target space t, target class space O and overall space C. In order to separate the prediction situations at three levels, the following symbol P = [p _t , p _O\t , p _\O ] is defined. _{p t} represents the probability of target t, p _O\t represents the probability of excluding target t in O space, and p _\O represents the probability of excluding O in the overall space C.

将O\t空间及\O下的各类概率分别定义为和其中，其中，∧表示最终skip_connect的修正符号，表示在类O空间内，排除t的其他独立同类概率。同理，表示在C空间内排除O空间后的其他独立概率。
Define the various probabilities in O\t space and \O as and Among them, ∧ represents Finally, the corrected symbol of skip_connect, It means that in the class O space, other independent similar probabilities of t are excluded. Similarly, Represents other independent probabilities in C space after excluding O space.

将传统KD进行3概率空间解耦：
Decouple the traditional KD into three probability spaces:

根据及⊙表示点乘：
according to and ⊙ represents dot product:

其中，①表示老师-学生模型在所定义的3个空间上的KL散度TKD；②表示老师-学生模型在非目标的KL散度CKD；③表示老师-学生模型在其他类间的KL散度OKD。Among them, ① represents the KL divergence TKD of the teacher-student model in the three defined spaces; ② represents the KL divergence CKD of the teacher-student model in non-target; ③ represents the KL divergence OKD of the teacher-student model between other classes.

对3个层次类别概率空间的KL散度TKD、KL散度CKD和KL散度OKD进行重组，根据重组结果实现老师网络和学生网络之间的知识传递。The KL divergence TKD, KL divergence CKD and KL divergence OKD of the three-level category probability space are analyzed. The network is reorganized and the knowledge transfer between the teacher network and the student network is realized based on the reorganization results.

步骤S9：利用超参寻优算法实现学生模型超参数优化；Step S9: Optimize the hyperparameters of the student model using a hyperparameter optimization algorithm;

步骤S10：循环步骤S8至S9，大约迭代至50次时，学生模型准确率收敛到最佳，获得知识蒸馏算法模型DPSKD。输入测试集，完成极简自搜索模型model_V2.0搭建完成，模型model_V2.0可以部署到边缘机上实现边缘设备的芯片缺陷检测应用。Step S10: loop steps S8 to S9, and after about 50 iterations, the student model accuracy converges to the best, and the knowledge distillation algorithm model DPSKD is obtained. Input the test set, and the minimalist self-search model model_V2.0 is built. Model model_V2.0 can be deployed on the edge machine to implement the chip defect detection application of the edge device.

实施例二Embodiment 2

本实施例提供一种芯片缺陷检测系统，包括：This embodiment provides a chip defect detection system, including:

检测模块：用于获取待检测芯片的一维振动信号并转化为二维图像数据，通过所述知识蒸馏算法模型DPSKD对待检测芯片的二维图像数据进行检测，以判断待检测芯片是否存在缺陷。 Detection module: used to obtain the one-dimensional vibration signal of the chip to be detected and convert it into two-dimensional image data, and detect the two-dimensional image data of the chip to be detected through the knowledge distillation algorithm model DPSKD to determine whether the chip to be detected has defects.

实施例三Embodiment 3

本实施例提供一种电子设备，包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序，所述处理器执行所述计算机程序时实现实施例一所述芯片缺陷检测方法的步骤。This embodiment provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, the steps of the chip defect detection method described in the first embodiment are implemented.

实施例四Embodiment 4

本实施例提供一种计算机可读存储介质，其上存储有计算机程序，所述计算机程序被处理器执行时，实现实施例一所述芯片缺陷检测方法的步骤。This embodiment provides a computer-readable storage medium on which a computer program is stored. When the computer program is executed by a processor, the steps of the chip defect detection method described in the first embodiment are implemented.

本领域内的技术人员应明白，本申请的实施例可提供为方法、系统、或计算机程序产品。因此，本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且，本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。本申请实施例中的方案可以采用各种计算机语言实现，例如，面向对象的程序设计语言Java和直译式脚本语言JavaScript等。Those skilled in the art will appreciate that the embodiments of the present application can be provided as methods, systems, or computer program products. Therefore, the present application can adopt the form of complete hardware embodiments, complete software embodiments, or embodiments in combination with software and hardware. Moreover, the present application can adopt the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) that contain computer-usable program code. The scheme in the embodiments of the present application can be implemented in various computer languages, for example, object-oriented programming language Java and literal scripting language JavaScript, etc.

本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器，使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present application is described with reference to the flowchart and/or block diagram of the method, device (system) and computer program product according to the embodiment of the present application. It should be understood that each process and/or box in the flowchart and/or block diagram, and the combination of the process and/or box in the flowchart and/or block diagram can be realized by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, a special-purpose computer, an embedded processor or other programmable data processing device to produce a machine, so that the instructions executed by the processor of the computer or other programmable data processing device produce a device for realizing the function specified in one process or multiple processes in the flowchart and/or one box or multiple boxes in the block diagram.

这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中，使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品，该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing device to work in a specific manner, so that the instructions stored in the computer-readable memory produce a manufactured product including an instruction device that implements the functions specified in one or more processes in the flowchart and/or one or more boxes in the block diagram.

这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上，使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理，从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded into a computer or other programmable data processing device. A series of operational steps are performed on a computer or other programmable device to produce a computer-implemented process, so that the instructions executed on the computer or other programmable device provide steps for implementing the functions specified in one or more flows in the flowchart and/or one or more blocks in the block diagram.

尽管已描述了本申请的优选实施例，但本领域内的技术人员一旦得知了基本创造性概念，则可对这些实施例作出另外的变更和修改。所以，所附权利要求意欲解释为包括优选实施例以及落入本申请范围的所有变更和修改。Although the preferred embodiments of the present application have been described, those skilled in the art may make other changes and modifications to these embodiments once they have learned the basic creative concept. Therefore, the appended claims are intended to be interpreted as including the preferred embodiments and all changes and modifications falling within the scope of the present application.

显然，上述实施例仅仅是为清楚地说明所作的举例，并非对实施方式的限定。对于所属领域的普通技术人员来说，在上述说明的基础上还可以做出其它不同形式变化或变动。这里无需也无法对所有的实施方式予以穷举。而由此所引申出的显而易见的变化或变动仍处于本发明创造的保护范围之中。 Obviously, the above embodiments are merely examples for clear explanation and are not intended to limit the implementation methods. For those skilled in the art, other different forms of changes or modifications can be made based on the above description. It is not necessary and impossible to list all the implementation methods here. The obvious changes or modifications derived from these are still within the protection scope of the invention.

Claims

A chip defect detection method, characterized in that it includes:

Acquire the one-dimensional vibration signal of the chip and convert it into two-dimensional image data;

Construct a neural architecture search algorithm model ASNDARTS, and construct a knowledge distillation algorithm model DPSKD based on the neural architecture search algorithm model ASNDARTS;

The method for constructing the neural architecture search algorithm model ASNDARTS is as follows: inputting the two-dimensional image data of the chip into the neural architecture search algorithm NAS for training, and optimizing the neural architecture search algorithm NAS during the training process, and obtaining the neural architecture search algorithm model ASNDARTS when the neural architecture search algorithm NAS reaches Nash equilibrium;

The construction method of the knowledge distillation algorithm model DPSKD is: using the neural architecture search algorithm model ASNDARTS as a teacher network for knowledge distillation, and taking a single cell in the neural architecture search algorithm model ASNDARTS as a student network for knowledge distillation, wherein the neural architecture search algorithm model ASNDARTS includes a plurality of cells connected in series, and each of the cells is a result obtained by searching the neural architecture search algorithm model ASNDARTS in the search space;

Based on the probability space, the knowledge transfer method between the teacher network and the student network is decoupled, and the hyperparameters of the student network are optimized. When the student network converges, the knowledge distillation algorithm model DPSKD is obtained;

The one-dimensional vibration signal of the chip to be detected is obtained and converted into two-dimensional image data, and the two-dimensional image data of the chip to be detected is detected by the knowledge distillation algorithm model DPSKD to determine whether the chip to be detected has defects.

The chip defect detection method according to claim 1 is characterized in that: the optimization of the neural architecture search algorithm NAS during the training process includes a first optimization method, specifically:

During the training process, the probability of the original skip_connect operation of the neural architecture search algorithm NAS being selected is band-limited.

The chip defect detection method according to claim 1 is characterized in that: the optimization of the neural architecture search algorithm NAS during the training process includes a second optimization method, specifically:

During the training process, several operation modes OP are randomly selected from the search space of the algorithm NAS to form an initial search space Q1, and the remaining operation modes OP in the search space of the algorithm NAS are then combined to form a spare search space Q2. According to a preset criterion, it is determined whether adaptive adjustment is required between the operation modes OP in the initial search space Q1 and the operation modes OP in the spare search space Q2. If the operation mode OP requires adaptive adjustment, the operation mode OP with the lowest probability of being selected in the initial search space Q1 is replaced by the first operation mode OP in the spare search space Q2, and the operation mode OP with the lowest probability of being selected is used as the last operation mode OP in the spare search space Q2.

The chip defect detection method according to claim 1 is characterized in that: the optimization of the neural architecture search algorithm NAS during the training process includes a third optimization method, specifically:

During the training process, the number of operation modes OP of the node is adaptively expanded according to the model contribution rate of each operation mode OP.

The chip defect detection method according to claim 2 is characterized in that: during the training process, the original skip_connect operation selection probability of the neural architecture search algorithm NAS is subjected to band-limited correction, and the method comprises:

The selected probability value α _skip of the skip_connect operation during the training of the neural architecture search algorithm NAS is recorded in real time. The selected probability distribution of α _skip is P _skip . The point when P _skip shows a gradient rise phenomenon is defined as the algorithm NAS crash starting point epoch_break. The parameter k _skip is introduced to correct the α _skip of the skip_connect operation to obtain the corrected value α _{correct_skip} = α _skip *k _skip . The difference between α _skip and α _{correct_skip} at the algorithm NAS crash starting point is defined as △. The band-limited corrected probability is obtained according to α _skip , α _{correct_skip} and △. The formula is as follows:

Among them, * represents the multiplication sign, k _skip represents the restriction on the probability of skip_connect operation in the algorithm NAS search process; the first half of k _skip , k _skip(tanh) , is the limit on the number of early skip_connect operations at the epoch level based on the tanh activation function, and the second half of k _skip, k _{skip(sigmoid),} is the limit on the number of skip_connect operations at the edge level based on the sigmoid activation function; Indicates the probability of α _skip taking value before the crash starting point epoch_break; Indicates the probability of α _skip taking the value after the crash starting point epoch_break.

The chip defect detection method according to claim 3 is characterized in that: the preset criterion includes:

The preset criteria include, in order of judgment, the algorithm accuracy trend under equal training epoch intervals, whether there is an abnormal accuracy value under equal training epoch intervals, the OP contribution value trend of each operation mode under equal training epoch intervals, and the OP frequency of each node selected operation mode under equal training epoch intervals, wherein:

The algorithm accuracy trend under the equal training epoch interval is as follows: obtain a preset number of equal training epoch intervals, count the algorithm accuracy value of each equal training epoch interval, and perform linear fitting on the algorithm accuracy values of all equal training epoch intervals to obtain a first linear fitting value. If the first linear fitting value is greater than 0, it indicates that the algorithm accuracy is on an upward trend; if the first linear fitting value is less than 0, then Indicates that the accuracy of the algorithm is on a downward trend;

Whether there is an abnormal accuracy value under the equal training epoch interval is as follows: a preset number of equal training epoch intervals are obtained, and the algorithm accuracy value of each equal training epoch interval is counted. If the algorithm accuracy value does not increase with the increase of the equal training epoch interval, it indicates that the algorithm accuracy value of the equal training epoch interval is abnormal, otherwise it is normal;

The contribution value trend of each operation mode OP under the equal training epoch interval is as follows: a preset number of equal training epoch intervals are obtained, and at the same time, the selection probability of each operation mode OP in each equal training epoch interval is obtained, and the selection probability of OP in all equal training epoch intervals is linearly fitted to obtain a second linear fitting value, and if the second linear fitting value is greater than 0, it indicates that the contribution value of the operation mode OP is in an upward trend; if the second linear fitting value is less than 0, it indicates that the contribution value of the operation mode OP is in a downward trend;

The frequency of the operation mode OP selected by each node under the equal training epoch interval is: obtain a preset number of equal training epoch intervals, and simultaneously obtain the number of times each operation mode OP is selected in all equal training epoch intervals, and according to the preset number of selections a, determine the relationship between the number of times each operation mode OP is selected and the preset number of selections a.

The chip defect detection method according to claim 4 is characterized in that: the adaptive expansion of the number of operation modes OP of the node according to the model contribution rate of each operation mode OP comprises:

In the algorithm NAS, the operation mode OP contribution discrimination mechanism is introduced into the predecessor connection edge of each node, and a preset number of equal training epoch intervals are obtained. At the same time, the selection probability of each operation mode OP in each equal training epoch interval is obtained, and the selection probability of OP in all equal training epoch intervals is linearly fitted to obtain the second linear fitting value. If the second linear fitting value is greater than 0, it means that the contribution value of the operation mode OP is on an upward trend; if the second linear fitting value is less than 0, it means that the contribution value of the operation mode OP is on a downward trend;

During the training process, the current node selects the operation mode OP whose contribution value is an upward trend, ignores the operation mode OP whose contribution value is a downward trend, and completes the adaptive expansion of the number of operation mode OPs of the current node.

The chip defect detection method according to claim 3 is characterized in that: the search space of the algorithm NAS includes several optional operation modes OP, and the operation modes OP include several max_pool_nxn, several avg_pool_nxn, skip_connect, several sep_conv_nxn, several dil_conv_nxn, and none, wherein max_pool represents maximum pooling, avg_pool represents average pooling, skip_connect represents skip connection, sep_conv represents depthwise separable convolution, dil_conv represents void convolution, none represents no operation, and n takes an odd number in the value range ∈ [1,9].

The chip defect detection method according to claim 1 is characterized in that: the knowledge transfer method between the teacher network and the student network based on the probability space decoupling method includes:

The model ASNDARTS is probability reassigned and divided into three hierarchical category probability spaces, namely target space t, target class space O and overall space C. To separate the predictions of the three hierarchical category probability spaces, the symbol P = [p _t , p _O\t , p _\O ] is defined, where p _t represents the probability of target space t, p _O\t represents the probability of excluding target t in the target class space O, and p _\O represents the probability of excluding O in the overall space C.

Among them, _zt represents the soft probability of the target class, _zj represents the soft probability of all classes, _zk represents the soft profile of the non-target class, k represents the in-class index of the non-target class, and j represents the out-class index of the non-target class;

Define the various probabilities in O\t space and \O as and Among them, ∧ represents the correction symbol of the final skip_connect, It means that in the target class space O, other independent similar probabilities of the target space t are excluded; similarly, represents the probability of other independent similar objects in the overall space C after excluding the target class space O, where

The knowledge distillation KD is decoupled into three-level category probability spaces, and the formula is:

make ⊙ represents dot product;

in, Represents the KL divergence TKD of the teacher-student model in the three defined spaces; Represents the KL divergence CKD of the teacher-student model on the non-target; It represents the KL divergence OKD of the teacher-student model among other classes;

The KL divergence TKD, KL divergence CKD and KL divergence OKD of the three-level category probability space are reorganized, and the knowledge transfer between the teacher network and the student network is realized based on the reorganization results.

A chip defect detection system, characterized in that it includes:

Acquisition module: used to acquire the one-dimensional vibration signal of the chip and convert it into two-dimensional image data;

Construction module: used to construct a neural architecture search algorithm model ASNDARTS, and to construct a knowledge distillation algorithm model DPSKD based on the neural architecture search algorithm model ASNDARTS;

Detection module: used to obtain the one-dimensional vibration signal of the chip to be detected and convert it into two-dimensional image data, and detect the two-dimensional image data of the chip to be detected through the knowledge distillation algorithm model DPSKD to determine whether the chip to be detected has defects.