CN118505600A

CN118505600A - Unified anomaly detection method based on multi-source uncertainty mining

Info

Publication number: CN118505600A
Application number: CN202410417739.6A
Authority: CN
Inventors: 钟羽中; 康玻瑞; 王茂宁; 邓霖; 张建伟
Original assignee: Sichuan University
Current assignee: Sichuan University
Priority date: 2024-04-09
Filing date: 2024-04-09
Publication date: 2024-08-16

Abstract

The invention belongs to the technical field of visual anomaly detection, and discloses a unified anomaly detection method based on multi-source uncertainty mining. According to the invention, in the abnormal segmentation network training process, the multi-source uncertainty mining network and the abnormal segmentation network are introduced, and the multi-source uncertainty mining network and the abnormal segmentation network are interactively trained layer by layer based on a cross attention mechanism, so that the abnormal segmentation network can be assisted to better pay attention to global features and abnormal local features of an image, and thus more accurate positioning is realized.

Description

Unified anomaly detection method based on multi-source uncertainty mining

技术领域Technical Field

本发明属于视觉异常检测技术领域，涉及一种基于多源不确定性挖掘的统一异常检测方法。The present invention belongs to the technical field of visual anomaly detection and relates to a unified anomaly detection method based on multi-source uncertainty mining.

背景技术Background Art

异常检测在计算机视觉和工业应用中起着关键作用。视觉异常检测的主要目标是准确识别异常图像并精确定位异常区域。现有的视觉异常检测方法可以分为三个范式：无监督、半监督和全监督。无监督方法可以有效地对正常样本进行建模，不依赖于先验的异常信息，并且可以检测产品中未知的异常形式。然而，其检测性能并不能令人满意。半监督和全监督方法侧重于对异常样本进行建模，需要指定异常形式并标注异常实例，这些方法对已知的异常具有较好的检测性能。然而，由于其限定在预定义的异常形式上，当面临未知的异常形式时，它们的检测性能大大减弱。Anomaly detection plays a key role in computer vision and industrial applications. The main goal of visual anomaly detection is to accurately identify abnormal images and precisely locate abnormal areas. Existing visual anomaly detection methods can be divided into three paradigms: unsupervised, semi-supervised, and fully supervised. Unsupervised methods can effectively model normal samples, do not rely on prior anomaly information, and can detect unknown abnormal forms in products. However, their detection performance is not satisfactory. Semi-supervised and fully supervised methods focus on modeling abnormal samples, which requires specifying the abnormal form and labeling the abnormal instances. These methods have good detection performance for known anomalies. However, since they are limited to predefined abnormal forms, their detection performance is greatly weakened when faced with unknown abnormal forms.

在工业领域中，通常先使用无监督方法快速建立检测模型。随后，通过引入部分异常样本，过渡到半监督或全监督方法，进一步提高异常检测性能。然而，这种策略存在一个挑战：两个阶段是割裂的，即无监督阶段获得的模型无法延续到后续的监督学习中。此外，针对多类产品异常检测的统一模型，可以仅用单个模型完成多个产品的检测，这更加适合实际的生产环境。现有的统一模型通常采用特征重构的方式，但仅在特征重构部分完成了统一，而在检测过程中却缺乏一个统一的决策边界。具体来说，当所有的类产品都使用统一的决策边界进行检测时，与为每个类别分别进行检测相比，性能更低。In the industrial field, unsupervised methods are usually used to quickly build detection models. Subsequently, by introducing some abnormal samples, a transition to semi-supervised or fully supervised methods is made to further improve the anomaly detection performance. However, there is a challenge with this strategy: the two stages are separated, that is, the model obtained in the unsupervised stage cannot be carried over to the subsequent supervised learning. In addition, a unified model for anomaly detection of multiple categories of products can complete the detection of multiple products with only a single model, which is more suitable for the actual production environment. Existing unified models usually adopt the feature reconstruction method, but only the unification is completed in the feature reconstruction part, while a unified decision boundary is lacking in the detection process. Specifically, when all categories of products are detected using a unified decision boundary, the performance is lower than that of detecting each category separately.

发明内容Summary of the invention

本发明目的旨在针对现有技术中存在的上述问题，提供一种基于多源不确定性挖掘的统一异常检测方法，能够再不预定义的情况下识别各种异常形式，并统一决策边界。The purpose of the present invention is to address the above-mentioned problems existing in the prior art and to provide a unified anomaly detection method based on multi-source uncertainty mining, which can identify various anomaly forms without pre-definition and unify decision boundaries.

为了达到上述目的，本发明采取以下技术方案来实现。In order to achieve the above object, the present invention adopts the following technical solutions to achieve it.

本发明提供了一种基于多源不确定性挖掘的统一异常检测方法，其包括以下步骤：The present invention provides a unified anomaly detection method based on multi-source uncertainty mining, which comprises the following steps:

S1基于多源不确定性挖掘的半监督学习方法训练异常分割网络，包括以下分步骤：S1 trains the anomaly segmentation network based on a semi-supervised learning method based on multi-source uncertainty mining, including the following steps:

S11利用包含正常样本和异常样本的数据构建训练集，并通过预训练的若干基础模型获取多源伪标签；S11 uses data containing normal samples and abnormal samples to construct a training set, and obtains multi-source pseudo labels through several pre-trained basic models;

S12利用任一基础模型获取训练集中样本的重建图像，并将样本与其重建图像之间的欧式距离作为有偏差异输入异常分割网络，得到异常分割图像；所述异常分割网络包括若干顺次设置的特征提取阶段，前一阶段的输出作为后一阶段的输入；至少部分特征提取阶段提取的图像特征输出至多源不确定性挖掘网络；S12 uses any basic model to obtain a reconstructed image of a sample in a training set, and uses the Euclidean distance between the sample and its reconstructed image as a biased difference to input an anomaly segmentation network to obtain an anomaly segmented image; the anomaly segmentation network includes a plurality of feature extraction stages arranged in sequence, and the output of the previous stage is used as the input of the next stage; at least part of the image features extracted in the feature extraction stage are output to a multi-source uncertainty mining network;

S13多源伪标签输入至多源不确定性挖掘网络，结合来自异常分割网络输出的图像特征，基于交叉注意力机制获取全局注意力分布，并生成不确定性权重图；S13 multi-source pseudo labels are input into the multi-source uncertainty mining network, combined with the image features output from the abnormal segmentation network, the global attention distribution is obtained based on the cross-attention mechanism, and an uncertainty weight map is generated;

S14基于多源伪标签、异常分割图像和不确定性权重图，构建损失函数并获取损失值；S14 constructs a loss function and obtains a loss value based on multi-source pseudo labels, abnormal segmentation images, and uncertainty weight maps;

S15利用损失值对异常分割网络和多源不确定性挖掘网络进行参数更新；S15 uses the loss value to update the parameters of the anomaly segmentation network and the multi-source uncertainty mining network;

重复上述步骤S12-S15，直至损失函数收敛，得到训练号的异常分割网络；Repeat the above steps S12-S15 until the loss function converges to obtain the abnormal segmentation network of the training number;

S2对待检测图像进行统一异常检测，包括以下分步骤：S2 performs unified anomaly detection on the image to be detected, including the following steps:

S21利用基础模型获取待检测图像的重建图像，并将待检测图像与其重建图像之间的欧式距离作为有偏差异；S21 obtains a reconstructed image of the image to be detected by using the basic model, and uses the Euclidean distance between the image to be detected and its reconstructed image as a biased difference;

S22将有偏差异输入异常分割网络得到异常分割图像。S22 inputs the biased difference into the anomaly segmentation network to obtain an anomaly segmented image.

上述步骤S12中，所述基础模型选自EdgRec、DRAEM、FastFlow或MSTAD等。所述异常分割网络中的每个特征提取阶段包括若干卷积模块；每个卷积模块由卷积层、批归一化层和ReLU激活函数组成。除最后一个特征提取阶段外的其他特征提取阶段后经下采样层进入下一个特征提取阶段，最后一个特征提取阶段经上采样层和卷积模块得到异常分割图像。除第一个特征提取阶段，其余特征提取阶段提取的图像特征输出至多源不确定性挖掘网络。In the above step S12, the basic model is selected from EdgRec, DRAEM, FastFlow or MSTAD, etc. Each feature extraction stage in the anomaly segmentation network includes several convolution modules; each convolution module consists of a convolution layer, a batch normalization layer and a ReLU activation function. After the other feature extraction stages except the last feature extraction stage, the next feature extraction stage is entered through the downsampling layer. The last feature extraction stage obtains the anomaly segmentation image through the upsampling layer and the convolution module. Except for the first feature extraction stage, the image features extracted in the remaining feature extraction stages are output to the multi-source uncertainty mining network.

上述步骤S13中，所述多源不确定性挖掘网络包括顺次设置的编码器、若干交叉注意力模块和解码器；所述编码器用于对输入图像特征进行编码；所述交叉注意力模块用于基于交叉注意力机制，获取异常分割网络相应特征提取阶段输出的图像特征对于多源伪标签的全局注意力分布，并与编码器输出或前一交叉注意力模块的输出进行叠加，然后进行补丁融合；所述解码器用于对输入图像特征进行解码，得到不确定性权重图。In the above step S13, the multi-source uncertainty mining network includes an encoder, several cross-attention modules and a decoder arranged in sequence; the encoder is used to encode the input image features; the cross-attention module is used to obtain the global attention distribution of the image features output by the corresponding feature extraction stage of the abnormal segmentation network for the multi-source pseudo-labels based on the cross-attention mechanism, and superimpose it with the encoder output or the output of the previous cross-attention module, and then perform patch fusion; the decoder is used to decode the input image features to obtain an uncertainty weight map.

在优选实现方式中，所述编码器包括三个卷积层和两个下采样层，两个下采样层分别位于相邻两个卷积层之间，其主要功能是提取特征并改变输入特征图像的大小。解码器包括一个卷积层和一个上采样层，主要用于输出能够准确表示像素级不确定性挖掘内容的不确定性权重图。In a preferred implementation, the encoder includes three convolutional layers and two downsampling layers, the two downsampling layers are respectively located between two adjacent convolutional layers, and their main function is to extract features and change the size of the input feature image. The decoder includes a convolutional layer and an upsampling layer, which is mainly used to output an uncertainty weight map that can accurately represent the pixel-level uncertainty mining content.

在优选实现方式中，所述交叉注意力模块数量与接入多源不确定性挖掘网络的特征提取阶段数相同。所述交叉注意力模块包括交叉注意力层、前反馈神经网络和补丁融合层。In a preferred implementation, the number of the cross attention modules is the same as the number of feature extraction stages accessing the multi-source uncertainty mining network. The cross attention module includes a cross attention layer, a feedforward neural network and a patch fusion layer.

上述步骤S14中，构建的损失函数为：In the above step S14, the loss function constructed is:

式中，θ_S和θ_∑分别是异常分割网络和多源不确定性挖掘网络的可训练参数；表示基于第m个伪标签得到的第i像素使用伪标签和非归一化得分Sⁱ计算的交叉熵损失；表示预测对数方差，表示使用伪标签得到的第i像素的不确定性权重；M表示伪标签数量，H表示图像的高，W表示图像的宽。Where, θ _S and θ _∑ are the trainable parameters of the anomaly segmentation network and the multi-source uncertainty mining network, respectively; Indicates that the i-th pixel obtained based on the m-th pseudo label uses the pseudo label and the cross entropy loss calculated with the unnormalized score ^Si ; represents the predicted log variance, Indicates the use of pseudo labels The uncertainty weight of the i-th pixel is obtained; M represents the number of pseudo labels, H represents the height of the image, and W represents the width of the image.

上述步骤S2中，为了完善对待检测图像的异常检测，该步骤还包括：In the above step S2, in order to improve the abnormality detection of the image to be detected, this step also includes:

S23对异常分割图像进行全局平均池化处理，以最大值作为待检测图的异常分数；S23 performs global average pooling processing on the abnormal segmented image, and takes the maximum value as the abnormal score of the image to be detected;

S24对异常分割图像和待检测图像进行加权融合，得到异常热图。S24 performs weighted fusion on the abnormal segmentation image and the image to be detected to obtain an abnormal heat map.

与现有技术相比，本发明提供的基于多源不确定性挖掘的统一异常检测方法具有以下有益效果：Compared with the prior art, the unified anomaly detection method based on multi-source uncertainty mining provided by the present invention has the following beneficial effects:

1)本发明在异常分割网络训练过程中，引入多源不确定性挖掘网络和异常分割网络，基于交叉注意力机制，使得多源不确定性挖掘网络和异常分割网络逐层进行交互训练，能够协助异常分割网络更好的关注图像的全局特征和异常的局部特征，从而实现更精准的定位；1) In the training process of the anomaly segmentation network, the present invention introduces a multi-source uncertainty mining network and anomaly segmentation network. Based on the cross-attention mechanism, the multi-source uncertainty mining network and the anomaly segmentation network are interactively trained layer by layer, which can help the anomaly segmentation network to better focus on the global features of the image and the local features of the anomaly, thereby achieving more accurate positioning;

2)本发明利用无监督学习获得的基础模型获取异常的有偏差异；通过利用未标记的异常样本，来进一步改善基础模型在异常检测中的性能；2) The present invention uses the basic model obtained by unsupervised learning to obtain the biased difference of anomalies; by using unlabeled abnormal samples, the performance of the basic model in anomaly detection is further improved;

3)本发明利用一种不确定性加权的损失函数，并将模型的伪标签视为贝叶斯框架中的吉布斯分布；通过该损失函数，可以同时训练多源不确定性挖掘网络和异常分割网络，并使异常分割网络能够在统一的决策边界条件下进行异常检测。3) The present invention utilizes an uncertainty-weighted loss function and regards the pseudo-labels of the model as Gibbs distribution in a Bayesian framework; through this loss function, a multi-source uncertainty mining network and anomaly segmentation network can be trained simultaneously, and the anomaly segmentation network can perform anomaly detection under unified decision boundary conditions.

4)本发明在MVTec AD数据集上进行了广泛的实验，并证明了本发明方法的有效性；此外，当使用自激励方法，即不依赖于其他模型的伪标签时，异常检测性能仍然可以提高。4) Extensive experiments are conducted on the MVTec AD dataset and the effectiveness of the proposed method is demonstrated; in addition, when a self-motivated approach is used, i.e., pseudo-labels that do not rely on other models, the anomaly detection performance can still be improved.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为基于多源不确定性挖掘的统一异常检测方法流程示意图；FIG1 is a flow chart of a unified anomaly detection method based on multi-source uncertainty mining;

图2为原始图像及相应标签；其中(a)为原始图像，(b)为真实标签，(c)为本发明方法得到的定位标签，(d)为EdgRec模型得到的伪标签，(e)为DRAEM模型得到的伪标签，(f)为FastFlow模型得到的伪标签，(g)为MSTAD模型得到的伪标签；Fig. 2 shows the original image and the corresponding labels; (a) is the original image, (b) is the real label, (c) is the positioning label obtained by the method of the present invention, (d) is the pseudo label obtained by the EdgRec model, (e) is the pseudo label obtained by the DRAEM model, (f) is the pseudo label obtained by the FastFlow model, and (g) is the pseudo label obtained by the MSTAD model;

图3为原始图像及不同方法得到的异常定位图。Figure 3 shows the original image and the abnormality localization maps obtained by different methods.

具体实施方式DETAILED DESCRIPTION

以下将结合附图对本发明各实施例的技术方案进行清楚、完整的描述，显然，所描述实施例仅仅是本发明的一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动的前提下所得到的所有其它实施例，都属于本发明所保护的范围。The following will clearly and completely describe the technical solutions of various embodiments of the present invention in conjunction with the accompanying drawings. Obviously, the described embodiments are only part of the embodiments of the present invention, not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by ordinary technicians in this field without making creative work are within the scope of protection of the present invention.

实施例1Example 1

本发明提出了一种将像素重构模型或特征重构模型作为基础模型的新颖半监督异常检测策略。引入多源不确定性挖掘网络(MUMNet)和异常分割网络(ASNet)，在训练过程中基于交叉注意力机制，使得MUMNet与ASNet逐层进行交互，帮助ASNet更好地关注图像的全局特征和异常的局部特征，从而实现更精准的定位。This paper proposes a novel semi-supervised anomaly detection strategy that uses a pixel reconstruction model or a feature reconstruction model as a basic model. The multi-source uncertainty mining network (MUMNet) and the anomaly segmentation network (ASNet) are introduced. During the training process, based on the cross-attention mechanism, MUMNet and ASNet interact layer by layer, helping ASNet to better focus on the global features of the image and the local features of the anomaly, thereby achieving more accurate positioning.

本实施例使用的数据为MVTec AD数据集，MVTec AD数据集包含3629张正常图像，涵盖了5个不同的纹理类别和10个独特的物体类别。同时，测试集包括467张正常图像和1258张异常图像。对于测试集中的每个异常样本，数据集提供了图像的标签和分割信息作为真值。The data used in this embodiment is the MVTec AD dataset, which contains 3629 normal images, covering 5 different texture categories and 10 unique object categories. At the same time, the test set includes 467 normal images and 1258 abnormal images. For each abnormal sample in the test set, the dataset provides the image label and segmentation information as the true value.

1、基础模型1. Basic model

本实施例使用的基础模型选自EdgRec、DRAEM、FastFlow或MSTAD。The basic model used in this embodiment is selected from EdgRec, DRAEM, FastFlow or MSTAD.

这些基础模型均为无监督重构模型利用上述基础模型输入和输出之间的欧氏距离来表示各种形式异常的有偏差异。这种方法消除了对手动标记异常的依赖。本实施例基于多源不确定性挖掘的统一异常检测方法(MUM-UAD)建立在这个基础上，并采用渐进式的学习策略。然而，由于无监督模型仅通过正常样本进行学习，无法利用异常样本，因此不能沿用与后续的半监督阶段。为了解决这个问题，MUM-UAD采用了一种不确定性挖掘学习策略，将基本模型作为无监督学习和半监督学习阶段之间的连接，从而提高检测性能并更好地适应实际生产环境。These basic models are all unsupervised reconstruction models that use the Euclidean distance between the input and output of the above basic models to represent the biased differences of various forms of anomalies. This method eliminates the reliance on manually labeled anomalies. The unified anomaly detection method (MUM-UAD) based on multi-source uncertainty mining in this embodiment is built on this basis and adopts a progressive learning strategy. However, since the unsupervised model is only learned through normal samples and cannot utilize abnormal samples, it cannot be used in the subsequent semi-supervised stage. To solve this problem, MUM-UAD adopts an uncertainty mining learning strategy, using the basic model as a connection between the unsupervised learning and semi-supervised learning stages, thereby improving detection performance and better adapting to actual production environments.

2、异常分割网络(ASNet)2. Anomaly Segmentation Network (ASNet)

异常分割网络(ASNet)主要目标是在最终的决策过程中生成高精度的异常分割图。为了实现这一目标，本实施例对ResNet34的网络架构进行了适当的修改。The main goal of the anomaly segmentation network (ASNet) is to generate a high-precision anomaly segmentation map in the final decision-making process. To achieve this goal, this embodiment makes appropriate modifications to the network architecture of ResNet34.

本实施例提供的异常分割网络(ASNet)，如图1(a)所示，其包括四个顺次设置的特征提取阶段(Stage0-Stage3)，前一阶段的输出作为后一阶段的输入。每个特征提取阶段包括若干卷积模块，数量分别表示为N₀＝3、N₁＝4、N₂＝6和N₃＝3；可以跟所采用的基础模型类型对网络结构进行相应的调整。每个卷积模块由卷积层、批归一化层和ReLU激活函数组成。除最后一个特征提取阶段外的其他特征提取阶段后经下采样层进入下一个特征提取阶段，最后一个特征提取阶段经上采样层和卷积模块得到异常分割图像。除第一个特征提取阶段(Stage0)，其余特征提取阶段(Stage1-Stage3)提取的图像特征输出至多源不确定性挖掘网络。The anomaly segmentation network (ASNet) provided in this embodiment, as shown in Figure 1(a), includes four feature extraction stages (Stage0-Stage3) arranged in sequence, and the output of the previous stage is used as the input of the next stage. Each feature extraction stage includes a number of convolution modules, and the number is expressed as _N0 = 3, _N1 = 4, _N2 = 6 and _N3 = 3 respectively; the network structure can be adjusted accordingly according to the type of basic model adopted. Each convolution module consists of a convolution layer, a batch normalization layer and a ReLU activation function. After the feature extraction stages except the last feature extraction stage, the other feature extraction stages enter the next feature extraction stage through the downsampling layer, and the last feature extraction stage obtains the anomaly segmentation image through the upsampling layer and the convolution module. Except for the first feature extraction stage (Stage0), the image features extracted by the remaining feature extraction stages (Stage1-Stage3) are output to the multi-source uncertainty mining network.

对于特征重构而言，ASNet采用ResNet34的倒置结构。具体而言，所有的下采样操作被替换为上采样，并且颠倒所有阶段的顺序。对于像素重构模型或无基础模型的情况，ASNet遵循标准的ResNet34结构，但将最后的线性输出层替换为上采样层以调整输出大小。这里，使用ASNet从基础模型获取的有偏差异中提取多尺度特征，从而得到无偏的异常分割图。For feature reconstruction, ASNet adopts the inverted structure of ResNet34. Specifically, all downsampling operations are replaced by upsampling, and the order of all stages is reversed. For pixel reconstruction models or cases without base models, ASNet follows the standard ResNet34 structure, but replaces the last linear output layer with an upsampling layer to adjust the output size. Here, ASNet is used to extract multi-scale features from the biased differences obtained from the base model, resulting in an unbiased anomaly segmentation map.

本实施例中，ASNet使用ResNet34的预训练权重进行初始化，并且限制最深特征图的大小为14×14。In this embodiment, ASNet is initialized using the pre-trained weights of ResNet34, and the size of the deepest feature map is limited to 14×14.

3、多源不确定性挖掘网络(MUMNet)3. Multi-source Uncertainty Mining Network (MUMNet)

MUMNet旨在通过分析伪标签和输入图像，准确识别每个像素的可靠性。具体而言，通过同时挖掘由多种异常检测模型生成的多个伪标签之间的共性和差异性，MUMNet可以有效地捕捉可靠的标签样本。MUMNet aims to accurately identify the reliability of each pixel by analyzing pseudo labels and input images. Specifically, by simultaneously mining the commonalities and differences between multiple pseudo labels generated by multiple anomaly detection models, MUMNet can effectively capture reliable label samples.

多源不确定性挖掘网络包括顺次设置的编码器、若干交叉注意力模块和解码器。The multi-source uncertainty mining network consists of an encoder, several cross-attention modules and a decoder arranged in sequence.

编码器用于对输入图像特征进行编码。编码器包括三个卷积层和两个下采样层，两个下采样层分别位于相邻两个卷积层之间，其主要功能是提取特征并改变输入特征图像的大小。The encoder is used to encode the input image features. The encoder includes three convolutional layers and two downsampling layers. The two downsampling layers are located between two adjacent convolutional layers. Their main function is to extract features and change the size of the input feature image.

解码器用于对输入图像特征进行解码，得到不确定性权重图。解码器包括一个卷积层和一个上采样层，主要用于输出能够准确表示像素级不确定性挖掘内容的不确定性权重图。The decoder is used to decode the input image features to obtain an uncertainty weight map. The decoder includes a convolution layer and an upsampling layer, which is mainly used to output an uncertainty weight map that can accurately represent the pixel-level uncertainty mining content.

交叉注意力模块数量用于基于交叉注意力机制，获取异常分割网络相应特征提取阶段输出的图像特征对于多源伪标签的全局注意力分布，并与编码器输出或前一交叉注意力模块的输出进行叠加，并进行补丁融合。交叉注意力模块数量与接入多源不确定性挖掘网络的特征提取阶段数相同；具体而言，本实施例中交叉注意力模块数量为3个，分别为第一交叉注意力模块至第三交叉注意力模块。如图1(a)所示，交叉注意力模块包括交叉注意力层、前反馈神经网络和补丁融合层。异常分割网络中的Stage1-至Stage3提取的特征图像分别输入至第一交叉注意力模块至第三交叉注意力模块的交叉注意力层。The number of cross-attention modules is used to obtain the global attention distribution of the image features output by the corresponding feature extraction stage of the abnormal segmentation network for multi-source pseudo-labels based on the cross-attention mechanism, and superimpose it with the encoder output or the output of the previous cross-attention module, and perform patch fusion. The number of cross-attention modules is the same as the number of feature extraction stages connected to the multi-source uncertainty mining network; specifically, the number of cross-attention modules in this embodiment is 3, namely the first cross-attention module to the third cross-attention module. As shown in Figure 1(a), the cross-attention module includes a cross-attention layer, a feedforward neural network and a patch fusion layer. The feature images extracted from Stage1 to Stage3 in the abnormal segmentation network are respectively input into the cross-attention layers of the first cross-attention module to the third cross-attention module.

交叉注意力层，使用线性投影将从ASNet提取的图像特征转换为同时将从伪标签导出的特征，记为线性投影为和这里，i表示交叉注意力模块所在阶段数，n表示特征图中的元素数量，c表示特征的维度。第i个阶段的详细信息如下：The criss-cross attention layer uses linear projection to transform the image features extracted from ASNet into At the same time, the features derived from the pseudo-labels are recorded as The linear projection is and Here, i represents the stage number of the cross-attention module, n represents the number of elements in the feature map, and c represents the dimension of the feature. The detailed information of the i-th stage is as follows:

通过利用交叉注意力机制，可以获得在该特定阶段图像特征对于多源伪标签特征的全局注意力分布。随后，使用前馈神经网络(FFN)将信息整合，并通过补丁融合进行下采样，以便与下一个阶段中的特征信息进行新的交互。By utilizing the cross-attention mechanism, the global attention distribution of the image features for the multi-source pseudo-label features at this particular stage can be obtained. Subsequently, the information is integrated using a feedforward neural network (FFN) and downsampled through patch fusion to interact with the feature information in the next stage.

fⁱ⁺¹＝PatchMerging(FFN(f′)) (2)；f ⁱ⁺¹ =PatchMerging(FFN(f′)) (2);

该模块增强了ASNet中不同尺度特征对异常信息的感知能力，并帮助MUMNet生成更鲁棒的不确定性图。This module enhances the perception of abnormal information by features of different scales in ASNet and helps MUMNet generate more robust uncertainty maps.

本实施例中，MUMNet使用均匀分布进行初始化，交叉注意力模块与三种尺度的特征图交互，它们分别是：56×56，28×28和14×14。In this embodiment, MUMNet is initialized with a uniform distribution, and the cross-attention module interacts with feature maps of three scales: 56×56, 28×28, and 14×14.

基于上述解释，本实施例提供的基于多源不确定性挖掘的统一异常检测方法，其包括以下步骤：Based on the above explanation, the unified anomaly detection method based on multi-source uncertainty mining provided in this embodiment includes the following steps:

S11利用包含正常样本和异常样本的数据构建训练集，并通过预训练的若干基础模型获取多源伪标签。S11 uses data containing normal samples and abnormal samples to construct a training set, and obtains multi-source pseudo labels through several pre-trained basic models.

由于MVTec AD数据集只有测试集中包含异常样本，本实施例重新定义了数据集。原始的MVTec AD训练集只包含正常样本，被称为“基础训练集”，而包含异常样本和少量正常样本的测试集，被称为“基础测试集”。然后，从“基础测试集”中的每个产品类别中选择若干异常样本和正常样本，形成“新训练集”。“基础测试集”中剩下的样本形成“新测试集”。同时将所有输入图像的分辨率调整为224×224。Since the MVTec AD dataset only contains abnormal samples in the test set, this embodiment redefines the dataset. The original MVTec AD training set contains only normal samples and is called the "basic training set", while the test set containing abnormal samples and a small number of normal samples is called the "basic test set". Then, several abnormal samples and normal samples are selected from each product category in the "basic test set" to form a "new training set". The remaining samples in the "basic test set" form the "new test set". At the same time, the resolution of all input images is adjusted to 224×224.

使用“基础训练集”对上述基础模型进行预训练，并保留各基础模型的参数权重。此外，还选择了在“基础训练集”上训练的四个模型(EdgRec，DRAEM，FastFlow，MSTAD)生成的异常得分图中覆盖98％得分的数值作为阈值。The above basic models are pre-trained using the "basic training set", and the parameter weights of each basic model are retained. In addition, the value covering 98% of the scores in the anomaly score map generated by the four models (EdgRec, DRAEM, FastFlow, MSTAD) trained on the "basic training set" is selected as the threshold.

利用预训练的上述基础模型对“新训练集”进行检测，生成异常得分图。然后，这些异常得分图根据阈值进行二值化，作为多源伪标签。The pre-trained basic model is used to detect the "new training set" and generate anomaly score maps. These anomaly score maps are then binarized according to the threshold and used as multi-source pseudo labels.

S12利用任一基础模型获取训练集中样本的重建图像，并将样本与其重建图像之间的欧式距离作为有偏差异输入异常分割网络，得到异常分割图像。S12 uses any basic model to obtain the reconstructed image of the sample in the training set, and uses the Euclidean distance between the sample and its reconstructed image as the biased difference input into the anomaly segmentation network to obtain the anomaly segmentation image.

同时，异常分割网络中的特征提取阶段Stage1-Stage3提取的特征图像输出至多源不确定性挖掘网络相应的交叉注意力模块。At the same time, the feature images extracted by the feature extraction stages Stage1-Stage3 in the anomaly segmentation network are output to the corresponding cross-attention module of the multi-source uncertainty mining network.

S13多源伪标签输入至多源不确定性挖掘网络，结合来自异常分割网络输出的图像特征，基于交叉注意力机制获取全局注意力分布，并生成不确定性权重图。S13 multi-source pseudo labels are input into the multi-source uncertainty mining network, combined with the image features output from the anomaly segmentation network, and the global attention distribution is obtained based on the cross-attention mechanism to generate an uncertainty weight map.

本步骤利用前面给出的多源不确定性挖掘网络生成不确定性权重图。This step uses the multi-source uncertainty mining network given above to generate an uncertainty weight graph.

最后一个交叉注意力模块输出经解码器进行解码，得到能够准确表示像素级不确定性挖掘内容的不确定性权重图。The output of the last cross-attention module is decoded by the decoder to obtain an uncertainty weight map that can accurately represent the pixel-level uncertainty mining content.

S14基于多源伪标签、异常分割图像和不确定性权重图，构建损失函数并获取损失值。S14 constructs a loss function and obtains the loss value based on multi-source pseudo labels, abnormal segmentation images and uncertainty weight maps.

在M个异常检测方法(对应于M个伪标签)中，将第m个方法生成的像素i的伪标签表示为其中c＝1表示异常区域，c＝0表示正常区域，对应的不确定性权重图由表示。将伪标签建模为一个在贝叶斯理论下服从吉布斯分布的随机变量y。当使用Softmax函数来归一化异常分数时，y的概率分布可以计算为：Among the M anomaly detection methods (corresponding to M pseudo labels), the pseudo label of pixel i generated by the mth method is represented as Where c = 1 represents an abnormal area, c = 0 represents a normal area, The corresponding uncertainty weight map is given by Represents. The pseudo label is modeled as a random variable y that follows the Gibbs distribution under Bayesian theory. When the Softmax function is used to normalize the anomaly score, the probability distribution of y can be calculated as:

当给定观测到的伪标签负对数似然可以进一步推导如下：When given the observed pseudo-label The negative log-likelihood can be further derived as follows:

其中，表示基于第m个伪标签得到的第i像素使用伪标签和非归一化得分Sⁱ(由异常分割图像给出)计算的交叉熵损失。然而，在实践中，可以通过预测对数方差增加训练过程中的数值稳定性，表示使用伪标签得到的第i像素的不确定性权重。in, Indicates that the i-th pixel obtained based on the m-th pseudo label uses the pseudo label and the cross entropy loss calculated with the unnormalized scores ^Si (given by the anomaly segmentation images). However, in practice, one can use the predicted log variance Increase numerical stability during training, Indicates the use of pseudo labels The uncertainty weight of the i-th pixel is obtained.

因此，可以将损失重新表述如下：Therefore, the loss can be restated as follows:

式中，θ_S和θ_∑分别是异常分割网络和多源不确定性挖掘网络的可训练参数；公式(5)被称为不确定性加权损失，它有助于整个网络的联合学习。因此，根据公式(4)和公式(5)，并扩展到所有M种伪标签，得到最终的损失函数为：Where θ _S and θ _∑ are the trainable parameters of the anomaly segmentation network and the multi-source uncertainty mining network, respectively; Formula (5) is called the uncertainty weighted loss, which helps the joint learning of the entire network. Therefore, according to Formula (4) and Formula (5), and extended to all M pseudo labels, the final loss function is:

在公式(6)的监督下，MUMNet和ASNet共同参与训练，以增强MUMNet捕捉详细异常特征信息的能力，从而提高ASNet在异常检测中的性能。Under the supervision of formula (6), MUMNet and ASNet are trained together to enhance the ability of MUMNet to capture detailed abnormal feature information, thereby improving the performance of ASNet in anomaly detection.

S15利用损失值对异常分割网络和多源不确定性挖掘网络进行参数更新。S15 uses the loss value to update the parameters of the anomaly segmentation network and the multi-source uncertainty mining network.

重复上述步骤S12-S15，直至损失函数收敛，得到训练号的异常分割网络。Repeat the above steps S12-S15 until the loss function converges to obtain the abnormal segmentation network of the training number.

损失函数收敛可以通过损失值变化或设置迭代次数上限来判断。当损失值趋于稳定或者达到迭代次数上限，表面损失函数收敛，训练结束。The convergence of the loss function can be judged by the change in the loss value or by setting an upper limit on the number of iterations. When the loss value tends to be stable or reaches the upper limit of the number of iterations, the loss function converges and the training ends.

本实施例中，设置MUMNet和ASNet两个网络的学习率设置为1e-4，并且采用Adam优化器进行端到端联合训练。训练过程中，将“新训练集”划分为6个批次，对网络进行分批训练。使用5个不同的随机种子进行评估。整个训练过程在GeForce 4080GPU上大约需要200个epochs。In this embodiment, the learning rate of MUMNet and ASNet is set to 1e-4, and the Adam optimizer is used for end-to-end joint training. During the training process, the "new training set" is divided into 6 batches, and the network is trained in batches. Five different random seeds are used for evaluation. The entire training process takes about 200 epochs on a GeForce 4080 GPU.

S2对待检测图像进行统一异常检测。S2 performs unified anomaly detection on the image to be detected.

本实施例中以“新测试集”中的样本作为待检测图像，按照以下分步骤进行统一异常检测：In this embodiment, samples in the "new test set" are used as images to be detected, and unified anomaly detection is performed according to the following steps:

S21利用基础模型获取待检测图像的重建图像，并将待检测图像与其重建图像之间的欧式距离作为有偏差异。S21 obtains a reconstructed image of the image to be detected by using the basic model, and takes the Euclidean distance between the image to be detected and its reconstructed image as a biased difference.

S23对异常分割图像进行全局平均池化处理，以最大值作为待检测图的异常分数；所得异常分数可以用于异常检测。S23 performs global average pooling processing on the abnormal segmented image, and takes the maximum value as the abnormal score of the image to be detected; the obtained abnormal score can be used for abnormality detection.

本实施例中，用于计算异常分数的平均池化的大小设置为80。In this embodiment, the size of the average pooling used to calculate the anomaly score is set to 80.

S24对异常分割图像和待检测图像进行加权融合，得到异常热图；所得异常热图可以用于异常定位。S24 performs weighted fusion on the abnormal segmentation image and the image to be detected to obtain an abnormal heat map; the obtained abnormal heat map can be used for abnormality positioning.

本实施例中，按照α·I+(1-α)·S′得到异常热图，I代表输入图像，S′表示输出的异常分割图像S经过colormap转换后的得分图，α表示权重系数；具体地，可借鉴图1中所示。In this embodiment, the abnormal heat map is obtained according to α·I+(1-α)·S′, where I represents the input image, S′ represents the score map of the output abnormal segmentation image S after colormap conversion, and α represents the weight coefficient; specifically, it can refer to that shown in FIG. 1.

本实施例，在四个无监督异常检测重建模型上进行实验，包括针对单类检测设计的模型(DRAEM和EdgRerc)，针对多类检测设计的模型(UniAD和MSTAD)，以及两个半监督异常检测模型(DRA和BGAD)。具体而言，无监督模型在MVTec AD的“基础训练集”上进行了训练，并保留它们的权重。在后续的半监督阶段，选择无监督模型中的一个作为基础模型，结合MUM-UAD策略，使用“新训练集”对MUMNet和ASNet进行进一步的训练。在融合MUM-UAD之前和之后，对无监督模型进行了“新测试集”的统一决策边界测试。In this embodiment, experiments are conducted on four unsupervised anomaly detection reconstruction models, including models designed for single-class detection (DRAEM and EdgRerc), models designed for multi-class detection (UniAD and MSTAD), and two semi-supervised anomaly detection models (DRA and BGAD). Specifically, the unsupervised models were trained on the "basic training set" of MVTec AD, and their weights were retained. In the subsequent semi-supervised stage, one of the unsupervised models was selected as the base model, and MUMNet and ASNet were further trained using the "new training set" in combination with the MUM-UAD strategy. Before and after fusing MUM-UAD, the unsupervised models were tested with a unified decision boundary on the "new test set".

表1在统一决策边界条件下，比较在引入MUMAD-UAD策略前后的无监督基础模型结果。Table 1 compares the results of the unsupervised base model before and after the introduction of the MUMAD-UAD strategy under the unified decision boundary condition.

“Det.”表示图像级别的AUROC，“Loc.”表示像素级别的AUROC，MUM-UAD策略用“ours”表示，每个前后对比中的最佳结果以粗体突出显示。“Det.” denotes image-level AUROC, “Loc.” denotes pixel-level AUROC, the MUM-UAD strategy is denoted by “ours”, and the best result in each before-after comparison is highlighted in bold.

AUROC表示接收者操作特征曲线下面积，用来评估提出的异常检测和定位方法的性能。AUROC stands for the area under the receiver operating characteristic curve and is used to evaluate the performance of the proposed anomaly detection and localization methods.

Aupro表示取余重叠曲线下面积，用于度量异常定位评估，确保在评估过程中所有尺寸的异常都被同等地看待。Aupro stands for Area Under the Modulus Overlap Curve and is used to measure anomaly location assessment to ensure that anomalies of all sizes are treated equally during the assessment process.

表2在统一决策边界条件下，半监督方法的比较结果Table 2 Comparison results of semi-supervised methods under unified decision boundary conditions

“B.M.”表示基础模型(例如MSTAD)，而“ours”代表MUM-UAD策略。“B.M.” indicates the base model (e.g., MSTAD), while “ours” represents the MUM-UAD strategy.

ours w/o B.M.表示将本实施例缺少基础模型的情况，此时仍基于MUM-UAD策略，借助伪标签对MUMNet和ASNet进行训练，再对训练后的ASNet进行测试。Ours w/o B.M. represents the situation where the basic model is missing in this embodiment. At this time, based on the MUM-UAD strategy, MUMNet and ASNet are trained with the help of pseudo labels, and then the trained ASNet is tested.

最佳结果以粗体突出显示。The best results are highlighted in bold.

如表1所示，当无监督模型结合MUM-UAD策略时，可以使其应用于半监督任务中，并在检测性能方面产生不同程度的提升。具体而言，当基础模型专注于单类异常检测任务时，结合该策略后，在图像级别AUC或像素级别AUC方面展现出超过10％的改进。这种增强凸显了MUM-UAD在赋予单类异常检测模型解决多类异常问题方面的有效性。相反，当基础模型专注于多类异常检测任务时，本发明方法表现出更加优越的性能，突显出其适用于多类情景。我们对UniAD和MSTAD模型进行了进一步研究。如图2及图3所示，结合本发明策略后，这些模型对异常建立了更精确的分割边界，有效降低了对非异常区域的关注，从而显著提高了检测性能。这进一步证实了将MUM-UAD集成到ASNet中可以显著增强其对异常的局部感知能力，并有助于提升其对于图像的整体理解能力。As shown in Table 1, when the unsupervised model is combined with the MUM-UAD strategy, it can be applied to semi-supervised tasks and produce different degrees of improvement in detection performance. Specifically, when the base model focuses on a single-class anomaly detection task, the combination of this strategy shows an improvement of more than 10% in image-level AUC or pixel-level AUC. This enhancement highlights the effectiveness of MUM-UAD in empowering a single-class anomaly detection model to solve multi-class anomaly problems. On the contrary, when the base model focuses on multi-class anomaly detection tasks, the method of the present invention shows more superior performance, highlighting its applicability to multi-class scenarios. We further studied the UniAD and MSTAD models. As shown in Figures 2 and 3, after combining the strategy of the present invention, these models established more accurate segmentation boundaries for anomalies, effectively reducing the focus on non-abnormal areas, thereby significantly improving detection performance. This further confirms that integrating MUM-UAD into ASNet can significantly enhance its local perception of anomalies and help improve its overall understanding of images.

对于半监督模型(DRA和BGAD)而言，它们直接在“新训练集”上进行训练，并在“新测试集”上进行测试。本发明方法在传统的半监督异常检测方法中表现出色，如表2所示，尤其是在没有无监督基础模型的情况下。此外，当利用基础模型提供的知识时，本发明方法取得了更好的结果，进一步验证了其有效性。For the semi-supervised models (DRA and BGAD), they are directly trained on the “new training set” and tested on the “new test set”. The proposed method performs well in the traditional semi-supervised anomaly detection methods, as shown in Table 2, especially in the absence of an unsupervised base model. In addition, when the knowledge provided by the base model is utilized, the proposed method achieves better results, further verifying its effectiveness.

本领域的普通技术人员将会意识到，这里的实施例是为了帮助读者理解本发明的原理，应被理解为本发明的保护范围并不局限于这样的特别陈述和实施例。本领域的普通技术人员可以根据本发明公开的这些技术启示做出各种不脱离本发明实质的其它各种具体变形和组合，这些变形和组合仍然在本发明的保护范围内。Those skilled in the art will appreciate that the embodiments herein are intended to help readers understand the principles of the present invention, and should be understood that the protection scope of the present invention is not limited to such specific statements and embodiments. Those skilled in the art can make various other specific variations and combinations that do not deviate from the essence of the present invention based on the technical revelations disclosed by the present invention, and these variations and combinations are still within the protection scope of the present invention.

Claims

1. The unified anomaly detection method based on multi-source uncertainty mining is characterized by comprising the following steps of:

s1, training an abnormal segmentation network based on a multi-source uncertainty mining semi-supervised learning method, wherein the method comprises the following steps of:

s11, constructing a training set by utilizing data comprising normal samples and abnormal samples, and acquiring multi-source pseudo tags through a plurality of pre-trained basic models;

S12, acquiring a reconstructed image of a sample in a training set by using any basic model, and inputting the Euclidean distance between the sample and the reconstructed image thereof as a biased difference into an abnormal segmentation network to obtain an abnormal segmentation image; the abnormal segmentation network comprises a plurality of characteristic extraction stages which are sequentially arranged, wherein the output of the former stage is used as the input of the latter stage; outputting the image features extracted in at least part of the feature extraction stage to a multi-source uncertainty mining network;

s13, inputting a multi-source pseudo tag into a multi-source uncertainty mining network, combining image features output from an anomaly segmentation network, acquiring global attention distribution based on a cross attention mechanism, and generating an uncertainty weight graph;

S14, constructing a loss function and acquiring a loss value based on the multi-source pseudo tag, the abnormal segmentation image and the uncertainty weight graph;

s15, updating parameters of the abnormal segmentation network and the multi-source uncertainty mining network by using the loss value;

Repeating the steps S12-S15 until the loss function converges to obtain an abnormal segmentation network of the training number;

S2, carrying out unified anomaly detection on the image to be detected, wherein the method comprises the following sub-steps:

S21, acquiring a reconstructed image of an image to be detected by using a basic model, and taking the Euclidean distance between the image to be detected and the reconstructed image thereof as a biased difference;

s22, inputting the deviation into an anomaly segmentation network to obtain an anomaly segmentation image.

2. The unified anomaly detection method based on multi-source uncertainty mining of claim 1, wherein in step S12, the base model is selected from EdgRec, DRAEM, fastFlow or MSTAD.

3. The unified anomaly detection method based on multi-source uncertainty mining of claim 1, wherein each feature extraction stage in the anomaly segmentation network comprises a number of convolution modules; each convolution module consists of a convolution layer, a batch normalization layer, and a ReLU activation function.

4. A unified anomaly detection method based on multi-source uncertainty mining according to claim 3, wherein the next feature extraction stage is entered via a downsampling layer after the feature extraction stages other than the last feature extraction stage, and the last feature extraction stage obtains an anomaly segmented image via an upsampling layer and a convolution module.

5. The unified anomaly detection method based on multi-source uncertainty mining of any one of claims 1 to 4, wherein in step S13, the multi-source uncertainty mining network comprises an encoder, several cross-attention modules, and a decoder arranged in sequence; the encoder is used for encoding the input image characteristics; the cross attention module is used for acquiring the global attention distribution of the image features output by the corresponding feature extraction stage of the abnormal segmentation network on the multi-source pseudo tag based on a cross attention mechanism, and superposing the image features with the output of the encoder or the output of the previous cross attention module, and carrying out patch fusion; the decoder is used for decoding the input image characteristics to obtain an uncertainty weight graph.

6. The unified anomaly detection method based on multi-source uncertainty mining of claim 5, wherein the encoder comprises three convolutional layers and two downsampling layers, the two downsampling layers being located between two adjacent convolutional layers, respectively; the decoder includes a convolutional layer and an upsampling layer.

7. The unified anomaly detection method based on multi-source uncertainty mining of claim 5, wherein the cross-attention module comprises a cross-attention layer, a feed-forward neural network, and a patch fusion layer.

8. The unified anomaly detection method based on multi-source uncertainty mining of claim 5, wherein in step S14, the constructed loss function is:

Wherein, theta _S and theta _∑ are trainable parameters of the anomaly segmentation network and the multi-source uncertainty mining network respectively; Representing the use of pseudo tags by the ith pixel based on the mth pseudo tag And cross entropy loss calculated by the non-normalized score S ⁱ; Representing the predicted logarithmic variance of the prediction, Indicating use of pseudo tagsThe uncertainty weight of the ith pixel is obtained; m represents the number of pseudo tags, H represents the height of the image, and W represents the width of the image.

9. The unified anomaly detection method based on multi-source uncertainty mining of claim 1, wherein step S2 further comprises:

S23, carrying out global average pooling treatment on the abnormal segmentation image, and taking the maximum value as the abnormal score of the image to be detected;

s24, carrying out weighted fusion on the abnormal segmentation image and the image to be detected to obtain an abnormal heat map.

10. The unified anomaly detection method based on multi-source uncertainty mining of claim 8, wherein step S2 further comprises: