WO2025107271A1

WO2025107271A1 - Unsupervised super-pixel segmentation method and system assisted by collaboration between atrous pyramid and attention mechanism

Info

Publication number: WO2025107271A1
Application number: PCT/CN2023/133843
Authority: WO
Inventors: 李世华; 罗富贵; 郭雨阳; 行敏锋
Original assignee: Yangtze River Delta Research Institute of UESTC Huzhou
Current assignee: Yangtze River Delta Research Institute of UESTC Huzhou
Priority date: 2023-11-23
Filing date: 2023-11-24
Publication date: 2025-05-30
Anticipated expiration: 2026-05-23
Also published as: CN117557579A

Abstract

The present invention pertains to the technical field of digital image processing. Disclosed are an unsupervised super-pixel segmentation method and system assisted by collaboration between an atrous pyramid and an attention mechanism. The method comprises: combining image RGB channel information with position information of a pixel point; constructing a channel attention module by using an attention mechanism; processing the result of the attention mechanism by using atrous spatial pyramid pooling; constructing a loss function, constructing a clustering loss term, and, by using a spatial smoothing loss term, constructing a reconstruction loss term; updating model parameters by using an Adam optimizer, and using for super-pixel generation an effective depth feature obtained from the last separation; and obtaining the maximum value of a channel dimension by using an argmax function, converting a processing result of the argmax function into a two-dimensional array, and, on the basis of a defining condition, completing adaptive super-pixel segmentation in a CPU. According to the present invention, the complexity is low, the adaptive and generalization capabilities are high, and effective support is provided for improving image processing efficiency and accuracy.

Description

A hollow pyramid collaborative attention mechanism to assist unsupervised superpixel segmentation method and system

Technical Field

本发明属于数字图像处理技术领域，尤其涉及一种空洞金字塔协同注意力机制助力无监督超像素分割方法及系统。The present invention belongs to the technical field of digital image processing, and in particular relates to a method and system for unsupervised superpixel segmentation assisted by a hollow pyramid collaborative attention mechanism.

Background Art

超像素分割是提高图像处理效率和精确度的有效手段，深度学习有监督超像素分割方法依赖于大量标记数据，存在数据偏差导致分割结果不准确、分割模型泛化能力不足的问题。传统基于能量优化、分水岭、图和聚类的超像素分割方法依赖于合适的参数选择，无法根据图像自身特点自适应确定超像素数目，存在对噪声敏感问题，大尺寸图像带来高计算复杂度的问题。而无监督超像素分割方法不受标签数据限制、无需手动调整模型参数和结构，具有更好的泛化能力、避免噪声干扰以及不会因为图像尺寸变化带来高计算复杂度问题等优势，因此是有监督和传统超像素分割方法的重要替代。然而，利用无监督实现超像素分割需要解决两个关键问题，分别是有效深度特征提取和超像素生成。Superpixel segmentation is an effective means to improve the efficiency and accuracy of image processing. Superpixel segmentation methods based on deep learning rely on a large amount of labeled data, and there are problems such as data bias leading to inaccurate segmentation results and insufficient generalization of segmentation models. Traditional superpixel segmentation methods based on energy optimization, watershed, graph and clustering rely on appropriate parameter selection, and cannot adaptively determine the number of superpixels according to the characteristics of the image itself. They are sensitive to noise, and large-size images bring high computational complexity. Unsupervised superpixel segmentation methods are not limited by labeled data, do not need to manually adjust model parameters and structures, have better generalization ability, avoid noise interference, and will not bring high computational complexity due to changes in image size. Therefore, they are an important alternative to supervised and traditional superpixel segmentation methods. However, the use of unsupervised superpixel segmentation requires solving two key problems, namely effective deep feature extraction and superpixel generation.

常用的特征提取主要分为手工设计特征和卷积神经网络提取深度特征，前者通常是将图像的RGB通道转化为LAB通道表示，再结合图像中像素点的位置信息生成五维特征用于后续超像素生成。后者通过构建的卷积模型自动学习深度特征。两种方法均能实现对图像中有用特征的提取，手工设计特征依赖研究人员对问题的领域知识和经验，卷积神经网络提取深度特征依赖卷积模型的构建以及特征提取维度的选择。然而，手工设计特征对复杂图像的超像素分割存在特征不足的问题，卷积神经网络提取深度特征依赖于构建的卷积模型，简单的卷积模型不能提取有用特征、复杂的卷积模型存在训练难度大的问题。Commonly used feature extraction methods are mainly divided into manually designed features and deep features extracted by convolutional neural networks. The former usually converts the RGB channel of the image into LAB channel representation, and then combines the position information of the pixels in the image to generate five-dimensional features for subsequent superpixel generation. The latter automatically learns deep features through the constructed convolutional model. Both methods can extract useful features from images. Manually designed features rely on the researchers' domain knowledge and experience of the problem, while convolutional neural networks extract deep features relying on the construction of convolutional models and the selection of feature extraction dimensions. However, manually designed features have the problem of insufficient features for superpixel segmentation of complex images, and convolutional neural networks extract deep features relying on the constructed convolutional models. Simple convolutional models cannot extract useful features, and complex convolutional models have the problem of high training difficulty.

超像素生成主要是指利用特征提取的有效信息将原始图像分割成连续、紧凑且有相似特征的小区域。常用的方法有分水岭、图以及聚类。在分水岭技术中，通常将暗区域视为谷、较亮的区域视为脊。由特定像素的灰度值或梯度大小定义脊的高度值，从而实现超像素生成，但存在不可训练的问题。在图的技术中，将像素视为节点、边权由相邻像素的相似度定义，但对于大尺寸图像处理困难且超像素分割性能高度依赖合并规则、相似性度量值以及超像素数目这类参数的选择。在聚类技术中，不需要额外的标签就可快速实现超像素生成，同时已经开发出可训练的聚类算法（典型的为可微的K-means聚类），但需要多次迭代才有好的效果，增加了整体方法的计算代价与时间开销，其余聚类技术普遍存在计算复杂度高的问题。Superpixel generation mainly refers to using the effective information of feature extraction to divide the original image into continuous, compact and small areas with similar features. Common methods include watershed, graph and clustering. In watershed technology, dark areas are usually regarded as valleys and brighter areas as ridges. The height value of the ridge is defined by the gray value or gradient size of a specific pixel, thereby realizing superpixel generation, but there is a problem of untrainability. In graph technology, pixels are regarded as nodes and edge weights are defined by the similarity of adjacent pixels, but it is difficult to process large-size images and the superpixel segmentation performance is highly dependent on the selection of parameters such as merging rules, similarity metrics and the number of superpixels. In clustering technology, superpixel generation can be quickly realized without additional labels. At the same time, trainable clustering algorithms have been developed (typically differentiable K-means clustering), but multiple iterations are required to achieve good results, which increases the computational cost and time overhead of the overall method. Other clustering technologies generally have the problem of high computational complexity.

Technical issues

通过上述分析，现有技术存在的问题及缺陷为：现有超像素分割方法泛化能力不足、高复杂度、有效特征信息难获取和缺乏自适应能力。Through the above analysis, the problems and defects of the existing technology are: the existing superpixel segmentation method has insufficient generalization ability, high complexity, difficulty in obtaining effective feature information and lack of adaptability.

Technical Solutions

针对现有技术存在的问题，本发明提供了一种空洞金字塔协同注意力机制助力无监督超像素分割方法及系统。In view of the problems existing in the prior art, the present invention provides a hollow pyramid collaborative attention mechanism to assist unsupervised superpixel segmentation method and system.

本发明是这样实现的，首先对图像进行预处理引入像素点之间的空间关系，并首次将注意力机制运用于超像素分割任务使其模型加强对重要特征通道的关注，抑制对无关通道的响应；利用空洞空间金字塔池化在减少参数的同时扩大感受野，结合构建的损失函数使用优化器实现参数更新以及最终有效深度特征的提取，将argmax函数运用于提取的有效深度特征便可以将超像素分割任务转化为分类问题，通过添加大小限定条件实现最终的自适应超像素生成。这样可以避免使用聚类算法且用少量参数便可以提取有效深度特征，从而极大降低超像素分割算法复杂度。所提出的超像素分割方法是无监督的，因此具有很强的可迁移性。The present invention is implemented in the following way: first, the image is preprocessed to introduce the spatial relationship between pixels, and the attention mechanism is applied to the superpixel segmentation task for the first time to make its model pay more attention to important feature channels and suppress the response to irrelevant channels; the hollow space pyramid pooling is used to reduce the parameters while expanding the receptive field, and the optimized function is used to update the parameters and extract the final effective depth features in combination with the constructed loss function. The argmax function is applied to the extracted effective depth features to convert the superpixel segmentation task into a classification problem, and the final adaptive superpixel generation is achieved by adding size limiting conditions. In this way, the use of clustering algorithms can be avoided and effective depth features can be extracted with a small number of parameters, thereby greatly reducing the complexity of the superpixel segmentation algorithm. The proposed superpixel segmentation method is unsupervised and therefore has strong transferability.

进一步，所述注意力机制协同空洞空间金字塔池化促进无监督超像素分割方法包括以下步骤：Furthermore, the attention mechanism cooperates with the dilated space pyramid pooling to promote the unsupervised superpixel segmentation method, which includes the following steps:

步骤一，将图像RGB通道信息同像素点的位置信息结合，将三维特征转化为五维特征；首先把需要进行超像素分割的图像赋给图像预处理中的变量，并将数组形式表达的变量改用张量形式表达，利用permute函数将维度重新排列为并将数据类型改为浮点型，通过None操作在外部增加一个批处理维度得到变量image，其形状为；使用torch.arange函数生成高度、宽度序列，结合torch.meshgrid函数将两个序列转化为两个坐标网格并用torch.stack函数进行堆叠，将得到的结果与张量在通道维度进行连接并进行标准化得到最终图像预处理的结果。 Step 1: Combine the image RGB channel information with the pixel position information to convert the three-dimensional features into five-dimensional features. First, assign the image to be segmented into superpixels to the variables in image preprocessing, and change the variables expressed in array form to tensor form. Use the permute function to rearrange the dimensions into And change the data type to floating point type, and add a batch dimension externally through the None operation to get the variable image, whose shape is ; Use the torch.arange function to generate height and width sequences, combine the torch.meshgrid function to convert the two sequences into two coordinate grids and stack them using the torch.stack function, and compare the results to the tensor The final image preprocessing result is obtained by concatenating and normalizing in the channel dimension.

步骤二，利用注意力机制构建通道注意力模块；将图像预处理结果应用于逐点卷积层得到形状为的张量，然后利用通道全局平均池化进行处理得到聚合特征，其结果经过自动计算内核大小为的快速一维卷积与sigmoid函数处理，并与逐点卷积得到的张量进行元素乘积获得注意力机制的处理结果。并使用Kaiming初始化方法对其中涉及的逐点卷积与快速一维卷积的权重进行初始化，使用常数值0初始化偏置；对于实例归一化层，使用常数值1初始化归一化的权重，从而保证训练时有合适的初始参数值。 Step 2: Use the attention mechanism to build a channel attention module; apply the image preprocessing results to the point-by-point convolution layer to obtain a shape of The tensor is then processed using channel global average pooling to obtain aggregate features. The result is automatically calculated with a kernel size of The fast one-dimensional convolution and sigmoid function are processed, and the tensor obtained by point-by-point convolution is multiplied by elements to obtain the processing result of the attention mechanism. The Kaiming initialization method is used to initialize the weights of the point-by-point convolution and fast one-dimensional convolution involved, and the bias is initialized with a constant value of 0; for the instance normalization layer, the normalized weights are initialized with a constant value of 1 to ensure that there are appropriate initial parameter values during training.

步骤三，利用空洞空间金字塔池化对注意力机制的结果进行处理，提取出适合超像素分割的深度特征；将通道注意力模块处理得到的张量分别通过卷积大小为、填充大小为0、采样率为1的卷积层；卷积核大小为、填充大小为2、采样率为2的卷积层；卷积核大小为、填充大小为4、采样率为4的卷积层；卷积核大小为、填充大小为6、采样率为6的卷积层；将输入张量的大小调整为，然后进行步长为1的卷积，并应用ReLU激活函数的操作；将所有操作的输出通道都设置为16，并利用实例归一化对每个输出通道进行归一化操作，将每个输出通道的结果在通道维度进行拼接得到形状为的张量；拼接后的张量使用卷积核大小为的卷积层、实例归一化层以及ReLU函数进行处理得到最终形状为的深度特征张量；并使用与注意力机制相同的初始化方式对空洞空间金字塔池化中卷积层的权重和偏置以及实例归一化层进行初始化。 Step 3: Use dilated spatial pyramid pooling to process the results of the attention mechanism and extract deep features suitable for superpixel segmentation. The tensors processed by the channel attention module are respectively processed by convolution with a size of , fill size is 0, sampling rate is 1 Convolutional layer; the convolution kernel size is , padding size is 2, sampling rate is 2 Convolutional layer; the convolution kernel size is , padding size is 4, sampling rate is 4 Convolutional layer; the convolution kernel size is , padding size is 6, sampling rate is 6 Convolutional layer; resizes the input tensor to , and then perform a step of 1 Convolution and applying the ReLU activation function Operation; set the output channels of all operations to 16, and use instance normalization to normalize each output channel, and concatenate the results of each output channel in the channel dimension to obtain a shape of The concatenated tensor uses a convolution kernel size of The convolution layer, instance normalization layer and ReLU function are processed to obtain the final shape The deep feature tensor of is obtained by , and the weights and biases of the convolutional layers and the instance normalization layers in the dilated spatial pyramid pooling are initialized using the same initialization method as the attention mechanism.

步骤四，构建损失函数，先构建聚类损失项；同时利用空间平滑损失项量化相邻像素之间的差异；再构建重构损失项；将空洞空间金字塔池化得到的深度特征分离成一个形状为的张量用于计算聚类损失项以及空间平滑损失项，另外一个形状为的张量用于计算重构损失项；利用softmax函数对张量的通道维度进行处理，使其转化为对应的类别概率。并计算每个像素的负对数似然并取所有像素损失值的平均得到最终损失值和每个样本对每个类别的平均概率估计用于构建聚类损失项。再次利用softmax函数对张量的通道维度进行处理得到对应的类别概率并将其视为概率图，分别计算概率图和变量image在W维度每个元素与其右边相邻元素之间的差值和H维度每个元素与其下边相邻元素的差值得到概率图和变量image在水平和垂直方向上的梯度，将计算得到的四个梯度用于构建空间平滑损失项。将形状为的张量与变量image，运用PyTorch中计算均方误差函数度量两个张量之间的差异，从而确定损失函数中的重构损失项。 Step 4: Construct the loss function. First, construct the clustering loss term. At the same time, use the spatial smoothing loss term to quantify the difference between adjacent pixels. Then construct the reconstruction loss term. Separate the deep features obtained by the hollow space pyramid pooling into a shape of A tensor of is used to calculate the clustering loss term and the spatial smoothing loss term, and another shape is The tensor is used to calculate the reconstruction loss term; the softmax function is used to calculate the tensor The channel dimension of is processed to convert it into the corresponding category probability. The negative log-likelihood of each pixel is calculated and the average of all pixel loss values is taken to obtain the final loss value and the average probability estimate of each sample for each category is used to construct the clustering loss term. The softmax function is used again to transform the tensor The channel dimension of is processed to obtain the corresponding category probability and regard it as a probability map. The difference between each element of the probability map and the variable image in the W dimension and its adjacent element to the right and the difference between each element of the H dimension and its adjacent element below are calculated to obtain the gradient of the probability map and the variable image in the horizontal and vertical directions. The four calculated gradients are used to construct the spatial smoothing loss term. The tensor and the variable image are used to calculate the mean square error function in PyTorch to measure the difference between the two tensors, thereby determining the reconstruction loss term in the loss function.

步骤五，通过设置Adam优化器的学习率与迭代次数进行模型的参数更新，寻找满足损失函数的最小化的模型参数；通过定义的optimize函数进行迭代，并选用Adam优化器更新模型参数，将迭代次数设置为500次，学习率设置为为；并将损失函数中聚类损失项中定义的系数设定为常数值2，将空间平滑损失项权值与重构损失项权值分别设定为常数值2和常数值10，完成对整体损失函数的构建。以此使模型可以在设定的迭代次数内自动寻找合适的参数使得整体损失函数最小化。 Step 5: Update the model parameters by setting the learning rate and number of iterations of the Adam optimizer to find the model parameters that minimize the loss function; iterate through the defined optimize function and use the Adam optimizer to update the model parameters, set the number of iterations to 500, and the learning rate to The coefficient defined in the clustering loss term in the loss function is set to a constant value of 2, and the weight of the spatial smoothing loss term and the weight of the reconstruction loss term are set to constant values of 2 and 10 respectively, completing the construction of the overall loss function. In this way, the model can automatically find appropriate parameters within the set number of iterations to minimize the overall loss function.

步骤六，利用argmax函数获得通道维度的最大值，将最终的有效深度特征转化为每个像素点对应一个最有的超像素标签索引；将argmax函数处理结果转为二维数组并在CPU中根据限定条件完成自适应超像素分割。Step 6: Use the argmax function to obtain the maximum value of the channel dimension, and convert the final effective depth feature into an optimal superpixel label index corresponding to each pixel point; convert the processing result of the argmax function into a two-dimensional array and complete the adaptive superpixel segmentation in the CPU according to the limited conditions.

进一步，步骤一的利用下式进行图像预处理：Furthermore, in step 1, the image is preprocessed using the following formula:

式中 , , 表示图像中第k个像素点从整数型转化为浮点型的颜色通道值，式中 , 表示图像中第k个像素点所处的行列数，表示图像中第k个像素点的五维特征并应用与后续处理。 In the formula , , Indicates the color channel value of the k-th pixel in the image converted from integer to floating point, where , Indicates the row and column number of the k-th pixel in the image. Represents the five-dimensional features of the k-th pixel in the image and applies them to subsequent processing.

进一步，步骤二的构建方法具体包括：Further, the construction method of step 2 specifically includes:

步骤21，将预处理后的五维特征应用于一个逐点卷积层，在保持高度和宽度不变的情况下，将五维特征进行线性组合和变换实现八维特征输出；Step 21, applying the preprocessed five-dimensional features to a point-by-point convolutional layer, and linearly combining and transforming the five-dimensional features to achieve eight-dimensional feature output while keeping the height and width unchanged;

步骤22，计算通道全局平均池化对的处理得到聚合特征 Step 22, calculate the channel global average pooling pair The aggregated features are obtained by

步骤23，计算聚合特征经内核大小为L的快速一维卷积：Step 23, calculate the aggregated features through a fast one-dimensional convolution with a kernel size of L:

式中，为sigmoid函数、为内核大小为L的快速一维卷积、为学习到的通道权重；根据通道数自动计算内核大小L： In the formula, is the sigmoid function, is a fast one-dimensional convolution with kernel size L, is the learned channel weight; the kernel size L is automatically calculated based on the number of channels:

式中，，定义为与t最接近的奇数。 In the formula , , Defined as the odd integer closest to t.

步骤24，将聚合特征经内核大小为L的快速一维卷积与sigmoid函数处理结果表示为，并与进行元素乘积得到注意力机制处理结果表示为，形状与输入注意力机制的八维特征保持一致； Step 24, the aggregated features are processed by fast one-dimensional convolution with kernel size L and sigmoid function, and the result is expressed as , and Perform element-wise multiplication to obtain the attention mechanism processing result expressed as , the shape is consistent with the eight-dimensional features of the input attention mechanism;

。 .

进一步，八维特征输出表示为并直接应用与通道全局平均池化，其中H表示图像高度、W表示图像宽度、C表示图像的通道数，此时的C=8。 Furthermore, the eight-dimensional feature output is expressed as And directly apply channel global average pooling, where H represents the image height, W represents the image width, and C represents the number of channels of the image. In this case, C=8.

进一步，步骤三中构建通道注意力模块具体包括：Furthermore, constructing the channel attention module in step 3 specifically includes:

步骤31，将空洞空间金字塔池化处理后得到的深度特征表示为，式中H与W保持不变依然对应图像的高度与宽度，；中间卷积层计算过程如下： Step 31, the depth feature obtained after the hollow space pyramid pooling process is expressed as , where H and W remain unchanged and still correspond to the height and width of the image, ; The calculation process of the intermediate convolutional layer is as follows:

式中，表示卷积核大小为、填充大小为0、采样率为1的卷积层；表示卷积核大小为、填充大小为2、采样率为2的卷积层；表示卷积核大小为、填充大小为4、采样率为4的卷积层；表示卷积核大小为、填充大小为6、采样率为6的卷积层；表示利用自适应平均池化层将输入张量的大小调整为；进行步长为1的卷积，并使其输入通道为8、输出通道为16，采用实例归一化对每个输出通道进行归一化操作，最后应用ReLU激活函数引入非线性；、、、以及为中间张量； In the formula, Indicates that the convolution kernel size is , convolutional layer with padding size 0 and sampling rate 1; Indicates that the convolution kernel size is , convolutional layer with padding size 2 and sampling rate 2; Indicates that the convolution kernel size is , convolutional layer with padding size 4 and sampling rate 4; Indicates that the convolution kernel size is , convolutional layers with padding size 6 and sampling rate 6; Indicates that the size of the input tensor is adjusted to ; Perform a step of 1 Convolution is performed with 8 input channels and 16 output channels. Each output channel is normalized using instance normalization, and the ReLU activation function is applied to introduce nonlinearity. , , , as well as is the intermediate tensor;

步骤32，深度特征计算如下： Step 32, deep features The calculation is as follows:

式中，表示将、、、和在通道维度上进行拼接从而形成一个更大的张量，、、、和输出通道都为16，拼接后的张量通道数为80；式中的表示将拼接得到的张量进行卷积，输出通道设置为适合超像素分割要求的128；采用实例归一化对每个输出通道进行归一化操作；最后应用ReLU激活函数引入非线性。 In the formula, Indicates that , , , and Concatenate in the channel dimension to form a larger tensor, , , , and The output channels are all 16, and the number of channels of the spliced tensor is 80; Indicates that the concatenated tensor is Convolution, the output channel is set to 128 suitable for superpixel segmentation requirements; instance normalization is used to normalize each output channel; finally, the ReLU activation function is applied to introduce nonlinearity.

进一步，步骤四构建构建损失函数具体包括：Furthermore, step 4 constructs the loss function, which specifically includes:

步骤41，整体损失函数由聚类损失项、空间平滑损失项与重构损失项三部分组成：Step 41, the overall loss function consists of three parts: clustering loss term, spatial smoothing loss term and reconstruction loss term:

式中，表示整体的损失函数；表示聚类损失项；表示空间平滑损失项；表示重构损失项；与都为常系数且、。 In the formula, Represents the overall loss function; represents the clustering loss term; represents the spatial smoothing loss term; represents the reconstruction loss term; and are constant coefficients and , .

步骤42，聚类损失项计算方式如下:Step 42, the clustering loss term is calculated as follows:

式中，；表示所有像素上类别概率向量的平均值；表示位于i行j列的像素的类别概率向量，计算方法如下： In the formula, ; Represents the average value of the class probability vector over all pixels; Represents the category probability vector of the pixel located at row i and column j, which is calculated as follows:

步骤43，将中的通道维度中的前三个通道分离用于之后重构损失项计算，剩余125个通道的特征用进行表示，确定取整，表示利用softmax函数将的通道维度转化为对应像素的类别概率； Step 43: The first three channels in the channel dimension in are separated for subsequent reconstruction loss calculation, and the features of the remaining 125 channels are used To indicate, confirm Round off, Indicates that the softmax function is used to The channel dimension is converted into the category probability of the corresponding pixel;

步骤44，计算空间平滑损失项分为两部分：方向和方向上的平滑性损失，将原始输入图像进行维度重排得到图像；通过计算概率差值的绝对值和图像梯度的平方差的指数函数定义，并计算所有像素的平均空间平滑项损失，具体计算方式如下： Step 44, the calculation of the spatial smoothing loss term is divided into two parts: Direction and The smoothness loss in the direction is to rearrange the original input image to obtain the image ; By calculating the absolute value of the probability difference and the image The exponential function of the squared difference of the gradient is defined, and the average spatial smoothness loss of all pixels is calculated as follows:

式中，表示通道和聚类损失一致；表示为方向上的像素概率差值；表示为方向上的像素强度差值；表示为方向上的像素概率差值；表示为方向上的像素强度差值；具体计算方式如下： In the formula, Indicates that the channel and clustering losses are consistent; Expressed as Pixel probability difference in direction; Expressed as The pixel intensity difference in direction; Expressed as Pixel probability difference in direction; Expressed as The pixel intensity difference in the direction; the specific calculation method is as follows:

步骤45，重构损失项计算方式如下:Step 45, the reconstruction loss term is calculated as follows:

在计算聚类损失项与空间平滑损失项时，从深度特征中分离的前三个通道用于图像重建并表示为；表示选用2-范数。 When calculating the clustering loss and spatial smoothness loss, the first three channels separated from the deep features are used for image reconstruction and expressed as ; Indicates the use of 2-norm.

进一步，步骤44原始输入图像维度为，图像维度为。 Furthermore, in step 44, the original input image dimension is ,image The dimension is .

进一步，步骤五中在最小化的模型参数下提取深度特征，该深度特征为有效深度特征，将最后一次分离得到的有效深度特征用于超像素生成。Furthermore, in step five, a depth feature is extracted under the minimized model parameters, and the depth feature is an effective depth feature. The effective depth feature obtained by the last separation is used for superpixel generation.

进一步，步骤六中对超像素生成的大小限定条件计算如下：Furthermore, the size limit condition for superpixel generation in step 6 is calculated as follows:

式中，表示理想超像素的平均大小；表示图像中像素的总数；与分别为限定超像素最小和最大的阈值，用于过滤太小或太大的超像素使最终生成的超像素大小尽量均匀；式中的超参数。 In the formula, represents the average size of an ideal superpixel; Represents the total number of pixels in the image; and are the minimum and maximum thresholds for limiting superpixels, respectively, and are used to filter superpixels that are too small or too large to make the size of the final generated superpixels as uniform as possible; the hyperparameters in the formula .

本发明的另一目的在于提供一种注意力机制协同空洞空间金字塔池化促进无监督超像素分割方法的注意力机制协同空洞空间金字塔池化促进无监督超像素分割系统，该系统包括：Another object of the present invention is to provide an attention mechanism and a hollow space pyramid pooling method to promote unsupervised superpixel segmentation system, the system comprising:

图像预处理模块，用于将图像RGB通道信息同像素点的位置信息结合，将三维特征转化为五维特征；Image preprocessing module, used to combine the image RGB channel information with the pixel position information to convert the three-dimensional features into five-dimensional features;

注意力机制模块，用于利用注意力机制构建通道注意力模块；Attention mechanism module, used to build channel attention module using attention mechanism;

空洞空间金字塔池化模块，用于利用空洞空间金字塔池化对注意力机制的结果进行处理，提取出适合超像素分割的深度特征；The dilated spatial pyramid pooling module is used to process the results of the attention mechanism using dilated spatial pyramid pooling to extract deep features suitable for superpixel segmentation;

损失函数构建模块，用于构建损失函数，先构建聚类损失项；同时利用空间平滑损失项量化相邻像素之间的差异；再构建重构损失项；The loss function construction module is used to construct the loss function. First, the clustering loss term is constructed; at the same time, the spatial smoothing loss term is used to quantify the difference between adjacent pixels; and then the reconstruction loss term is constructed;

参数更新模块，用于通过设置Adam优化器的学习率与迭代次数进行模型的参数更新；The parameter update module is used to update the model parameters by setting the learning rate and number of iterations of the Adam optimizer;

超像素分割模块，用于将argmax函数处理结果转为二维数组并在CPU中根据限定条件完成自适应超像素分割。The superpixel segmentation module is used to convert the processing result of the argmax function into a two-dimensional array and complete the adaptive superpixel segmentation in the CPU according to the limited conditions.

Beneficial Effects

结合上述的技术方案和解决的技术问题，本发明所要保护的技术方案所具备的优点及积极效果为：In combination with the above technical solutions and the technical problems solved, the advantages and positive effects of the technical solutions to be protected by the present invention are as follows:

第一，针对上述现有技术存在的技术问题以及解决该问题的难度，紧密结合本发明的所要保护的技术方案以及研发过程中结果和数据等，详细、深刻地分析本发明技术方案如何解决的技术问题，解决问题之后带来的一些具备创造性的技术效果。具体描述如下：First, in view of the technical problems existing in the above-mentioned prior art and the difficulty of solving the problems, the technical solutions to be protected by the present invention and the results and data during the research and development process are closely combined to analyze in detail and deeply how the technical solutions of the present invention solve the technical problems, and some creative technical effects brought about after solving the problems. The specific description is as follows:

本发明提供了一种注意力机制协同空洞空间金字塔池化促进无监督超像素分割方法，通过图像预处理、注意力机制协同空洞空间金字塔池化、损失函数与Adam优化器参数更新实现有效深度特征的提取，并将超像素分割转化为分类问题实现自适应的超像素生成。解决了现有超像素分割方法泛化能力不足、高复杂度、有效特征信息难获取和缺乏自适应能力的问题。The present invention provides an attention mechanism and hollow space pyramid pooling to promote unsupervised superpixel segmentation method, which realizes the extraction of effective deep features through image preprocessing, attention mechanism and hollow space pyramid pooling, loss function and Adam optimizer parameter update, and converts superpixel segmentation into a classification problem to realize adaptive superpixel generation. The existing superpixel segmentation method solves the problems of insufficient generalization ability, high complexity, difficulty in obtaining effective feature information and lack of adaptive ability.

第二，把技术方案看做一个整体或者从产品的角度，本发明所要保护的技术方案具备的技术效果和优点，具体描述如下：Second, considering the technical solution as a whole or from the perspective of the product, the technical effects and advantages of the technical solution to be protected by the present invention are described in detail as follows:

图像预处理：利用图像中像素位置的坐标作为附加特征有助于提高超像素分割方法的性能，可使得模型更好地捕获图像中的空间结构与像素之间的关系。图像中每个像素点对应所在的行列数可以生成坐标信息，将生成的坐标信息附加给原始图像的每个像素值上便可创建包含五个通道的新图像。将具有五维特征的新图像输入深度特征提取网络中，可以从这些特征中更好地理解图像，提取更适合超像素分割的深度特征。Image preprocessing: Using the coordinates of the pixel positions in the image as additional features helps improve the performance of the superpixel segmentation method, allowing the model to better capture the relationship between the spatial structure and pixels in the image. The row and column numbers corresponding to each pixel in the image can generate coordinate information, and the generated coordinate information can be attached to each pixel value of the original image to create a new image with five channels. By inputting the new image with five-dimensional features into the deep feature extraction network, we can better understand the image from these features and extract deep features that are more suitable for superpixel segmentation.

注意力机制：在此超像素分割方法中，利用自适应选择一维卷积核大小的方法确定局部跨通道交互的覆盖范围仅用少量参数实现高效通道注意，可在提升超像素分割性能的同时权衡其带来的复杂性。Attention mechanism: In this superpixel segmentation method, the coverage of local cross-channel interactions is determined by adaptively selecting the size of the one-dimensional convolution kernel. Efficient channel attention is achieved with only a small number of parameters, which can improve the superpixel segmentation performance while weighing the complexity it brings.

空洞空间金字塔池化：在超像素分割任务中，需要对每个像素进行分类以确定其属于哪个超像素类别。传统的卷积层只能捕捉有限范围内的信息，因此需要对不同尺度和上下文信息进行建模。空洞空间金字塔池化通过在输入图像上引入膨胀率来扩大感受野，在不增加参数数量的同时可捕捉更大范围内的信息。构建的空洞空间金字塔池化可实现从不同复杂度图像都能提取到适用于超像素分割任务的深度特征。Atrous spatial pyramid pooling: In superpixel segmentation tasks, each pixel needs to be classified to determine which superpixel category it belongs to. Traditional convolutional layers can only capture information within a limited range, so it is necessary to model information at different scales and contexts. Atrous spatial pyramid pooling expands the receptive field by introducing a dilation rate on the input image, which can capture information in a wider range without increasing the number of parameters. The constructed atrous spatial pyramid pooling can extract deep features suitable for superpixel segmentation tasks from images of different complexities.

损失函数：为保证提取到的深度特征来源于图像自身，通过在损失函数中引入了重构损失项使得深度特征提取被迫生成与原始图像相匹配的输出，从而可以还原出原始输入图像。构建的聚类损失项是一种基于熵的聚类成本，类似于正则化信息最大化的互信息项。通过最大化像素点在深度特征表征下的互信息来实现聚类，同时通过正则化项控制超像素分割方法的复杂性，避免出现过拟合。构建的空间平滑损失项是图像处理任务的主要先验，可以量化相邻像素之间的差异，在保证超像素分割方法生成平滑输出的同时，还挺高提高超像素分割方法的泛化能力、防止过拟合。Loss function: To ensure that the extracted deep features come from the image itself, a reconstruction loss term is introduced into the loss function so that the deep feature extraction is forced to generate outputs that match the original image, thereby restoring the original input image. The constructed clustering loss term is an entropy-based clustering cost, similar to the mutual information term of maximizing regularized information. Clustering is achieved by maximizing the mutual information of pixels under the deep feature representation, and the complexity of the superpixel segmentation method is controlled by the regularization term to avoid overfitting. The constructed spatial smoothness loss term is the main prior for image processing tasks, which can quantify the differences between adjacent pixels. While ensuring that the superpixel segmentation method generates smooth outputs, it also improves the generalization ability of the superpixel segmentation method and prevents overfitting.

参数更新及超像素生成：为实现模型参数更新选用Adam优化器进行梯度下降优化，通过计算梯度的一阶矩估计和二阶矩估计为不同的参数设计独立的自适应性学习率。并且参数的更新不受梯度伸缩变换的影响，可以克服梯度存在很大的噪声问题，使得模型参数更新更加简单高效。通过argmax函数将最终的有效深度特征转化为每个像素点对应一个最有的超像素标签索引，这样处理可以将超像素分割问题转化为分类问题，结合限定条件就可快速完成自适应超像素分割，避免使用复杂的聚类算法。Parameter update and superpixel generation: To achieve model parameter update, the Adam optimizer is used for gradient descent optimization. By calculating the first-order moment estimate and second-order moment estimate of the gradient, independent adaptive learning rates are designed for different parameters. In addition, the parameter update is not affected by the gradient scaling transformation, which can overcome the large noise problem of the gradient, making the model parameter update simpler and more efficient. The final effective depth feature is converted into a superpixel label index corresponding to each pixel through the argmax function. This process can convert the superpixel segmentation problem into a classification problem. Combined with the limited conditions, the adaptive superpixel segmentation can be completed quickly, avoiding the use of complex clustering algorithms.

第三，本发明的技术方案转化后的预期收益和商业价值为：可应用于遥感图像处理，极大加快其处理速度。Third, the expected benefits and commercial value of the technical solution of the present invention after transformation are: it can be applied to remote sensing image processing, greatly accelerating its processing speed.

本发明的技术方案克服了技术偏见：传统超像素分割方法多使用聚类算法实现，本发明将超像素分割转化为分类任务，从而避免使用聚类算法。使得该超像素分割方法对图像尺寸变化具有很强适应能力。The technical solution of the present invention overcomes technical prejudice: traditional superpixel segmentation methods are mostly implemented using clustering algorithms, while the present invention converts superpixel segmentation into a classification task, thereby avoiding the use of clustering algorithms, making the superpixel segmentation method highly adaptable to changes in image size.

第四，本发明提供的无监督超像素分割方法带来的显著技术进步主要体现在以下几个方面：Fourth, the significant technical progress brought about by the unsupervised superpixel segmentation method provided by the present invention is mainly reflected in the following aspects:

1）增强的特征提取能力：1) Enhanced feature extraction capabilities:

通过结合注意力机制和空洞空间金字塔池化，该方法能够更有效地提取图像的深度特征。注意力机制的引入使模型能够更加关注重要的特征通道，从而提高特征提取的准确性。By combining the attention mechanism and dilated spatial pyramid pooling, this method can more effectively extract deep features of images. The introduction of the attention mechanism enables the model to pay more attention to important feature channels, thereby improving the accuracy of feature extraction.

2）改善的超像素分割质量：2) Improved superpixel segmentation quality:

传统的超像素分割方法会忽略一些细节信息，而该方法通过结合空间关系和深度特征，可以更好地保留图像的细节和结构，从而生成更准确、更细致的超像素分割结果。Traditional superpixel segmentation methods ignore some detail information, while this method can better preserve the details and structure of the image by combining spatial relationships and depth features, thereby generating more accurate and detailed superpixel segmentation results.

3）灵活性和适应性的提升：3) Improved flexibility and adaptability:

该方法通过自适应地生成超像素，可以灵活地适应不同类型和质量的图像，这在处理多样化的图像数据集时尤其重要。This method can flexibly adapt to images of different types and qualities by adaptively generating superpixels, which is especially important when dealing with diverse image datasets.

4）参数优化与计算效率：4) Parameter optimization and computational efficiency:

使用Adam优化器和特定的损失函数结合，可以更有效地优化模型参数，降低了计算成本和时间。此外，空洞空间金字塔池化在扩大感受野的同时减少了参数数量，这进一步提高了计算效率。Using the Adam optimizer in combination with a specific loss function can more effectively optimize model parameters, reducing computational cost and time. In addition, the atrous spatial pyramid pooling reduces the number of parameters while expanding the receptive field, which further improves computational efficiency.

5）广泛的应用潜力：5) Wide application potential:

这种方法不仅适用于标准的图像处理任务，还可以扩展到其他领域，如医学图像分析、机器视觉、图像识别等，显示出广泛的应用潜力。This method is not only applicable to standard image processing tasks, but can also be extended to other fields such as medical image analysis, machine vision, image recognition, etc., showing a wide range of application potential.

总的来说，本发明提供的方法通过其创新的技术组合和优化策略，在超像素分割领域实现了显著的技术进步，提高了分割质量，扩展了应用范围，并提高了处理效率。In general, the method provided by the present invention has achieved significant technological progress in the field of superpixel segmentation, improved segmentation quality, expanded the scope of application, and improved processing efficiency through its innovative technical combination and optimization strategy.

BRIEF DESCRIPTION OF THE DRAWINGS

为了更清楚地说明本发明实施例的技术方案，下面将对本发明实施例中所需要使用的附图做简单的介绍，显而易见地，下面所描述的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the following briefly introduces the drawings required for use in the embodiments of the present invention. Obviously, the drawings described below are only some embodiments of the present invention. For ordinary technicians in this field, other drawings can be obtained based on these drawings without creative work.

图1是本发明实施例提供的注意力机制协同空洞空间金字塔池化促进无监督超像素分割方法流程图；FIG1 is a flow chart of a method for promoting unsupervised superpixel segmentation by using an attention mechanism in collaboration with a dilated spatial pyramid pooling according to an embodiment of the present invention;

图2是本发明实施例提供的注意力机制协同空洞空间金字塔池化促进无监督超像素分割系统结构图；FIG2 is a structural diagram of a system for promoting unsupervised superpixel segmentation by using an attention mechanism in collaboration with a dilated spatial pyramid pooling according to an embodiment of the present invention;

图3是本发明实施例提供的构建的注意力机制图；FIG3 is a diagram of an attention mechanism constructed according to an embodiment of the present invention;

图4是本发明实施例提供的构建的空洞空间金字塔池化图；FIG4 is a hollow space pyramid pooling diagram constructed according to an embodiment of the present invention;

图5是本发明实施例提供的超像素分割效果图；FIG5 is a diagram showing the effect of superpixel segmentation provided by an embodiment of the present invention;

图6是本发明效果对比图；A，真实标签；B，SLIC；C，算法2；D，本发明超像素分割结果。FIG6 is a comparison diagram of the effects of the present invention; A, true label; B, SLIC; C, algorithm 2; D, superpixel segmentation result of the present invention.

Embodiments of the present invention

为了使本发明的目的、技术方案及优点更加清楚明白，以下结合实施例，对本发明进行进一步详细说明。应当理解，此处所描述的具体实施例仅仅用以解释本发明，并不用于限定本发明。In order to make the purpose, technical solution and advantages of the present invention more clearly understood, the present invention is further described in detail below in conjunction with the embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention and are not used to limit the present invention.

本发明提供的基于注意力机制和空洞空间金字塔池化的无监督超像素分割方法。该方法主要包含六个步骤，具体来说：The present invention provides an unsupervised superpixel segmentation method based on attention mechanism and void space pyramid pooling. The method mainly includes six steps, specifically:

1）图像预处理与特征转化：1) Image preprocessing and feature conversion:

将图像的RGB通道信息与像素点的位置信息结合，将三维特征转化为五维特征。The RGB channel information of the image is combined with the position information of the pixel points to convert the three-dimensional features into five-dimensional features.

2）构建通道注意力模块：2) Construct channel attention module:

通过注意力机制，增强模型对重要特征通道的关注，抑制对无关通道的响应。Through the attention mechanism, the model's attention to important feature channels is enhanced, and responses to irrelevant channels are suppressed.

3）应用空洞空间金字塔池化：3) Apply atrous spatial pyramid pooling:

对注意力机制处理的结果应用空洞空间金字塔池化，以提取适合于超像素分割的深度特征。Atrous spatial pyramid pooling is applied to the results of attention mechanism processing to extract deep features suitable for superpixel segmentation.

4）构建与应用损失函数：4) Construct and apply loss function:

构建聚类损失项、空间平滑损失项（量化相邻像素间差异），以及重构损失项。Construct a clustering loss term, a spatial smoothness loss term (quantifying the differences between adjacent pixels), and a reconstruction loss term.

5）模型参数更新：5) Model parameter update:

通过设定Adam优化器的学习率和迭代次数进行模型的参数更新，以找到满足损失函数最小化的模型参数。The model parameters are updated by setting the learning rate and number of iterations of the Adam optimizer to find the model parameters that minimize the loss function.

6）超像素标签生成与自适应超像素分割：6) Superpixel label generation and adaptive superpixel segmentation:

使用argmax函数获得通道维度的最大值，将有效深度特征转换为每个像素点的超像素标签索引。将argmax的结果转换为二维数组，在CPU中根据限定条件完成自适应超像素分割。Use the argmax function to obtain the maximum value of the channel dimension and convert the effective depth feature into the superpixel label index of each pixel. Convert the result of argmax into a two-dimensional array and complete the adaptive superpixel segmentation in the CPU according to the limited conditions.

以下是本发明的两个具体的实施例和具体实现方案。这些实施例通常包括在该方法上的具体应用，例如特定类型的图像数据集（如自然场景图像、医学图像等）或特定的应用场景（如图像分割、目标跟踪等）。但是，由于你的描述没有提供足够的细节来确定这些实施例，我将提供两个的实施例：The following are two specific embodiments and specific implementations of the present invention. These embodiments generally include specific applications of the method, such as specific types of image datasets (such as natural scene images, medical images, etc.) or specific application scenarios (such as image segmentation, target tracking, etc.). However, since your description does not provide enough details to identify these embodiments, I will provide two embodiments:

应用实施例1：自然场景图像的超像素分割Application Example 1: Superpixel Segmentation of Natural Scene Images

- 在这个实施例中，方法可以应用于自然场景图像。通过分析图像中的颜色、纹理等特征，模型可以有效地将图像分割成具有相似特征的超像素群组。这在图像编辑和增强应用中非常有用。- In this embodiment, the method can be applied to natural scene images. By analyzing features such as color and texture in the image, the model can effectively segment the image into superpixel groups with similar features. This is very useful in image editing and enhancement applications.

应用实施例2：医学图像分析Application Example 2: Medical Image Analysis

- 在医学图像（如MRI或CT扫描）中，超像素分割可以用于识别和区分不同的组织类型或病变。这种方法可以帮助医生更准确地诊断和计划治疗。- In medical images, such as MRI or CT scans, superpixel segmentation can be used to identify and differentiate different tissue types or lesions. This approach can help doctors diagnose and plan treatment more accurately.

具体实现方案将涉及选择适当的图像数据集、调整模型参数（如学习率、迭代次数、池化层的配置等），以及的后处理步骤，如超像素群组的合并或分割，以改进分割的精度和效率。The specific implementation will involve selecting an appropriate image dataset, adjusting model parameters (such as learning rate, number of iterations, configuration of pooling layers, etc.), and post-processing steps such as merging or segmenting superpixel groups to improve segmentation accuracy and efficiency.

如图1所示，本发明实施例提供种注意力机制协同空洞空间金字塔池化促进无监督超像素分割方法，包括以下步骤：As shown in FIG1 , an embodiment of the present invention provides an attention mechanism and a hollow space pyramid pooling method to promote unsupervised superpixel segmentation, including the following steps:

步骤1，图像预处理Step 1: Image preprocessing

为保证提取有效的深度特征需要对图像进行预处理，直接将图像RGB通道信息同像素点的位置信息进行结合，实现将三维特征转化为五维特征，保证超像素分割整体方法运行的高效。In order to ensure the extraction of effective depth features, the image needs to be preprocessed. The image RGB channel information is directly combined with the pixel position information to convert the three-dimensional features into five-dimensional features, ensuring the efficiency of the overall superpixel segmentation method.

上式中 , , 表示图像中第k个像素点从整数型转化为浮点型的颜色通道值，其中类型转换主要是保证后续数据类型一致和整体方法的精确度。式中 , 表示图像中第k个像素点所处的行列数。表示图像中第k个像素点的五维特征并应用与后续处理。 In the above formula , , Indicates the color channel value of the k-th pixel in the image converted from integer to floating point type, where the type conversion is mainly to ensure the consistency of subsequent data types and the accuracy of the overall method. , Indicates the row and column number of the kth pixel in the image. Represents the five-dimensional features of the k-th pixel in the image and applies them to subsequent processing.

步骤2，注意力机制Step 2: Attention Mechanism

为使超像素分割整体方法可以根据输入的不同图像动态调整像素关注度，构建了无需降维局部跨通道交互、自适应确定一维卷积核大小且高效的通道注意力模块。In order to enable the overall superpixel segmentation method to dynamically adjust pixel attention according to different input images, a channel attention module is constructed that does not require local cross-channel interaction of dimensionality reduction and can adaptively determine the size of one-dimensional convolution kernel and is efficient.

首先将预处理后的五维特征应用于一个逐点卷积层，在保持高度和宽度不变的情况下，将五维特征进行线性组合和变换实现八维特征输出。八维特征输出表示为并直接应用与通道全局平均池化，其中H表示图像高度、W表示图像宽度、C表示图像的通道数，此时的C=8。 First, the preprocessed five-dimensional features are applied to a point-by-point convolution layer. While keeping the height and width unchanged, the five-dimensional features are linearly combined and transformed to achieve eight-dimensional feature output. The eight-dimensional feature output is represented as And directly apply channel global average pooling, where H represents the image height, W represents the image width, and C represents the number of channels of the image. In this case, C=8.

计算通道全局平均池化对的处理得到聚合特征 Calculate the channel global average pooling pair The aggregated features are obtained by

计算聚合特征经内核大小为L的快速一维卷积，实现局部跨通道交互的通道注意力学习Calculate the aggregated features through a fast one-dimensional convolution with a kernel size of L to achieve channel attention learning for local cross-channel interaction

式中，为sigmoid函数、为内核大小为L的快速一维卷积、为学习到的通道权重。 In the formula, is the sigmoid function, is a fast one-dimensional convolution with kernel size L, is the learned channel weight.

根据通道数自动计算内核大小LAutomatically calculate kernel size L based on the number of channels

式中，，，定义为与t最接近的奇数。 In the formula, , , Defined as the odd integer closest to t.

将聚合特征经内核大小为L的快速一维卷积与sigmoid函数处理结果表示为，并与进行元素乘积得到注意力机制处理结果表示为。其形状与输入注意力机制的八维特征保持一致。 The aggregated features are processed by fast one-dimensional convolution with kernel size L and sigmoid function, and the result is expressed as , and The result of the attention mechanism processing is expressed as Its shape is consistent with the eight-dimensional feature input to the attention mechanism.

步骤3，空洞空间金字塔池化Step 3: Atrous Space Pyramid Pooling

利用空洞空间金字塔池化对注意力机制的结果进行处理，提取出适合超像素分割的深度特征。The results of the attention mechanism are processed using atrous spatial pyramid pooling to extract deep features suitable for superpixel segmentation.

将空洞空间金字塔池化处理后得到的深度特征表示为，其中H与W保持不变依然对应图像的高度与宽度，此时的。中间卷积层计算过程如下： The deep features obtained after the void space pyramid pooling process are expressed as , where H and W remain unchanged and still correspond to the height and width of the image. The calculation process of the intermediate convolutional layer is as follows:

表示卷积核大小为、填充大小为0、采样率为1的卷积层；表示卷积核大小为、填充大小为2、采样率为2的卷积层；表示卷积核大小为、填充大小为4、采样率为4的卷积层；表示卷积核大小为、填充大小为6、采样率为6的卷积层；表示利用自适应平均池化层将输入张量的大小调整为，然后进行步长为1的卷积，并使其输入通道为8、输出通道为16，采用实例归一化对每个输出通道进行归一化操作，最后应用ReLU激活函数引入非线性。将、、和的输入通道都设置为8、输出通道都设置为16、每个卷积层都采用实例归一化对每个输出通道进行归一化操作。其中、、、以及为中间张量。 Indicates that the convolution kernel size is , convolutional layer with padding size 0 and sampling rate 1; Indicates that the convolution kernel size is , convolutional layer with padding size 2 and sampling rate 2; Indicates that the convolution kernel size is , convolutional layer with padding size 4 and sampling rate 4; Indicates that the convolution kernel size is , convolutional layers with padding size 6 and sampling rate 6; Indicates that the size of the input tensor is adjusted to , and then perform a step of 1 Convolution is performed with 8 input channels and 16 output channels. Each output channel is normalized using instance normalization, and the ReLU activation function is applied to introduce nonlinearity. , , and The input channels are all set to 8, the output channels are all set to 16, and each convolution layer uses instance normalization to normalize each output channel. , , , as well as is the intermediate tensor.

深度特征计算如下： Deep Features The calculation is as follows:

式中，表示将、、、和在通道维度上进行拼接从而形成一个更大的张量，、、、和输出通道都为16，所以拼接后的张量通道数为80；式中的表示将拼接得到的张量进行卷积，输出通道设置为适合超像素分割要求的128；同一样，采用实例归一化对每个输出通道进行归一化操作；最后应用ReLU激活函数引入非线性。 In the formula, Indicates that , , , and Concatenate in the channel dimension to form a larger tensor, , , , and The output channels are all 16, so the number of channels of the concatenated tensor is 80; Indicates that the concatenated tensor is Convolution, the output channel is set to 128 which is suitable for superpixel segmentation requirements; Similarly, instance normalization is used to normalize each output channel; finally, the ReLU activation function is applied to introduce nonlinearity.

步骤4，构建损失函数Step 4: Construct a loss function

为鼓励确定性的超像素分配、促使每个超像素大小尽量均匀构建了聚类损失项；同时引入空间平滑损失项量化相邻像素之间的差异；通过构建重构损失项可以保证提取到的深度特征来源于图像自身。In order to encourage deterministic superpixel allocation and make the size of each superpixel as uniform as possible, a clustering loss term is constructed; at the same time, a spatial smoothing loss term is introduced to quantify the differences between adjacent pixels; by constructing a reconstruction loss term, it can be ensured that the extracted deep features come from the image itself.

整体损失函数由聚类损失项、空间平滑损失项与重构损失项三部分组成。The overall loss function consists of three parts: clustering loss, spatial smoothing loss and reconstruction loss.

聚类损失项计算方式如下:The clustering loss term is calculated as follows:

其中，；表示所有像素上类别概率向量的平均值；表示位于i行j列的像素的类别概率向量，计算方法如下： in, ; Represents the average value of the class probability vector over all pixels; Represents the category probability vector of the pixel located at row i and column j, which is calculated as follows:

首先，将中的通道维度中的前三个通道分离用于之后重构损失项计算，剩余125个通道的特征用进行表示，此时可确定取整，表示利用softmax函数将的通道维度转化为对应像素的类别概率。 First, The first three channels in the channel dimension in are separated for subsequent reconstruction loss calculation, and the features of the remaining 125 channels are used To indicate that Round off, Indicates that the softmax function is used to The channel dimension is converted into the category probability of the corresponding pixel.

计算空间平滑损失项分为两部分：方向和方向上的平滑性损失，将原始输入图像进行维度重排得到图像（原始输入图像维度为，图像维度为）。通过计算概率差值的绝对值和图像梯度的平方差的指数函数定义，并计算所有像素的平均空间平滑项损失。具体计算方式如下： The computation of spatial smoothness loss is divided into two parts: Direction and The smoothness loss in the direction is to rearrange the original input image to obtain the image (The original input image dimension is ,image The dimension is ). By calculating the absolute value of the probability difference and the image The exponential function of the squared difference of the gradient is defined, and the average spatial smoothness loss of all pixels is calculated. The specific calculation method is as follows:

式中，表示通道和聚类损失一致；表示为方向上的像素概率差值；表示为方向上的像素强度差值；表示为方向上的像素概率差值；表示为方向上的像素强度差值。具体计算方式如下： In the formula, Indicates that the channel and clustering losses are consistent; Expressed as Pixel probability difference in direction; Expressed as The pixel intensity difference in direction; Expressed as Pixel probability difference in direction; Expressed as The pixel intensity difference in the direction. The specific calculation method is as follows:

重构损失项计算方式如下:The reconstruction loss term is calculated as follows:

式中，在计算聚类损失项与空间平滑损失项时，从深度特征中分离的前三个通道用于图像重建并表示为；表示选用2-范数。 In the formula, when calculating the clustering loss and spatial smoothing loss, the first three channels separated from the deep features are used for image reconstruction and expressed as ; Indicates the use of 2-norm.

步骤5，参数更新及超像素生成Step 5: Parameter update and superpixel generation

通过设置Adam优化器的学习率与迭代次数实现模型的参数更新，即寻找满足损失函数的最小化的模型参数，在此参数下提取的深度特征被视为有效深度特征。由于每次都会获得不同深度特征，因此将最后一次分离得到的有效深度特征用于超像素生成。并利用argmax函数获得通道维度的最大值，从而使得每个像素点对应一个最有的超像素标签索引。将argmax函数处理结果转为二维数组并在CPU中根据限定条件完成自适应超像素分割，其中，对超像素生成的大小限定条件计算如下： The model parameters are updated by setting the learning rate and number of iterations of the Adam optimizer, that is, finding the model parameters that minimize the loss function. The deep features extracted under this parameter are regarded as effective deep features. Since different deep features are obtained each time, the deep features obtained in the last separation are The effective depth features are used for superpixel generation. The argmax function is used to obtain the maximum value of the channel dimension, so that each pixel corresponds to the most accurate superpixel label index. The result of the argmax function is converted into a two-dimensional array and the adaptive superpixel segmentation is completed in the CPU according to the limiting conditions. The size limiting conditions for superpixel generation are calculated as follows:

如图2所示，本发明实施例提供一种注意力机制协同空洞空间金字塔池化促进无监督超像素分割方法的注意力机制协同空洞空间金字塔池化促进无监督超像素分割系统，该系统包括：As shown in FIG2 , an embodiment of the present invention provides an attention mechanism and a hollow space pyramid pooling method for promoting unsupervised superpixel segmentation. The attention mechanism and the hollow space pyramid pooling promote unsupervised superpixel segmentation system, the system comprising:

实施例：Example:

本实施例所用深度学习框架为PyTorch, 编程语言为Python。The deep learning framework used in this embodiment is PyTorch, and the programming language is Python.

步骤1，将获取的高分二号0.8米分辨率彩色图像赋给变量img，判断cuda是否可用，如果可用则在GPU中运行后续代码；不可用则在CPU中运行。获取到的图像应用于编写的图像预处理函数，通过维度重新排列、数据类型转化为浮点型以及None操作在外部增加一个批处理维度得到形状为的张量； Step 1: Assign the acquired 0.8m resolution color image of GF-2 to the variable img, and check whether cuda is available. If it is available, run the subsequent code in GPU; if it is not available, run it in CPU. The acquired image is applied to the written image preprocessing function. By rearranging the dimensions, converting the data type to floating point, and adding a batch dimension externally with None operation, the shape is Tensor of ;

读取张量的最后两个维度获得图像的高度和宽度值，利用torch.arange函数生成高度、宽度序列，使用torch.meshgrid函数将两个序列转化为两个坐标网格并用torch.stack函数进行堆叠；将图像与坐标网格进行连接和标准化操作得到张量形状为的结果。 Read the last two dimensions of the tensor to get the height and width values of the image, use the torch.arange function to generate the height and width sequence, use the torch.meshgrid function to convert the two sequences into two coordinate grids and stack them with the torch.stack function; connect the image and the coordinate grid and perform normalization operations to get the tensor shape result.

步骤2，将图像预处理结果应用于逐点卷积层得到形状为的张量，然后利用通道全局平均池化进行处理得到聚合特征，其结果经过自动计算内核大小为的快速一维卷积与sigmoid函数处理，并与逐点卷积得到的张量进行元素乘积得到注意力机制（如图3所示）处理结果张量形状也为； Step 2: Apply the image preprocessing result to the point-by-point convolution layer to obtain a shape of The tensor is then processed using channel global average pooling to obtain aggregate features. The result is automatically calculated with a kernel size of The fast one-dimensional convolution and sigmoid function are processed, and the tensor obtained by point-by-point convolution is element-wise multiplied to obtain the attention mechanism (as shown in Figure 3). The shape of the processing result tensor is also ;

使用Kaiming初始化方法对逐点卷积与快速一维卷积的权重进行初始化，使用常数值0初始化偏置；对于实例归一化层，使用常数值1初始化归一化的权重，从而保证训练时有合适的初始参数值。The Kaiming initialization method is used to initialize the weights of point-by-point convolution and fast one-dimensional convolution, and the bias is initialized with a constant value of 0. For the instance normalization layer, the normalized weights are initialized with a constant value of 1 to ensure appropriate initial parameter values during training.

步骤3，将注意力机制处理得到的张量应用于空洞空间金字塔池化（如图4所示），使其分别通过卷积大小为、填充大小为0、采样率为1的卷积层；卷积核大小为、填充大小为2、采样率为2的卷积层；卷积核大小为、填充大小为4、采样率为4的卷积层；卷积核大小为、填充大小为6、采样率为6的卷积层；输入张量的大小调整为，然后进行步长为1的卷积，应用ReLU激活函数引入非线性的操作； Step 3: Apply the tensor obtained by the attention mechanism to the dilated space pyramid pooling (as shown in Figure 4), so that it passes through the convolution size of , fill size is 0, sampling rate is 1 Convolutional layer; the convolution kernel size is , padding size is 2, sampling rate is 2 Convolutional layer; the convolution kernel size is , padding size is 4, sampling rate is 4 Convolutional layer; the convolution kernel size is , padding size is 6, sampling rate is 6 Convolutional layer; the input tensor is resized to , and then perform a step of 1 Convolution, using the ReLU activation function to introduce nonlinearity operate;

将上述操作的输出通道都设置为16，这样拼接后得到形状为的张量；拼接后的张量使用卷积核大小为的卷积层、实例归一化层以及ReLU进行处理得到最终形状为的深度特征张量；用实例归一化对每个输出通道进行归一化操作，最后应用ReLU激活函数引入非线性；使用与注意力机制相同的初始化方式对空洞空间金字塔池化中卷积层的权重和偏置以及实例归一化层进行初始化。 Set the output channels of the above operations to 16, so that the shape after splicing is The concatenated tensor uses a convolution kernel size of The convolutional layer, instance normalization layer, and ReLU are processed to obtain the final shape The deep feature tensor of is obtained; each output channel is normalized using instance normalization, and finally the ReLU activation function is applied to introduce nonlinearity; the weights and biases of the convolutional layer and the instance normalization layer in the atrous spatial pyramid pooling are initialized using the same initialization method as the attention mechanism.

步骤4，将提取得到的深度特征分离成一个形状为的张量用于计算重构损失项，另外一个形状为的张量用于计算聚类损失项以及空间平滑损失项；通过定义的optimize函数进行迭代，并选用Adam优化器更新模型参数，实现损失函数的最小化；其中，迭代次数设置为500次，学习率设置为；当达到迭代次数后，利用argmax函数获得通道维度的最大值，从而使得每个像素点对应一个最有的超像素标签索引； Step 4: Separate the extracted deep features into a shape A tensor of is used to calculate the reconstruction loss term, and another one of shape The tensor is used to calculate the clustering loss term and the spatial smoothing loss term; it is iterated through the defined optimize function, and the Adam optimizer is used to update the model parameters to minimize the loss function; the number of iterations is set to 500 times, and the learning rate is set to ; When the number of iterations is reached, the argmax function is used to obtain the maximum value of the channel dimension, so that each pixel corresponds to the most appropriate superpixel label index;

获得标签索引后通过squeeze函数去除通道维度，同时不再计算梯度，并将结果从GPU转移到CPU上处理；将PyTorch张量转化为Numpy数组用于之后超像素分割，最终构建_enforce_label_connectivity_cython函数根据限定的超像素大小条件自适应实现超像素分割。After obtaining the label index, the channel dimension is removed through the squeeze function. At the same time, the gradient is no longer calculated, and the result is transferred from the GPU to the CPU for processing; the PyTorch tensor is converted into a Numpy array for subsequent superpixel segmentation, and finally the _enforce_label_connectivity_cython function is constructed to adaptively implement superpixel segmentation according to the limited superpixel size conditions.

本实施例对scipy库中获取的一张脸部图像进行处理，图5为实施例的超像素分割效果图。This embodiment processes a facial image obtained from the scipy library, and FIG5 is a superpixel segmentation effect diagram of the embodiment.

通过以上实施例可见，本发明实现了无监督超像素分割，分割精度检验结果达到94.7%。本发明提供的方法在不提供真实标签和超像素数目的情况下，灵活高效地实现了有效深度特征提取以及根据图像自身特点自适应的超像素生成。在不同图像尺寸和图像复杂度的情况下，也可快速准确地实现无监督超像素分割。具有低复杂度、自适应以及泛化能力强等优势，为提高图像处理效率和精确度提供有效支撑。It can be seen from the above embodiments that the present invention realizes unsupervised superpixel segmentation, and the segmentation accuracy test result reaches 94.7%. The method provided by the present invention flexibly and efficiently realizes effective depth feature extraction and superpixel generation adaptively according to the characteristics of the image itself without providing the real label and the number of superpixels. Under the conditions of different image sizes and image complexities, unsupervised superpixel segmentation can also be realized quickly and accurately. It has the advantages of low complexity, adaptability and strong generalization ability, and provides effective support for improving image processing efficiency and accuracy.

应用于分类任务时，由于传统的卷积无法有效地提取不规则分布的对象的特征，大多数研究人员会选用图卷积提取不规则分布对象的特征，然而获取图卷积所需合适的图结构是困难的。可以利用该超像素分割方法对网络训练期间的中间特征进行超像素生成，以自适应地生成同质区域，获得图结构，并进一步生成空间描述符，作为图节点，通过考虑描述符之间的关系来获得的邻接矩阵，以满足图卷积的必备条件。When applied to classification tasks, most researchers choose graph convolution to extract features of irregularly distributed objects because traditional convolution cannot effectively extract features of irregularly distributed objects. However, it is difficult to obtain the appropriate graph structure required for graph convolution. This superpixel segmentation method can be used to generate superpixels for intermediate features during network training to adaptively generate homogeneous regions, obtain graph structures, and further generate spatial descriptors as graph nodes. The adjacency matrix obtained by considering the relationship between descriptors can meet the necessary conditions for graph convolution.

应用于分割任务时，如果所处理的图像尺寸过大会增加计算开销且分割算法容易受到噪声的干扰，传统的像素级分割方法还容易导致过分割问题。通过所提供的超像素算法可以很好解决这些问题，因为超像素可以减少图像中的像素数量，相比于对每个像素进行分割，对超像素进行分割能够在保持图像结构的同时显著减少分割的计算成本；由于超像素更倾向于将相似的像素合并在一起，可以提供更大范围内的空间一致性，使得分割结果更具连续性的同时减少过分割问题的出现；通过对局部相似性进行聚合，减少了单个像素的影响，因此可以在一定程度上对图像中存在的各种噪声进行抑制。When applied to segmentation tasks, if the size of the processed image is too large, the computational overhead will increase and the segmentation algorithm will be easily disturbed by noise. The traditional pixel-level segmentation method is also prone to over-segmentation problems. These problems can be solved well by the provided superpixel algorithm, because superpixels can reduce the number of pixels in the image. Compared with segmenting each pixel, segmenting superpixels can significantly reduce the computational cost of segmentation while maintaining the image structure; because superpixels tend to merge similar pixels together, they can provide spatial consistency in a larger range, making the segmentation results more continuous while reducing the occurrence of over-segmentation problems; by aggregating local similarities, the influence of a single pixel is reduced, so various noises in the image can be suppressed to a certain extent.

本发明实施例在实验过程中与公开的SLIC算法和通过具有正则信息最大化的卷积神经网络进行超像素分割算法（以下简称算法2）进行对比，其中涉及的相关参数与算法开发人员使用的参数保持一致，以下为真实标签和分割算法分别生成的超像素分割结果。如图6所示，A，真实标签；B，SLIC；C，算法2；D，本发明超像素分割结果。During the experiment, the embodiment of the present invention was compared with the public SLIC algorithm and the superpixel segmentation algorithm using a convolutional neural network with maximum regular information (hereinafter referred to as Algorithm 2), and the relevant parameters involved were consistent with the parameters used by the algorithm developers. The following are the superpixel segmentation results generated by the real label and the segmentation algorithm respectively. As shown in Figure 6, A, real label; B, SLIC; C, Algorithm 2; D, superpixel segmentation result of the present invention.

由以上结果可以看出SLIC算法和算法2对于大尺寸图像的超像素分割适应性较差，其中SLIC算法存在一定的欠分割问题且边界粘附性较差，但其算法运行效率比本发明分割方法高；算法2中存在过分割问题且不能对图像边界进行良好的超像素生成，虽然和本发明分割方法一样都是无监督的，但其算法的时间复杂度约是本发明分割方法的13倍，并且用于对比的两个超像素分割算法其生成的分割精度都低于本发明超像素分割方法。From the above results, it can be seen that the SLIC algorithm and Algorithm 2 have poor adaptability to superpixel segmentation of large-size images. The SLIC algorithm has certain under-segmentation problems and poor boundary adhesion, but its algorithm operation efficiency is higher than that of the segmentation method of the present invention; Algorithm 2 has an over-segmentation problem and cannot generate good superpixels for the image boundaries. Although it is unsupervised like the segmentation method of the present invention, its algorithm time complexity is about 13 times that of the segmentation method of the present invention, and the segmentation accuracy generated by the two superpixel segmentation algorithms used for comparison is lower than that of the superpixel segmentation method of the present invention.

以上所述，仅为本发明的具体实施方式，但本发明的保护范围并不局限于此，任何熟悉本技术领域的技术人员在本发明揭露的技术范围内，凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等，都应涵盖在本发明的保护范围之内。The above description is only a specific implementation mode of the present invention, but the protection scope of the present invention is not limited thereto. Any modification, equivalent substitution and improvement made by any technician familiar with the technical field within the technical scope disclosed by the present invention and within the spirit and principle of the present invention should be covered within the protection scope of the present invention.

Claims

A method for promoting unsupervised superpixel segmentation by using an attention mechanism and dilute spatial pyramid pooling is proposed. The method is characterized in that the image is first preprocessed to introduce the spatial relationship between pixels, and the attention mechanism is applied to the superpixel segmentation task for the first time to enable the model to strengthen the focus on important feature channels and suppress the response to irrelevant channels; dilute spatial pyramid pooling is used to reduce parameters while expanding the receptive field, and an optimizer is used in combination with a constructed loss function to achieve parameter update and extraction of the final effective deep features. The argmax function is applied to the extracted effective deep features to transform the superpixel segmentation task into a classification problem, and the final adaptive superpixel generation is achieved by adding size restriction conditions.

The method for promoting unsupervised superpixel segmentation by using an attention mechanism in collaboration with atrous space pyramid pooling as claimed in claim 1 is characterized in that the method for promoting unsupervised superpixel segmentation by using an attention mechanism in collaboration with atrous space pyramid pooling comprises the following steps:

Step 1: Combine the image RGB channel information with the pixel position information to convert the three-dimensional features into five-dimensional features;

Step 2: Use the attention mechanism to build a channel attention module;

Step 3: Use dilated spatial pyramid pooling to process the results of the attention mechanism and extract deep features suitable for superpixel segmentation;

Step 4: Construct a loss function. First, construct a clustering loss term. At the same time, use a spatial smoothing loss term to quantify the difference between adjacent pixels. Then construct a reconstruction loss term.

Step 5: Update the model parameters by setting the learning rate and number of iterations of the Adam optimizer to find the model parameters that minimize the loss function.

Step 6: Use the argmax function to obtain the maximum value of the channel dimension, and convert the final effective depth feature into an optimal superpixel label index corresponding to each pixel point; convert the processing result of the argmax function into a two-dimensional array and complete the adaptive superpixel segmentation in the CPU according to the limited conditions.

The attention mechanism and hollow space pyramid pooling to promote unsupervised superpixel segmentation method as claimed in claim 2 is characterized in that the image is preprocessed using the following formula in step 1:

In the formula , , Indicates the color channel value of the k-th pixel in the image converted from integer to floating point, where , Indicates the row and column number of the k-th pixel in the image. Represents the five-dimensional features of the k-th pixel in the image and applies them to subsequent processing.

The method for promoting unsupervised superpixel segmentation by using the attention mechanism in collaboration with the dilated space pyramid pooling as claimed in claim 2 is characterized in that the construction method of step 2 specifically includes:

Step 21, applying the preprocessed five-dimensional features to a point-by-point convolutional layer, and linearly combining and transforming the five-dimensional features to achieve eight-dimensional feature output while keeping the height and width unchanged;

Step 22, calculate the channel global average pooling pair The aggregated features are obtained by

Step 23, calculate the aggregated features through a fast one-dimensional convolution with a kernel size of L:

In the formula, is the sigmoid function, is a fast one-dimensional convolution with kernel size L, is the learned channel weight; the kernel size L is automatically calculated based on the number of channels:

In the formula , , Defined as the odd integer closest to t.

Step 24, the aggregated features are processed by fast one-dimensional convolution with kernel size L and sigmoid function, and the result is expressed as , and Perform element-wise multiplication to obtain the attention mechanism processing result expressed as , the shape is consistent with the eight-dimensional features of the input attention mechanism; .

The attention mechanism and hollow space pyramid pooling method for promoting unsupervised superpixel segmentation as claimed in claim 4 is characterized in that the eight-dimensional feature output in step 21 is represented as And directly apply channel global average pooling, where H represents the image height, W represents the image width, and C represents the number of channels of the image. In this case, C=8.

The method for promoting unsupervised superpixel segmentation by using the attention mechanism in collaboration with the atrous space pyramid pooling as claimed in claim 2 is characterized in that the construction of the channel attention module in step 3 specifically comprises:

Step 31, the depth feature obtained after the hollow space pyramid pooling process is expressed as , where H and W remain unchanged and still correspond to the height and width of the image, ; The calculation process of the intermediate convolutional layer is as follows:

In the formula, Indicates that the convolution kernel size is , convolutional layer with padding size 0 and sampling rate 1; Indicates that the convolution kernel size is , convolutional layer with padding size 2 and sampling rate 2; Indicates that the convolution kernel size is , convolutional layer with padding size 4 and sampling rate 4; Indicates that the convolution kernel size is , convolutional layers with padding size 6 and sampling rate 6; Indicates that the size of the input tensor is adjusted to ; Perform a step of 1 Convolution is performed with 8 input channels and 16 output channels. Each output channel is normalized using instance normalization, and the ReLU activation function is applied to introduce nonlinearity. , , , as well as is the intermediate tensor;

Step 32, deep features The calculation is as follows:

In the formula, Indicates that , , , and Concatenate in the channel dimension to form a larger tensor, , , , and The output channels are all 16, and the number of channels of the spliced tensor is 80; Indicates that the concatenated tensor is Convolution, the output channel is set to 128 suitable for superpixel segmentation requirements; instance normalization is used to normalize each output channel; finally, the ReLU activation function is applied to introduce nonlinearity.

The attention mechanism and hollow space pyramid pooling to promote unsupervised superpixel segmentation method as claimed in claim 2 is characterized in that step 4 constructing a loss function specifically includes:

Step 41, the overall loss function consists of three parts: clustering loss term, spatial smoothing loss term and reconstruction loss term:

In the formula, Represents the overall loss function; represents the clustering loss term; represents the spatial smoothing loss term; represents the reconstruction loss term; and are constant coefficients and , .

Step 42, the clustering loss term is calculated as follows:

In the formula, ; Represents the average value of the class probability vector over all pixels; Represents the category probability vector of the pixel located at row i and column j, which is calculated as follows:

Step 43: The first three channels in the channel dimension in are separated for subsequent reconstruction loss calculation, and the features of the remaining 125 channels are used To indicate, confirm Rounding, Indicates that the softmax function is used to The channel dimension is converted into the category probability of the corresponding pixel;

Step 44, the calculation of the spatial smoothing loss term is divided into two parts: Direction and The smoothness loss in the direction is to rearrange the original input image to obtain the image ; By calculating the absolute value of the probability difference and the image The exponential function of the squared difference of the gradient is defined, and the average spatial smoothness loss of all pixels is calculated as follows:

In the formula, Indicates that the channel and clustering losses are consistent; Expressed as Pixel probability difference in direction; Expressed as The pixel intensity difference in direction; Expressed as Pixel probability difference in direction; Expressed as The pixel intensity difference in the direction; the specific calculation method is as follows:

Step 45, the reconstruction loss term is calculated as follows:

When calculating the clustering loss and spatial smoothness loss, the first three channels separated from the deep features are used for image reconstruction and expressed as ; Indicates the use of 2-norm;

Step 44: The original input image dimension is ,image The dimension is .

The attention mechanism and hollow space pyramid pooling as described in claim 2 promote unsupervised superpixel segmentation method, characterized in that in step five, deep features are extracted under minimized model parameters, and the deep features are effective deep features, and the effective depth features obtained by the last separation are used for superpixel generation.

The method for promoting unsupervised superpixel segmentation by using an attention mechanism in collaboration with atrous space pyramid pooling as claimed in claim 2 is characterized in that the size limiting condition for superpixel generation in step 6 is calculated as follows:

In the formula, represents the average size of an ideal superpixel; Represents the total number of pixels in the image; and are the minimum and maximum thresholds for limiting superpixels, respectively, and are used to filter superpixels that are too small or too large to make the size of the final generated superpixels as uniform as possible; the hyperparameters in the formula .

The attention mechanism and atrous space pyramid pooling method for promoting unsupervised superpixel segmentation according to any one of claims 1 to 9, wherein the attention mechanism and atrous space pyramid pooling system for promoting unsupervised superpixel segmentation comprises:

Image preprocessing module, used to combine the image RGB channel information with the pixel position information to convert the three-dimensional features into five-dimensional features;

Attention mechanism module, used to build channel attention module using attention mechanism;

The dilated spatial pyramid pooling module is used to process the results of the attention mechanism using dilated spatial pyramid pooling to extract deep features suitable for superpixel segmentation;

The loss function construction module is used to construct the loss function. First, the clustering loss term is constructed; at the same time, the spatial smoothing loss term is used to quantify the difference between adjacent pixels; and then the reconstruction loss term is constructed;

The parameter update module is used to update the model parameters by setting the learning rate and number of iterations of the Adam optimizer;

The superpixel segmentation module is used to convert the processing result of the argmax function into a two-dimensional array and complete the adaptive superpixel segmentation in the CPU according to the limited conditions.