CN118552563A

CN118552563A - A breast ultrasound image segmentation method based on window attention semantic stream alignment

Info

Publication number: CN118552563A
Application number: CN202410478201.6A
Authority: CN
Inventors: 梅礼晔; 雷诚; 黄金; 叶昭毅; 杨威; 查文琪; 徐川
Original assignee: Wuhan Kuangmu Intelligent Technology Co ltd; Jingyun Zhitu Suzhou Technology Co ltd
Current assignee: Jingyun Zhitu Suzhou Technology Co ltd; Xinguang Zhiying (Wuhan) Technology Co.,Ltd.
Priority date: 2024-04-19
Filing date: 2024-04-19
Publication date: 2024-08-27

Abstract

The present invention discloses a method for segmenting breast ultrasound images based on window attention semantic stream alignment, including: collecting breast ultrasound images for preprocessing to construct a data set, and segmenting the data set into a training data set and a test data set; constructing a model, including an encoding-decoding framework, a window self-attention module, and a stream alignment module; selecting a custom hybrid loss function to optimize the segmentation performance of the model, training the model using the training data set, optimizing the model parameters by back propagation and gradient descent algorithms, and adjusting hyperparameters and model structures; evaluating the model using the test data set, and calculating multiple indicators to measure the performance of the model. The present invention can effectively improve the accuracy and robustness of the model in identifying and segmenting cancer areas in ultrasound images by adopting a window self-attention module and a stream alignment module; and can further improve the performance and generalization ability of the model by using a custom hybrid loss function for model training and optimization.

Description

A breast ultrasound image segmentation method based on window attention semantic stream alignment

技术领域Technical Field

本发明涉及医用超声技术领域，尤其涉及一种基于窗口注意力语义流对齐的乳腺超声图像分割方法。The present invention relates to the field of medical ultrasound technology, and in particular to a breast ultrasound image segmentation method based on window attention semantic stream alignment.

背景技术Background Art

乳腺癌是世界上影响女性健康的最具威胁性的疾病，其发病率和死亡率居世界首位。此外，2021年，它直接导致美国43600名患者死亡。尽管如此，医生可以通过诊断和治疗早期I级癌症患者提高5年生存率，成功率为99％，而晚期IV级仅为37％。因此，早期癌症检测对降低死亡率至关重要。在诊断过程中，乳腺超声图像是放射科医师最常用的医学成像方式，具有性价比高、无放射性、无痛性和动态性的优点。因此，超声图像经常应用于癌症筛查和治疗效果监测。具体来说，乳腺癌的图像难以辨认，尤其是癌细胞和肌层的边界。换句话说，训练有素、技术娴熟的医生能够准确定位癌症。由于对放射科医师的高度依赖，许多深度学习工程师提出人工智能来协助医生检测异常组织并产生定量指标。特别是分割算法能够自动生成癌细胞的分布，直观地指导医生对异常组织进行分割。此外，分割算法还可以提高放射科医师生成准确诊断的效率和客观性。因此，超声图像分割在医学成像方面仍然具有挑战性和潜力。Breast cancer is the most threatening disease affecting women's health in the world, with the highest incidence and mortality rates in the world. In addition, in 2021, it directly caused the deaths of 43,600 patients in the United States. Despite this, doctors can improve the 5-year survival rate by diagnosing and treating patients with early stage I cancer, with a success rate of 99%, while the late stage IV is only 37%. Therefore, early cancer detection is crucial to reducing mortality. In the diagnosis process, breast ultrasound images are the most commonly used medical imaging method for radiologists, with the advantages of high cost performance, no radioactivity, painlessness and dynamicity. Therefore, ultrasound images are often used in cancer screening and treatment effect monitoring. Specifically, breast cancer images are difficult to identify, especially the boundary between cancer cells and muscle layer. In other words, well-trained and skilled doctors are able to accurately locate cancer. Due to the high dependence on radiologists, many deep learning engineers have proposed artificial intelligence to assist doctors in detecting abnormal tissues and generating quantitative indicators. In particular, segmentation algorithms can automatically generate the distribution of cancer cells and intuitively guide doctors to segment abnormal tissues. In addition, segmentation algorithms can also improve the efficiency and objectivity of radiologists in generating accurate diagnoses. Therefore, ultrasound image segmentation remains challenging and has potential in medical imaging.

目前，医学图像分割领域主要采用卷积神经网络(CNN)进行自动化的图像分割，其中网络结构设计主要包括：编码器-解码器架构：其中编码器负责提取图像的低级和高级特征，解码器负责将这些特征映射回原始图像空间，以生成像素级的分割结果；卷积层：在编码器和解码器中，使用卷积层来提取图像特征，包括局部感受野和参数共享；池化层：在编码器中，使用池化层来减少特征图的大小，同时在一定程度上减少计算量；激活层：使用激活层，如ReLU，以引入非线性，提高网络的表达能力；以及全连接层：在解码器中，使用全连接层来生成最终的分割结果。At present, the field of medical image segmentation mainly uses convolutional neural networks (CNN) for automated image segmentation, where the network structure design mainly includes: encoder-decoder architecture: the encoder is responsible for extracting low-level and high-level features of the image, and the decoder is responsible for mapping these features back to the original image space to generate pixel-level segmentation results; convolutional layer: in the encoder and decoder, convolutional layers are used to extract image features, including local receptive fields and parameter sharing; pooling layer: in the encoder, pooling layers are used to reduce the size of feature maps and reduce the amount of computation to a certain extent; activation layer: activation layers, such as ReLU, are used to introduce nonlinearity and improve the expressiveness of the network; and fully connected layer: in the decoder, fully connected layers are used to generate the final segmentation results.

但上述技术还具有一些缺陷，包括：1).随着网络层数的增加，CNN的计算复杂度也相应增加，需要更多的计算资源；2).CNN容易过拟合，特别是在小数据集上训练时；为了防止过拟合，需要采用正则化、数据增强等技术；3).CNN的泛化能力取决于其训练数据的质量和多样性，在实际应用中，CNN可能无法很好地处理未在训练数据中出现的新情况；4).CNN对噪声和图像失真较为敏感，会影响分割的精度和准确性。However, the above technology still has some defects, including: 1). As the number of network layers increases, the computational complexity of CNN also increases accordingly, requiring more computing resources; 2). CNN is prone to overfitting, especially when trained on small data sets; in order to prevent overfitting, regularization, data enhancement and other techniques are needed; 3). The generalization ability of CNN depends on the quality and diversity of its training data. In practical applications, CNN may not be able to handle new situations that do not appear in the training data well; 4). CNN is sensitive to noise and image distortion, which will affect the precision and accuracy of segmentation.

发明内容Summary of the invention

针对现有技术的以上缺陷或改进需求，本发明的目的在于提供一种基于窗口注意力语义流对齐的乳腺超声图像癌症分割方法，所述方法通过构建一种基于窗口注意力语义流对齐的乳腺超声图像分割模型，采用窗口自注意力模块提取图像的局部特征，联合多尺度和语义流特征获取图像的全局信息依赖，使得模型更多地保留有效的全局信息，提高模型的鲁棒性和准确性。In view of the above defects or improvement needs of the prior art, the purpose of the present invention is to provide a breast ultrasound image cancer segmentation method based on window attention semantic stream alignment. The method constructs a breast ultrasound image segmentation model based on window attention semantic stream alignment, adopts a window self-attention module to extract local features of the image, and combines multi-scale and semantic stream features to obtain the global information dependency of the image, so that the model retains more effective global information and improves the robustness and accuracy of the model.

为达此目的，本发明采用以下技术方案：To achieve this object, the present invention adopts the following technical solutions:

第一方面，本发明提供一种基于窗口注意力语义流对齐的乳腺超声图像癌症分割方法，包括：In a first aspect, the present invention provides a breast ultrasound image cancer segmentation method based on window attention semantic stream alignment, comprising:

S100、收集乳腺超声图像进行预处理构建数据集，并将数据集分割为训练数据集和测试数据集；S100, collecting breast ultrasound images, performing preprocessing to construct a data set, and dividing the data set into a training data set and a test data set;

S200、构建一种基于窗口注意力语义流对齐的乳腺超声图像分割模型，所述模型包括编码-解码框架、窗口自注意力模块以及流对齐模块，其中：S200, constructing a breast ultrasound image segmentation model based on window attention semantic stream alignment, the model comprising an encoding-decoding framework, a window self-attention module and a stream alignment module, wherein:

所述编码-解码框架用于将乳腺癌特征提取和特征识别过程分离，便于特征信息挖掘和整合；The encoding-decoding framework is used to separate the breast cancer feature extraction and feature recognition processes, facilitating feature information mining and integration;

所述窗口自注意力模块用于通过将图像分割成一系列的tokens，在注意力机制的作用下，挖掘出更有效的全局特征，避免超声图像阴影和噪点的干扰，提高模型的鲁棒性和分割精度；The window self-attention module is used to mine more effective global features by segmenting the image into a series of tokens under the action of the attention mechanism, avoiding the interference of ultrasound image shadows and noise, and improving the robustness and segmentation accuracy of the model;

所述流对齐模块用于通过周期性连接或跳过连接改进特性，对齐神经网络中的特征集合，以促进跨层通信，允许相关信息在各层之间传播，增强模型的跨层通信能力、上下文推理能力和预测能力；The stream alignment module is used to improve characteristics through periodic connections or skip connections, align feature sets in the neural network to facilitate cross-layer communication, allow relevant information to propagate between layers, and enhance the model's cross-layer communication capability, contextual reasoning capability, and prediction capability;

S300、采用自定义的BCE-Dice混合损失函数，结合像素准确率和预测重合区域的准确率，优化模型分割性能，提高分割精度；S300 uses a custom BCE-Dice hybrid loss function, combines pixel accuracy and prediction overlap area accuracy, optimizes model segmentation performance, and improves segmentation accuracy;

S400、使用训练数据集对模型进行训练，通过反向传播和梯度下降算法优化模型参数，调整超参数和模型结构，提高模型的性能和泛化能力；S400, train the model using the training data set, optimize the model parameters through back propagation and gradient descent algorithms, adjust the hyperparameters and model structure, and improve the performance and generalization ability of the model;

S500、使用测试数据集对模型进行评估，计算精确度、召回率、总体准确度、预测误差、Kappa系数、F1分数和交并比等指标来衡量模型的性能；S500, evaluate the model using the test data set, and calculate indicators such as precision, recall, overall accuracy, prediction error, Kappa coefficient, F1 score, and intersection-over-union ratio to measure the performance of the model;

进一步地，所述超声图像预处理包括图像空白区域去除以及图像敏感信息去除。Furthermore, the ultrasound image preprocessing includes removing blank areas of the image and removing sensitive information of the image.

进一步地，所述步骤S100中收集乳腺超声图像还包括：Furthermore, collecting breast ultrasound images in step S100 also includes:

采用翻转、旋转、改变亮度、调整对比度以及调整分辨率的方法来模拟临床场景中的超声图像，丰富数据的多样性。The methods of flipping, rotating, changing brightness, adjusting contrast, and adjusting resolution are used to simulate ultrasound images in clinical scenarios and enrich the diversity of data.

进一步地，所述编码-解码框架中编码器模块主要通过四个连续阶段，应用窗口自注意来推断获取特征信息。Furthermore, the encoder module in the encoding-decoding framework mainly applies window self-attention to infer and obtain feature information through four consecutive stages.

进一步地，所述编码-解码框架中解码器模块还引入多尺度特征学习，以弥补浅层特征的丢失，并使用空洞空间金字塔池化作为编码器和解码器之间的连接，获取更多上下文信息。Furthermore, the decoder module in the encoding-decoding framework also introduces multi-scale feature learning to compensate for the loss of shallow features, and uses hollow spatial pyramid pooling as a connection between the encoder and decoder to obtain more contextual information.

进一步地，S200中所述模型还包括上块模块，所述上块模块用于3×3卷积和批量归一化，生成相应的大小分布结果。Furthermore, the model in S200 also includes an upper block module, which is used for 3×3 convolution and batch normalization to generate corresponding size distribution results.

进一步地，所述自定义混合损失函数包括焦点损失以及Dice损失函数，如公式(1)-(3)所示：Furthermore, the custom hybrid loss function includes a focal loss and a Dice loss function, as shown in formulas (1)-(3):

L_Hybrid＝L_BCE+L_Dice (3)L _Hybrid = L _BCE + L _Dice (3)

其中，L_BCE代表交叉熵损失函数，Y_i代表像素点i上的真实标签值，Y_i代表像素点i上的预测结果值，N代表整个超声图像的像素数量，i代表像素索引。L_Dice代表Dice损失函数，∩代表求交集，∪代表求并集。Wherein, L _BCE represents the cross entropy loss function, _Yi represents the true label value at pixel i, _Yi represents the predicted result value at pixel i, N represents the number of pixels in the entire ultrasound image, and i represents the pixel index. L _Dice represents the Dice loss function, ∩ represents the intersection, and ∪ represents the union.

进一步地，所述步骤S500中精确度、召回率、总体准确度、预测误差、Kappa系数、F1分数和交并比计算公式包括：Furthermore, the calculation formulas for precision, recall, overall accuracy, prediction error, Kappa coefficient, F1 score and intersection over union ratio in step S500 include:

其中，TP、TN、FP和FN分别真正例、真负例、假正例和假负例；Precision代表精确度；Recall代表召回率；OA代表总体准确度；Pe代表预测误差；Kappa代表Kappa系数；F1代表F1分数；IOU代表交并比。Among them, TP, TN, FP and FN represent true positive, true negative, false positive and false negative respectively; Precision represents precision; Recall represents recall; OA represents overall accuracy; Pe represents prediction error; Kappa represents Kappa coefficient; F1 represents F1 score; IOU represents intersection over union.

第二方面，本发明提供一种电子设备，包括：In a second aspect, the present invention provides an electronic device, comprising:

至少一个处理器、至少一个存储器和通信接口；其中，At least one processor, at least one memory and a communication interface; wherein,

所述处理器、存储器和通信接口相互间进行通信；The processor, memory and communication interface communicate with each other;

所述存储器存储有可被所述处理器执行的程序指令，所述处理器调用所述程序指令，以执行上述方法中的任一步骤。The memory stores program instructions that can be executed by the processor, and the processor calls the program instructions to perform any step in the above method.

第三方面，本发明提供一种非暂态计算机可读存储介质，所述非暂态计算机可读存储介质存储计算机指令，所述计算机指令使所述计算机执行上述方法中的任一步骤。In a third aspect, the present invention provides a non-transitory computer-readable storage medium, wherein the non-transitory computer-readable storage medium stores computer instructions, wherein the computer instructions enable the computer to execute any step in the above method.

本发明的有益效果：Beneficial effects of the present invention:

1.本发明通过构建一种基于窗口注意力语义流对齐的乳腺超声图像分割模型，采用窗口自注意力模块提取图像的局部特征，联合多尺度和语义流特征获取图像的全局信息依赖，使得模型更多地保留有效的全局信息，提高模型的鲁棒性和准确性。1. The present invention constructs a breast ultrasound image segmentation model based on window attention semantic stream alignment, adopts a window self-attention module to extract local features of the image, and combines multi-scale and semantic stream features to obtain the global information dependency of the image, so that the model retains more effective global information and improves the robustness and accuracy of the model.

2本发明采用窗口自注意力机制提取超声图像信息，与传统的卷积神经网络相比，窗口自注意力机制的计算效率更高，它通过将图像分割成小块，并只关注这些小块之间的交互，从而减少了计算量。这使得模型能够更快地处理超声图像，提高了计算效率。2 The present invention uses a window self-attention mechanism to extract ultrasound image information. Compared with traditional convolutional neural networks, the window self-attention mechanism has higher computational efficiency. It reduces the amount of computation by dividing the image into small blocks and only focusing on the interactions between these small blocks. This enables the model to process ultrasound images faster and improves computational efficiency.

3.本发明采用自定义混合损失函数对模型进行优化处理，Dice损失函数有助于提高分割精度，而焦点损失函数则可以确保模型对癌症特征的重视可通过最大化预测结果与真实标签之间的重叠来提高分割精度；而自定义混合损失函数可将Dice损失函数和焦点损失函数结合起来，充分发挥两种损失函数的优势，从而在整体上提高模型的性能。3. The present invention uses a custom hybrid loss function to optimize the model. The Dice loss function helps to improve the segmentation accuracy, while the focal loss function can ensure that the model pays attention to cancer features and improves the segmentation accuracy by maximizing the overlap between the predicted results and the true labels. The custom hybrid loss function can combine the Dice loss function and the focal loss function to give full play to the advantages of the two loss functions, thereby improving the performance of the model as a whole.

本申请附加的方面和优点将在下面的描述中部分给出，这些将从下面的描述中变得明显，或通过本申请的实践了解到。Additional aspects and advantages of the present application will be partially given in the following description, which will become apparent from the following description, or will be understood through the practice of the present application.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

本申请上述的和/或附加的方面和优点从下面结合附图对实施例的描述中将变得明显和容易理解，其中：The above and/or additional aspects and advantages of the present application will become apparent and easily understood from the following description of the embodiments in conjunction with the accompanying drawings, in which:

图1是本发明实施例中乳腺超声图像数据采集流程图；FIG1 is a flow chart of breast ultrasound image data acquisition in an embodiment of the present invention;

图2是本发明实施例中所述方法整体结构示意图；FIG2 is a schematic diagram of the overall structure of the method according to an embodiment of the present invention;

图3是本发明实施例中结果分析图；FIG3 is a result analysis diagram in an embodiment of the present invention;

图4是本发明实施例中模型结果展示图；FIG4 is a diagram showing model results in an embodiment of the present invention;

具体实施方式DETAILED DESCRIPTION

下面结合附图和实施例对本发明作进一步的详细说明。可以理解的是，此处所描述的具体实施例仅仅用于解释本发明，而非对本发明的限定。另外还需要说明的是，为了便于描述，附图中仅示出了与本发明相关的部分而非全部结构。The present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It is to be understood that the specific embodiments described herein are only used to explain the present invention, rather than to limit the present invention. It should also be noted that, for ease of description, only parts related to the present invention, rather than all structures, are shown in the accompanying drawings.

本技术领域技术人员可以理解，除非特意声明，这里使用的单数形式“一”、“一个”、“所述”和“该”也可包括复数形式。应该进一步理解的是，本申请的说明书中使用的措辞“包括”是指存在所述特征、整数、步骤、操作、元件和/或组件，但是并不排除存在或添加一个或多个其他特征、整数、步骤、操作、元件、组件和/或它们的组合。Those skilled in the art will appreciate that, unless otherwise stated, the singular forms "a", "an", "said" and "the" used herein may also include plural forms. It should be further understood that the term "comprising" used in the specification of the present application refers to the presence of the features, integers, steps, operations, elements and/or components, but does not exclude the presence or addition of one or more other features, integers, steps, operations, elements, components and/or combinations thereof.

本技术领域技术人员可以理解，除非另外定义，这里使用的所有术语(包括技术术语和科学术语)，具有与本申请所属领域中的普通技术人员的一般理解相同的意义。还应该理解的是，诸如通用字典中定义的那些术语，应该被理解为具有与现有技术的上下文中的意义一致的意义，并且除非像本申请实施例中一样被特定定义，否则不会用理想化或过于正式的含义来解释。It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical terms and scientific terms) used herein have the same meaning as the general understanding of those skilled in the art to which this application belongs. It should also be understood that terms such as those defined in general dictionaries should be understood to have the same meaning as in the context of the prior art, and will not be interpreted with an idealized or overly formal meaning unless specifically defined as in the embodiments of this application.

第一方面，本发明提供一种基于窗口注意力语义流对齐的乳腺超声图像癌症分割方法，所述方法通过构建一种基于窗口注意力语义流对齐的乳腺超声图像分割模型，采用窗口自注意力模块提取图像的局部特征，联合多尺度和语义流特征获取图像的全局信息依赖，使得模型更多地保留有效的全局信息，提高模型的鲁棒性和准确性。In the first aspect, the present invention provides a method for cancer segmentation of breast ultrasound images based on window attention semantic stream alignment. The method constructs a breast ultrasound image segmentation model based on window attention semantic stream alignment, uses a window self-attention module to extract local features of the image, and combines multi-scale and semantic stream features to obtain the global information dependency of the image, so that the model retains more effective global information and improves the robustness and accuracy of the model.

所述方法包括：The method comprises:

具体步骤如下：The specific steps are as follows:

步骤1：数据集构建Step 1: Dataset Construction

首先，我们从武汉大学人民医院放射科收集乳腺癌超声图像，采集病人包括17至79岁的良性或恶性肿瘤患者，此外，武汉大学人民医院批准了伦理审批，编号为WDRY2022-k217。整个乳腺超声图像采集包括四个步骤，如图1所示：第一，病人准备；第二，超音波放射科医生在可疑的乳腺区域进行扫描，以取得超声信息；第三，利用电脑处理超声数据，生成超声图像；第四，对超声图像进行预处理处理，包括图像空白区域及敏感信息去除，生成最终可用的超声图像结果。此外，在部分病人治疗过程中收集乳腺癌的超声图像，因此数据集包含一些不同阶段的癌症图像。最终收集了927张临床超声图像。对于标注，我们雇用了两名放射科医师和一名专家对数据集进行注释以生成标签。首先，两位放射科医师为整个数据集独立标记癌症区域。其次，专家检查两个标注区域，尤其是癌症和阴影之间的界限，以确保最终的标签准确性。如果肿瘤区域混淆或两位放射科医师存在分歧，最终的注释结果采用最后一位专家的结果。最终，我们收集了927张包含良性或恶性肿瘤的乳腺超声图像，并进行了相应的癌症面具注释，支持分割算法的开发。于此同时，本方法采用了数据增强技术，通过采用翻转、旋转、改变亮度，对比度，以及分辨率来模拟临床场景中的超声图像，丰富数据的多样性，为后续的模型摆脱以上变量的干扰，更有效地提取出癌症特征，提升模型的鲁棒性和准确性。并且，临床上采集的图像形状不一致，为了更好地助力模型训练，本方法将超声图像缩放为256×256像素后输入到模型中，进行进一步的分析。First, we collected breast cancer ultrasound images from the Department of Radiology of Wuhan University People's Hospital. The patients included patients with benign or malignant tumors aged 17 to 79 years old. In addition, Wuhan University People's Hospital approved the ethical approval, numbered WDRY2022-k217. The entire breast ultrasound image collection includes four steps, as shown in Figure 1: first, patient preparation; second, ultrasound radiologists scan suspicious breast areas to obtain ultrasound information; third, use computers to process ultrasound data to generate ultrasound images; fourth, preprocess ultrasound images, including image blank areas and sensitive information removal, to generate the final available ultrasound image results. In addition, breast cancer ultrasound images were collected during the treatment of some patients, so the dataset contains some cancer images at different stages. Finally, 927 clinical ultrasound images were collected. For annotation, we hired two radiologists and one expert to annotate the dataset to generate labels. First, the two radiologists independently marked the cancer area for the entire dataset. Second, the expert checked the two labeled areas, especially the boundary between cancer and shadow, to ensure the final label accuracy. If the tumor area is confused or there is a disagreement between the two radiologists, the final annotation result uses the result of the last expert. In the end, we collected 927 breast ultrasound images containing benign or malignant tumors and annotated corresponding cancer masks to support the development of segmentation algorithms. At the same time, this method uses data enhancement technology to simulate ultrasound images in clinical scenarios by flipping, rotating, changing brightness, contrast, and resolution, enriching the diversity of data, and helping subsequent models get rid of the interference of the above variables, more effectively extracting cancer features, and improving the robustness and accuracy of the model. In addition, the shapes of images collected clinically are inconsistent. In order to better assist model training, this method scales the ultrasound image to 256×256 pixels and inputs it into the model for further analysis.

步骤2：模型构建Step 2: Model building

如图2所示，本发明开发了一种新颖的乳腺癌超声图像分割网络，旨在获取全局特征依赖性，同时减少杂波噪声和模糊的癌症边界干扰。此外，我们推断整个超声图像，而不是癌症的感兴趣区域(ROI)，这对放射科医生来说更有用作为潜在工具。如图3所示，我们采用编码器和解码器架构来提取癌症特征并生成目标结果，可以从编码器中采用特征，而无需担心特征混淆问题。在编码器模块中，我们通过提取特征(如图2(a)所示)的四个连续阶段来获取特征，其中应用了窗口自注意(WSA)来推断特征信息(如图2(b)所示)。具体而言，WSA从超声图像中提取序列特征，可以克服局部区域形态的干扰。WSA通过窗口自注意提取特征，并进一步使用线性标准化和多层感知器进行处理，如图2(b)所示。WSA主要利用注意机制来提取图像特征，相对于卷积神经网络，它更有效地捕获全局信息。精确地说，网络将整个图像分成小块并将它们嵌入为令牌，这支持图像保持全局依赖性。在本文中，受到自注意机制的启发，变换器通过序列提取纹理和相邻上下文信息。此外，WSA通过合并低级图像与高级图像的层次关系，实现了更全面的信息提取。因此，WSA能够自适应地关注主要的癌症特征，而不考虑肿瘤大小变化、形状变化和杂波噪声。我们将WSA模块设置为我们架构的主干，以从乳腺癌中提取更多特征。在解码器模块中，我们还引入了多尺度特征学习，以弥补浅层特征，例如空洞空间金字塔池化(ASPP)，因为在更深层次模型中会失去宝贵的局部特征。在我们的算法中，我们将ASPP作为编码器和解码器之间的连接，以有效获取更多上下文信息，这可以在解码器生成空间分辨率时改善分割性能。然后，在解码器模块中，我们对编码器模块不同阶段的特征进行上采样。同时，在解码器模块中，我们应用特征流对齐(FAM，如图2(d)所示)，以增强网络检测边缘的能力，因为癌症的边界会在邻近阴影的干扰下消失。FAM采用一系列卷积来捕捉两个输入特征之间的差异。这最终使网络能够有效进行癌症筛查和检测。此外，如图2(c)中的上块模块涉及3×3卷积和批量归一化，以生成相应的大小分布结果。在构建了上述模块之后，这一综合架构能够有效地对乳腺超声图像进行癌症筛查和检测。As shown in Figure 2, the present invention develops a novel breast cancer ultrasound image segmentation network, which aims to obtain global feature dependencies while reducing clutter noise and blurred cancer boundary interference. In addition, we infer the entire ultrasound image instead of the region of interest (ROI) of the cancer, which is more useful as a potential tool for radiologists. As shown in Figure 3, we adopt an encoder and decoder architecture to extract cancer features and generate target results, and features can be adopted from the encoder without worrying about feature confusion problems. In the encoder module, we extract features through four consecutive stages of feature extraction (as shown in Figure 2 (a)), where window self-attention (WSA) is applied to infer feature information (as shown in Figure 2 (b)). Specifically, WSA extracts sequential features from ultrasound images, which can overcome the interference of local region morphology. WSA extracts features through window self-attention and further processes them using linear normalization and multi-layer perceptron, as shown in Figure 2 (b). WSA mainly uses attention mechanism to extract image features, which is more effective in capturing global information than convolutional neural networks. Precisely, the network divides the entire image into small blocks and embeds them as tokens, which supports the image to maintain global dependencies. In this paper, inspired by the self-attention mechanism, the transformer extracts texture and neighboring context information through the sequence. In addition, WSA achieves more comprehensive information extraction by incorporating the hierarchical relationship between low-level images and high-level images. Therefore, WSA can adaptively focus on the main cancer features regardless of tumor size variation, shape variation, and clutter noise. We set the WSA module as the backbone of our architecture to extract more features from breast cancer. In the decoder module, we also introduce multi-scale feature learning to make up for shallow features, such as atrous spatial pyramid pooling (ASPP), because valuable local features are lost in deeper models. In our algorithm, we use ASPP as a connection between the encoder and decoder to effectively obtain more context information, which can improve the segmentation performance when the decoder generates spatial resolution. Then, in the decoder module, we upsample the features of different stages of the encoder module. At the same time, in the decoder module, we apply feature flow alignment (FAM, as shown in Figure 2(d)) to enhance the network's ability to detect edges, because the boundaries of cancer will disappear under the interference of neighboring shadows. FAM adopts a series of convolutions to capture the difference between two input features. This ultimately enables the network to effectively perform cancer screening and detection. In addition, the upper block module in Figure 2(c) involves 3×3 convolution and batch normalization to generate corresponding size distribution results. After constructing the above modules, this comprehensive architecture can effectively perform cancer screening and detection on breast ultrasound images.

如图2(b)所示，窗口自注意(WSA)机制同时获得了捕获全局背景和本地特征的平衡，WSA可以大大降低计算成本。根据乳腺超声图像，输入图像包含了类似的癌组织、阴影和杂波噪声等信息，因此我们的网络利用WSA来保留更多的全局依赖性，以避免局部变化的影响。具体而言，我们通过多阶段进行编码器模块，以尽最大努力提取更有价值的信息，包括第1阶段、第2阶段、第3阶段和第4阶段。在准备用于训练的原始图像尺寸为256×256×3时，我们应用了一个7×7的卷积，步长为4，结果得到96个通道。每个阶段包含一个3×3的卷积，步长为2，以减少特征的数量，并利用路径嵌入模块来加倍通道的数量。在进行嵌入之后，WSA开始根据注意机制提取有用的信息，这将保存更多信息。同时，我们还采用多头自注意机制来增强特征表示。生成的特征图的尺寸分别为64×64、32×32、16×16和8×8，对应于输入图像尺寸256的1/4、1/8、1/16和1/32的分辨率。我们的网络显示在图2(a)中，因为每个阶段都会经历路径嵌入，所以结果特征的维度会变为96、192、384和768。特别是对于图2(b)中的WSA，我们的网络会对输入特征进行线性标准化，将所有值重新缩放为0-1，减少异常点的不良影响。然后，特征将由窗口自注意来获取窗口尺度上的更多信息。具体而言，首先，我们根据窗口形状重塑特征，为后续操作做准备。其次，它将利用线性层来计算查询、键和值，然后计算注意力图。第三，我们利用多头自注意来计算最终的注意力图以提取最佳特征。第四，最终的特征将进行线性标准化并重塑到原始输入形状。我们的网络在每个阶段中采用WSA机制来扩展注意力并实现全局自注意。此外，我们在自注意分支中引入了窗口自注意模块以增强位置编码。As shown in Figure 2(b), the window self-attention (WSA) mechanism simultaneously obtains a balance between capturing global background and local features, and WSA can greatly reduce the computational cost. According to breast ultrasound images, the input image contains similar information such as cancerous tissue, shadows, and clutter noise, so our network uses WSA to retain more global dependencies to avoid the influence of local changes. Specifically, we conduct the encoder module through multiple stages to do our best to extract more valuable information, including stage 1, stage 2, stage 3, and stage 4. When the original image size prepared for training is 256×256×3, we apply a 7×7 convolution with a stride of 4, resulting in 96 channels. Each stage contains a 3×3 convolution with a stride of 2 to reduce the number of features, and a path embedding module is used to double the number of channels. After embedding, WSA begins to extract useful information according to the attention mechanism, which will save more information. At the same time, we also adopt a multi-head self-attention mechanism to enhance feature representation. The sizes of the generated feature maps are 64×64, 32×32, 16×16, and 8×8, corresponding to the resolutions of 1/4, 1/8, 1/16, and 1/32 of the input image size 256. Our network is shown in Figure 2(a), and because each stage undergoes path embedding, the dimensions of the resulting features become 96, 192, 384, and 768. In particular, for WSA in Figure 2(b), our network linearly normalizes the input features, rescales all values to 0-1, and reduces the undesirable effects of outliers. The features are then self-attentioned by the window to obtain more information on the window scale. Specifically, first, we reshape the features according to the window shape to prepare for subsequent operations. Second, it will utilize a linear layer to calculate the query, key, and value, and then calculate the attention map. Third, we utilize multi-head self-attention to calculate the final attention map to extract the best features. Fourth, the final features will be linearly normalized and reshaped to the original input shape. Our network adopts the WSA mechanism in each stage to expand the attention and achieve global self-attention. In addition, we introduce a window self-attention module in the self-attention branch to enhance the position encoding.

如图2(d)所示，特征流对齐，由于超声图像中癌症边界的频繁错误分割，这些图像通常包含丰富的脂肪和肌肉组织之间的复杂边界，我们采用了语义特征流对齐模块的技术，将神经网络中的特征汇集对齐，以促进语义信息的有意义流动。它集成了不同层的特征，如图2(d)所示，使模型能够获取本地和全局上下文。该技术通过递归或跳跃连接来改进特征，增强其区分能力。它还促进了跨层通信，允许相关信息在不同层之间传播。注意机制突出了有信息量的区域或特征，增强了网络的区分能力。语义特征流改进了表示学习、上下文理解和预测的稳健性。通过对齐特征聚合，它使网络能够利用有意义的语义信息，从而在对象识别和分割等任务中实现更准确的预测。总之，语义特征流通过促进跨层特征的整合和改进，增强了网络的理解、上下文推理和预测能力。因此，我们在我们的模型中引入了语义特征流对齐模块。As shown in Figure 2(d), Feature Stream Alignment,Due to the frequent mis-segmentation of cancer boundaries in ultrasound images, which often contain complex boundaries between abundant fat and muscle tissues, we adopt the technique of semantic feature stream alignment module to align the features in the neural network to promote the meaningful flow of semantic information. It integrates features from different layers, as shown in Figure 2(d), enabling the model to obtain local and global context. The technique improves the features through recursive or skip connections, enhancing their discriminative ability. It also promotes cross-layer communication, allowing relevant information to propagate between different layers. The attention mechanism highlights the informative regions or features, enhancing the discriminative ability of the network. The semantic feature stream improves the robustness of representation learning, contextual understanding, and prediction. By aligning feature aggregation, it enables the network to exploit meaningful semantic information, resulting in more accurate predictions in tasks such as object recognition and segmentation. In summary, the semantic feature stream enhances the network's understanding, contextual reasoning, and prediction capabilities by promoting the integration and improvement of cross-layer features. Therefore, we introduce the semantic feature stream alignment module in our model.

步骤3：分割模型损失函数设计Step 3: Segmentation model loss function design

在训练阶段，损失函数对优化处理具有至关重要的影响，因为损失函数直接影响模型的最佳性能和任务潜力。因此，利用适当的损失函数在训练策略中起着关键作用，以实现最佳的算法性能。具体来说，选择损失函数需要考虑数据集和任务的特性。根据我们的乳腺癌超声图像，我们自定义了混合损失函数，由焦点损失和Dice损失函数组成，如公式(1)-(3)所示。在这些公式中，代表了真实标签(GT)，而代表了模型的预测结果。此外，代表了整个超声图像的像素数量，代表像素索引。设计的混合损失函数具有三个优点。首先，Dice损失函数通过最大化预测和真实标签之间的重叠来提高分割精度，这与我们的乳腺癌超声图像分割任务相匹配。其次，根据我们的数据集，癌症区域的面积总是小于正常组织的面积，这导致模型更关注正常组织而不是癌症。尽管Dice损失在一定程度上能够减轻类别不平衡的影响，但焦点损失函数可以通过强调癌症特征来解决这个问题。第三，混合损失函数可以结合Dice和焦点损失函数的优点，从而表现得比单一的损失函数更好。最终，我们采用混合损失函数来分割超声图像中的癌症，同时优化模型的性能。In the training phase, the loss function has a crucial impact on the optimization process because the loss function directly affects the optimal performance of the model and the task potential. Therefore, using an appropriate loss function plays a key role in the training strategy to achieve the best algorithm performance. Specifically, the selection of the loss function needs to consider the characteristics of the dataset and the task. According to our breast cancer ultrasound images, we customized a hybrid loss function consisting of a focal loss and a Dice loss function, as shown in formulas (1)-(3). In these formulas, represents the true label (GT), and represents the prediction result of the model. In addition, represents the number of pixels in the entire ultrasound image and represents the pixel index. The designed hybrid loss function has three advantages. First, the Dice loss function improves the segmentation accuracy by maximizing the overlap between the predicted and true labels, which matches our breast cancer ultrasound image segmentation task. Second, according to our dataset, the area of the cancer region is always smaller than that of the normal tissue, which causes the model to focus more on the normal tissue rather than the cancer. Although the Dice loss can alleviate the impact of class imbalance to a certain extent, the focal loss function can solve this problem by emphasizing the cancer characteristics. Third, the hybrid loss function can combine the advantages of the Dice and focal loss functions, thereby performing better than a single loss function. Finally, we adopt a hybrid loss function to segment cancer in ultrasound images while optimizing the performance of the model.

L_Hybrid＝L_BCE+L_Dice (3)L _Hybrid = L _BCE + L _Dice (3)

其中，代表交叉熵损失函数，代表像素点i上的真实标签值，代表像素点i上的预测结果值，N代表整个超声图像的像素数量，i代表像素索引。代表Dice损失函数，代表求交集，代表求并集。Among them, represents the cross entropy loss function, represents the true label value at pixel point i, represents the predicted result value at pixel point i, N represents the number of pixels in the entire ultrasound image, and i represents the pixel index. represents the Dice loss function, represents intersection, and represents union.

步骤4：检测模型评价指标Step 4: Test model evaluation metrics

为了定量分析我们的研究，我们引入了许多评估指标来表示我们的模型在检测乳腺癌超声图像中的癌症区域方面的性能。正如我们所知，选择评估指标对于展示检测效率与其他分割架构相比至关重要。根据乳腺癌超声图像的特性，我们采用精确度、召回率、总体准确度(OA)、预测误差(Pe)、Kappa、F1和交并比(IOU)来表示模型的分割性能，如公式(4)-(10)所示。具体而言，精确度表示对癌症区域的正确识别。召回率，即灵敏度，表示识别癌症的能力，而特异性强调了对正常组织的能力。Pe和OA展示了网络在分割癌症方面的整体性能。至于F1分数，它表示精确度和召回率的调和平均值，它平衡评估了精确度和召回率，代表了该方法的整体性能。同时，IOU表示了癌症和正常组织之间的交集。总之，上述7个统计指标的取值范围在0到1之间，每个指标的值越大，表示检测异常区域的性能越好。为了展示对癌症和正常的检测性能，我们使用上述7个指标来展示包括真实情况和乳腺癌在内的整体分割能力。接下来的公式(4)-(10)，TP、TF、FP和FN分别表示真正例、真负例、假正例和假负例。此外，所有指标的计算如下所示：To quantitatively analyze our study, we introduced a number of evaluation metrics to represent the performance of our model in detecting cancer regions in breast cancer ultrasound images. As we know, the selection of evaluation metrics is crucial to show the detection efficiency compared with other segmentation architectures. According to the characteristics of breast cancer ultrasound images, we use precision, recall, overall accuracy (OA), prediction error (Pe), Kappa, F1 and intersection over union (IOU) to represent the segmentation performance of the model, as shown in formulas (4)-(10). Specifically, precision represents the correct identification of cancer regions. Recall, i.e., sensitivity, represents the ability to identify cancer, while specificity emphasizes the ability to identify normal tissues. Pe and OA show the overall performance of the network in segmenting cancer. As for the F1 score, it represents the harmonic mean of precision and recall, which balances the evaluation of precision and recall and represents the overall performance of the method. At the same time, IOU represents the intersection between cancer and normal tissues. In summary, the values of the above 7 statistical indicators range from 0 to 1, and the larger the value of each indicator, the better the performance of detecting abnormal regions. To demonstrate the detection performance for cancer and normal, we use the above 7 metrics to demonstrate the overall segmentation capability including ground truth and breast cancer. In the following formulas (4)-(10), TP, TF, FP, and FN represent true positives, true negatives, false positives, and false negatives, respectively. In addition, the calculation of all metrics is as follows:

其中，TP、TN、FP和FN分别真正例、真负例、假正例和假负例；代表精确度；代表召回率；代表总体准确度；代表预测误差；Kappa代表Kappa系数；F1代表F1分数；代表交并比。Among them, TP, TN, FP and FN stand for true positive, true negative, false positive and false negative respectively; represents precision; represents recall; represents overall accuracy; represents prediction error; Kappa represents Kappa coefficient; F1 represents F1 score; represents intersection over union ratio.

为了验证本发明所提的重叠染色体方法的性能，本发明给出了最终的实验结果，如图3，我们的提出的模型进行了一些实验，以可视化显示超声图像中的癌症检测性能。一方面，我们使用PR曲线和ROC曲线来展示模型在数据集上的性能。另一方面，我们通过t-SNE3和相关图来可视化模型的性能，以展示模型在不同图像特征中的能力。我们的模型在准确检测癌症区域方面表现出了鲁棒性，即使在有斑点噪声和标记的情况下也能准确检测。如图3(a)所示，PR曲线的AUC(曲线下面积)为0.965，而ROC曲线的AUC为0.996。这两个AUC值均超过0.95，表明网络在分割乳腺超声图像方面具有高精度。同时，我们定量计算了轮廓的欧氏距离和Hu矩相似度，以展示模型的分割能力。从结果中可以看出，大多数图像的欧氏距离很小，而Hu矩的相似度很高。此外，在图3(b)中，我们呈现了预测的真实标签(GT)和IOU的面积，显示这些面积相似，大多数IOU值超过80％。通过观察图像区域和区域比例的分布，生成的分割结果与真实情况非常接近。这些结果展示了我们的模型在提高乳腺超声图像中的癌症检测方面的有效性。图3(c)显示了预测的GT和真实标签的平方厘米大小，按照真实标签区域大小的顺序排列。它还说明了预测的GT内癌症区域的比例。此外，我们在图3(d)中计算了测试集中轮廓的最短和最长直径，以及质心的坐标。此外，我们在图3(e)中通过降维可视化采用了t-SNE，因此能够观察到癌症区域和正常组织之间的显著差异。这表明我们的模型准确捕捉到了两者之间的显著特征并对癌症进行了分割。最后，我们随机抽样了50张图像来验证它们之间的相关性。如图3(f)所示，我们可以观察到图像之间的相似性各不相同，表明测试集具有高度的多样性。图4为测试集的表现图，其中Grad_CAM图展示模型检测出癌症区域并给出癌症的分布结果图，表1为不同的方法的量化指标。In order to verify the performance of the overlapping chromosome method proposed in the present invention, the present invention gives the final experimental results. As shown in Figure 3, our proposed model conducts some experiments to visualize the cancer detection performance in ultrasound images. On the one hand, we use PR curves and ROC curves to show the performance of the model on the dataset. On the other hand, we visualize the performance of the model through t-SNE3 and related graphs to show the ability of the model in different image features. Our model shows robustness in accurately detecting cancer regions, even in the presence of speckle noise and markers. As shown in Figure 3(a), the AUC (area under the curve) of the PR curve is 0.965, while the AUC of the ROC curve is 0.996. Both AUC values exceed 0.95, indicating that the network has high accuracy in segmenting breast ultrasound images. At the same time, we quantitatively calculate the Euclidean distance and Hu moment similarity of the contours to demonstrate the segmentation ability of the model. It can be seen from the results that the Euclidean distance of most images is small, while the similarity of Hu moment is high. In addition, in Figure 3(b), we present the predicted ground-truth labels (GT) and the area of IOU, showing that these areas are similar, with most IOU values exceeding 80%. By observing the distribution of image areas and area proportions, the generated segmentation results are very close to the ground-truth. These results demonstrate the effectiveness of our model in improving cancer detection in breast ultrasound images. Figure 3(c) shows the square centimeter size of the predicted GT and ground-truth labels, arranged in order of the area size of the ground-truth labels. It also illustrates the proportion of cancer areas within the predicted GT. In addition, we calculate the shortest and longest diameters of the contours in the test set, as well as the coordinates of the centroid in Figure 3(d). In addition, we employ t-SNE in Figure 3(e) for dimensionality reduction visualization, so that we can observe significant differences between cancer areas and normal tissues. This shows that our model accurately captures the significant features between the two and segments cancer. Finally, we randomly sampled 50 images to verify their correlation. As shown in Figure 3(f), we can observe that the similarities between images vary, indicating that the test set has a high degree of diversity. Figure 4 is a performance graph of the test set, where the Grad_CAM graph shows that the model detects cancer areas and gives a distribution result graph of cancer. Table 1 shows the quantitative indicators of different methods.

表1本发明的各个指标Table 1 Various indicators of the present invention

本发明还提供一种电子设备，包括：The present invention also provides an electronic device, comprising:

本发明还提供一种计算机可读存储介质，所述计算机可读存储介质存储计算机指令，所述计算机指令使所述计算机执行上述方法中的任一步骤。The present invention also provides a computer-readable storage medium, wherein the computer-readable storage medium stores computer instructions, and the computer instructions enable the computer to execute any step in the above method.

应该理解的是，虽然附图的流程图中的各个步骤按照箭头的指示依次显示，但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明，这些步骤的执行并没有严格的顺序限制，其可以以其他的顺序执行。而且，附图的流程图中的至少一部分步骤可以包括多个子步骤或者多个阶段，这些子步骤或者阶段并不必然是在同一时刻执行完成，而是可以在不同的时刻执行，其执行顺序也不必然是依次进行，而是可以与其他步骤或者其他步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。It should be understood that, although the steps in the flowchart of the accompanying drawings are displayed in sequence as indicated by the arrows, these steps are not necessarily executed in sequence in the order indicated by the arrows. Unless otherwise specified herein, there is no strict order restriction on the execution of these steps, and they can be executed in other orders. Moreover, at least a part of the steps in the flowchart of the accompanying drawings may include multiple sub-steps or multiple stages, and these sub-steps or stages are not necessarily executed at the same time, but can be executed at different times, and their execution order is not necessarily sequential, but can be executed in turn or alternately with other steps or at least a part of the sub-steps or stages of other steps.

以上所述仅是本申请的部分实施方式，应当指出，对于本技术领域的普通技术人员来说，在不脱离本申请原理的前提下，还可以做出若干改进和润饰，这些改进和润饰也应视为本申请的保护范围。The above is only a partial implementation method of the present application. It should be pointed out that for ordinary technicians in this technical field, several improvements and modifications can be made without departing from the principles of the present application. These improvements and modifications should also be regarded as the scope of protection of the present application.

Claims

1. A breast ultrasound image segmentation method based on window attention semantic stream alignment, comprising:

s100, collecting a mammary gland ultrasonic image, preprocessing to construct a data set, and dividing the data set into a training data set and a test data set;

S200, constructing a mammary gland ultrasonic image segmentation model based on window attention semantic flow alignment, wherein the model comprises an encoding-decoding frame, a window self-attention module and a flow alignment module, and the method comprises the following steps of:

the coding-decoding framework is used for separating the breast cancer feature extraction and feature recognition processes, so that feature information can be conveniently mined and integrated;

The window self-attention module is used for excavating more effective global features under the action of an attention mechanism by dividing an image into a series of tokens, avoiding the interference of ultrasonic image shadows and noise points and improving the robustness and the segmentation precision of the model;

the flow alignment module is used for improving characteristics through periodical connection or skip connection, aligning feature sets in the neural network to promote cross-layer communication, allowing related information to spread among layers, and enhancing cross-layer communication capacity, context reasoning capacity and prediction capacity of the model;

S300, adopting a self-defined BCE-Dice mixed loss function, and optimizing the segmentation performance of the model by combining the pixel accuracy and the accuracy of predicting the superposition area, so as to improve the segmentation precision;

S400, training the model by using a training data set, optimizing model parameters through a back propagation and gradient descent algorithm, adjusting super parameters and model structures, and improving the performance and generalization capability of the model;

s500, evaluating the model by using the test data set, and measuring the performance of the model by calculating accuracy, recall, overall accuracy, prediction error, kappa coefficient, F1 score and cross ratio.

2. The method of breast ultrasound image segmentation based on window attention semantic stream alignment according to claim 1, wherein the preprocessing in S100 includes image blank region removal and image sensitive information removal.

3. The method for breast ultrasound image segmentation based on window attention semantic stream alignment according to claim 1, wherein the step S100 further comprises:

the ultrasound images in the clinical scene are simulated by adopting the methods of turning, rotating, changing brightness, adjusting contrast and adjusting resolution, so that the diversity of data is enriched.

4. The method of claim 1, wherein the encoder module in the encoding-decoding framework uses window self-attention to infer acquisition feature information through mainly four consecutive phases.

5. The method for breast ultrasound image segmentation based on window attention semantic stream alignment according to claim 4, wherein a decoder module in the encoding-decoding framework introduces a multi-scale feature learning method to compensate for the loss of shallow features and uses hole space pyramid pooling as a connection between encoder and decoder to obtain more context information.

6. The method according to claim 1, wherein the breast ultrasound image segmentation model based on the window attention semantic flow alignment in step S200 further comprises an upper block module, the upper block module is used for 3×3 convolution and batch normalization, and a corresponding size distribution result is generated.

7. The method for breast ultrasound image segmentation based on window attention semantic stream alignment according to claim 1, wherein the custom mixing loss function comprises a focus loss and a Dice loss function, comprising:

L_Hybrid＝L_BCE+L_Dice (3)

Where L _BCE represents the cross entropy loss function, Y _i represents the true label value at pixel i, Y _i represents the predicted result value at pixel i, N represents the number of pixels of the entire ultrasound image, and i represents the pixel index. L _Dice represents the Dice loss function, U represents the intersection, U represents the union.

8. The method for breast ultrasound image segmentation based on window attention semantic stream alignment according to any one of claims 1 to 7, wherein the calculation of the accuracy, recall, overall accuracy, prediction error, kappa coefficient, F1 score and cross-over ratio in step S500 is:

TP, TN, FP and FN are real examples, real negative examples, false positive examples and false negative examples respectively; precision stands for accuracy; recall represents Recall; OA represents overall accuracy; pe represents a prediction error; kappa stands for Kappa coefficient; f1 represents a fraction F1; IOU stands for cross-over ratio.

9. An electronic device, comprising:

at least one processor, at least one memory, and a communication interface; wherein,

The processor, the memory and the communication interface are communicated with each other;

the memory stores program instructions executable by the processor, the processor invoking the program instructions to perform the method of any of claims 1-8.

10. A computer-readable storage medium, wherein the computer-readable storage medium stores computer instructions, the computer instructions cause the computer to perform the method of any one of claims 1 to 8.