CN117576405A

CN117576405A - Tongue picture semantic segmentation method, device, equipment and medium

Info

Publication number: CN117576405A
Application number: CN202410063634.5A
Authority: CN
Inventors: 李会霞; 韩爱庆; 唐燕
Original assignee: Shenzhen Huiyi Bida Medical Technology Co ltd
Current assignee: Shenzhen Huiyi Bida Medical Technology Co ltd
Priority date: 2024-01-17
Filing date: 2024-01-17
Publication date: 2024-02-20

Abstract

The invention relates to a tongue picture semantic segmentation method, a tongue picture semantic segmentation device, tongue picture semantic segmentation equipment and a tongue picture semantic segmentation medium. The method comprises the steps of obtaining a target tongue picture and carrying out feature extraction treatment to obtain a shallow feature picture; carrying out multi-scale feature extraction processing on the shallow feature map to obtain a multi-scale feature map, and carrying out channel dimension reduction on the multi-scale feature map to obtain an integrated feature map; performing channel dimension reduction on the shallow feature map to obtain a first feature map, upsampling the integrated feature map to obtain a second feature map, and performing feature fusion on the first and second feature maps to obtain a deep feature map; and inputting the shallow and deep feature images into an image segmentation model to obtain a segmentation prediction result. The method can better identify and classify the objects, reduce the complexity of the model, improve the running speed and accuracy, better understand and identify the image content, improve the semantic segmentation of tongue picture, provide more accurate and objective diagnosis basis, and facilitate the diagnosis and health detection of traditional Chinese medicine.

Description

Tongue image semantic segmentation methods, devices, equipment and media

技术领域Technical field

本发明适用于图像处理技术领域，尤其涉及一种舌象语义分割方法、装置、设备及介质。The invention is applicable to the technical field of image processing, and in particular relates to a tongue image semantic segmentation method, device, equipment and medium.

背景技术Background technique

语义分割是一种计算机视觉领域的技术，它的主要目的是将一张图像分割成不同的区域，并将这些区域标记为不同的语义类别。与传统的图像分割技术相比，语义分割可以更准确地理解图像中的不同区域，并将它们与相应的语义类别相匹配。因此，语义分割技术在许多视觉场景下都具有广泛的应用前景。Semantic segmentation is a technology in the field of computer vision. Its main purpose is to segment an image into different regions and label these regions as different semantic categories. Compared with traditional image segmentation techniques, semantic segmentation can more accurately understand different regions in an image and match them with corresponding semantic categories. Therefore, semantic segmentation technology has broad application prospects in many visual scenarios.

近年来，舌象分割是中医现代化领域的研究热点。很多学者对舌象分割进行相关研究，尝试了很多种分割方法，舌象分割也取得了一些研究成果。现有DeepLabV3+模型进行语义分割，但该模型参数量大，边缘分割精度不高，不能达到较好的分割效果。In recent years, tongue image segmentation has become a research hotspot in the field of modernization of traditional Chinese medicine. Many scholars have conducted related research on tongue image segmentation and tried many segmentation methods. Tongue image segmentation has also achieved some research results. The existing DeepLabV3+ model performs semantic segmentation, but this model has a large number of parameters, low edge segmentation accuracy, and cannot achieve good segmentation results.

因此，如何改进现有模型并提供提高舌象语义分割效果成为亟待解决的问题。Therefore, how to improve existing models and improve the semantic segmentation effect of tongue images has become an urgent problem to be solved.

发明内容Contents of the invention

有鉴于此，本发明实施例提供了一种舌象语义分割方法、装置、设备及介质，以解决如何改进现有模型并提供提高舌象语义分割效果的问题。In view of this, embodiments of the present invention provide a tongue image semantic segmentation method, device, equipment and medium to solve the problem of how to improve the existing model and improve the tongue image semantic segmentation effect.

第一方面，提供一种舌象语义分割方法，所述舌象语义分割方法包括：In a first aspect, a tongue image semantic segmentation method is provided. The tongue image semantic segmentation method includes:

获取目标舌象图片，对目标舌象图片进行特征提取预处理，得到目标舌象图片的浅层特征图；Obtain the target tongue image, perform feature extraction preprocessing on the target tongue image, and obtain the shallow feature map of the target tongue image;

将浅层特征图进行多尺度特征提取处理，得到目标舌象图片的多尺度特征图，对多尺度特征图进行通道降维处理，得到目标舌象图片的整合特征图；Perform multi-scale feature extraction processing on the shallow feature map to obtain a multi-scale feature map of the target tongue image, perform channel dimensionality reduction processing on the multi-scale feature map, and obtain an integrated feature map of the target tongue image;

对浅层特征图进行通道降维处理，得到目标舌象图片的第一特征图，对整合特征图进行上采样，得到目标舌象图片的第二特征图，将第一特征图与第二特征图进行特征融合，得到目标舌象图片的深层特征图；Perform channel dimensionality reduction processing on the shallow feature map to obtain the first feature map of the target tongue image. Upsample the integrated feature map to obtain the second feature map of the target tongue image. Combine the first feature map and the second feature map. Feature fusion is performed on the images to obtain the deep feature map of the target tongue image;

将浅层特征图与深层特征图输入至预设的用于语义分割任务的图像分割模型，得到目标舌象图片的分割预测结果。Input the shallow feature map and deep feature map into the preset image segmentation model for semantic segmentation tasks to obtain the segmentation prediction results of the target tongue image.

第二方面，提供一种舌象语义分割装置，所述舌象语义分割装置包括：In a second aspect, a tongue image semantic segmentation device is provided. The tongue image semantic segmentation device includes:

浅层特征图获取模块，用于获取目标舌象图片，对目标舌象图片进行特征提取预处理，得到目标舌象图片的浅层特征图；The shallow feature map acquisition module is used to obtain the target tongue image, perform feature extraction preprocessing on the target tongue image, and obtain the shallow feature map of the target tongue image;

整合特征图获取模块，用于将浅层特征图进行多尺度特征提取处理，得到目标舌象图片的多尺度特征图，对多尺度特征图进行通道降维处理，得到目标舌象图片的整合特征图；The integrated feature map acquisition module is used to perform multi-scale feature extraction processing on the shallow feature map to obtain the multi-scale feature map of the target tongue image, and perform channel dimensionality reduction processing on the multi-scale feature map to obtain the integrated features of the target tongue image. picture;

深层特征图获取模块，用于对浅层特征图进行通道降维处理，得到目标舌象图片的第一特征图，对整合特征图进行上采样，得到目标舌象图片的第二特征图，将第一特征图与第二特征图进行特征融合，得到目标舌象图片的深层特征图；The deep feature map acquisition module is used to perform channel dimensionality reduction processing on the shallow feature map to obtain the first feature map of the target tongue image, upsample the integrated feature map to obtain the second feature map of the target tongue image, and The first feature map and the second feature map are feature fused to obtain the deep feature map of the target tongue image;

预测结果生成模块，用于将浅层特征图与深层特征图输入至预设的用于语义分割任务的图像分割模型，得到目标舌象图片的分割预测结果。The prediction result generation module is used to input the shallow feature map and the deep feature map into the preset image segmentation model for the semantic segmentation task, and obtain the segmentation prediction result of the target tongue image.

第三方面，本发明实施例提供一种计算机设备，所述计算机设备包括处理器、存储器以及存储在所述存储器中并可在所述处理器上运行的计算机程序，所述处理器执行所述计算机程序时实现如第一方面所述的舌象语义分割方法。In a third aspect, embodiments of the present invention provide a computer device. The computer device includes a processor, a memory, and a computer program stored in the memory and executable on the processor. The processor executes the The computer program implements the tongue image semantic segmentation method described in the first aspect.

第四方面，本发明实施例提供一种计算机可读存储介质，所述计算机可读存储介质存储有计算机程序，所述计算机程序被处理器执行时实现如第一方面所述的舌象语义分割方法。In a fourth aspect, embodiments of the present invention provide a computer-readable storage medium. The computer-readable storage medium stores a computer program. When the computer program is executed by a processor, the tongue image semantic segmentation as described in the first aspect is implemented. method.

第五方面，本发明实施例提供一种计算机程序产品，当计算机程序产品在计算机设备上运行时，使得计算机设备执行上述第一方面所述的舌象语义分割方法。In a fifth aspect, embodiments of the present invention provide a computer program product. When the computer program product is run on a computer device, it causes the computer device to execute the tongue image semantic segmentation method described in the first aspect.

本发明与现有技术相比存在的有益效果是：本发明通过上述步骤，能够充分分析出图像在不同尺度上的特征，更好地识别和分类对象，降低了模型的复杂性，提高了运行的速度和准确性，能够更好地理解和识别图像内容，有效提高了对舌象图片的语义分割，准确分割出舌头的各个部分，提供更精确和客观的诊断依据，便于中医诊断和健康检测，减少了人为因素对诊断结果的影响。Compared with the existing technology, the beneficial effects of the present invention are: through the above steps, the present invention can fully analyze the characteristics of images at different scales, better identify and classify objects, reduce the complexity of the model, and improve the operation efficiency. With its speed and accuracy, it can better understand and identify image content, effectively improve the semantic segmentation of tongue images, accurately segment various parts of the tongue, provide more accurate and objective diagnostic basis, and facilitate traditional Chinese medicine diagnosis and health testing. , reducing the impact of human factors on diagnostic results.

附图说明Description of the drawings

为了更清楚地说明本发明实施例中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动性的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments or prior art will be briefly introduced below. Obviously, the drawings in the following description are only illustrative of the present invention. For some embodiments, for those of ordinary skill in the art, other drawings can be obtained based on these drawings without exerting creative efforts.

图1是本发明实施例一提供的一种舌象语义分割方法的一应用环境示意图；Figure 1 is a schematic diagram of an application environment of a tongue image semantic segmentation method provided by Embodiment 1 of the present invention;

图2是本发明实施例二提供的一种舌象语义分割方法的流程示意图；Figure 2 is a schematic flow chart of a tongue image semantic segmentation method provided in Embodiment 2 of the present invention;

图3是本发明实施例三提供的一种舌象语义分割方法的流程示意图；Figure 3 is a schematic flow chart of a tongue image semantic segmentation method provided in Embodiment 3 of the present invention;

图4是本发明实施例四提供的一种舌象语义分割方法的流程示意图；Figure 4 is a schematic flow chart of a tongue image semantic segmentation method provided in Embodiment 4 of the present invention;

图5是本发明实施例五提供的一种舌象语义分割方法的流程示意图；Figure 5 is a schematic flow chart of a tongue image semantic segmentation method provided in Embodiment 5 of the present invention;

图6是本发明实施例六提供的一种舌象语义分割方法的流程示意图；Figure 6 is a schematic flow chart of a tongue image semantic segmentation method provided in Embodiment 6 of the present invention;

图7是本发明实施例七提供的一种舌象语义分割方法的流程示意图；Figure 7 is a schematic flow chart of a tongue image semantic segmentation method provided in Embodiment 7 of the present invention;

图8是本发明实施例八提供的一种舌象语义分割方法的流程示意图；Figure 8 is a schematic flow chart of a tongue image semantic segmentation method provided in Embodiment 8 of the present invention;

图9是本发明实施例九提供的一种舌象语义分割方法的模型架构示意图；Figure 9 is a schematic diagram of the model architecture of a tongue image semantic segmentation method provided in Embodiment 9 of the present invention;

图10是本发明实施例十提供的一种舌象语义分割装置的结构示意图；Figure 10 is a schematic structural diagram of a tongue image semantic segmentation device provided in Embodiment 10 of the present invention;

图11是本发明实施例十一提供的一种计算机设备的结构示意图。FIG. 11 is a schematic structural diagram of a computer device provided by Embodiment 11 of the present invention.

具体实施方式Detailed ways

以下描述中，为了说明而不是为了限定，提出了诸如特定系统结构、技术之类的具体细节，以便透彻理解本发明实施例。然而，本领域的技术人员应当清楚，在没有这些具体细节的其它实施例中也可以实现本发明。在其它情况中，省略对众所周知的系统、装置、电路以及方法的详细说明，以免不必要的细节妨碍本发明的描述。In the following description, specific details such as specific system structures and technologies are provided for the purpose of illustration rather than limitation, so as to provide a thorough understanding of the embodiments of the present invention. However, it will be apparent to those skilled in the art that the present invention may be practiced in other embodiments without these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the present invention in unnecessary detail.

应当理解，当在本发明说明书和所附权利要求书中使用时，术语“包括”指示所描述特征、整体、步骤、操作、元素和/或组件的存在，但并不排除一个或多个其它特征、整体、步骤、操作、元素、组件和/或其集合的存在或添加。It will be understood that, when used in the description of the present invention and the appended claims, the term "comprising" indicates the presence of the described features, integers, steps, operations, elements and/or components but does not exclude one or more other The presence or addition of features, integers, steps, operations, elements, components and/or collections thereof.

还应当理解，在本发明说明书和所附权利要求书中使用的术语“和/或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合，并且包括这些组合。It will also be understood that the term "and/or" as used in the specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

如在本发明说明书和所附权利要求书中所使用的那样，术语“如果”可以依据上下文被解释为“当...时”或“一旦”或“响应于确定”或“响应于检测到”。类似地，短语“如果确定”或“如果检测到[所描述条件或事件]”可以依据上下文被解释为意指“一旦确定”或“响应于确定”或“一旦检测到[所描述条件或事件]”或“响应于检测到[所描述条件或事件]”。As used in the description of the present invention and the appended claims, the term "if" may be interpreted as "when" or "once" or "in response to determining" or "in response to detecting" depending on the context. ". Similarly, the phrase "if determined" or "if [the described condition or event] is detected" may be interpreted, depending on the context, to mean "once determined" or "in response to a determination" or "once the [described condition or event] is detected ]" or "in response to detection of [the described condition or event]".

另外，在本发明说明书和所附权利要求书的描述中，术语“第一”、“第二”、“第三”等仅用于区分描述，而不能理解为指示或暗示相对重要性。In addition, in the description of the present specification and appended claims, the terms "first", "second", "third", etc. are only used to distinguish the description and shall not be understood as indicating or implying relative importance.

在本发明说明书中描述的参考“一个实施例”或“一些实施例”等意味着在本发明的一个或多个实施例中包括结合该实施例描述的特定特征、结构或特点。由此，在本说明书中的不同之处出现的语句“在一个实施例中”、“在一些实施例中”、“在其他一些实施例中”、“在另外一些实施例中”等不是必然都参考相同的实施例，而是意味着“一个或多个但不是所有的实施例”，除非是以其他方式另外特别强调。术语“包括”、“包含”、“具有”及它们的变形都意味着“包括但不限于”，除非是以其他方式另外特别强调。Reference in the specification of the invention to "one embodiment" or "some embodiments" or the like means that a particular feature, structure or characteristic described in connection with the embodiment is included in one or more embodiments of the invention. Therefore, the phrases "in one embodiment", "in some embodiments", "in other embodiments", "in other embodiments", etc. appearing in different places in this specification are not necessarily References are made to the same embodiment, but rather to "one or more but not all embodiments" unless specifically stated otherwise. The terms “including,” “includes,” “having,” and variations thereof all mean “including but not limited to,” unless otherwise specifically emphasized.

应理解，以下实施例中各步骤的序号的大小并不意味着执行顺序的先后，各过程的执行顺序应以其功能和内在逻辑确定，而不应对本发明实施例的实施过程构成任何限定。It should be understood that the sequence number of each step in the following embodiments does not mean the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present invention.

为了说明本发明的技术方案，下面通过具体实施例来进行说明。In order to illustrate the technical solution of the present invention, specific examples will be described below.

本发明实施例一提供的一种舌象语义分割方法，可应用在如图1的应用环境中，其中，客户端与服务端进行通信。其中，客户端包括但不限于掌上电脑、桌上型计算机、笔记本电脑、超级移动个人计算机（ultra-mobile personal computer，UMPC）、上网本、云端计算机设备、个人数字助理（personal digital assistant，PDA）等计算机设备。服务端可以用独立的服务器或者是多个服务器组成的服务器集群来实现。The tongue image semantic segmentation method provided in Embodiment 1 of the present invention can be applied in the application environment as shown in Figure 1, in which the client communicates with the server. Among them, clients include but are not limited to handheld computers, desktop computers, notebook computers, ultra-mobile personal computers (UMPC), netbooks, cloud computer equipment, personal digital assistants (personal digital assistants, PDAs), etc. Computer equipment. The server can be implemented as an independent server or a server cluster composed of multiple servers.

参见图2，是本发明实施例二提供的一种舌象语义分割方法的流程示意图，上述舌象语义分割方法可以应用于图1中的客户端，用户使用该客户端对一舌体图像进行分析，在该客户端中执行上述舌象语义分割方法，以输出舌象语义分割的结果。如图2所示，该舌象语义分割方法可以包括以下步骤：Refer to Figure 2, which is a schematic flow chart of a tongue image semantic segmentation method provided in Embodiment 2 of the present invention. The above tongue image semantic segmentation method can be applied to the client in Figure 1. The user uses the client to perform a tongue body image Analysis, the above tongue image semantic segmentation method is executed in the client to output the result of tongue image semantic segmentation. As shown in Figure 2, the tongue image semantic segmentation method can include the following steps:

步骤S201，获取目标舌象图片，对目标舌象图片进行特征提取预处理，得到目标舌象图片的浅层特征图。Step S201: Obtain a target tongue image, perform feature extraction preprocessing on the target tongue image, and obtain a shallow feature map of the target tongue image.

其中，目标舌象图片可以通过公开的数据库或研究中获取，或通过医疗设备对病患的舌部进行图像采集以直接获取，也可以通过用户手动拍照后上传来获得。Among them, the target tongue image can be obtained from public databases or studies, or can be obtained directly by collecting images of the patient's tongue through medical equipment, or can be obtained by the user manually taking pictures and uploading them.

浅层特征图是对目标舌象图片进行初步处理的特征图，该浅层特征图能够具体展现原舌象图片的细节特征，该“浅层”为定义一个名称，而非对特征图有本质上的限定，该浅层特征图就是通过对摄像图进行特征提取预处理得到的图像。The shallow feature map is a feature map used for preliminary processing of the target tongue image. This shallow feature map can specifically show the detailed features of the original tongue image. The "shallow layer" is to define a name, rather than having an essence of the feature map. According to the above limitation, the shallow feature map is an image obtained by performing feature extraction preprocessing on the camera image.

对获取到的目标舌象图片进行特征提取预处理，可以包括调整大小、颜色归一化、去噪等步骤，以便进行特征提取，提取的特征通常包括：颜色、纹理、形状等信息。特征提取预处理后即可得到以一些特征表现为主要的图像，具体地可以是通过矩阵的形式表达出的图像。Perform feature extraction preprocessing on the obtained target tongue image, which may include steps such as resizing, color normalization, and denoising, in order to perform feature extraction. The extracted features usually include: color, texture, shape and other information. After feature extraction preprocessing, an image mainly represented by some features can be obtained. Specifically, it can be an image expressed in the form of a matrix.

步骤S202，将浅层特征图进行多尺度特征提取处理，得到目标舌象图片的多尺度特征图，对多尺度特征图进行通道降维处理，得到目标舌象图片的整合特征图。Step S202: Perform multi-scale feature extraction processing on the shallow feature map to obtain a multi-scale feature map of the target tongue image, and perform channel dimensionality reduction processing on the multi-scale feature map to obtain an integrated feature map of the target tongue image.

其中，多尺度特征提取是指对浅层特征图进行多次处理，每次处理使用不同的尺度或参数，最终对不同的处理结果进行可并融合，得到目标舌象图片的多尺度特征图。在对浅层特征图进行多尺度特征提取处理时，可以对浅层特征图进行不同程度的缩小或放大，从而得到不同尺度下的图像特征信息，还可以通过调节网络的结构参数，以调整网络的深度、卷积核大小、步长等特征表示。Among them, multi-scale feature extraction refers to processing the shallow feature map multiple times, using different scales or parameters for each processing, and finally merging the different processing results to obtain the multi-scale feature map of the target tongue image. When performing multi-scale feature extraction processing on shallow feature maps, the shallow feature maps can be reduced or enlarged to varying degrees to obtain image feature information at different scales. The network can also be adjusted by adjusting the structural parameters of the network. Depth, convolution kernel size, step size and other feature representations.

在进行多尺度特征提取过程中可能会产生大量的特征通道，会增加后续处理的复杂性和计算负担，因此，需要进行通道降维处理，减少特征通道的数量，同时保留关键信息。整合特征图包含舌象在不同尺度和通道上的重要信息，可以为后续的分析提供全面的特征表示。A large number of feature channels may be generated during the multi-scale feature extraction process, which will increase the complexity and computational burden of subsequent processing. Therefore, channel dimensionality reduction processing is required to reduce the number of feature channels while retaining key information. The integrated feature map contains important information about the tongue image at different scales and channels, which can provide a comprehensive feature representation for subsequent analysis.

其中，通道维度处理时指对数据的通道进行降维的过程，通道通常指的是输入数据中的特征图的数量。通道降维的目的是为了减少模型的复杂度和参数量，从而提高模型的训练和推理效率。同时可以有效防止过拟合。Among them, channel dimension processing refers to the process of dimensionality reduction of data channels, and channels usually refer to the number of feature maps in the input data. The purpose of channel dimensionality reduction is to reduce the complexity and number of parameters of the model, thereby improving the training and inference efficiency of the model. At the same time, it can effectively prevent overfitting.

步骤S203，对浅层特征图进行通道降维处理，得到目标舌象图片的第一特征图，对整合特征图进行上采样，得到目标舌象图片的第二特征图，将第一特征图与第二特征图进行特征融合，得到目标舌象图片的深层特征图。Step S203: Perform channel dimensionality reduction processing on the shallow feature map to obtain the first feature map of the target tongue image, upsample the integrated feature map to obtain the second feature map of the target tongue image, and combine the first feature map with The second feature map performs feature fusion to obtain the deep feature map of the target tongue image.

第一特征图是指对浅层特征图进行通道降维处理后得到的图像或者图表达，例如，浅层特征图为3通道的特征图，对其进行通道降维处理可以将其将至2通道或者1通道，得到对应的第一特征图。The first feature map refers to the image or graph expression obtained after channel dimensionality reduction processing of the shallow feature map. For example, the shallow feature map is a 3-channel feature map, and channel dimensionality reduction processing can reduce it to 2 channel or 1 channel to obtain the corresponding first feature map.

第二特征图是指对整合特征图进行上采样后得到的图像或者图表达，整合特征图是对多尺度特征图进行通道降维处理后的图像或图表达，应理解，整合特征图的尺寸小于第一特征图，因此需要对其进行上采样，通过插值或置换卷积得到与第一特征图尺寸大小一致的第二特征图。The second feature map refers to the image or graph representation obtained by upsampling the integrated feature map. The integrated feature map is the image or graph expression obtained after channel dimensionality reduction processing of the multi-scale feature map. It should be understood that the size of the integrated feature map is smaller than the first feature map, so it needs to be upsampled, and a second feature map that is consistent in size with the first feature map is obtained through interpolation or permutation convolution.

两个尺寸相同的特征图的融合可以进行将第一特征图和第二特征图的信息结合起来，得到更为丰富和多维度的深层特征图，其中，融合可以采用加法融合、拼接融合、加权融合等方式进行，这些融合方式可以根据。The fusion of two feature maps of the same size can combine the information of the first feature map and the second feature map to obtain a richer and multi-dimensional deep feature map. Among them, the fusion can use additive fusion, splicing fusion, and weighted fusion. Fusion and other methods are carried out, and these fusion methods can be based on.

该深层特征图不仅包含了原始目标舌象图片的纹理和颜色信息，还可能包含经过多层次处理和融合后的语义信息。The deep feature map not only contains the texture and color information of the original target tongue image, but may also contain semantic information after multi-level processing and fusion.

步骤S204，将浅层特征图与深层特征图输入至预设的用于语义分割任务的图像分割模型，得到目标舌象图片的分割预测结果。Step S204: Input the shallow feature map and the deep feature map into a preset image segmentation model for semantic segmentation tasks to obtain segmentation prediction results of the target tongue image.

图像分割模型可以是一预先训练好的模型，该模型包括机器学习模型、神经网络模型等，在训练时选用采集的舌象图片作为训练集，以使得训练好的模型能够用于舌象的分割。在本实施例中具体体现为将舌象图片的不同区域进行精确的分割，例如，舌象图片中的舌体和背景进行分割，舌质的提取等。The image segmentation model can be a pre-trained model, which includes a machine learning model, a neural network model, etc. During training, the collected tongue images are selected as a training set, so that the trained model can be used for tongue segmentation. . In this embodiment, it is specifically embodied in accurately segmenting different areas of the tongue image image, for example, segmenting the tongue body and background in the tongue image image, extracting tongue quality, etc.

将深层特征图输入该图像分割模型，可以得到初步的分割预测结果，由于在对深层特征进行图像分割过程中可能无法考虑全局的特征，导致特征缺失，因此，为了避免特征缺失，则采用浅层特征图修整生成的分割预测结果，以完善分割结果，提高舌象图片的完整性。By inputting the deep feature map into the image segmentation model, preliminary segmentation prediction results can be obtained. Since global features may not be considered in the image segmentation process of deep features, resulting in missing features, therefore, in order to avoid missing features, shallow layers are used. The segmentation prediction results generated by the feature map are trimmed to improve the segmentation results and improve the integrity of the tongue images.

本实施例的舌象语义分割方法，通过多尺度提取处理，充分理解图像在不同尺度上的特征，更好地识别和分类对象，降维处理的运用，降低了模型的复杂性，提高了运行的速度和准确性，结合浅层和深层信息，更好地理解和识别图像内容，有效提高了对舌象图片的语义分割，准确分割出舌头的各个部分，提供更精确和客观的诊断依据，便于中医诊断和健康检测，通过机器学习和深度学习自动化分析处理目标舌象图片，减少了人为因素对诊断结果的影响。The tongue image semantic segmentation method in this embodiment fully understands the characteristics of images at different scales through multi-scale extraction processing, and better identifies and classifies objects. The application of dimensionality reduction processing reduces the complexity of the model and improves operation efficiency. The speed and accuracy combine shallow and deep information to better understand and identify image content, effectively improve the semantic segmentation of tongue images, accurately segment various parts of the tongue, and provide a more accurate and objective diagnosis basis. It facilitates traditional Chinese medicine diagnosis and health testing, and automatically analyzes and processes target tongue images through machine learning and deep learning, reducing the impact of human factors on diagnostic results.

参见图3，为本发明实施例三提供的一种舌象语义分割方法的流程示意图，如图3所示，在步骤S202中的将浅层特征图进行多尺度特征提取处理，得到目标舌象图片的多尺度特征图，包括：Refer to Figure 3, which is a schematic flow chart of a tongue image semantic segmentation method provided in Embodiment 3 of the present invention. As shown in Figure 3, in step S202, the shallow feature map is subjected to multi-scale feature extraction processing to obtain the target tongue image. Multi-scale feature maps of images, including:

步骤S301，将浅层特征图分别经过一个池化层和至少一个卷积层。Step S301: Pass the shallow feature map through a pooling layer and at least one convolution layer respectively.

其中，浅层特征图需要经过一个池化层以及一个卷积层或多个卷积层。池化层通常用于下采样，减少特征图的维度的同时保留重要信息，减少计算量和过拟合的风险，提高更高级别的特征；通过卷积层可以捕捉到图像中的细节信息，可以连接多个卷积层以进一步提取特征，有助于更好地学习和理解舌象图片中的复杂特征。Among them, the shallow feature map needs to go through a pooling layer and a convolution layer or multiple convolution layers. The pooling layer is usually used for downsampling to reduce the dimension of the feature map while retaining important information, reducing the amount of calculation and the risk of overfitting, and improving higher-level features; the detailed information in the image can be captured through the convolution layer. Multiple convolutional layers can be connected to further extract features, helping to better learn and understand complex features in tongue images.

步骤S302，当卷积层的数量大于两层时，各个卷积层用于对浅层特征图进行不同膨胀率的膨胀卷积处理，得到目标舌象图片的各卷积特征图。Step S302, when the number of convolutional layers is greater than two layers, each convolutional layer is used to perform dilation convolution processing with different dilation rates on the shallow feature map to obtain each convolutional feature map of the target tongue image.

其中，膨胀卷积是一种特殊的卷积操作，通过在卷积过程中引入膨胀率，可控制卷积核在输入特征图上扩展的步长，当有多层卷积层时，可以根据需求设置不同的膨胀率，使每一层卷积层都可以以不同的方式扩展感受野，从而从不同的角度和尺度上提取特征。每一种膨胀率的膨胀卷积处理均能生成一个卷积特征图。Among them, dilated convolution is a special convolution operation. By introducing the expansion rate in the convolution process, the step size of the convolution kernel expansion on the input feature map can be controlled. When there are multiple convolution layers, it can be It is necessary to set different expansion rates so that each convolutional layer can expand the receptive field in different ways, thereby extracting features from different angles and scales. The dilation convolution process for each dilation rate can generate a convolution feature map.

步骤S303，池化层用于对浅层特征图进行平均池化处理，得到目标舌象图片的池化特征图。Step S303: The pooling layer is used to perform average pooling processing on the shallow feature map to obtain the pooled feature map of the target tongue image.

对浅层特征图进行全局平均池化处理，有助于进一步降低特征图的维度，保留重要信息，为后续的图像分割和分类任务提供更简洁、高效的特征表示。Global average pooling of shallow feature maps can help further reduce the dimensionality of feature maps, retain important information, and provide a more concise and efficient feature representation for subsequent image segmentation and classification tasks.

步骤S304，对各卷积特征图和池化特征图进行特征融合，得到多尺度特征图。Step S304: Feature fusion is performed on each convolution feature map and the pooling feature map to obtain a multi-scale feature map.

特征融合是将来自不同层级的特征图进行组合，以获得多尺度的特征表示，将各个经过处理的特征图进行特征融合，将体现的各个特征融合在一起，形成具有多个尺度的特征图。具体的融合方法可以是将不同层的特征图在通道维度上串联起来，也可以是通过上采样和下采样使特征图的尺寸相一致。Feature fusion is to combine feature maps from different levels to obtain multi-scale feature representation, fuse each processed feature map, and fuse the embodied features together to form a feature map with multiple scales. The specific fusion method can be to concatenate the feature maps of different layers in the channel dimension, or to make the size of the feature maps consistent through upsampling and downsampling.

举例说明，设置四个卷积层和一个池化层，四个卷积层分别为1×1的卷积层，以及膨胀率分别为3、6、9的3×3的膨胀卷积层，每一个卷积层或池化层生成一个经过处理的特征图。For example, set up four convolutional layers and one pooling layer, the four convolutional layers are 1×1 convolutional layers, and the 3×3 dilated convolutional layers with expansion rates of 3, 6, and 9 respectively. Each convolutional or pooling layer generates a processed feature map.

本实施例的舌象语义分割方法，通过使用池化和不同膨胀率的卷积来捕捉多尺度的特征，使能更好地理解和分析舌象图片，降低了特征图的维度，减少了计算资源和时间，提高处理效率，在卷积层地选择方面具有一定的灵活性。The tongue image semantic segmentation method in this embodiment captures multi-scale features by using pooling and convolution with different expansion rates, enabling better understanding and analysis of tongue image images, reducing the dimensionality of the feature map, and reducing calculations. resources and time, improve processing efficiency, and have certain flexibility in the selection of convolutional layers.

参见图4，为本发明实施例四提供的一种舌象语义分割方法的流程示意图，如图4所示，在步骤S302对浅层特征图进行不同膨胀率的膨胀卷积处理，得到目标舌象图片的各卷积特征图之后，还包括：Refer to Figure 4, which is a schematic flow chart of a tongue image semantic segmentation method provided in Embodiment 4 of the present invention. As shown in Figure 4, in step S302, dilation convolution processing with different expansion rates is performed on the shallow feature map to obtain the target tongue After each convolution feature map of the image, it also includes:

步骤S401，针对每一卷积层，根据预设的通道注意力模块，计算卷积特征图的通道维度注意力权重，根据预设的空间注意力模块，计算卷积特征图的空间维度注意力权重。Step S401: For each convolution layer, calculate the channel dimension attention weight of the convolution feature map according to the preset channel attention module, and calculate the spatial dimension attention of the convolution feature map according to the preset spatial attention module. Weights.

每一个卷积层都能输出一个卷积特征图，将输出的卷积特征图输入注意力融合模块，分别计算该卷积特征图的通道维度上的注意力权重，以及空间维度上的注意力权重。计算通道维度注意力权重的方法有全局平均池化、自注意力机制等，计算空间维度注意力权重的方法有位置嵌入、空间金字塔池化等。Each convolution layer can output a convolution feature map. The output convolution feature map is input into the attention fusion module to calculate the attention weight in the channel dimension of the convolution feature map and the attention in the spatial dimension. Weights. Methods for calculating channel dimension attention weight include global average pooling, self-attention mechanism, etc. Methods for calculating spatial dimension attention weight include position embedding, spatial pyramid pooling, etc.

步骤S402，利用通道维度注意力权重和空间维度注意力权重，对卷积特征图进行处理，得到卷积特征图的加权特征图。Step S402, use the channel dimension attention weight and the spatial dimension attention weight to process the convolution feature map to obtain a weighted feature map of the convolution feature map.

利用计算出来的注意力权重与卷积特征图相乘，使用这些权重调整卷积层的输出，得到卷积特征图的加权特征图。具体的，可以将权重与通道特征相乘，得到加权的通道特征，同样，对于特征图中的每个位置，可以使用空间维度的注意力权重对该位置的特征进行加权。The calculated attention weights are multiplied with the convolutional feature map, and these weights are used to adjust the output of the convolutional layer to obtain a weighted feature map of the convolutional feature map. Specifically, the weight can be multiplied by the channel feature to obtain the weighted channel feature. Similarly, for each position in the feature map, the attention weight of the spatial dimension can be used to weight the features of the position.

本实施例的舌象语义分割方法，利用注意力权重对卷积特征图进行加权计算，能够更好地捕捉输入数据的结果和语义信息，有助于模型更好地理解和处理输入数据，提高了模型的性能和准确性。The tongue image semantic segmentation method in this embodiment uses attention weights to perform weighted calculations on the convolution feature map, which can better capture the results and semantic information of the input data, help the model better understand and process the input data, and improve improve the performance and accuracy of the model.

参见图5，为本发明实施例五提供的一种舌象语义分割方法的流程示意图，如图5所示，在步骤S402中，即利用通道维度注意力权重对卷积特征图进行处理，包括：Refer to Figure 5, which is a schematic flow chart of a tongue image semantic segmentation method provided in Embodiment 5 of the present invention. As shown in Figure 5, in step S402, the convolution feature map is processed using the channel dimension attention weight, including :

步骤S501，对卷积特征图进行平均池化操作，得到第一池化结果。Step S501, perform an average pooling operation on the convolutional feature map to obtain a first pooling result.

对卷积特征图进行平均池化操作，对特征图中的每个局部区域进行平均操作，以得到该区域内的平均值作为输出，可以将特征图中的每个像素点替换为一个平均值，从而降低特征图的维度。Perform an average pooling operation on the convolution feature map, and perform an average operation on each local area in the feature map to obtain the average value in the area as the output. Each pixel in the feature map can be replaced with an average value. , thereby reducing the dimensionality of the feature map.

步骤S502，对卷积特征图进行最大池化操作，得到第二池化结果。Step S502: Perform a maximum pooling operation on the convolutional feature map to obtain a second pooling result.

最大池化是在池化核覆盖的区域内选择最大值作为输出，与平均池化相比，最大池化将每个像素点替换为该区域内的最大值，更注重提取特征图中的突变信息。Maximum pooling selects the maximum value as the output within the area covered by the pooling kernel. Compared with average pooling, maximum pooling replaces each pixel with the maximum value in the area and pays more attention to extracting mutations in the feature map. information.

步骤S503，将第一池化结果和第二池化结果输入第一多层感知器，分别得到第一输出特征图和第二输出特征图。Step S503: Input the first pooling result and the second pooling result into the first multi-layer perceptron to obtain the first output feature map and the second output feature map respectively.

第一多层感知器（Multi-Layer Perceptron, MLP）分别对第一输出特征图和第二输出特征图进行进一步的处理和分析，具体的操作过程为：对输入的特征图进行逐层卷积和池化操作，提取更深层次的特征；在每一层中使用激活函数对卷积结果进行非线性变换，增加模型的表达能力；在最后一层，将特征图展平为一维向量，并进行全连接操作，得到输出结果。The first multi-layer perceptron (MLP) further processes and analyzes the first output feature map and the second output feature map respectively. The specific operation process is: perform layer-by-layer convolution on the input feature map. and pooling operations to extract deeper features; use activation functions in each layer to perform nonlinear transformation on the convolution results to increase the expressive ability of the model; in the last layer, flatten the feature map into a one-dimensional vector, and Perform a full connection operation and obtain the output result.

步骤S504，将第一输出特征图和第二输出特征图进行加和操作，得到通道维度注意力权重。Step S504: Add the first output feature map and the second output feature map to obtain the channel dimension attention weight.

合并第一输出特征图和第二输出特征图，以得到一个更丰富的特征融合结果，对每个通道进行加权求和，确定通道维度的注意力权重，从而强调重要的通道并一直不重要的通道。通过计算得到的通道维度注意力权重可以进一步用于指导模型对特征图的注意力分配，提高模型的性能和泛化能力。Merge the first output feature map and the second output feature map to obtain a richer feature fusion result, perform a weighted sum for each channel, and determine the attention weight of the channel dimension, thereby emphasizing important channels and keeping unimportant ones. aisle. The calculated channel dimension attention weight can be further used to guide the model's attention allocation to the feature map, improving the performance and generalization ability of the model.

步骤S505，将通道维度注意力权重与卷积特征图相乘，得到通道注意力机制融合特征图。Step S505: Multiply the channel dimension attention weight and the convolution feature map to obtain the channel attention mechanism fusion feature map.

通道注意力机制是一种让模型关注重要通道信息的方法，通过为每个通道分配不同的权重，使得模型能够更加关注重要的特征，从而提升模型的性能。将得到的通道维度注意力权重与卷积特征图进行相乘操作，使模型更好地关注到特征图中重要的通道，并抑制不相关或冗余的通道，从而提高模型的性能和泛化能力。The channel attention mechanism is a method that allows the model to focus on important channel information. By assigning different weights to each channel, the model can pay more attention to important features, thereby improving the performance of the model. Multiply the obtained channel dimension attention weight with the convolutional feature map, so that the model can better focus on important channels in the feature map and suppress irrelevant or redundant channels, thereby improving the performance and generalization of the model. ability.

本实施例的舌象语义分割方法，在通道维度上进行注意力融合，有效提高了模型的表示能力和分类准确率，平均池化与最大池化相结合，降低特征图维度的同时，保留了图像中的重要信息，使更好地分割舌象图片。The tongue image semantic segmentation method in this embodiment performs attention fusion in the channel dimension, which effectively improves the representation ability and classification accuracy of the model. The combination of average pooling and maximum pooling reduces the dimension of the feature map while retaining Important information in the image enables better segmentation of tongue images.

参见图6，为本发明实施例六提供的一种舌象语义分割方法的流程示意图，如图6所示，在步骤S505得到通道注意力机制融合特征图之后，还包括：Referring to Figure 6, it is a schematic flow chart of a tongue image semantic segmentation method provided in Embodiment 6 of the present invention. As shown in Figure 6, after obtaining the channel attention mechanism fusion feature map in step S505, it also includes:

步骤S601，对通道注意力机制融合特征图进行基于通道的最大池化和平均池化操作，分别得到最大池化结果和平均池化结果。Step S601: Perform channel-based maximum pooling and average pooling operations on the channel attention mechanism fusion feature map to obtain maximum pooling results and average pooling results respectively.

基于通道的最大池化和平均池化操作时常见的池化方法，最大池化是选择池化窗口内的最大值作为输出，而平均池化则是计算池化窗口内的平均值作为输出，减小特征图的尺寸，并保留重要特征信息。具体操作为，将特征图沿着通道方向进行划分，使每个通道对应一个特征图，然后对每个通道的特征图分别进行最大池化和平均池化，得到新的特征向量，即最大池化结果和平均池化结果。Common pooling methods are channel-based max pooling and average pooling operations. Max pooling selects the maximum value within the pooling window as the output, while average pooling calculates the average value within the pooling window as the output. Reduce the size of feature maps and preserve important feature information. The specific operation is to divide the feature map along the channel direction so that each channel corresponds to a feature map, and then perform maximum pooling and average pooling on the feature map of each channel to obtain a new feature vector, that is, maximum pooling. pooling results and average pooling results.

步骤S602，将最大池化结果和平均池化结果进行合并，得到空间注意力合并特征图。Step S602: Combine the maximum pooling result and the average pooling result to obtain a spatial attention combined feature map.

具体的，将最大池化结果和平均池化结果分别展平为一维向量，将两个一维向量拼接在一起，形成一个信息特征向量，这个特征向量包含了通道注意力机制融合特征图在每个通道上的最大值和平均值信息，将这个新的特征向量进行调整，形成一个新的空间注意力合并特征图。Specifically, the maximum pooling result and the average pooling result are flattened into one-dimensional vectors respectively, and the two one-dimensional vectors are spliced together to form an information feature vector. This feature vector contains the channel attention mechanism fusion feature map in The maximum and average information on each channel are adjusted to this new feature vector to form a new spatial attention merged feature map.

步骤S603，对空间注意力合并特征图进行卷积激活，得到空间维度注意力权重。Step S603: Perform convolution activation on the spatial attention combined feature map to obtain the spatial dimension attention weight.

对空间注意力合并特征图进行卷积操作，可以使用一个卷积核进行一次卷积，或者使用多个卷积核进行多次卷积，卷积操作可以提取特征图中的局部特征，并生成新的特征图，在卷积操作后，使用激活函数对特征图进行非线性变换，常用sigmoid激活函数进行处理。在激活函数的作用下，特征图中的每个像素点会被赋予一个数值，这个数值表示该像素点在空间维度上的注意力权重。To perform a convolution operation on the spatial attention merged feature map, you can use one convolution kernel for one convolution, or use multiple convolution kernels for multiple convolutions. The convolution operation can extract local features in the feature map and generate For the new feature map, after the convolution operation, the activation function is used to nonlinearly transform the feature map, and the sigmoid activation function is commonly used for processing. Under the action of the activation function, each pixel in the feature map will be assigned a value, which represents the attention weight of the pixel in the spatial dimension.

步骤S604，将空间维度注意力权重与通道注意力机制融合特征图相乘，得到加权特征图。Step S604: Multiply the spatial dimension attention weight and the channel attention mechanism fusion feature map to obtain a weighted feature map.

计算得到的空间维度注意力权重应与通道注意力机制融合特征图有相同的尺寸，进行逐元素相乘，即对每个通道的特征图进行加权，使模型能够更加关注重要的空间特征，通过逐元素相乘得到加权特征图，其中每个像素点的值是对应的通道注意力权重与空间维度注意力权重的乘积。The calculated spatial dimension attention weight should have the same size as the channel attention mechanism fusion feature map, and be multiplied element by element, that is, the feature map of each channel is weighted, so that the model can pay more attention to important spatial features. Element-wise multiplication is performed to obtain a weighted feature map, in which the value of each pixel is the product of the corresponding channel attention weight and the spatial dimension attention weight.

本实施例的舌象语义分割方法，提高了特征图的表征能力，最大池化与平均池化相结合，减少了后续卷积操作的计算量，提高了模型的运行效率，可以更好地处理各种不同的图像数据，提高了模型的泛化能力。The tongue image semantic segmentation method in this embodiment improves the representation ability of the feature map. The combination of maximum pooling and average pooling reduces the calculation amount of subsequent convolution operations, improves the operating efficiency of the model, and can better process A variety of different image data improves the generalization ability of the model.

参见图7，为本发明实施例七提供的一种舌象语义分割方法的流程示意图，如图7所示，在步骤S204中，即将浅层特征图与深层特征图输入至预设的用于语义分割任务的图像分割模型，得到目标舌象图片的分割预测结果，包括：Referring to Figure 7, a schematic flow chart of a tongue image semantic segmentation method provided in Embodiment 7 of the present invention is shown. As shown in Figure 7, in step S204, the shallow feature map and the deep feature map are input into the preset for The image segmentation model for the semantic segmentation task obtains the segmentation prediction results of the target tongue image, including:

步骤S701，对深层特征图进行卷积特征提取，得到深层特征图的特征表示结果。Step S701: Perform convolution feature extraction on the deep feature map to obtain a feature representation result of the deep feature map.

对深层特征图进行进一步的处理操作，卷积特征提取后的深层特征图可以获得更有效的特征表示，对每个特征图的局部区域进行卷积，提取出更具有鉴别性的特征。By further processing the deep feature map, the deep feature map after convolution feature extraction can obtain a more effective feature representation, and the local area of each feature map is convolved to extract more discriminative features.

步骤S702，对特征表示结果的像素点周围进行点采样，获得若干个预设采样点。Step S702: Perform point sampling around the pixel points of the feature representation result to obtain several preset sampling points.

其中，预设采样点使特征表示结果中相对模糊，不能清晰确定语义信息的点，可以在像素点周围均匀选取，具体的采样过程可以根据实际需求进行设计，控制采样点的密度和分布，确保能够充分覆盖特征表示结果中的重要信息。需要注意的是，采样点的数量和分布需要根据具体任务和数据集的特点进行选择和调整，以确保最佳的效果。Among them, the preset sampling points make the feature representation results relatively fuzzy and the points that cannot clearly determine the semantic information can be selected evenly around the pixel points. The specific sampling process can be designed according to actual needs to control the density and distribution of sampling points to ensure It can fully cover the important information in the feature representation results. It should be noted that the number and distribution of sampling points need to be selected and adjusted according to the characteristics of the specific task and data set to ensure the best results.

步骤S703，根据特征表示结果，获取每个预设采样点的特征向量。Step S703: Obtain the feature vector of each preset sampling point according to the feature representation result.

应理解，每个预设采样点的特征向量可以在特征表示结果中提取出来，可以根据实际选择合适的特征向量提取方法和参数设置，以提高分类任务的性能和准确性。It should be understood that the feature vector of each preset sampling point can be extracted from the feature representation result, and the appropriate feature vector extraction method and parameter settings can be selected according to the actual situation to improve the performance and accuracy of the classification task.

在一实施方式中，如图8所示，获取每个预设采样点的特征向量，包括：In one implementation, as shown in Figure 8, obtaining the feature vector of each preset sampling point includes:

步骤S801，根据特征表示结果，获取每个预设采样点的点坐标；Step S801, obtain the point coordinates of each preset sampling point according to the feature representation result;

步骤S802，根据点坐标，对浅层特征图进行采样，得到预设采样点的低层特征向量；Step S802, sample the shallow feature map according to the point coordinates to obtain the low-level feature vector of the preset sampling point;

步骤S803，根据点坐标，对特征表示结果进行采样，得到预设采样点的高层特征向量；Step S803, sample the feature representation result according to the point coordinates to obtain the high-level feature vector of the preset sampling point;

步骤S804，对低层特征向量和高层特征向量进行合并，得到预设采样点的特征向量。Step S804: Merge the low-level feature vector and the high-level feature vector to obtain the feature vector of the preset sampling point.

其中，从特征表示结果中确定预设采样点的位置（即预设采样点的点坐标），由于浅层特征图的尺寸与特征表示结果的尺寸一致，因此，根据点坐标，可以获取浅层特征图中相应位置的特征向量，即为预设采样点的低层特征向量，根据点坐标可以获取特征表示结果中相应位置的特征向量，即为高层特征向量，将同一预设采样点的低层特征向量和高层特征向量进行合并得到预设采样点的特征向量。Among them, the position of the preset sampling point (that is, the point coordinates of the preset sampling point) is determined from the feature representation result. Since the size of the shallow layer feature map is consistent with the size of the feature representation result, the shallow layer can be obtained based on the point coordinates. The feature vector at the corresponding position in the feature map is the low-level feature vector of the preset sampling point. According to the point coordinates, the feature vector at the corresponding position in the feature representation result can be obtained, which is the high-level feature vector. The low-level feature vector of the same preset sampling point The vector and the high-level feature vector are combined to obtain the feature vector of the preset sampling point.

通过高层特征向量与低层特征向量相融合，可以得到不同层次的特征信息，更好地理解图像或数据的本质特征，有效提高了特征的表示能力，能够更灵活的处理不同的任务和数据集，提高特征表示的全面性和准确性，增强模型的泛化能力。Through the fusion of high-level feature vectors and low-level feature vectors, different levels of feature information can be obtained to better understand the essential characteristics of images or data, effectively improve the representation ability of features, and be able to handle different tasks and data sets more flexibly. Improve the comprehensiveness and accuracy of feature representation and enhance the generalization ability of the model.

步骤S704，通过第二多层感知器对特征向量进行计算，得到分割预测结果。Step S704: Calculate the feature vector through the second multi-layer perceptron to obtain a segmentation prediction result.

多层感知器由多个神经元组成，每个神经元接收输入信号并输出一个值，通过训练，多层感知器可以学习到从输入的特征向量到输出分割结果的映射关系。具体的训练过程可以使用反向传播算法等优化算法进行，通过不断调整第二多层感知器的权重参数，可以使得预测结果与实际分割结果之间的误差最小化。需要注意的是，第二多层感知器的结构和参数设置需要根据具体任务和数据集的特点进行选择和调整。经过计算，最终得到与原图尺寸相同的分割预测结果。The multi-layer perceptron is composed of multiple neurons. Each neuron receives an input signal and outputs a value. Through training, the multi-layer perceptron can learn the mapping relationship from the input feature vector to the output segmentation result. The specific training process can be carried out using optimization algorithms such as the back propagation algorithm. By continuously adjusting the weight parameters of the second multi-layer perceptron, the error between the prediction results and the actual segmentation results can be minimized. It should be noted that the structure and parameter settings of the second multi-layer perceptron need to be selected and adjusted according to the characteristics of the specific task and data set. After calculation, the segmentation prediction result is finally obtained with the same size as the original image.

在一实施方式中，得到的分割预测结果还可以进行迭代修复处理，不断优化分割结果，以达到更准确的效果。通过不断更新分割结果，逐渐逼近最优解，可以根据当前分割预测结果与实际分割结果的差异，调整第二多层感知器的权重参数，以得到更准确的分割结果，然后，将更准确的分割结果作为新的输入特征向量，再次进行第二多层感知器的计算，以得到进一步优化的分割预测结果。这样，逐渐减小分割预测结果与实际分割结果之间的误差，提高分割的准确性和稳定性。另外，迭代修复处理的次数和参数设置需要根据具体任务和数据集的特点进行选择和调整，以达到最佳的分割效果。In one embodiment, the obtained segmentation prediction results can also be subjected to iterative repair processing to continuously optimize the segmentation results to achieve more accurate results. By continuously updating the segmentation results and gradually approaching the optimal solution, the weight parameters of the second multi-layer perceptron can be adjusted according to the difference between the current segmentation prediction results and the actual segmentation results to obtain more accurate segmentation results. Then, the more accurate segmentation results can be obtained. The segmentation result is used as a new input feature vector, and the calculation of the second multi-layer perceptron is performed again to obtain further optimized segmentation prediction results. In this way, the error between the segmentation prediction results and the actual segmentation results is gradually reduced, and the accuracy and stability of the segmentation are improved. In addition, the number of iterative repair processes and parameter settings need to be selected and adjusted according to the characteristics of the specific task and data set to achieve the best segmentation effect.

如图9所示，为舌象语义分割方法的模型架构示意图，在图9中，输入一个轻量级网络MobileNet V2得到浅层特征图，将浅层特征图分别输入一个1×1的卷积层（Conv）、一个3倍膨胀率的3×3的卷积层、一个6倍膨胀率的3×3的卷积层、一个9倍膨胀率的3×3的卷积层，以及一个平均池化层，分别进行处理并输出特征图，将这些特征图融合起来，得到多尺度特征图。在浅层特征图经过卷积层后，可以利用CBAM（Convolutional Block AttentionModule，卷积注意力机制模块）对输出的特征图进行处理，使融入通道注意力和空间注意力。浅层特征图经过池化层的特征图不需要进行CBAM处理。对多尺度特征图进行通道降维处理，得到整合特征图，对整合特征图上采样后（即第二特征图）与通道降维处理后的浅层特征图（即第一特征图）进行合并（Concat），得到深层特征图。经过卷积层的深层特征图与浅层特征图输入点渲染（Pointrend）模块，整理输出得到目标舌象图片的分割预测结果。As shown in Figure 9, it is a schematic diagram of the model architecture of the tongue image semantic segmentation method. In Figure 9, a lightweight network MobileNet V2 is input to obtain a shallow feature map, and the shallow feature map is input into a 1×1 convolution. layer (Conv), a 3×3 convolutional layer with 3x expansion rate, a 3×3 convolutional layer with 6x expansion rate, a 3x3 convolutional layer with 9x expansion rate, and an average The pooling layer processes and outputs feature maps separately, and fuses these feature maps to obtain multi-scale feature maps. After the shallow feature map passes through the convolution layer, CBAM (Convolutional Block Attention Module, convolutional attention mechanism module) can be used to process the output feature map to incorporate channel attention and spatial attention. The feature map of the shallow feature map that passes through the pooling layer does not need to be processed by CBAM. Perform channel dimensionality reduction on the multi-scale feature map to obtain an integrated feature map. Merge the upsampled integrated feature map (i.e. the second feature map) and the shallow feature map after channel dimensionality reduction (i.e. the first feature map). (Concat), get the deep feature map. After the deep feature map and shallow feature map of the convolution layer are input to the point rendering (Pointrend) module, the output is sorted and the segmentation prediction results of the target tongue image are obtained.

本实施例的舌象语义分割方法，选取了一些预设采样点进行特征向量计算，得到的分割预测结果更加精确，提高了分割的准确性和稳定性，为中医舌象分析提供便利。The tongue image semantic segmentation method in this embodiment selects some preset sampling points for feature vector calculation, and the obtained segmentation prediction results are more accurate, which improves the accuracy and stability of segmentation and facilitates tongue image analysis in traditional Chinese medicine.

参见图10，为本申请实施例十提供的一种舌象语义分割装置的结构示意图，基于上述的舌象语义分割方法，本实施例十中该舌象语义分割装置包括：浅层特征图获取模块101，用于获取目标舌象图片，对目标舌象图片进行特征提取处理，得到目标舌象图片的浅层特征图；Referring to Figure 10, which is a schematic structural diagram of a tongue image semantic segmentation device provided in Embodiment 10 of the present application. Based on the above tongue image semantic segmentation method, the tongue image semantic segmentation device in Embodiment 10 includes: shallow feature map acquisition Module 101 is used to obtain the target tongue image, perform feature extraction processing on the target tongue image, and obtain the shallow feature map of the target tongue image;

整合特征图获取模块102，用于将浅层特征图进行多尺度特征提取处理，得到目标舌象图片的多尺度特征图，对多尺度特征图进行通道降维处理，得到目标舌象图片的整合特征图；The integrated feature map acquisition module 102 is used to perform multi-scale feature extraction processing on the shallow feature map to obtain a multi-scale feature map of the target tongue image, and perform channel dimensionality reduction processing on the multi-scale feature map to obtain an integration of the target tongue image image. feature map;

深层特征图获取模块103，用于对浅层特征图进行通道降维处理，得到目标舌象图片的第一特征图，对整合特征图进行上采样，得到目标舌象图片的第二特征图，将第一特征图与第二特征图进行特征融合，得到目标舌象图片的深层特征图；The deep feature map acquisition module 103 is used to perform channel dimensionality reduction processing on the shallow feature map to obtain the first feature map of the target tongue image, and upsample the integrated feature map to obtain the second feature map of the target tongue image. Perform feature fusion on the first feature map and the second feature map to obtain the deep feature map of the target tongue image;

预测结果生成模块104，用于将浅层特征图与深层特征图输入至预设的用于语义分割任务的图像分割模型，得到目标舌象图片的分割预测结果。The prediction result generation module 104 is used to input shallow feature maps and deep feature maps into a preset image segmentation model for semantic segmentation tasks to obtain segmentation prediction results of the target tongue image.

可选的是，上述整合特征图获取模块102包括：Optionally, the above integrated feature map acquisition module 102 includes:

特征图处理子模块，用于将浅层特征图分别经过一个池化层和至少一个卷积层；The feature map processing submodule is used to pass the shallow feature map through a pooling layer and at least one convolution layer respectively;

当卷积层的数量大于两层时，各个卷积层用于对浅层特征图进行不同膨胀率的膨胀卷积处理，得到目标舌象图片的各卷积特征图；When the number of convolutional layers is greater than two, each convolutional layer is used to perform dilation convolution processing with different expansion rates on the shallow feature map to obtain each convolutional feature map of the target tongue image;

池化层用于对浅层特征图进行平均池化处理，得到目标舌象图片的池化特征图；The pooling layer is used to average pool the shallow feature map to obtain the pooled feature map of the target tongue image;

特征融合子模块，用于对各卷积特征图和池化特征图进行特征融合，得到多尺度特征图。The feature fusion submodule is used to perform feature fusion on each convolution feature map and pooling feature map to obtain a multi-scale feature map.

可选的是，上述特征图处理子模块包括：Optionally, the above feature map processing sub-module includes:

权重计算单元，用于针对每一卷积层，根据预设的通道注意力模块，计算卷积特征图的通道维度注意力权重，根据预设的空间注意力模块，计算卷积特征图的空间维度注意力权重；The weight calculation unit is used to calculate the channel dimension attention weight of the convolution feature map according to the preset channel attention module for each convolution layer, and calculate the space of the convolution feature map according to the preset spatial attention module. Dimension attention weight;

加权特征图获取单元，用于利用通道维度注意力权重和空间维度注意力权重，对卷积特征图进行处理，得到卷积特征图的加权特征图。The weighted feature map acquisition unit is used to process the convolution feature map using the channel dimension attention weight and the spatial dimension attention weight to obtain a weighted feature map of the convolution feature map.

可选的是，上述加权特征图获取单元包括：Optionally, the above weighted feature map acquisition unit includes:

平均池化子单元，用于对卷积特征图进行平均池化操作，得到第一池化结果；The average pooling subunit is used to perform an average pooling operation on the convolutional feature map to obtain the first pooling result;

最大池化子单元，用于对卷积特征图进行最大池化操作，得到第二池化结果；The maximum pooling subunit is used to perform maximum pooling operations on the convolution feature map to obtain the second pooling result;

特征图输出子单元，用于将第一池化结果和第二池化结果输入第一多层感知器，分别得到第一输出特征图和第二输出特征图；The feature map output subunit is used to input the first pooling result and the second pooling result into the first multi-layer perceptron to obtain the first output feature map and the second output feature map respectively;

通道注意力权重计算子单元，用于将第一输出特征图和第二输出特征图进行加和操作，得到通道维度注意力权重；The channel attention weight calculation subunit is used to add the first output feature map and the second output feature map to obtain the channel dimension attention weight;

通道注意力融合子单元，用于将通道维度注意力权重与卷积特征图相乘，得到通道注意力机制融合特征图。The channel attention fusion subunit is used to multiply the channel dimension attention weight and the convolution feature map to obtain the channel attention mechanism fusion feature map.

可选的是，上述通道注意力融合子单元之后还包括：Optionally, the above channel attention fusion subunit also includes:

通道池化子单元，用于对通道注意力机制融合特征图进行基于通道的最大池化和平均池化操作，分别得到最大池化结果和平均池化结果；The channel pooling subunit is used to perform channel-based maximum pooling and average pooling operations on the channel attention mechanism fusion feature map to obtain the maximum pooling result and the average pooling result respectively;

池化合并子单元，用于将最大池化结果和平均池化结果进行合并，得到空间注意力合并特征图；The pooling merging subunit is used to merge the maximum pooling results and the average pooling results to obtain the spatial attention merging feature map;

空间注意力权重获取子单元，用于对空间注意力合并特征图进行卷积激活，得到空间维度注意力权重；The spatial attention weight acquisition subunit is used to perform convolution activation on the spatial attention combined feature map to obtain the spatial dimension attention weight;

加权特征图获取子单元，用于将空间维度注意力权重与通道注意力机制融合特征图相乘，得到加权特征图。The weighted feature map acquisition subunit is used to multiply the spatial dimension attention weight and the channel attention mechanism fusion feature map to obtain a weighted feature map.

可选的是，上述预测结果生成模块104包括：Optionally, the above prediction result generation module 104 includes:

特征提取子模块，用于对深层特征图进行卷积特征提取，得到深层特征图的特征表示结果；The feature extraction submodule is used to extract convolutional features from deep feature maps to obtain the feature representation results of deep feature maps;

点采样子模块，用于对特征表示结果的像素点周围进行点采样，获得若干个预设采样点；The point sampling submodule is used to perform point sampling around the pixels of the feature representation results to obtain several preset sampling points;

特征向量获取子模块，用于根据特征表示结果，获取每个预设采样点的特征向量；The feature vector acquisition submodule is used to obtain the feature vector of each preset sampling point based on the feature representation results;

预测结果获取子模块，用于通过第二多层感知器对特征向量进行计算，得到分割预测结果。The prediction result acquisition submodule is used to calculate the feature vector through the second multi-layer perceptron to obtain the segmentation prediction result.

可选的是，上述特征向量获取子模块包括：Optionally, the above feature vector acquisition sub-module includes:

点坐标获取单元，用于根据特征表示结果，获取每个预设采样点的点坐标；The point coordinate acquisition unit is used to obtain the point coordinates of each preset sampling point based on the feature representation results;

低层特征向量获取单元，用于根据点坐标，对浅层特征图进行采样，得到预设采样点的低层特征向量；The low-level feature vector acquisition unit is used to sample the shallow feature map according to the point coordinates to obtain the low-level feature vector of the preset sampling point;

高层特征向量获取单元，用于根据点坐标，对特征表示结果进行采样，得到预设采样点的高层特征向量；The high-level feature vector acquisition unit is used to sample the feature representation results according to the point coordinates to obtain the high-level feature vector of the preset sampling point;

特征向量获取单元，用于对低层特征向量和高层特征向量进行合并，得到预设采样点的特征向量。The feature vector acquisition unit is used to merge low-level feature vectors and high-level feature vectors to obtain feature vectors of preset sampling points.

需要说明的是，上述模块之间的信息交互、执行过程等内容，由于与本发明方法实施例基于同一构思，其具体功能及带来的技术效果，具体可参见方法实施例部分，此处不再赘述。It should be noted that the information interaction, execution process, etc. between the above modules are based on the same concept as the method embodiments of the present invention. For details of their specific functions and technical effects, please refer to the method embodiments section, which will not be discussed here. Again.

图11为本发明实施例十一提供的一种计算机设备的结构示意图。如图10所示，该实施例的计算机设备包括：至少一个处理器（图10中仅示出一个）、存储器以及存储在存储器中并可在至少一个处理器上运行的计算机程序，处理器执行计算机程序时实现上述任意各个舌象语义分割方法实施例中的步骤。FIG. 11 is a schematic structural diagram of a computer device provided by Embodiment 11 of the present invention. As shown in Figure 10, the computer device of this embodiment includes: at least one processor (only one is shown in Figure 10), a memory, and a computer program stored in the memory and executable on at least one processor. The processor executes The computer program implements the steps in any of the above embodiments of the tongue image semantic segmentation method.

该计算机设备可包括，但不仅限于，处理器、存储器。本领域技术人员可以理解，图10仅仅是计算机设备的举例，并不构成对计算机设备的限定，计算机设备可以包括比图示更多或更少的部件，或者组合某些部件，或者不同的部件，例如还可以包括网络接口、显示屏和输入装置等。The computer device may include, but is not limited to, a processor and a memory. Those skilled in the art can understand that FIG. 10 is only an example of a computer device and does not constitute a limitation on the computer device. The computer device may include more or less components than shown in the figure, or may combine certain components, or use different components. , for example, it may also include a network interface, a display screen, an input device, etc.

所称处理器可以是CPU，该处理器还可以是其他通用处理器、数字信号处理器（Digital Signal Processor，DSP）、专用集成电路（Application Specific IntegratedCircuit，ASIC）、现成可编程门阵列（Field-Programmable Gate Array，FPGA）或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。The so-called processor can be a CPU, which can also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field- Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general-purpose processor may be a microprocessor or the processor may be any conventional processor, etc.

存储器包括可读存储介质、内存储器等，其中，内存储器可以是计算机设备的内存，内存储器为可读存储介质中的操作系统和计算机可读指令的运行提供环境。可读存储介质可以是计算机设备的硬盘，在另一些实施例中也可以是计算机设备的外部存储设备，例如，计算机设备上配备的插接式硬盘、智能存储卡（Smart Media Card，SMC）、安全数字（Secure Digital，SD）卡、闪存卡（Flash Card）等。进一步地，存储器还可以既包括计算机设备的内部存储单元也包括外部存储设备。存储器用于存储操作系统、应用程序、引导装载程序（BootLoader）、数据以及其他程序等，该其他程序如计算机程序的程序代码等。存储器还可以用于暂时地存储已经输出或者将要输出的数据。The memory includes readable storage media, internal memory, etc., wherein the internal memory can be the memory of the computer device, and the internal memory provides an environment for the operation of the operating system and computer-readable instructions in the readable storage medium. The readable storage medium may be a hard disk of the computer device, or in other embodiments may be an external storage device of the computer device, such as a plug-in hard disk, a smart memory card (SMC), Secure Digital (SD) card, Flash Card, etc. Further, the memory may also include both internal storage units of the computer device and external storage devices. The memory is used to store the operating system, application programs, boot loader (Boot Loader), data and other programs, such as the program code of the computer program. The memory can also be used to temporarily store data that has been output or is to be output.

所属领域的技术人员可以清楚地了解到，为了描述的方便和简洁，仅以上述各功能单元、模块的划分进行举例说明，实际应用中，可以根据需要而将上述功能分配由不同的功能单元、模块完成，即将装置的内部结构划分成不同的功能单元或模块，以完成以上描述的全部或者部分功能。实施例中的各功能单元、模块可以集成在一个处理单元中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个单元中，上述集成的单元既可以采用硬件的形式实现，也可以采用软件功能单元的形式实现。另外，各功能单元、模块的具体名称也只是为了便于相互区分，并不用于限制本发明的保护范围。Those skilled in the art can clearly understand that for the convenience and simplicity of description, only the division of the above functional units and modules is used as an example. In actual applications, the above functions can be allocated to different functional units and modules according to needs. Module completion means dividing the internal structure of the device into different functional units or modules to complete all or part of the functions described above. Each functional unit and module in the embodiment can be integrated into one processing unit, or each unit can exist physically alone, or two or more units can be integrated into one unit. The above-mentioned integrated unit can be hardware-based. It can also be implemented in the form of software functional units. In addition, the specific names of each functional unit and module are only for the convenience of distinguishing each other and are not used to limit the scope of the present invention.

上述装置中单元、模块的具体工作过程，可以参考前述方法实施例中的对应过程，在此不再赘述。集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储介质中。基于这样的理解，本发明实现上述实施例方法中的全部或部分流程，可以通过计算机程序来指令相关的硬件来完成，计算机程序可存储于一计算机可读存储介质中，该计算机程序在被处理器执行时，可实现上述方法实施例的步骤。其中，计算机程序包括计算机程序代码，计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。计算机可读介质至少可以包括：能够携带计算机程序代码的任何实体或装置、记录介质、计算机存储器、只读存储器（Read-Only Memory，ROM）、随机存取存储器（Random Access Memory，RAM）、电载波信号、电信信号以及软件分发介质。例如U盘、移动硬盘、磁碟或者光盘等。在某些司法管辖区，根据立法和专利实践，计算机可读介质不可以是电载波信号和电信信号。For the specific working processes of the units and modules in the above device, reference can be made to the corresponding processes in the foregoing method embodiments, which will not be described again here. Integrated units may be stored in a computer-readable storage medium if they are implemented in the form of software functional units and sold or used as independent products. Based on this understanding, the present invention can implement all or part of the processes in the methods of the above embodiments by instructing relevant hardware through a computer program. The computer program can be stored in a computer-readable storage medium, and the computer program can be processed after being processed. When the processor is executed, the steps of the above method embodiments can be implemented. Among them, the computer program includes computer program code, and the computer program code can be in the form of source code, object code, executable file or some intermediate form, etc. Computer-readable media may at least include: any entity or device capable of carrying computer program code, recording media, computer memory, read-only memory (ROM), random access memory (Random Access Memory, RAM), electronic Carrier signals, telecommunications signals, and software distribution media. For example, U disk, mobile hard disk, magnetic disk or CD, etc. In some jurisdictions, subject to legislation and patent practice, computer-readable media may not be electrical carrier signals and telecommunications signals.

本发明实现上述实施例方法中的全部或部分流程，也可以通过一种计算机程序产品来完成，当计算机程序产品在计算机设备上运行时，使得计算机设备执行时实现可实现上述方法实施例中的步骤。The present invention can implement all or part of the processes in the above method embodiments, and can also be completed through a computer program product. When the computer program product is run on a computer device, the computer device can implement the steps in the above method embodiments when executed. step.

在上述实施例中，对各个实施例的描述都各有侧重，某个实施例中没有详述或记载的部分，可以参见其它实施例的相关描述。In the above embodiments, each embodiment is described with its own emphasis. For parts that are not detailed or documented in a certain embodiment, please refer to the relevant descriptions of other embodiments.

本领域普通技术人员可以意识到，结合本文中所公开的实施例描述的各示例的单元及算法步骤，能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行，取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能，但是这种实现不应认为超出本发明的范围。Those of ordinary skill in the art will appreciate that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented with electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each specific application, but such implementations should not be considered to be beyond the scope of the present invention.

在本发明所提供的实施例中，应该理解到，所揭露的装置/计算机设备和方法，可以通过其它的方式实现。例如，以上所描述的装置/计算机设备实施例仅仅是示意性的，例如，模块或单元的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式，例如多个单元或组件可以结合或者可以集成到另一个系统，或一些特征可以忽略，或不执行。另一点，所显示或讨论的相互之间的耦合或直接耦合或通讯连接可以是通过一些接口，装置或单元的间接耦合或通讯连接，可以是电性，机械或其它的形式。In the embodiments provided by the present invention, it should be understood that the disclosed apparatus/computer equipment and methods can be implemented in other ways. For example, the apparatus/computer equipment embodiments described above are only illustrative. For example, the division of modules or units is only a logical function division. In actual implementation, there may be other division methods, such as multiple units or components. can be combined or can be integrated into another system, or some features can be ignored, or not implemented. On the other hand, the coupling or direct coupling or communication connection between each other shown or discussed may be through some interfaces, indirect coupling or communication connection of devices or units, which may be in electrical, mechanical or other forms.

作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。A unit described as a separate component may or may not be physically separate. A component shown as a unit may or may not be a physical unit, that is, it may be located in one place, or it may be distributed to multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

以上实施例仅用以说明本发明的技术方案，而非对其限制；尽管参照前述实施例对本发明进行了详细的说明，本领域的普通技术人员应当理解：其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分技术特征进行等同替换；而这些修改或者替换，并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围，均应包含在本发明的保护范围之内。The above embodiments are only used to illustrate the technical solutions of the present invention, but not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that they can still modify the technical solutions of the foregoing embodiments. Modifications are made to the recorded technical solutions, or equivalent substitutions are made to some of the technical features; however, these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of each embodiment of the present invention, and should all be included in the present invention. within the scope of protection.

Claims

1. The tongue picture semantic segmentation method is characterized by comprising the following steps of:

obtaining a target tongue picture, and carrying out feature extraction pretreatment on the target tongue picture to obtain a shallow feature picture of the target tongue picture;

Performing multi-scale feature extraction processing on the shallow feature map to obtain a multi-scale feature map of the target tongue picture, and performing channel dimension reduction processing on the multi-scale feature map to obtain an integrated feature map of the target tongue picture;

performing channel dimension reduction processing on the shallow feature map to obtain a first feature map of the target tongue picture, up-sampling the integrated feature map to obtain a second feature map of the target tongue picture, and performing feature fusion on the first feature map and the second feature map to obtain a deep feature map of the target tongue picture;

and inputting the shallow feature map and the deep feature map into a preset image segmentation model for semantic segmentation tasks to obtain a segmentation prediction result of the target tongue picture.

2. The tongue picture semantic segmentation method according to claim 1, wherein the performing multi-scale feature extraction processing on the shallow feature map to obtain a multi-scale feature map of the target tongue picture comprises:

respectively passing the shallow feature map through a pooling layer and at least one convolution layer;

when the number of the convolution layers is greater than two, each convolution layer is used for performing expansion convolution processing of different expansion rates on the shallow feature images to obtain each convolution feature image of the target tongue image picture;

The pooling layer is used for carrying out average pooling treatment on the shallow feature map to obtain a pooled feature map of the target tongue picture;

and carrying out feature fusion on each convolution feature map and the pooled feature map to obtain the multi-scale feature map.

3. The tongue picture semantic segmentation method according to claim 2, wherein after performing expansion convolution processing of different expansion rates on the shallow feature map to obtain each convolution feature map of the target tongue picture, the method further comprises:

for each convolution layer, calculating the channel dimension attention weight of the convolution feature map according to a preset channel attention module, and calculating the space dimension attention weight of the convolution feature map according to a preset space attention module;

and processing the convolution feature map by using the channel dimension attention weight and the space dimension attention weight to obtain a weighted feature map of the convolution feature map.

4. A tongue semantic segmentation method according to claim 3, wherein processing the convolution feature map with the channel dimension attention weights comprises:

carrying out average pooling operation on the convolution feature map to obtain a first pooling result;

Performing maximum pooling operation on the convolution feature map to obtain a second pooling result;

inputting the first pooling result and the second pooling result into a first multi-layer perceptron to obtain a first output characteristic diagram and a second output characteristic diagram respectively;

adding the first output feature map and the second output feature map to obtain the channel dimension attention weight;

multiplying the channel dimension attention weight with the convolution feature map to obtain a channel attention mechanism fusion feature map.

5. The tongue semantic segmentation method according to claim 4, further comprising, after the obtaining the channel attention mechanism fusion feature map:

carrying out maximum pooling and average pooling operation based on the channel attention mechanism fusion feature map to respectively obtain a maximum pooling result and an average pooling result;

combining the maximum pooling result and the average pooling result to obtain a spatial attention combination feature map;

performing convolution activation on the spatial attention merging feature map to obtain the spatial dimension attention weight;

multiplying the spatial dimension attention weight by the channel attention mechanism fusion feature map to obtain the weighted feature map.

6. The tongue picture semantic segmentation method according to claim 1, wherein the step of inputting the shallow feature map and the deep feature map to a preset image segmentation model for a semantic segmentation task to obtain a segmentation prediction result of the target tongue picture comprises:

performing convolution feature extraction on the deep feature map to obtain a feature representation result of the deep feature map;

performing point sampling around the pixel points of the characteristic representation result to obtain a plurality of preset sampling points;

according to the feature representation result, obtaining a feature vector of each preset sampling point;

and calculating the feature vector through a second multi-layer perceptron to obtain the segmentation prediction result.

7. The tongue semantic segmentation method according to claim 6, wherein the obtaining the feature vector of each preset sampling point comprises:

acquiring point coordinates of each preset sampling point according to the characteristic representation result;

sampling the shallow feature map according to the point coordinates to obtain a low-level feature vector of the preset sampling point;

sampling the feature representation result according to the point coordinates to obtain a high-level feature vector of the preset sampling point;

And combining the low-level characteristic vector and the high-level characteristic vector to obtain the characteristic vector of the preset sampling point.

8. The tongue picture semantic segmentation device is characterized by comprising:

the shallow feature map acquisition module is used for acquiring a target tongue picture, and carrying out feature extraction pretreatment on the target tongue picture to obtain a shallow feature map of the target tongue picture;

the integrated feature map acquisition module is used for carrying out multi-scale feature extraction processing on the shallow feature map to obtain a multi-scale feature map of the target tongue picture, and carrying out channel dimension reduction processing on the multi-scale feature map to obtain an integrated feature map of the target tongue picture;

the deep feature map acquisition module is used for carrying out channel dimension reduction processing on the shallow feature map to obtain a first feature map of the target tongue picture, up-sampling the integrated feature map to obtain a second feature map of the target tongue picture, and carrying out feature fusion on the first feature map and the second feature map to obtain a deep feature map of the target tongue picture;

the prediction result generation module is used for inputting the shallow feature image and the deep feature image into a preset image segmentation model for a semantic segmentation task to obtain a segmentation prediction result of the target tongue picture.

9. A computer device comprising a processor, a memory and a computer program stored in the memory and executable on the processor, the processor implementing the tongue semantic segmentation method according to any one of claims 1 to 7 when the computer program is executed by the processor.

10. A computer readable storage medium storing a computer program, wherein the computer program when executed by a processor implements the tongue semantic segmentation method according to any one of claims 1 to 7.