CN114399639A

CN114399639A - Semantic segmentation model training method, electronic equipment and storage medium

Info

Publication number: CN114399639A
Application number: CN202111644950.4A
Authority: CN
Inventors: 梁致远; 汪天才
Original assignee: Beijing Kuangshi Technology Co Ltd; Beijing Megvii Technology Co Ltd
Current assignee: Beijing Kuangshi Technology Co Ltd; Beijing Megvii Technology Co Ltd
Priority date: 2021-12-29
Filing date: 2021-12-29
Publication date: 2022-04-26

Abstract

The disclosure relates to a semantic segmentation model training method, an electronic device and a storage medium. The semantic segmentation model training method comprises the following steps: inputting the training image into a semantic segmentation model to obtain semantic segmentation features and a semantic segmentation prediction result; constructing a pseudo label of the training image based on a tree relation matrix, wherein the tree relation matrix comprises a first tree relation matrix determined based on the semantic segmentation features and a second tree relation matrix determined based on the training image; determining a first target loss based on the pseudo-label and the semantic segmentation prediction result; training the semantic segmentation model based on the first target loss until the semantic segmentation model converges. The method and the device for training the semantic segmentation model are efficient and fast.

Description

A semantic segmentation model training method, electronic device and storage medium

技术领域technical field

本公开涉及人工智能技术领域，尤其涉及一种语义分割模型训练方法、电子设备及存储介质。The present disclosure relates to the technical field of artificial intelligence, and in particular, to a method for training a semantic segmentation model, an electronic device and a storage medium.

背景技术Background technique

语义分割任务，旨在训练一个分割网络使其能为图像生成稠密的类别标签。Semantic segmentation task, which aims to train a segmentation network to generate dense class labels for images.

相关技术中，针对图像的语义分割，需要大量精细标注的训练数据，存在标注成本高的问题。In the related art, for semantic segmentation of images, a large amount of finely labeled training data is required, and there is a problem of high labeling cost.

发明内容SUMMARY OF THE INVENTION

为克服相关技术中存在的问题，本公开提供一种语义分割模型训练方法、电子设备及存储介质。In order to overcome the problems existing in the related art, the present disclosure provides a semantic segmentation model training method, an electronic device and a storage medium.

根据本公开实施例的第一方面，提供一种语义分割模型训练方法，包括：According to a first aspect of the embodiments of the present disclosure, a method for training a semantic segmentation model is provided, including:

将训练图像输入至语义分割模型，得到语义分割特征以及语义分割预测结果；基于树关系矩阵，构建所述训练图像的伪标签，所述树关系矩阵包括基于所述语义分割特征确定的第一树关系矩阵以及基于所述训练图像确定的第二树关系矩阵；基于所述伪标签和所述语义分割预测结果，确定第一目标损失；基于所述第一目标损失，对所述语义分割模型进行训练，直至所述语义分割模型收敛。Input the training image into the semantic segmentation model to obtain the semantic segmentation feature and the semantic segmentation prediction result; construct the pseudo-label of the training image based on the tree relationship matrix, and the tree relationship matrix includes the first tree determined based on the semantic segmentation feature relationship matrix and a second tree relationship matrix determined based on the training image; based on the pseudo-label and the semantic segmentation prediction result, determine a first target loss; based on the first target loss, perform the semantic segmentation model Train until the semantic segmentation model converges.

一种实施方式中，所述树关系矩阵采用如下方式确定：基于所述语义分割特征构建第一平面网格图，基于所述训练图像构建第二平面网格图，所述第一平面网格图中包括有与所述语义分割特征中各像素点一一对应的节点，所述第二平面网格图中包括有与所述训练图像中各像素点一一对应的节点；其中，所述第一平面网格图中包括的节点之间通过无向边连接，所述第二平面网格图中包括的节点之间通过无向边连接；基于所述第一平面网格图中各节点之间的相似度，确定所述第一平面网格图的第一最小生成树，并基于所述第二平面网格图中各节点之间的相似度，确定所述第二平面网格图的第二最小生成树，所述最小生成树中包括无环路存在的节点以及无向边；基于所述第一最小生成树中包括的节点以及无向边，确定所述语义分割特征对应的距离矩阵，并基于所述第二最小生成树中包括的节点以及无向边，确定所述训练图像对应的距离矩阵，所述距离矩阵中节点间的距离为节点间最短路径上各个无向边对应的节点相似度累加值；将所述语义分割特征对应的距离矩阵进行非负映射，得到所述第一树关系矩阵，并将所述训练图像对应的距离矩阵进行非负映射，得到所述第二树关系矩阵。In one embodiment, the tree relationship matrix is determined in the following manner: constructing a first plane grid graph based on the semantic segmentation feature, constructing a second plane grid graph based on the training image, and the first plane grid graph The graph includes nodes corresponding to each pixel in the semantic segmentation feature, and the second plane grid graph includes nodes corresponding to each pixel in the training image; wherein, the Nodes included in the first plane grid graph are connected by undirected edges, and nodes included in the second plane grid graph are connected by undirected edges; based on each node in the first plane grid graph The similarity between the first planar grid graph is determined, and the first minimum spanning tree of the first planar grid graph is determined, and the second planar grid graph is determined based on the similarity between the nodes in the second planar grid graph. The second minimum spanning tree, the minimum spanning tree includes nodes without loops and undirected edges; based on the nodes and undirected edges included in the first minimum spanning tree, determine the corresponding semantic segmentation feature. distance matrix, and based on the nodes and undirected edges included in the second minimum spanning tree, determine the distance matrix corresponding to the training image, the distance between the nodes in the distance matrix is each undirected edge on the shortest path between the nodes The corresponding cumulative value of node similarity; perform non-negative mapping on the distance matrix corresponding to the semantic segmentation feature to obtain the first tree relationship matrix, and perform non-negative mapping on the distance matrix corresponding to the training image to obtain the The second tree relationship matrix.

一种实施方式中，所述基于树关系矩阵，构建所述训练图像的伪标签，包括：将所述第一树关系矩阵和所述第二树关系矩阵，分别作为滤波核函数，对所述语义分割预测结果进行滤波，得到所述训练图像的伪标签。In an embodiment, the constructing the pseudo-label of the training image based on the tree relationship matrix includes: using the first tree relationship matrix and the second tree relationship matrix as filter kernel functions respectively, Semantic segmentation prediction results are filtered to obtain pseudo-labels for the training images.

一种实施方式中，所述将所述第一树关系矩阵和所述第二树关系矩阵，分别作为滤波核函数，对所述语义分割预测结果进行滤波，包括：对所述语义分割预测结果，以所述第二树关系矩阵为滤波核函数进行初始滤波；将初始滤波后的语义分割预测结果，以所述第一树关系矩阵为滤波核函数再次进行滤波。In one embodiment, the filtering of the semantic segmentation prediction result by using the first tree relationship matrix and the second tree relationship matrix as filter kernel functions respectively includes: filtering the semantic segmentation prediction result. , using the second tree relationship matrix as a filter kernel function to perform initial filtering; the semantic segmentation prediction result after the initial filtering is filtered again with the first tree relationship matrix as a filter kernel function.

一种实施方式中，所述方法还包括：对所述语义分割特征进行尺寸调整，以使调整后的语义分割特征对应的分辨率与所述语义分割预测结果对应的分辨率一致；对所述训练图像进行尺寸调整，以使调整后的训练图像的分辨率与所述语义分割预测结果对应的分辨率一致。In one embodiment, the method further includes: adjusting the size of the semantic segmentation feature, so that the resolution corresponding to the adjusted semantic segmentation feature is consistent with the resolution corresponding to the semantic segmentation prediction result; The training image is resized so that the resolution of the adjusted training image is consistent with the resolution corresponding to the semantic segmentation prediction result.

一种实施方式中，通过如下公式确定第一目标损失：其中，为第一目标损失函数，所述为所述语义分割预测结果中位置索引为的像素的预测结果，为伪标签中位置索引为的标签值，代表伪标签中未标注像素集，为未标注像素的语义分割预测结果与伪标签之间的差值绝对值。In one embodiment, the first target loss is determined by the following formula: wherein, is the first target loss function, the described is the prediction result of the pixel whose position index is in the semantic segmentation prediction result, and is the position index in the pseudo tag is The label value of , represents the set of unlabeled pixels in the pseudo-label, and is the absolute value of the difference between the semantic segmentation prediction result of the unlabeled pixels and the pseudo-label.

一种实施方式中，所述方法还包括：获取所述训练图像的稀疏标签，并根据所述稀疏标签，使用交叉熵损失函数计算所述语义分割预测结果的第二目标损失；基于所述第二目标损失对所述语义分割模型进行训练，直至所述语义分割模型收敛。In one embodiment, the method further includes: acquiring the sparse label of the training image, and according to the sparse label, using a cross-entropy loss function to calculate the second target loss of the semantic segmentation prediction result; The two-target loss trains the semantic segmentation model until the semantic segmentation model converges.

根据本公开实施例的第二方面，提供一种图像分割方法，包括：According to a second aspect of the embodiments of the present disclosure, an image segmentation method is provided, including:

将待分割图像输入至语义分割模型，所述语义分割模型采用上述任意一项语义分割模型训练方法预先训练得到；基于所述语义分割模型的输出结果，确定所述待分割图像的分割结果。Input the image to be segmented into the semantic segmentation model, and the semantic segmentation model is pre-trained using any one of the above semantic segmentation model training methods; based on the output result of the semantic segmentation model, the segmentation result of the to-be-segmented image is determined.

根据本公开实施例的第三方面，提供一种语义分割模型训练装置，包括：According to a third aspect of the embodiments of the present disclosure, there is provided an apparatus for training a semantic segmentation model, including:

处理单元，用于将训练图像输入至语义分割模型，得到语义分割特征以及语义分割预测结果；还用于基于树关系矩阵，构建所述训练图像的伪标签，所述树关系矩阵包括基于所述语义分割特征确定的第一树关系矩阵以及基于所述训练图像确定的第二树关系矩阵；确定单元，用于基于所述伪标签和所述语义分割预测结果，确定第一目标损失；所述处理单元，还用于基于所述第一目标损失，对所述语义分割模型进行训练，直至所述语义分割模型收敛。The processing unit is used to input the training image into the semantic segmentation model to obtain the semantic segmentation feature and the semantic segmentation prediction result; it is also used to construct the pseudo-label of the training image based on the tree relationship matrix, and the tree relationship matrix includes a first tree relationship matrix determined by the semantic segmentation feature and a second tree relationship matrix determined based on the training image; a determining unit for determining a first target loss based on the pseudo-label and the semantic segmentation prediction result; the The processing unit is further configured to train the semantic segmentation model based on the first target loss until the semantic segmentation model converges.

一种实施方式中，所述处理单元采用如下方式确定树关系矩阵：基于所述语义分割特征构建第一平面网格图，基于所述训练图像构建第二平面网格图，所述第一平面网格图中包括有与所述语义分割特征中各像素点一一对应的节点，所述第二平面网格图中包括有与所述训练图像中各像素点一一对应的节点；其中，针对所述第一平面网格图中包括的节点或所述第二平面网格图中包括的节点，所述节点之间通过无向边连接；基于所述第一平面网格图中各节点之间的相似度，确定所述第一平面网格图的第一最小生成树，并基于所述第二平面网格图中各节点之间的相似度，确定所述第二平面网格图的第二最小生成树，所述最小生成树中包括无环路存在的节点以及无向边；基于所述第一最小生成树中包括的节点以及无向边，确定所述语义分割特征对应的距离矩阵，并基于所述第二最小生成树中包括的节点以及无向边，确定所述训练图像对应的距离矩阵，所述距离矩阵中节点间的距离为节点间最短路径上各个无向边对应的节点相似度累加值；将所述语义分割特征对应的距离矩阵进行非负映射，得到所述第一树关系矩阵，并将所述训练图像对应的距离矩阵进行非负映射，得到所述第二树关系矩阵。In one embodiment, the processing unit determines the tree relationship matrix in the following manner: constructing a first plane grid graph based on the semantic segmentation feature, constructing a second plane grid graph based on the training image, and the first plane grid graph. The grid graph includes nodes that correspond to each pixel in the semantic segmentation feature, and the second plane grid graph includes nodes that correspond to each pixel in the training image; wherein, For the nodes included in the first plane grid graph or the nodes included in the second plane grid graph, the nodes are connected by undirected edges; based on each node in the first plane grid graph The similarity between the first planar grid graph is determined, and the first minimum spanning tree of the first planar grid graph is determined, and the second planar grid graph is determined based on the similarity between the nodes in the second planar grid graph. The second minimum spanning tree, the minimum spanning tree includes nodes without loops and undirected edges; based on the nodes and undirected edges included in the first minimum spanning tree, determine the corresponding semantic segmentation feature. distance matrix, and based on the nodes and undirected edges included in the second minimum spanning tree, determine the distance matrix corresponding to the training image, the distance between the nodes in the distance matrix is each undirected edge on the shortest path between the nodes The corresponding cumulative value of node similarity; perform non-negative mapping on the distance matrix corresponding to the semantic segmentation feature to obtain the first tree relationship matrix, and perform non-negative mapping on the distance matrix corresponding to the training image to obtain the The second tree relationship matrix.

一种实施方式中，所述处理单元采用如下方式基于树关系矩阵，构建所述训练图像的伪标签：将所述第一树关系矩阵和所述第二树关系矩阵，分别作为滤波核函数，对所述语义分割预测结果进行滤波，得到所述训练图像的伪标签。In one embodiment, the processing unit constructs the pseudo-label of the training image based on the tree relationship matrix in the following manner: the first tree relationship matrix and the second tree relationship matrix are used as filter kernel functions, respectively, The semantic segmentation prediction result is filtered to obtain the pseudo-label of the training image.

一种实施方式中，所述处理单元采用如下方式将所述第一树关系矩阵和所述第二树关系矩阵，分别作为滤波核函数，对所述语义分割预测结果进行滤波：对所述语义分割预测结果，以所述第二树关系矩阵为滤波核函数进行初始滤波；将初始滤波后的语义分割预测结果，以所述第一树关系矩阵为滤波核函数再次进行滤波。In one embodiment, the processing unit uses the first tree relationship matrix and the second tree relationship matrix as filter kernel functions, respectively, to filter the semantic segmentation prediction result in the following manner: For the segmentation prediction result, initial filtering is performed using the second tree relationship matrix as a filter kernel function; the semantic segmentation prediction result after the initial filtering is filtered again using the first tree relationship matrix as a filter kernel function.

一种实施方式中，所述处理单元还用于：对所述语义分割特征进行尺寸调整，以使调整后的语义分割特征对应的分辨率与所述语义分割预测结果对应的分辨率一致；对所述训练图像进行尺寸调整，以使调整后的训练图像的分辨率与所述语义分割预测结果对应的分辨率一致。In one embodiment, the processing unit is further configured to: adjust the size of the semantic segmentation feature, so that the resolution corresponding to the adjusted semantic segmentation feature is consistent with the resolution corresponding to the semantic segmentation prediction result; The training image is resized so that the resolution of the adjusted training image is consistent with the resolution corresponding to the semantic segmentation prediction result.

一种实施方式中，所述处理单元还用于：获取所述训练图像的稀疏标签，并根据所述稀疏标签，使用交叉熵损失函数计算所述语义分割预测结果的第二目标损失；基于所述第二目标损失对所述语义分割模型进行训练，直至所述语义分割模型收敛。In one embodiment, the processing unit is further configured to: acquire the sparse label of the training image, and use a cross-entropy loss function to calculate the second target loss of the semantic segmentation prediction result according to the sparse label; The second target loss is used to train the semantic segmentation model until the semantic segmentation model converges.

根据本公开实施例的第四方面，提供一种图像分割装置，包括：According to a fourth aspect of the embodiments of the present disclosure, there is provided an image segmentation apparatus, including:

输入单元，用于将待分割图像输入至语义分割模型，所述语义分割模型采用上述任意一项语义分割模型训练方法预先训练得到；确定单元，用于基于所述语义分割模型的输出结果，确定所述待分割图像的分割结果。The input unit is used to input the image to be segmented into the semantic segmentation model, and the semantic segmentation model is pre-trained by using any one of the above-mentioned semantic segmentation model training methods; the determination unit is used to determine the output result of the semantic segmentation model based on the The segmentation result of the image to be segmented.

根据本公开实施例第五方面，提供一种电子设备，包括：According to a fifth aspect of the embodiments of the present disclosure, an electronic device is provided, including:

处理器；用于存储处理器可执行指令的存储器；processor; memory for storing processor-executable instructions;

其中，所述处理器被配置为：执行第一方面或者第一方面任意一种实施方式中所述的语义分割模型训练方法，或执行第二方面或者第二方面任意一种实施方式中所述的图像分割方法。The processor is configured to: execute the semantic segmentation model training method described in the first aspect or any embodiment of the first aspect, or execute the second aspect or any embodiment of the second aspect image segmentation method.

根据本公开实施例第六方面，提供一种存储介质，所述存储介质中存储有指令，当所述存储介质中的指令由处理器执行时，使得处理器能够执行第一方面或者第一方面任意一种实施方式中所述的语义分割模型训练方法，或执行第二方面或者第二方面任意一种实施方式中所述的图像分割方法。According to a sixth aspect of the embodiments of the present disclosure, a storage medium is provided, where instructions are stored in the storage medium, and when the instructions in the storage medium are executed by a processor, the processor can execute the first aspect or the first aspect The semantic segmentation model training method described in any one of the embodiments, or the image segmentation method described in the second aspect or any one of the embodiments of the second aspect.

根据本公开实施例第七方面，提供一种计算机程序产品，包括计算机程序，所述计算机程序用于在被处理器执行时实现第一方面或者第一方面任意一种实施方式中所述的语义分割模型训练方法，或实现第二方面或者第二方面任意一种实施方式中所述的图像分割方法。According to a seventh aspect of the embodiments of the present disclosure, a computer program product is provided, including a computer program, the computer program being configured to implement the semantics described in the first aspect or any one of the implementation manners of the first aspect when executed by a processor A segmentation model training method, or the image segmentation method described in the second aspect or any one of the implementation manners of the second aspect.

本公开的实施例提供的技术方案可以包括以下有益效果：可以针对训练图像以及针对训练图像对应的语义分割特征，分别生成树关系矩阵，进而通过树关系矩阵构建用于训练语义分割模型的伪标签。进一步的，可以通过伪标签和语义分割预测结果，计算目标损失，并以计算得到的目标损失对语义分割模型进行训练，该方法通过构建伪标签的方式，可以实现针对语义分割模型的快速训练。The technical solutions provided by the embodiments of the present disclosure may include the following beneficial effects: tree relationship matrices can be generated for training images and semantic segmentation features corresponding to the training images, respectively, and then pseudo-labels for training a semantic segmentation model can be constructed through the tree relationship matrix. . Further, the target loss can be calculated through the pseudo-label and the semantic segmentation prediction result, and the semantic segmentation model can be trained with the calculated target loss. This method can realize the rapid training of the semantic segmentation model by constructing the pseudo-label.

应当理解的是，以上的一般描述和后文的细节描述仅是示例性和解释性的，并不能限制本公开。It is to be understood that the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the present disclosure.

附图说明Description of drawings

此处的附图被并入说明书中并构成本说明书的一部分，示出了符合本公开的实施例，并与说明书一起用于解释本公开的原理。The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description serve to explain the principles of the disclosure.

图1是根据一示例性实施例示出的一种语义分割模型训练方法的流程图。Fig. 1 is a flowchart of a method for training a semantic segmentation model according to an exemplary embodiment.

图2是根据一示例性实施例示出的一种确定树关系矩阵的方法流程图。Fig. 2 is a flowchart of a method for determining a tree relationship matrix according to an exemplary embodiment.

图3是根据一示例性实施例示出的构建树关系矩阵的流程示意图。FIG. 3 is a schematic flowchart of constructing a tree relationship matrix according to an exemplary embodiment.

图4是根据一示例性实施例示出的一种通过树关系矩阵，构建训练图像的伪标签的方法流程图。Fig. 4 is a flowchart of a method for constructing pseudo-labels of training images through a tree relationship matrix according to an exemplary embodiment.

图5是根据一示例性实施例示出的另一种语义分割模型训练方法的流程图。Fig. 5 is a flowchart showing another method for training a semantic segmentation model according to an exemplary embodiment.

图6是根据一示例性实施例示出的一种通过树关系矩阵，构建训练图像的伪标签的方法流程图。Fig. 6 is a flowchart of a method for constructing pseudo-labels of training images through a tree relationship matrix according to an exemplary embodiment.

图7是根据一示例性实施例示出的一种通过树关系矩阵，构建训练图像的伪标签的方法流程图。Fig. 7 is a flowchart of a method for constructing pseudo-labels of training images through a tree relationship matrix according to an exemplary embodiment.

图8是根据一示例性实施例示出的一种训练语义分割模型的流程示意图。Fig. 8 is a schematic flowchart of training a semantic segmentation model according to an exemplary embodiment.

图9是根据一示例性实施例示出的一种图像分割方法的流程图。Fig. 9 is a flowchart of an image segmentation method according to an exemplary embodiment.

图10是根据一示例性实施例示出的一种语义分割模型训练装置框图。Fig. 10 is a block diagram of an apparatus for training a semantic segmentation model according to an exemplary embodiment.

图11是根据一示例性实施例示出的一种图像分割装置框图。Fig. 11 is a block diagram of an image segmentation apparatus according to an exemplary embodiment.

图12是根据一示例性实施例示出的一种用于图像处理的电子设备框图。Fig. 12 is a block diagram of an electronic device for image processing according to an exemplary embodiment.

具体实施方式Detailed ways

这里将详细地对示例性实施例进行说明，其示例表示在附图中。下面的描述涉及附图时，除非另有表示，不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本公开相一致的所有实施方式。相反，它们仅是与如所附权利要求书中所详述的、本公开的一些方面相一致的装置和方法的例子。Exemplary embodiments will be described in detail herein, examples of which are illustrated in the accompanying drawings. Where the following description refers to the drawings, the same numerals in different drawings refer to the same or similar elements unless otherwise indicated. The implementations described in the illustrative examples below are not intended to represent all implementations consistent with this disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as recited in the appended claims.

在附图中，自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。所描述的实施例是本公开一部分实施例，而不是全部的实施例。下面通过参考附图描述的实施例是示例性的，旨在用于解释本公开，而不能理解为对本公开的限制。基于本公开中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本公开保护的范围。下面结合附图对本公开的实施例进行详细说明。Throughout the drawings, the same or similar reference numbers refer to the same or similar elements or elements having the same or similar functions. The described embodiments are some, but not all, of the embodiments of the present disclosure. The embodiments described below with reference to the accompanying drawings are exemplary, and are intended to explain the present disclosure and should not be construed as a limitation of the present disclosure. Based on the embodiments in the present disclosure, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present disclosure. The embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings.

本公开实施例提供的语义分割模型训练方法可以应用于针对图像的语义分割场景中。例如，可以应用于基于稀疏标签的语义分割场景中。The semantic segmentation model training method provided by the embodiments of the present disclosure can be applied to a semantic segmentation scene for images. For example, it can be applied to sparse label-based semantic segmentation scenarios.

语义分割任务，旨在训练一个分割网络使其能为图像生成稠密的类别标签。相关技术中，针对图像的语义分割，需要大量精细标注的训练数据，存在标注成本高的问题。Semantic segmentation task, which aims to train a segmentation network to generate dense class labels for images. In the related art, for semantic segmentation of images, a large amount of finely labeled training data is required, and there is a problem of high labeling cost.

为了降低标注成本并保持较好的分割质量，相关技术着手研究如何利用稀疏标签(例如像素点、画笔或像素块标注)训练分割网络。相关技术中，基于稀疏标签的语义分割方法主要采用如下方式：(1)构建辅助任务，增加额外的任务分支来检测语义边界，并通过检测得到的信息辅助进行语义分割；(2)基于多阶段迭代训练，得到伪标签，通过设计算法补全原始稀疏标签，从而增加有监督样本的数量；(3)正则化损失函数，利用图像低级别视觉信息(例如像素的颜色、位置坐标)设计损失函数，对分割结果进行约束。In order to reduce the cost of labeling and maintain good segmentation quality, related technologies start to study how to use sparse labels (such as pixel point, brush or pixel block labels) to train segmentation networks. In the related art, semantic segmentation methods based on sparse labels mainly adopt the following methods: (1) construct auxiliary tasks, add additional task branches to detect semantic boundaries, and use the detected information to assist in semantic segmentation; (2) based on multi-stage Iterative training to obtain pseudo-labels, and design algorithms to complement the original sparse labels, thereby increasing the number of supervised samples; (3) Regularize the loss function, using the low-level visual information of the image (such as pixel color, position coordinates) to design a loss function , to constrain the segmentation results.

相关技术中，基于稀疏标签进行语义分割的方式，存在如下问题：(1)辅助任务分支的预测误差通常会影响分割质量，最终分割结果较差；(2)伪标签生成通常需要多阶段迭代训练，训练所需的时间成本较高；(3)语义分割特征为高级别语义信息，通过低级别视觉信息设计损失函数的方式，会因低级别视觉信息与高级别语义信息之间的差别，导致语义分割结果较差。In the related art, the semantic segmentation method based on sparse labels has the following problems: (1) The prediction error of the auxiliary task branch usually affects the segmentation quality, and the final segmentation result is poor; (2) The pseudo-label generation usually requires multi-stage iterative training. , the time cost required for training is relatively high; (3) The semantic segmentation features are high-level semantic information, and the loss function is designed through low-level visual information. The difference between low-level visual information and high-level semantic information will lead to Semantic segmentation results are poor.

鉴于此，本公开提供了一种高效快捷的语义分割模型训练方法，可以针对训练图像以及针对训练图像对应的语义分割特征，分别生成树关系矩阵，进而通过树关系矩阵构建用于训练语义分割模型的伪标签。由于本公开实施例提供的语义分割模型训练方法，针对语义分割模型的训练过程，无需构建及引用辅助任务，因此，收敛后的语义分割模型可以得到较优的语义分割结果。并且，由于语义分割特征为高级别结构化信息，训练图像中的颜色信息为低级别结构化信息，因此，通过训练图像以及语义分割特征得到的伪标签，受信息级别差异的影响较小。此外，由于伪标签通过树关系矩阵生成，因此，伪标签具有较高的语义分割精度，可以在语义分割模型的训练过程中，为语义分割模型提供更优的参考值。进一步的，以通过伪标签及语义分割预测结果确定的第一目标损失，对语义分割模型进行训练，可以使语义分割模型在收敛后，具有较高的语义分割精确度。In view of this, the present disclosure provides an efficient and fast semantic segmentation model training method, which can generate a tree relationship matrix for training images and semantic segmentation features corresponding to the training images respectively, and then construct a tree relationship matrix for training a semantic segmentation model. pseudo-label. Since the semantic segmentation model training method provided by the embodiments of the present disclosure does not need to construct and reference auxiliary tasks for the training process of the semantic segmentation model, the converged semantic segmentation model can obtain better semantic segmentation results. Moreover, since the semantic segmentation features are high-level structured information, and the color information in the training images is low-level structured information, the pseudo-labels obtained from the training images and semantic segmentation features are less affected by the difference in information levels. In addition, since the pseudo-labels are generated by the tree relationship matrix, the pseudo-labels have high semantic segmentation accuracy and can provide better reference values for the semantic segmentation model during the training process of the semantic segmentation model. Further, training the semantic segmentation model with the first target loss determined by the pseudo-label and the semantic segmentation prediction result can make the semantic segmentation model have higher semantic segmentation accuracy after convergence.

本公开以下为便于描述，将通过语义分割特征确定的树关系矩阵称为第一树关系矩阵，将通过训练图像确定的树关系矩阵称为第二树关系矩阵，将通过伪标签以及语义分割预测结果得到的损失值称为第一目标损失。Hereinafter, for the convenience of description, the tree relationship matrix determined by the semantic segmentation feature is referred to as the first tree relationship matrix, the tree relationship matrix determined by the training image is referred to as the second tree relationship matrix, and the pseudo-label and semantic segmentation prediction will be The resulting loss value is called the first objective loss.

图1是根据一示例性实施例示出的一种语义分割模型训练方法的流程图，如图1所示，包括以下步骤。Fig. 1 is a flowchart of a method for training a semantic segmentation model according to an exemplary embodiment. As shown in Fig. 1 , the method includes the following steps.

在步骤S11中，将训练图像输入至语义分割模型，得到语义分割特征以及语义分割预测结果。In step S11, the training image is input to the semantic segmentation model, and the semantic segmentation feature and the semantic segmentation prediction result are obtained.

其中，语义分割预测结果例如可以是对语义分割特征进行分类得到的。The semantic segmentation prediction result may be obtained, for example, by classifying semantic segmentation features.

在步骤S12中，基于树关系矩阵，构建训练图像的伪标签。In step S12, a pseudo-label of the training image is constructed based on the tree relationship matrix.

本公开实施例中，树关系矩阵包括通过语义分割特征确定的第一树关系矩阵，以及包括通过训练图像确定的第二树关系矩阵。In the embodiment of the present disclosure, the tree relationship matrix includes a first tree relationship matrix determined by semantic segmentation features, and a second tree relationship matrix determined by training images.

示例的，通过训练图像确定第二树关系矩阵，可以理解为通过训练图像中包含的颜色信息确定第二树关系矩阵。其中，颜色信息例如可以是训练图像中各像素位置对应的三基色(Red Green Blue，RGB)颜色空间值。Exemplarily, determining the second tree relationship matrix by using the training image can be understood as determining the second tree relationship matrix by using the color information contained in the training image. The color information may be, for example, three primary colors (Red Green Blue, RGB) color space values corresponding to each pixel position in the training image.

本公开实施例中，第一树关系矩阵可以表征各像素位置之间的语义分割特征关系，第二树关系矩阵可以表征训练图像中各像素位置之间的颜色信息关系。换言之，第一树关系矩阵以及第二树关系矩阵，包括有低级别的颜色信息以及高级别的语义分割特征。通过第一树关系矩阵以及第二树关系矩阵构建训练图像的伪标签，可以使伪标签具有较高的语义分割精度。In the embodiment of the present disclosure, the first tree relationship matrix may represent the semantic segmentation feature relationship between pixel positions, and the second tree relationship matrix may represent the color information relationship between pixel positions in the training image. In other words, the first tree relationship matrix and the second tree relationship matrix include low-level color information and high-level semantic segmentation features. The pseudo-label of the training image is constructed by the first tree relationship matrix and the second tree relationship matrix, so that the pseudo-label can have higher semantic segmentation accuracy.

在步骤S13中，基于伪标签和语义分割预测结果，确定第一目标损失。In step S13, a first target loss is determined based on the pseudo-label and the semantic segmentation prediction result.

在步骤S14中，基于第一目标损失，对语义分割模型进行训练，直至语义分割模型收敛。In step S14, based on the first target loss, the semantic segmentation model is trained until the semantic segmentation model converges.

通过本公开实施例提供的语义分割模型训练方法，可以得到分割精度较高的语义分割模型，后续通过收敛后的语义分割模型执行针对图像的语义分割任务，可以得到满足实际需求的语义分割结果。Through the semantic segmentation model training method provided by the embodiments of the present disclosure, a semantic segmentation model with higher segmentation accuracy can be obtained, and then the semantic segmentation task for images can be performed through the converged semantic segmentation model to obtain a semantic segmentation result that meets actual needs.

示例的，可以通过如下方式确定第一树关系矩阵或第二树关系矩阵。本公开以下为便于描述将由语义分割特征构建的平面网格图称为第一平面网格图，将由训练图像构建的平面网格图称为第二平面网格图，将通过第一平面网格图得到的最小生成树称为第一最小生成树，将通过第二平面网格图得到的最小生成树称为第二最小生成树。Exemplarily, the first tree relationship matrix or the second tree relationship matrix may be determined in the following manner. In the present disclosure, for the convenience of description, the plane grid graph constructed by the semantic segmentation feature is referred to as the first plane grid graph, and the plane grid graph constructed from the training image is referred to as the second plane grid graph. The minimum spanning tree obtained from the graph is called the first minimum spanning tree, and the minimum spanning tree obtained from the second plane grid graph is called the second minimum spanning tree.

图2是根据一示例性实施例示出的一种确定树关系矩阵的方法流程图，如图2所示，包括以下步骤S21至步骤S24。此外，图3是根据一示例性实施例示出的构建树关系矩阵的流程示意图。为便于理解，以下参照图3，对图2中各步骤进行解释说明。Fig. 2 is a flowchart of a method for determining a tree relationship matrix according to an exemplary embodiment. As shown in Fig. 2 , the method includes the following steps S21 to S24. In addition, FIG. 3 is a schematic flowchart of constructing a tree relationship matrix according to an exemplary embodiment. For ease of understanding, each step in FIG. 2 will be explained below with reference to FIG. 3 .

在步骤S21中，基于语义分割特征构建第一平面网格图，基于训练图像构建第二平面网格图。In step S21, a first plane grid map is constructed based on the semantic segmentation feature, and a second plane grid map is constructed based on the training image.

本公开实施例中，第一平面网格图中包括有与语义分割特征中各像素点一一对应的节点，第二平面网格图中包括有与训练图像中各像素点一一对应的节点。示例的，平面网格图如图3中①所示(示例的，①中涉及的平面网格图为四连通平面网格图，此外，平面网格图还可以为例如八连通平面网格图等其他类型的平面网格图，本公开对平面网格图的具体类型不做限制)，节点如①中包括的n1至n16所示，平面网格图中各节点之间通过无向边连接。例如，其中，第一平面网格图中包括的节点之间通过无向边连接，第二平面网格图中包括的节点之间通过无向边连接。示例的，两两节点之间所连接的线段(例如连接与n15与n16之间的线段)，即为上述涉及的无向边。In the embodiment of the present disclosure, the first plane grid graph includes nodes that correspond to each pixel in the semantic segmentation feature, and the second plane grid graph includes nodes that correspond to each pixel in the training image. . Illustratively, the plane grid diagram is shown in ① in FIG. 3 (in an example, the plane grid diagram involved in ① is a four-connected plane grid diagram, in addition, the plane grid diagram can also be, for example, an eight-connected plane grid diagram. and other types of plane grid graphs, the present disclosure does not limit the specific types of plane grid graphs), the nodes are shown as n1 to n16 included in ①, and the nodes in the plane grid graph are connected by undirected edges . For example, the nodes included in the first plane grid graph are connected by undirected edges, and the nodes included in the second plane grid graph are connected by undirected edges. Exemplarily, a line segment connected between two nodes (for example, a line segment connecting with n15 and n16) is the undirected edge mentioned above.

在步骤S22中，基于第一平面网格图中各节点之间的相似度，确定第一平面网格图的第一最小生成树，并基于第二平面网格图中各节点之间的相似度，确定第二平面网格图的第二最小生成树。In step S22, a first minimum spanning tree of the first plane grid graph is determined based on the similarity between the nodes in the first plane grid graph, and based on the similarity between the nodes in the second plane grid graph degree to determine the second minimum spanning tree of the second planar grid graph.

本公开实施例中，可以针对平面网格图中的任意两个节点，确定节点间相似度，并由连通相似度最大的两节点的无向边开始，依次删除无向边，直至平面网格图中无环路，得到最小生成树。示例的，由平面网格图确定的最小生成树，如图3中②所示。其中，平面网格图中无环路，可以理解为平面网格图中不存在由四条无向边组成的四边形。例如，针对第一平面网格图，可以通过第一平面网格图中各节点之间的相似度，对第一平面网格图进行删边操作，直至第一平面网格图中不存在有四条无向边组成的四边形，得到第一最小生成树。此外，确定第二最小生成树的方式与上述涉及的确定第一最小生成树的方式相似，在此不做赘述。In the embodiment of the present disclosure, the similarity between nodes can be determined for any two nodes in the plane grid graph, and the undirected edge of the two nodes with the largest connected similarity can be deleted in turn, until the plane grid There are no loops in the graph, and a minimum spanning tree is obtained. Illustratively, the minimum spanning tree determined by the planar grid graph is shown in ② in Figure 3. Among them, there are no loops in the plane grid graph, which can be understood as the fact that there is no quadrilateral composed of four undirected edges in the plane grid graph. For example, for the first plane grid graph, an edge deletion operation may be performed on the first plane grid graph according to the similarity between the nodes in the first plane grid graph, until there is no A quadrilateral consisting of four undirected edges is the first minimum spanning tree. In addition, the manner of determining the second minimum spanning tree is similar to the manner of determining the first minimum spanning tree mentioned above, and details are not described herein.

在步骤S23中，基于第一最小生成树中包括的节点以及无向边，确定语义分割特征对应的距离矩阵，并基于第二最小生成树中包括的节点以及无向边，确定训练图像对应的距离矩阵。In step S23, the distance matrix corresponding to the semantic segmentation feature is determined based on the nodes and undirected edges included in the first minimum spanning tree, and based on the nodes and undirected edges included in the second minimum spanning tree, the corresponding distance matrix of the training image is determined distance matrix.

本公开实施例中，距离矩阵中节点间的距离为节点间最短路径上各个无向边对应的节点相似度累加值。示例的，如图3所示，以n3与n7之间的距离为例，②中虚线所示的多条无向边对应的节点相似度累加值，即为n3与n7之间的距离。针对最小生成树中各节点，以两两为一组抽取方式，确定抽取方式对应的多种不同节点组合，进而针对每一节点组合分别确定距离，得到最终的距离矩阵。In the embodiment of the present disclosure, the distance between nodes in the distance matrix is the accumulated value of the node similarity corresponding to each undirected edge on the shortest path between the nodes. As an example, as shown in Figure 3, taking the distance between n3 and n7 as an example, the cumulative value of the node similarity corresponding to the multiple undirected edges indicated by the dotted line in ② is the distance between n3 and n7. For each node in the minimum spanning tree, a pair of extraction methods are used as a group to determine a variety of different node combinations corresponding to the extraction methods, and then the distance is determined for each node combination to obtain the final distance matrix.

在步骤S24中，将语义分割特征对应的距离矩阵进行非负映射，得到第一树关系矩阵，并将训练图像对应的距离矩阵进行非负映射，得到第二树关系矩阵。In step S24, non-negative mapping is performed on the distance matrix corresponding to the semantic segmentation feature to obtain a first tree relationship matrix, and non-negative mapping is performed on the distance matrix corresponding to the training image to obtain a second tree relationship matrix.

示例的，针对语义分割特征对应的语义分割特征距离矩阵(示例以D^high∈R^HW*HW表示)，可以通过A^high＝exp(-D^high)∈R^HW*HW的方式得到第一树关系矩阵(示例以A^high表示)。针对训练图像对应的距离矩阵(示例以D^low∈R^HW*HW表示)，可以通过A^low＝exp(-D^low/σ)∈R^HW*HW的方式得到第二树关系矩阵(示例以A^low表示)。其中，∈R^HW*HW表示矩阵的维度，σ为预设常量。示例的，针对图3中②所示的最小生成树，对应的树关系矩阵如图3中③所示。Illustratively, for the semantic segmentation feature distance matrix corresponding to the semantic segmentation feature (an example is represented by D ^high ∈ R ^HW*HW ), the first tree relationship can be obtained in the manner of A ^high =exp(-D ^high )∈R ^HW*HW Matrix (examples are denoted by A ^high ). For the distance matrix corresponding to the training image (the example is represented by D ^low ∈ R ^HW*HW ), the second tree relationship matrix can be obtained by A ^low =exp(-D ^low /σ)∈R ^HW*HW (the example is represented by A ^low indicates). Among them, ∈R ^HW*HW represents the dimension of the matrix, and σ is a preset constant. Exemplarily, for the minimum spanning tree shown in ② in FIG. 3 , the corresponding tree relationship matrix is shown in ③ in FIG. 3 .

本公开实施例提供的语义分割模型训练方法，可以通过构建平面网格图的方式构建最小生成树，进而通过最小生成树计算各节点间距离，得到最终的距离矩阵。进一步的，对得到的距离矩阵进行非负映射，可以得到用于计算伪标签的最小生成树。The semantic segmentation model training method provided by the embodiments of the present disclosure can construct a minimum spanning tree by constructing a plane grid graph, and then calculate the distance between each node through the minimum spanning tree to obtain a final distance matrix. Further, by performing non-negative mapping on the obtained distance matrix, a minimum spanning tree for calculating pseudo-labels can be obtained.

示例的，可以将第一树关系矩阵(示例以A^high表示)和第二树关系矩阵(示例以A^low表示)，分别作为滤波核函数，对语义分割预测结果进行滤波，以此得到训练图像的伪标签。As an example, the first tree relationship matrix (the example is represented by A ^high ) and the second tree relationship matrix (the example is represented by A ^low ) can be used as filter kernel functions respectively, and the semantic segmentation prediction result can be filtered to obtain the training image. pseudo-label.

图4是根据一示例性实施例示出的一种通过树关系矩阵，构建训练图像的伪标签的方法流程图，如图4所示，本公开实施例中的步骤S31、步骤S33和步骤S34与图1中的步骤S11、步骤S13和步骤S14的实施过程相似，在此不做赘述。FIG. 4 is a flowchart of a method for constructing pseudo-labels of training images by using a tree relationship matrix according to an exemplary embodiment. As shown in FIG. 4 , steps S31 , S33 and S34 in the embodiment of the present disclosure are the same as The implementation processes of step S11 , step S13 and step S14 in FIG. 1 are similar, and will not be repeated here.

在步骤S32中，将第一树关系矩阵和第二树关系矩阵，分别作为滤波核函数，对语义分割预测结果进行滤波，得到训练图像的伪标签。In step S32, the first tree relationship matrix and the second tree relationship matrix are used as filter kernel functions respectively, and the semantic segmentation prediction result is filtered to obtain the pseudo-label of the training image.

本公开实施例提供的语义分割模型训练方法，可以通过将第一树关系矩阵和第二树关系矩阵分别作为滤波核函数的方式，实现对语义分割预测结果的滤波。基于此，可以得到用于训练语义分割模型的伪标签。The semantic segmentation model training method provided by the embodiments of the present disclosure can implement filtering of the semantic segmentation prediction result by using the first tree relationship matrix and the second tree relationship matrix as filtering kernel functions respectively. Based on this, pseudo-labels for training the semantic segmentation model can be obtained.

一实施方式中，针对第一树关系矩阵和第二树关系矩阵，可以分别将第一树关系矩阵和第二树关系矩阵作为滤波核函数，并以先后顺序对语义分割预测结果进行级联滤波，例如可以包括如下方式一及方式二。In one embodiment, for the first tree relationship matrix and the second tree relationship matrix, the first tree relationship matrix and the second tree relationship matrix may be used as filter kernel functions respectively, and the semantic segmentation prediction results are cascaded and filtered in sequence. , for example, the following methods 1 and 2 may be included.

方式一：先将第一树关系矩阵作为滤波核函数，对语义分割预测结果进行初始滤波，再将第二树关系矩阵作为滤波核函数，对初始滤波后的语义分割预测结果进行再次滤波。Method 1: First, use the first tree relationship matrix as a filter kernel function to perform initial filtering on the semantic segmentation prediction result, and then use the second tree relationship matrix as a filter kernel function to filter the semantic segmentation prediction result after the initial filtering again.

方式二：先将第二树关系矩阵作为滤波核函数，对语义分割预测结果进行初始滤波，再将第一树关系矩阵作为滤波核函数，对初始滤波后的语义分割预测结果进行再次滤波。Method 2: First, use the second tree relationship matrix as a filter kernel function to perform initial filtering on the semantic segmentation prediction result, and then use the first tree relationship matrix as a filter kernel function to filter the semantic segmentation prediction result after the initial filtering again.

本公开如下以方式二为例，对通过级联滤波的方式得到伪标签的过程进行描述。The present disclosure takes the second mode as an example to describe the process of obtaining a pseudo tag by means of cascaded filtering.

图5是根据一示例性实施例示出的另一种语义分割模型训练方法的流程图，如图5所示，本公开实施例中的步骤S41、步骤S44和步骤S45与图4中的步骤S31、步骤S33和步骤S34的实施过程相似，在此不做赘述。Fig. 5 is a flowchart showing another method for training a semantic segmentation model according to an exemplary embodiment. As shown in Fig. 5 , step S41, step S44 and step S45 in the embodiment of the present disclosure and step S31 in Fig. 4 . The implementation processes of step S33 and step S34 are similar, and will not be repeated here.

在步骤S42中，对语义分割预测结果，以第二树关系矩阵为滤波核函数进行初始滤波。In step S42, initial filtering is performed on the semantic segmentation prediction result using the second tree relationship matrix as a filtering kernel function.

示例的，针对语义分割预测结果(示例以P表示)以及第二树关系矩阵(示例以A^low表示)，初始滤波后的语义分割预测结果可以通过

表示。其中，

表示初始滤波所使用的滤波函数。Illustratively, for the semantic segmentation prediction result (the example is represented by P) and the second tree relationship matrix (the example is represented by A ^low ), the semantic segmentation prediction result after the initial filtering can be obtained by

express. in,

Indicates the filter function used for the initial filtering.

在步骤S43中，将初始滤波后的语义分割预测结果，以第一树关系矩阵为滤波核函数再次进行滤波，得到训练图像的伪标签。In step S43, the semantic segmentation prediction result after the initial filtering is filtered again by using the first tree relation matrix as a filtering kernel function to obtain a pseudo-label of the training image.

示例的，针对初始滤波后的语义分割预测结果(示例以

表示)，以及第一树关系矩阵(示例以Ahigh表示)，再次滤波后得到的结果值可以通过

表示。其中，可以理解的是初始滤波与再次滤波使用了相同的滤波函数，滤波函数可通过

表示。此外，由于再次滤波后得到的结果值即为训练图像的伪标签，因此，可以通过

的方式，得到训练图像的伪标签(示例以

表示)。Illustratively, for the initial filtered semantic segmentation prediction results (examples start with

represented), and the first tree relationship matrix (the example is represented by Ahigh), the result value obtained after filtering again can be passed through

express. Among them, it can be understood that the same filter function is used for the initial filtering and the second filtering, and the filtering function can be passed through

express. In addition, since the result value obtained after filtering again is the pseudo-label of the training image, it can be obtained by

way to get the pseudo-labels of the training images (examples start with

express).

示例的，针对上述涉及的初始滤波或再次滤波，滤波函数可采用

。其中，Ω表示待处理图像中的全体像素集，z_i表示正则化项。示例的，针对上述滤波操作，正则化项z_i可表示为

此外，*表示树关系矩阵的矩阵类型，针对第一树关系矩阵以及第二树关系矩阵，*的范围可以是*∈{low，high}。换言之，所采用的树关系矩阵A^*，包括第一树关系矩阵A^low以及第二树关系矩阵A^high。Illustratively, for the above-mentioned initial filtering or re-filtering, the filtering function can be

. Among them, Ω represents the entire set of pixels in the image to be processed, and _zi represents the regularization term. Illustratively, for the above filtering operation, the regularization term _zi can be expressed as

In addition, * represents the matrix type of the tree relationship matrix, and for the first tree relationship matrix and the second tree relationship matrix, the range of * may be *∈{low,high}. In other words, the adopted tree relationship matrix A ^* includes the first tree relationship matrix A ^low and the second tree relationship matrix A ^high .

示例的，可以通过

的方式，进行滤波结果值的计算，以此得到初始滤波后的语义分割预测结果。此外，再次滤波的计算过程与上述初始滤波的计算过程相似，在此不做赘述。example, via

In this way, the filtering result value is calculated, so as to obtain the semantic segmentation prediction result after the initial filtering. In addition, the calculation process of the re-filtering is similar to the calculation process of the above-mentioned initial filtering, which is not repeated here.

由于第二树关系矩阵包含有低级别的结构化信息(颜色信息)，第一树关系矩阵包含有高级别的结构化信息(语义分割特征)，因此，通过上述方式对语义分割预测结果进行级联滤波，语义分割预测结果得以逐步优化。进一步的，将级联滤波后的结果值作为训练图像的伪标签，进而通过伪标签对语义分割模型进行监督训练。Since the second tree relationship matrix contains low-level structural information (color information), and the first tree relationship matrix contains high-level structural information (semantic segmentation features), the semantic segmentation prediction results are classified in the above manner. The results of the semantic segmentation prediction can be gradually optimized through the joint filtering. Further, the result value after the cascade filtering is used as the pseudo-label of the training image, and then the semantic segmentation model is supervised and trained through the pseudo-label.

一实施方式中，可以在构建平面网格图之前，对语义分割特征及训练图像进行尺寸调整。In one embodiment, the semantic segmentation features and training images may be resized before constructing the planar grid map.

图6是根据一示例性实施例示出的一种通过树关系矩阵，构建训练图像的伪标签的方法流程图，如图6所示，本公开实施例中的步骤S52、步骤S53、步骤S54和步骤S55与图2中的步骤S21、步骤S22、步骤S23和步骤S24的实施过程相似，在此不做赘述。FIG. 6 is a flowchart of a method for constructing pseudo-labels of training images by using a tree relationship matrix according to an exemplary embodiment. As shown in FIG. 6 , steps S52 , S53 , S54 and S54 in the embodiment of the present disclosure Step S55 is similar to the implementation process of step S21 , step S22 , step S23 and step S24 in FIG. 2 , and is not repeated here.

在步骤S51中，对语义分割特征进行尺寸调整，以使调整后的语义分割特征对应的分辨率与语义分割预测结果对应的分辨率一致，并对训练图像进行尺寸调整，以使调整后的训练图像的分辨率与语义分割预测结果对应的分辨率一致。In step S51, the size of the semantic segmentation feature is adjusted so that the resolution corresponding to the adjusted semantic segmentation feature is consistent with the resolution corresponding to the semantic segmentation prediction result, and the size of the training image is adjusted, so that the adjusted training image The resolution of the image is consistent with the resolution corresponding to the semantic segmentation prediction result.

示例的，尺寸调整方式可采用相关技术中的常规方式。例如，可以通过计算双线性差值的方式，实现针对语义分割特征或训练图像的尺寸调整。Exemplarily, the size adjustment method may adopt a conventional method in the related art. For example, size adjustment for semantic segmentation features or training images can be achieved by calculating bilinear differences.

本公开实施例提供的语义分割模型训练方法，可以通过将语义分割特征、训练图像以及语义分割预测结果之间的分辨率调整为一致的方式，实现像素对齐，以此降低后续操作流程的复杂度。The semantic segmentation model training method provided by the embodiments of the present disclosure can achieve pixel alignment by adjusting the resolution among the semantic segmentation features, training images, and semantic segmentation prediction results to be consistent, thereby reducing the complexity of the subsequent operation process .

本公开实施例中，可以通过

的方式确定第一目标损失。其中，

为第一目标损失函数，P_l为语义分割预测结果中位置索引为l的像素的预测结果，

为伪标签中位置索引为l的标签值，Ω_U代表伪标签中未标注像素集，

为未标注像素的语义分割预测结果与伪标签之间的差值绝对值。其中，第一目标损失函数用于计算得到第一目标损失。并且示例的，与伪标签相匹配的，为第一目标损失函数为树能量损失函数(Tree Energy Loss，TEL)。In this embodiment of the present disclosure, the

way to determine the first target loss. in,

is the first objective loss function, P _l is the prediction result of the pixel whose position index is l in the semantic segmentation prediction result,

is the label value with position index l in the pseudo-label, Ω _U represents the unlabeled pixel set in the pseudo-label,

The absolute value of the difference between the semantic segmentation prediction for unlabeled pixels and the pseudo-label. Wherein, the first target loss function is used to calculate the first target loss. And for example, what matches the pseudo-label is that the first objective loss function is a tree energy loss function (Tree Energy Loss, TEL).

一实施方式中，可以将伪标签中未标注像素集、语义分割预测结果以及训练图像的伪标签输入第一目标损失函数，第一目标损失函数对应输出的结果值，即为第一目标损失。In one embodiment, the unlabeled pixel set in the pseudo-label, the semantic segmentation prediction result, and the pseudo-label of the training image may be input into the first target loss function, and the result value corresponding to the output of the first target loss function is the first target loss.

本公开实施例中，在通过伪标签对语义分割模型进行监督训练的同时，还可以通过稀疏标签对语义分割模型进行监督训练。如下以通过伪标签以及稀疏标签共同对语义分割模型进行监督训练为例，对语义分割模型的训练过程进行描述。本公开以下为便于描述，将通过交叉熵损失函数计算语义分割预测结果得到的损失值称为第二目标损失。In the embodiment of the present disclosure, while the semantic segmentation model is supervised and trained through pseudo labels, the semantic segmentation model can also be supervised and trained through sparse labels. The following describes the training process of the semantic segmentation model by taking the supervised training of the semantic segmentation model through pseudo-labels and sparse labels as an example. Hereinafter, for the convenience of description in the present disclosure, the loss value obtained by calculating the semantic segmentation prediction result through the cross-entropy loss function is referred to as the second target loss.

图7是根据一示例性实施例示出的一种通过树关系矩阵，构建训练图像的伪标签的方法流程图，如图7所示，本公开实施例中的步骤S61与图1中的步骤S11的实施过程相似，在此不做赘述。FIG. 7 is a flowchart of a method for constructing pseudo-labels of training images by using a tree relationship matrix according to an exemplary embodiment. As shown in FIG. 7 , step S61 in the embodiment of the present disclosure and step S11 in FIG. 1 The implementation process is similar and will not be repeated here.

在步骤S62a中，基于树关系矩阵，构建训练图像的伪标签，并基于伪标签和语义分割预测结果，确定第一目标损失。In step S62a, a pseudo-label of the training image is constructed based on the tree relationship matrix, and a first target loss is determined based on the pseudo-label and the semantic segmentation prediction result.

在步骤S62b中，获取训练图像的稀疏标签，并根据稀疏标签，使用交叉熵损失函数计算语义分割预测结果的第二目标损失。In step S62b, the sparse labels of the training images are obtained, and according to the sparse labels, a cross-entropy loss function is used to calculate the second target loss of the semantic segmentation prediction result.

在步骤S63中，基于第一目标损失，对语义分割模型进行训练，以及基于第二目标损失对语义分割模型进行训练，直至语义分割模型收敛。In step S63, the semantic segmentation model is trained based on the first target loss, and the semantic segmentation model is trained based on the second target loss until the semantic segmentation model converges.

本公开实施例提供的语义分割模型训练方法，通过伪标签计算第一目标损失，以及通过稀疏标签计算第二损失。进一步的，可以通过第一目标损失以及第二目标损失，对语义分割模型进行训练，以使语义分割模型快速收敛。In the semantic segmentation model training method provided by the embodiments of the present disclosure, the first target loss is calculated by using pseudo labels, and the second loss is calculated by using sparse labels. Further, the semantic segmentation model can be trained through the first target loss and the second target loss, so that the semantic segmentation model converges quickly.

图8是根据一示例性实施例示出的一种训练语义分割模型的流程示意图。如图8所示，可以通过语义分割模型对训练图像进行特征提取，得到语义分割特征。进一步的，针对提取得到的语义分割特征，可以通过构建最小生成树的方式，构建第一树关系矩阵。同样的，针对训练图像中包含的颜色信息，也可以通过构建最小生成树的方式，构建第二树关系矩阵。进一步的，可以将第二树关系矩阵作为滤波核函数，对语义分割预测结果进行初始滤波，得到滤波后的分割结果。以及将第一树关系矩阵作为滤波核函数，对滤波后的分割结果进行再次滤波，得到训练图像的伪标签。Fig. 8 is a schematic flowchart of training a semantic segmentation model according to an exemplary embodiment. As shown in FIG. 8 , the features of the training images can be extracted by the semantic segmentation model to obtain the semantic segmentation features. Further, for the extracted semantic segmentation features, a first tree relationship matrix can be constructed by constructing a minimum spanning tree. Similarly, for the color information contained in the training image, a second tree relationship matrix can also be constructed by constructing a minimum spanning tree. Further, the second tree relationship matrix may be used as a filter kernel function to perform initial filtering on the semantic segmentation prediction result to obtain a filtered segmentation result. and using the first tree relationship matrix as a filtering kernel function, and filtering the filtered segmentation result again to obtain a pseudo-label of the training image.

基于此，可以通过得到的伪标签及语义分割预测结果，使用树能量损失函数，计算第一目标损失。以及通过伪标签及语义分割预测结果，使用标准交叉熵损失函数，计算第二目标损失。进一步的，通过第一目标损失以及第二目标损失对语义分割模型进行监督训练，以使语义分割模型快速收敛。Based on this, the first target loss can be calculated by using the tree energy loss function through the obtained pseudo-label and semantic segmentation prediction results. And through the pseudo-label and semantic segmentation prediction results, use the standard cross-entropy loss function to calculate the second target loss. Further, the semantic segmentation model is supervised and trained through the first target loss and the second target loss, so that the semantic segmentation model converges quickly.

基于相同的构思，本公开实施例还提供了一种图像分割方法。Based on the same concept, the embodiments of the present disclosure also provide an image segmentation method.

图9是根据一示例性实施例示出的一种图像分割方法的流程图，如图9所示，包括以下步骤。Fig. 9 is a flowchart of an image segmentation method according to an exemplary embodiment. As shown in Fig. 9 , the method includes the following steps.

在步骤S71中，将待分割图像输入至语义分割模型。In step S71, the image to be segmented is input into the semantic segmentation model.

其中，所使用的语义分割模型可以是通过上述任一实施例中涉及的语义分割模型训练方法得到的。针对语义分割模型的训练过程，可详见上述任一实施例，本公开在此不做赘述。Wherein, the used semantic segmentation model may be obtained by the semantic segmentation model training method involved in any of the foregoing embodiments. For the training process of the semantic segmentation model, please refer to any of the above embodiments for details, and the present disclosure will not repeat them here.

在步骤S72中，基于语义分割模型的输出结果，确定待分割图像的分割结果。In step S72, the segmentation result of the image to be segmented is determined based on the output result of the semantic segmentation model.

由于本公开实施例提供的语义分割模型训练方法，可以得到精度较高的语义分割模型。因此，使用通过本公开实施例提供的语义分割模型训练方法得到的语义分割模型，对待分割图像进行图像分割，可以得到较为精确的图像分割结果，满足针对图像分割的实际使用需求。Due to the semantic segmentation model training method provided by the embodiments of the present disclosure, a semantic segmentation model with higher precision can be obtained. Therefore, by using the semantic segmentation model obtained by the semantic segmentation model training method provided by the embodiments of the present disclosure to perform image segmentation on the image to be segmented, a relatively accurate image segmentation result can be obtained, which meets the actual use requirements for image segmentation.

基于相同的构思，本公开实施例还提供一种语义分割模型训练装置。Based on the same concept, an embodiment of the present disclosure also provides a semantic segmentation model training apparatus.

可以理解的是，本公开实施例提供的语义分割模型训练装置为了实现上述功能，其包含了执行各个功能相应的硬件结构和/或软件模块。结合本公开实施例中所公开的各示例的单元及算法步骤，本公开实施例能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行，取决于技术方案的特定应用和设计约束条件。本领域技术人员可以对每个特定的应用来使用不同的方法来实现所描述的功能，但是这种实现不应认为超出本公开实施例的技术方案的范围。It can be understood that, in order to implement the above-mentioned functions, the apparatus for training a semantic segmentation model provided by the embodiments of the present disclosure includes hardware structures and/or software modules corresponding to each function. Combining with the units and algorithm steps of each example disclosed in the embodiments of the present disclosure, the embodiments of the present disclosure can be implemented in hardware or a combination of hardware and computer software. Whether a function is performed by hardware or computer software driving hardware depends on the specific application and design constraints of the technical solution. Those skilled in the art may use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of the technical solutions of the embodiments of the present disclosure.

图10是根据一示例性实施例示出的一种语义分割模型训练装置100框图。参照图10，该装置100包括处理单元101和确定单元102。Fig. 10 is a block diagram of an apparatus 100 for training a semantic segmentation model according to an exemplary embodiment. Referring to FIG. 10 , the apparatus 100 includes a processing unit 101 and a determination unit 102 .

处理单元101，用于将训练图像输入至语义分割模型，得到语义分割特征以及语义分割预测结果。还用于基于树关系矩阵，构建训练图像的伪标签，树关系矩阵包括基于语义分割特征确定的第一树关系矩阵以及基于训练图像确定的第二树关系矩阵。确定单元102，用于基于伪标签和语义分割预测结果，确定第一目标损失。处理单元101，还用于基于第一目标损失，对语义分割模型进行训练，直至语义分割模型收敛。The processing unit 101 is configured to input the training image into the semantic segmentation model to obtain the semantic segmentation feature and the semantic segmentation prediction result. It is also used for constructing pseudo-labels of training images based on the tree relationship matrix, where the tree relationship matrix includes a first tree relationship matrix determined based on the semantic segmentation feature and a second tree relationship matrix determined based on the training image. The determining unit 102 is configured to determine the first target loss based on the pseudo-label and the semantic segmentation prediction result. The processing unit 101 is further configured to train the semantic segmentation model based on the first target loss until the semantic segmentation model converges.

一种实施方式中，处理单元101采用如下方式确定树关系矩阵：基于语义分割特征构建第一平面网格图，基于训练图像构建第二平面网格图，第一平面网格图中包括有与语义分割特征中各像素点一一对应的节点，第二平面网格图中包括有与训练图像中各像素点一一对应的节点。其中，第一平面网格图中包括的节点之间通过无向边连接，第二平面网格图中包括的节点之间通过无向边连接。基于第一平面网格图中各节点之间的相似度，确定第一平面网格图的第一最小生成树，并基于第二平面网格图中各节点之间的相似度，确定第二平面网格图的第二最小生成树，最小生成树中包括无环路存在的节点以及无向边。基于第一最小生成树中包括的节点以及无向边，确定语义分割特征对应的距离矩阵，并基于第二最小生成树中包括的节点以及无向边，确定训练图像对应的距离矩阵，距离矩阵中节点间的距离为节点间最短路径上各个无向边对应的节点相似度累加值。将语义分割特征对应的距离矩阵进行非负映射，得到第一树关系矩阵，并将训练图像对应的距离矩阵进行非负映射，得到第二树关系矩阵。In one embodiment, the processing unit 101 determines the tree relationship matrix in the following manner: constructing a first plane grid graph based on the semantic segmentation feature, constructing a second plane grid graph based on the training image, and the first plane grid graph includes a Nodes corresponding to each pixel in the semantic segmentation feature one-to-one, and the second plane grid graph includes nodes corresponding to each pixel in the training image one-to-one. Wherein, the nodes included in the first plane grid graph are connected by undirected edges, and the nodes included in the second plane grid graph are connected by undirected edges. Determine the first minimum spanning tree of the first plane grid graph based on the similarity between the nodes in the first plane grid graph, and determine the second minimum spanning tree based on the similarity between the nodes in the second plane grid graph The second minimum spanning tree of a flat grid graph, the minimum spanning tree includes nodes without loops and undirected edges. Based on the nodes and undirected edges included in the first minimum spanning tree, determine the distance matrix corresponding to the semantic segmentation feature, and based on the nodes and undirected edges included in the second minimum spanning tree, determine the distance matrix corresponding to the training image, the distance matrix The distance between the middle nodes is the cumulative value of the node similarity corresponding to each undirected edge on the shortest path between the nodes. Perform non-negative mapping on the distance matrix corresponding to the semantic segmentation feature to obtain the first tree relationship matrix, and perform non-negative mapping on the distance matrix corresponding to the training image to obtain the second tree relationship matrix.

一种实施方式中，处理单元101采用如下方式基于树关系矩阵，构建训练图像的伪标签：将第一树关系矩阵和第二树关系矩阵，分别作为滤波核函数，对语义分割预测结果进行滤波，得到训练图像的伪标签。In one embodiment, the processing unit 101 constructs the pseudo-label of the training image based on the tree relationship matrix in the following manner: the first tree relationship matrix and the second tree relationship matrix are respectively used as filter kernel functions to filter the semantic segmentation prediction result. , to get the pseudo-labels of the training images.

一种实施方式中，处理单元101采用如下方式将第一树关系矩阵和第二树关系矩阵，分别作为滤波核函数，对语义分割预测结果进行滤波：对语义分割预测结果，以第二树关系矩阵为滤波核函数进行初始滤波。将初始滤波后的语义分割预测结果，以第一树关系矩阵为滤波核函数再次进行滤波。In one embodiment, the processing unit 101 uses the first tree relationship matrix and the second tree relationship matrix as filter kernel functions, respectively, to filter the semantic segmentation prediction result in the following manner: The matrix is the initial filter for the filter kernel function. The semantic segmentation prediction result after the initial filtering is filtered again with the first tree relation matrix as the filtering kernel function.

一种实施方式中，处理单元101还用于：对语义分割特征进行尺寸调整，以使调整后的语义分割特征对应的分辨率与语义分割预测结果对应的分辨率一致。对训练图像进行尺寸调整，以使调整后的训练图像的分辨率与语义分割预测结果对应的分辨率一致。In one embodiment, the processing unit 101 is further configured to: adjust the size of the semantic segmentation feature, so that the resolution corresponding to the adjusted semantic segmentation feature is consistent with the resolution corresponding to the semantic segmentation prediction result. The training image is resized so that the resolution of the resized training image is consistent with the resolution corresponding to the semantic segmentation prediction.

一种实施方式中，通过如下公式确定第一目标损失：其中，为第一目标损失函数，为语义分割预测结果中位置索引为的像素的预测结果，为伪标签中位置索引为的标签值，代表伪标签中未标注像素集，为未标注像素的语义分割预测结果与伪标签之间的差值绝对值。In one embodiment, the first target loss is determined by the following formula: where is the first target loss function, is the prediction result of the pixel whose position index is in the semantic segmentation prediction result, is the label value whose position index is in the pseudo-label, Represents the set of unlabeled pixels in the pseudo-label, and is the absolute value of the difference between the semantic segmentation prediction result of the unlabeled pixels and the pseudo-label.

一种实施方式中，处理单元101还用于：获取训练图像的稀疏标签，并根据稀疏标签，使用交叉熵损失函数计算语义分割预测结果的第二目标损失。基于第二目标损失对语义分割模型进行训练，直至语义分割模型收敛。In one embodiment, the processing unit 101 is further configured to: acquire the sparse labels of the training images, and use the cross-entropy loss function to calculate the second target loss of the semantic segmentation prediction result according to the sparse labels. The semantic segmentation model is trained based on the second target loss until the semantic segmentation model converges.

图11是根据一示例性实施例示出的一种图像分割装置200框图。参照图11，该装置200包括输入单元201和确定单元202。FIG. 11 is a block diagram of an image segmentation apparatus 200 according to an exemplary embodiment. Referring to FIG. 11 , the apparatus 200 includes an input unit 201 and a determination unit 202 .

输入单元，用于将待分割图像输入至语义分割模型，语义分割模型采用上述任意一项语义分割模型训练方法预先训练得到。The input unit is used for inputting the image to be segmented into the semantic segmentation model, and the semantic segmentation model is pre-trained by any one of the above-mentioned semantic segmentation model training methods.

确定单元，用于基于语义分割模型的输出结果，确定待分割图像的分割结果。The determining unit is configured to determine the segmentation result of the image to be segmented based on the output result of the semantic segmentation model.

关于上述实施例中的装置，其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述，此处将不做详细阐述说明。Regarding the apparatus in the above-mentioned embodiment, the specific manner in which each module performs operations has been described in detail in the embodiment of the method, and will not be described in detail here.

图12是根据一示例性实施例示出的一种用于图像处理的电子设备300框图。FIG. 12 is a block diagram of an electronic device 300 for image processing according to an exemplary embodiment.

如图12所示，本公开的一个实施方式提供了一种电子设备300。其中，该电子设备300包括存储器301、处理器302、输入/输出(Input/Output，I/O)接口303。其中，存储器301，用于存储指令。处理器302，用于调用存储器301存储的指令执行本公开实施例的语义分割模型训练方法。其中，处理器302分别与存储器301、I/O接口303连接，例如可通过总线系统和/或其他形式的连接机构(未示出)进行连接。存储器301可用于存储程序和数据，包括本公开实施例中涉及的语义分割模型训练方法的程序，处理器302通过运行存储在存储器301的程序从而执行电子设备300的各种功能应用以及数据处理。As shown in FIG. 12 , an embodiment of the present disclosure provides an electronic device 300 . The electronic device 300 includes a memory 301 , a processor 302 , and an input/output (I/O) interface 303 . Among them, the memory 301 is used for storing instructions. The processor 302 is configured to invoke the instructions stored in the memory 301 to execute the method for training a semantic segmentation model according to the embodiment of the present disclosure. Wherein, the processor 302 is respectively connected with the memory 301 and the I/O interface 303, for example, it can be connected through a bus system and/or other forms of connection mechanisms (not shown). The memory 301 can be used to store programs and data, including the programs of the semantic segmentation model training method involved in the embodiments of the present disclosure. The processor 302 executes various functional applications and data processing of the electronic device 300 by running the programs stored in the memory 301 .

本公开实施例中处理器302可以采用数字信号处理器(Digital SignalProcessing，DSP)、现场可编程门阵列(Field Programmable Gate Array，FPGA)、可编程逻辑阵列(Programmable Logic Array，PLA)中的至少一种硬件形式来实现，所述处理器302可以是中央处理单元(Central Processing Unit，CPU)或者具有数据处理能力和/或指令执行能力的其他形式的处理单元中的一种或几种的组合。In this embodiment of the present disclosure, the processor 302 may use at least one of a digital signal processor (Digital Signal Processing, DSP), a field programmable gate array (Field Programmable Gate Array, FPGA), and a programmable logic array (Programmable Logic Array, PLA). It is implemented in a hardware form, and the processor 302 may be a central processing unit (Central Processing Unit, CPU) or one or a combination of other forms of processing units with data processing capability and/or instruction execution capability.

本公开实施例中的存储器301可以包括一个或多个计算机程序产品，所述计算机程序产品可以包括各种形式的计算机可读存储介质，例如易失性存储器和/或非易失性存储器。所述易失性存储器例如可以包括随机存取存储器(Random Access Memory，RAM)和/或高速缓冲存储器(cache)等。所述非易失性存储器例如可以包括只读存储器(Read OnlyMemory，ROM)、快闪存储器(Flash Memory)、硬盘(Hard Disk Drive，HDD)或固态硬盘(Solid State Drive，SSD)等。The memory 301 in the embodiments of the present disclosure may include one or more computer program products, and the computer program products may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, a random access memory (Random Access Memory, RAM) and/or a cache memory (cache). The non-volatile memory may include, for example, a read only memory (Read Only Memory, ROM), a flash memory (Flash Memory), a hard disk (Hard Disk Drive, HDD), or a solid state drive (Solid State Drive, SSD), and the like.

本公开实施例中，I/O接口303可用于接收输入的指令(例如数字或字符信息，以及产生与电子设备300的用户设置以及功能控制有关的键信号输入等)，也可向外部输出各种信息(例如，图像或声音等)。本公开实施例中I/O接口303可包括物理键盘、功能按键(比如音量控制按键、开关按键等)、鼠标、操作杆、轨迹球、麦克风、扬声器、和触控面板等中的一个或多个。In the embodiment of the present disclosure, the I/O interface 303 can be used to receive input instructions (such as numeric or character information, and generate key signal input related to user settings and function control of the electronic device 300 , etc.), and can also output various external information (for example, images or sounds, etc.). In this embodiment of the present disclosure, the I/O interface 303 may include one or more of a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a mouse, a joystick, a trackball, a microphone, a speaker, and a touch panel, etc. indivual.

在一些实施方式中，本公开提供了一种计算机可读存储介质，该计算机可读存储介质存储有计算机可执行指令，计算机可执行指令在由处理器执行时，执行上文所述的任何方法。In some embodiments, the present disclosure provides a computer-readable storage medium storing computer-executable instructions that, when executed by a processor, perform any of the methods described above .

在一些实施方式中，本公开提供了一种计算机程序产品，该计算机程序产品包括计算机程序，计算机程序被处理器执行时，执行上文所述的任何方法。In some embodiments, the present disclosure provides a computer program product comprising a computer program that, when executed by a processor, performs any of the methods described above.

尽管在附图中以特定的顺序描述操作，但是不应将其理解为要求按照所示的特定顺序或是串行顺序来执行这些操作，或是要求执行全部所示的操作以得到期望的结果。在特定环境中，多任务和并行处理可能是有利的。Although operations are depicted in the figures in a particular order, this should not be construed as requiring that the operations be performed in the particular order shown, or in a serial order, or that all operations shown be performed to obtain desirable results . In certain circumstances, multitasking and parallel processing may be advantageous.

本公开的方法和装置能够利用标准编程技术来完成，利用基于规则的逻辑或者其他逻辑来实现各种方法步骤。还应当注意的是，此处以及权利要求书中使用的词语“装置”和“模块”意在包括使用一行或者多行软件代码的实现和/或硬件实现和/或用于接收输入的设备。The methods and apparatus of the present disclosure can be accomplished using standard programming techniques, using rule-based logic or other logic to implement the various method steps. It should also be noted that the terms "means" and "module" as used herein and in the claims are intended to include implementations using one or more lines of software code and/or hardware implementations and/or means for receiving input.

此处描述的任何步骤、操作或程序可以使用单独的或与其他设备组合的一个或多个硬件或软件模块来执行或实现。在一个实施方式中，软件模块使用包括包含计算机程序代码的计算机可读介质的计算机程序产品实现，其能够由计算机处理器执行用于执行任何或全部的所描述的步骤、操作或程序。Any steps, operations or procedures described herein may be performed or implemented using one or more hardware or software modules, alone or in combination with other devices. In one embodiment, a software module is implemented using a computer program product comprising a computer readable medium containing computer program code executable by a computer processor for performing any or all of the described steps, operations or procedures.

出于示例和描述的目的，已经给出了本公开实施的前述说明。前述说明并非是穷举性的也并非要将本公开限制到所公开的确切形式，根据上述教导还可能存在各种变形和修改，或者是可能从本公开的实践中得到各种变形和修改。选择和描述这些实施例是为了说明本公开的原理及其实际应用，以使得本领域的技术人员能够以适合于构思的特定用途来以各种实施方式和各种修改而利用本公开。The foregoing descriptions of implementations of the present disclosure have been presented for the purposes of illustration and description. The foregoing description is not intended to be exhaustive nor to limit the disclosure to the precise forms disclosed, and various variations and modifications are possible in light of the above teachings or may be obtained from practice of the disclosure. The embodiments were chosen and described in order to explain the principles of the disclosure and its practical application, to enable others skilled in the art to utilize the disclosure in various embodiments and with various modifications as are suited to the particular use contemplated.

可以理解的是，本公开中“多个”是指两个或两个以上，其它量词与之类似。“和/或”，描述关联对象的关联关系，表示可以存在三种关系，例如，A和/或B，可以表示：单独存在A，同时存在A和B，单独存在B这三种情况。字符“/”一般表示前后关联对象是一种“或”的关系。单数形式的“一种”、“所述”和“该”也旨在包括多数形式，除非上下文清楚地表示其他含义。It should be understood that in the present disclosure, "plurality" refers to two or more than two, and other quantifiers are similar. "And/or", which describes the association relationship of the associated objects, means that there can be three kinds of relationships, for example, A and/or B, which can mean that A exists alone, A and B exist at the same time, and B exists alone. The character "/" generally indicates that the associated objects are an "or" relationship. The singular forms "a," "the," and "the" are intended to include the plural forms as well, unless the context clearly dictates otherwise.

进一步可以理解的是，术语“第一”、“第二”等用于描述各种信息，但这些信息不应限于这些术语。这些术语仅用来将同一类型的信息彼此区分开，并不表示特定的顺序或者重要程度。实际上，“第一”、“第二”等表述完全可以互换使用。例如，在不脱离本公开范围的情况下，第一信息也可以被称为第二信息，类似地，第二信息也可以被称为第一信息。It is further understood that the terms "first", "second", etc. are used to describe various information, but the information should not be limited to these terms. These terms are only used to distinguish the same type of information from one another, and do not imply a particular order or level of importance. In fact, the expressions "first", "second" etc. are used completely interchangeably. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present disclosure.

进一步可以理解的是，除非有特殊说明，“连接”包括两者之间不存在其他构件的直接连接，也包括两者之间存在其他元件的间接连接。It can be further understood that, unless otherwise specified, "connection" includes a direct connection between the two without other components, and also includes an indirect connection between the two with other elements.

进一步可以理解的是，本公开实施例中尽管在附图中以特定的顺序描述操作，但是不应将其理解为要求按照所示的特定顺序或是串行顺序来执行这些操作，或是要求执行全部所示的操作以得到期望的结果。在特定环境中，多任务和并行处理可能是有利的。It is further to be understood that although the operations in the embodiments of the present disclosure are described in a specific order in the drawings, it should not be construed as requiring that the operations be performed in the specific order shown or the serial order, or requiring Perform all operations shown to obtain the desired result. In certain circumstances, multitasking and parallel processing may be advantageous.

本领域技术人员在考虑说明书及实践这里公开的发明后，将容易想到本公开的其它实施方案。本申请旨在涵盖本公开的任何变型、用途或者适应性变化，这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的，本公开的真正范围和精神由下面的权利范围指出。Other embodiments of the present disclosure will readily occur to those skilled in the art upon consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the present disclosure that follow the general principles of the present disclosure and include common knowledge or techniques in the technical field not disclosed by the present disclosure . The specification and examples are to be regarded as exemplary only, with the true scope and spirit of the disclosure being indicated by the following claims.

应当理解的是，本公开并不局限于上面已经描述并在附图中示出的精确结构，并且可以在不脱离其范围进行各种修改和改变。本公开的范围仅由所附的权利范围来限制。It is to be understood that the present disclosure is not limited to the precise structures described above and illustrated in the accompanying drawings, and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the scope of the appended rights.

Claims

1. A semantic segmentation model training method is characterized by comprising the following steps:

inputting the training image into a semantic segmentation model to obtain semantic segmentation features and a semantic segmentation prediction result;

constructing a pseudo label of the training image based on a tree relation matrix, wherein the tree relation matrix comprises a first tree relation matrix determined based on the semantic segmentation features and a second tree relation matrix determined based on the training image;

determining a first target loss based on the pseudo-label and the semantic segmentation prediction result;

training the semantic segmentation model based on the first target loss until the semantic segmentation model converges.

2. The method of claim 1, wherein the tree relationship matrix is determined as follows:

constructing a first plane grid graph based on the semantic segmentation features, and constructing a second plane grid graph based on the training image, wherein the first plane grid graph comprises nodes which are in one-to-one correspondence with all pixel points in the semantic segmentation features, and the second plane grid graph comprises nodes which are in one-to-one correspondence with all pixel points in the training image; the nodes in the first planar grid graph are connected through a non-directional edge, and the nodes in the second planar grid graph are connected through a non-directional edge;

determining a first minimum spanning tree of the first planar grid graph based on the similarity among the nodes in the first planar grid graph, and determining a second minimum spanning tree of the second planar grid graph based on the similarity among the nodes in the second planar grid graph, wherein the minimum spanning tree comprises nodes without loops and non-directional edges;

determining a distance matrix corresponding to the semantic segmentation features based on the nodes and the non-directional edges included in the first minimum spanning tree, and determining a distance matrix corresponding to the training image based on the nodes and the non-directional edges included in the second minimum spanning tree, wherein the distance between the nodes in the distance matrix is a node similarity accumulated value corresponding to each non-directional edge on the shortest path between the nodes;

and carrying out non-negative mapping on the distance matrix corresponding to the semantic segmentation characteristics to obtain the first tree relation matrix, and carrying out non-negative mapping on the distance matrix corresponding to the training image to obtain the second tree relation matrix.

3. The method of claim 1 or 2, wherein constructing the pseudo label of the training image based on the tree relationship matrix comprises:

and respectively taking the first tree relation matrix and the second tree relation matrix as filtering kernel functions, and filtering the semantic segmentation prediction result to obtain the pseudo label of the training image.

4. The method according to claim 3, wherein the filtering the semantic segmentation prediction result by using the first tree relationship matrix and the second tree relationship matrix as filtering kernel functions respectively comprises:

performing initial filtering on the semantic segmentation prediction result by taking the second tree relation matrix as a filtering kernel function;

and filtering the initial filtered semantic segmentation prediction result again by taking the first tree relation matrix as a filtering kernel function.

5. The method of claim 2, further comprising:

performing size adjustment on the semantic segmentation features to enable the resolution corresponding to the adjusted semantic segmentation features to be consistent with the resolution corresponding to the semantic segmentation prediction result;

and adjusting the size of the training image to enable the resolution of the adjusted training image to be consistent with the resolution corresponding to the semantic segmentation prediction result.

6. The method of claim 1, wherein the first target loss is determined by the formula:

wherein,

is a first target loss function, said P_lFor a predictor of a pixel with position index l in the semantic segmentation predictors,

for a tag value, Ω, indexed by position l in the pseudo tag_URepresenting an unmarked set of pixels in the pseudo label,

and (4) segmenting the absolute value of the difference between the prediction result and the pseudo label for the semanteme of the unmarked pixel.

7. The method of claim 1, further comprising:

acquiring a sparse label of the training image, and calculating a second target loss of the semantic segmentation prediction result by using a cross entropy loss function according to the sparse label;

training the semantic segmentation model based on the second target loss until the semantic segmentation model converges.

8. A method of image segmentation, the method comprising:

inputting an image to be segmented into a semantic segmentation model, wherein the semantic segmentation model is obtained by adopting the method of any one of claims 1 to 7 and training in advance;

and determining the segmentation result of the image to be segmented based on the output result of the semantic segmentation model.

9. An electronic device, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to: performing the semantic segmentation model training method of any one of claims 1 to 7, or performing the image segmentation method of claim 8.

10. A storage medium having stored therein instructions that, when executed by a processor, enable the processor to perform the semantic segmentation model training method of any one of claims 1 to 7 or the image segmentation method of claim 8.

11. A computer program product comprising a computer program for implementing the semantic segmentation model training method of any one of claims 1 to 7, or for implementing the image segmentation method of claim 8, when executed by a processor.