WO2022141969A1

WO2022141969A1 - Image segmentation method and apparatus, electronic device, storage medium, and program

Info

Publication number: WO2022141969A1
Application number: PCT/CN2021/088983
Authority: WO
Inventors: 胡含哲
Original assignee: Beijing Sensetime Technology Development Co Ltd
Current assignee: Beijing Sensetime Technology Development Co Ltd
Priority date: 2020-12-29
Filing date: 2021-04-22
Publication date: 2022-07-07
Anticipated expiration: 2023-06-29
Also published as: CN112598676A; CN112598676B

Abstract

The present disclosure relates to an image segmentation method and apparatus, an electronic device, a computer storage medium, and a computer program. The method comprises: performing feature extraction on an image to be segmented to obtain a first feature of the image to be segmented, the image to be segmented comprising N pixel categories, and N being an integer greater than 1; fusing the first feature with M second features to obtain M first target features, the M second features and the M first target features both having one-to-one correspondence to M pixel categories, the M second features being determined and obtained on the basis of a first sample data set, the first sample data set comprising at least one sample image corresponding to each pixel category among the M pixel categories and annotation information of each sample image, M being greater than or equal to N, and the N pixel categories being a subset of the M pixel categories; and according to the M first target features, performing image segmentation on the image to be segmented to obtain a target segmentation result of the image to be segmented.

Description

Image segmentation method and device, electronic device, storage medium and program

相关申请的交叉引用CROSS-REFERENCE TO RELATED APPLICATIONS

本公开基于申请号为202011595659.8、申请日为2020年12月29日的中国专利申请提出名称为“图像分割方法及装置、电子设备和存储介质”的优先权，该中国专利申请的全部内容在此引入本公开作为参考。The present disclosure claims a priority titled "image segmentation method and device, electronic device and storage medium" based on a Chinese patent application with application number 202011595659.8 and an application date of December 29, 2020, the entire content of which is hereby incorporated by reference This disclosure is incorporated by reference.

technical field

本公开涉及计算机技术领域，涉及但不限于一种图像分割方法及装置、电子设备、计算机存储介质和计算机程序。The present disclosure relates to the field of computer technology, and relates to, but is not limited to, an image segmentation method and apparatus, an electronic device, a computer storage medium, and a computer program.

Background technique

图像语义分割是计算机视觉中的基本任务之一，其在很多领域都有重要应用，例如，自动驾驶，场景理解等。用于进行图像语义分割的神经网络的训练过程需要像素级的标注样本，获得如此精细的标注样本是十分不易的，并且很多类别的样本的获得十分困难。此外，一张待分割图像中可能包含多个像素类别。因此，在实际应用中，如何利用少样本实现多类别的图像分割是亟需解决的问题。Image semantic segmentation is one of the basic tasks in computer vision, and it has important applications in many fields, such as autonomous driving, scene understanding, etc. The training process of a neural network for image semantic segmentation requires pixel-level labeled samples. It is very difficult to obtain such finely labeled samples, and it is very difficult to obtain samples of many categories. In addition, an image to be segmented may contain multiple pixel categories. Therefore, in practical applications, how to use few samples to achieve multi-class image segmentation is an urgent problem to be solved.

发明内容SUMMARY OF THE INVENTION

本公开提出了一种图像分割方法及装置、电子设备、计算机存储介质和计算机程序的技术方案。The present disclosure provides technical solutions for an image segmentation method and apparatus, electronic equipment, computer storage medium and computer program.

本公开实施例提供了一种图像分割方法，包括：对待分割图像进行特征提取，得到所述待分割图像的第一特征，所述待分割图像中包括N个像素类别，N是大于1的整数；将所述第一特征与M个第二特征进行融合，得到M个第一目标特征，所述M个第二特征和所述M个第一目标特征均与M个像素类别一一对应，所述M个第二特征是基于第一样本数据集确定得到的，所述第一样本数据集中包括所述M个像素类别中各像素类别对应的至少一个样本图像和各样本图像对应的标注信息，M大于或等于N，所述N个像素类别是所述M个像素类别的子集；根据所述M个第一目标特征，对所述待分割图像进行图像分割，得到所述待分割图像的目标分割结果。An embodiment of the present disclosure provides an image segmentation method, including: performing feature extraction on an image to be segmented to obtain a first feature of the image to be segmented, where the image to be segmented includes N pixel categories, where N is an integer greater than 1 ; The first feature and M second features are fused to obtain M first target features, and the M second features and the M first target features are all in one-to-one correspondence with M pixel categories, The M second features are determined based on a first sample data set, and the first sample data set includes at least one sample image corresponding to each pixel category in the M pixel categories and a sample image corresponding to each sample image. Labeling information, M is greater than or equal to N, and the N pixel categories are subsets of the M pixel categories; perform image segmentation on the to-be-segmented image according to the M first target features to obtain the to-be-segmented image The target segmentation result of the segmented image.

在本公开的一些实施例中，所述将所述第一特征与M个第二特征进行融合，得到M个第一目标特征，包括：针对所述M个第二特征中的第i个第二特征，对所述第一特征与所述第i个第二特征执行特征乘法，得到第i个第三特征，1≤i≤M；对所述第一特征与所述第i个第二特征执行特征减法，得到第i个第四特征；对所述第一特征、所述第i个第三特征以及所述第i个第四特征进行特征连接，得到所述M个第一目标特征中的第i个第一目标特征；所述第i个第二特征、所述第i个第三特征、所述第i个第四特征以及所述第i个第一目标特征均为与所述M个像素类别中的第i个像素类别对应的特征。In some embodiments of the present disclosure, the merging the first feature and M second features to obtain M first target features includes: for the i th in the M second features Two features, perform feature multiplication on the first feature and the i-th second feature to obtain the i-th third feature, 1≤i≤M; perform the feature multiplication on the first feature and the i-th second feature Perform feature subtraction on the feature to obtain the i-th fourth feature; perform feature connection on the first feature, the i-th third feature, and the i-th fourth feature to obtain the M first target features The i-th first target feature in the i-th second feature, the i-th third feature, the i-th fourth feature, and the i-th first target feature are all the same The feature corresponding to the i-th pixel category in the M pixel categories.

在本公开的一些实施例中，所述根据所述M个第一目标特征，对所述待分割图像进行图像分割，得到所述待分割图像的目标分割结果，包括：根据所述M个第一目标特征，对所述待分割图像进行逐类别预测，确定所述待分割图像对应的M个分割子结果，所述M个分割子结果与所述M个像素类别一一对应；根据所述M个分割子结果，确定所述目标分割结果。In some embodiments of the present disclosure, performing image segmentation on the to-be-segmented image according to the M first target features to obtain a target segmentation result of the to-be-segmented image includes: according to the M first target features a target feature, perform category-by-category prediction on the image to be segmented, and determine M segmentation sub-results corresponding to the to-be-segmented image, and the M segmentation sub-results are in one-to-one correspondence with the M pixel categories; according to the There are M segmentation sub-results, and the target segmentation result is determined.

在本公开的一些实施例中，所述根据所述M个第一目标特征，对所述待分割图像进行逐类别预测，确定所述待分割图像对应的M个分割子结果，包括：将所述M个第一目标特征输入余弦分类器，基于所述余弦分类器和所述M个第一目标特征，对所述待分割图像进行逐类别预测，确定所述M个分割子结果。In some embodiments of the present disclosure, performing category-by-category prediction on the to-be-segmented image according to the M first target features, and determining M segmentation sub-results corresponding to the to-be-segmented image, includes: The M first target features are input into a cosine classifier, and based on the cosine classifier and the M first target features, category-by-category prediction is performed on the to-be-segmented image, and the M segmentation sub-results are determined.

在本公开的一些实施例中，所述根据所述M个第一目标特征，对所述待分割图像进行逐类别预测，确定所述待分割图像对应的M个分割子结果，包括：针对所述M个第一目标特征中的第i个第一目标特征，根据所述第i个第一目标特征，确定所述待分割图像对应的所述M个分割子结果中的第i个分割子结果，所述第i个分割子结果中包括所述待分割图像中像素类别是所述M个像素类别中的第i个像素类别的像素点。In some embodiments of the present disclosure, performing category-by-category prediction on the to-be-segmented image according to the M first target features, and determining M segmentation sub-results corresponding to the to-be-segmented image, includes: The i-th first target feature in the M first target features, according to the i-th first target feature, determine the i-th segmentation sub-result in the M segmentation sub-results corresponding to the image to be segmented As a result, the i-th sub-result of segmentation includes pixels whose pixel category in the to-be-segmented image is the i-th pixel category among the M pixel categories.

在本公开的一些实施例中，所述图像分割方法通过图像分割神经网络实现。In some embodiments of the present disclosure, the image segmentation method is implemented by an image segmentation neural network.

在本公开的一些实施例中，所述图像分割神经网络的训练样本包括第一待分割样本图像、所述第一待分割样本图像的分割标注信息，以及所述第一样本数据集，所述第一待分割样本图像中包括所述M个像素类别中的至少两个像素类别；所述方法还包括：通过所述图像分割神经网络对所述第一待分割样本图像进行特征提取，得到所述第一待分割样本图像的第五特征，以及通过所述图像分割神经网络对所述M个像素类别中各像素类别对应的目标样本图像进行特征提取，得到M个第六特征，所述M个第六特征与所述M个像素类别一一对应，各所述像素类别对应的目标样本图像为各所述像素类别对应的至少一个样本图像中的任意一个；根据所述M个第六特征和所述M个像素类别中各像素类别对应的目标样本图像的标注信息，确定M个第七特征，以及将所述第五特征与所述M个第七特征进行融合，得到M个第二目标特征，所述M个第七特征和所述M个第二目标特征均与所述M个像素类别一一对应；根据所述M个第二目标特征，对所述第一待分割样本图像进行图像分割，得到所述第一待分割样本图像的分割结果；根据所述第一待分割样本图像的分割结果以及所述分割标注信息，确定分割损失；根据所述分割损失，对所述图像分割神经网络进行训练，得到训练后的图像分割神经网络。In some embodiments of the present disclosure, the training samples of the image segmentation neural network include a first sample image to be segmented, segmentation annotation information of the first sample image to be segmented, and the first sample data set, where The first sample image to be segmented includes at least two pixel classes among the M pixel classes; the method further includes: performing feature extraction on the first sample image to be segmented through the image segmentation neural network, to obtain The fifth feature of the first to-be-segmented sample image, and the feature extraction of the target sample image corresponding to each pixel category in the M pixel categories through the image segmentation neural network, to obtain M sixth features, the said The M sixth features are in one-to-one correspondence with the M pixel categories, and the target sample image corresponding to each pixel category is any one of the at least one sample image corresponding to each pixel category; according to the M sixth features feature and the annotation information of the target sample image corresponding to each pixel category in the M pixel categories, determine M seventh features, and fuse the fifth feature with the M seventh features to obtain M th Two target features, the M seventh features and the M second target features are in one-to-one correspondence with the M pixel categories; according to the M second target features, the first sample to be segmented is Perform image segmentation on the image to obtain the segmentation result of the first sample image to be segmented; determine the segmentation loss according to the segmentation result of the first sample image to be segmented and the segmentation annotation information; determine the segmentation loss according to the segmentation loss; The image segmentation neural network is trained, and the trained image segmentation neural network is obtained.

在本公开的一些实施例中，所述M个像素类别中各像素类别对应的目标样本图像的标注信息为掩膜；所述根据所述M个第六特征和所述M个像素类别中各像素类别对应的目标样本图像的标注信息，确定M个第七特征，包括：针对所述M个第六特征中的第i个第六特征，根据所述第i个第六特征以及所述M个像素类别中的第i个像素类别对应的目标样本图像的掩膜，执行掩膜平均池化操作，得到所述M个第七特征中的第i个第七特征，所述第i个第六特征和所述第i个第七特征均为与所述M个像素类别中的第i个像素类别对应的特征。In some embodiments of the present disclosure, the labeling information of the target sample image corresponding to each pixel category in the M pixel categories is a mask; Labeling information of the target sample image corresponding to the pixel category, and determining M seventh features, including: for the i-th sixth feature in the M sixth features, according to the i-th sixth feature and the M sixth features The mask of the target sample image corresponding to the ith pixel category in the pixel categories, perform the mask average pooling operation, and obtain the ith seventh feature among the M seventh features, and the ith seventh feature is obtained. Both the six features and the i-th seventh feature are features corresponding to the i-th pixel category in the M pixel categories.

在本公开的一些实施例中，在根据所述第一待分割样本图像、所述第一待分割样本图像的分割标注信息，以及所述第一样本数据集对所述图像分割神经网络进行训练之前，所述方法还包括：根据第二待分割样本图像、所述第二待分割样本图像的分割标注信息，以及第二样本数据集，对所述图像分割神经网络进行预训练，所述第二样本数据集中包括P个像素类别中各像素类别对应的多个样本图像和各样本图像对应的标注信息，所述M个像素类别是所述P个像素类别以外的新像素类别，所述第二待分割样本图像中包括所述P个像素类别中的至少两个像素类别。In some embodiments of the present disclosure, the image segmentation neural network is performed on the image segmentation neural network according to the first sample image to be segmented, segmentation annotation information of the first sample image to be segmented, and the first sample data set. Before training, the method further includes: pre-training the image segmentation neural network according to the second sample image to be segmented, the segmentation annotation information of the second sample image to be segmented, and the second sample data set, the The second sample data set includes a plurality of sample images corresponding to each of the P pixel categories and labeling information corresponding to each sample image, the M pixel categories are new pixel categories other than the P pixel categories, and the M pixel categories are new pixel categories. The second sample image to be segmented includes at least two pixel categories among the P pixel categories.

在本公开的一些实施例中，所述方法还包括：根据所述第一样本数据集和所述训练后的图像分割神经网络，确定所述M个第二特征。In some embodiments of the present disclosure, the method further includes: determining the M second features according to the first sample data set and the trained image segmentation neural network.

在本公开的一些实施例中，所述M个像素类别中各像素类别对应的样本图像的标注信息为掩膜；所述根据所述第一样本数据集和所述训练后的图像分割神经网络，确定所述M个第二特征，包括：通过所述训练后的图像分割神经网络对所述M个像素类别中各像素类别对应的样本图像进行特征提取，得到M个第八特征；针对所述M个第八特征中的第i个第八特征，根据所述第i个第八特征以及M个像素类别中第i个像素类别对应的样本图像的掩膜，执行掩膜平均池化操作，得到所述M个第二特征中的第i个第二特征；所述第i个第八特征和所述第i个第二特征均为所述M个像素类别中的第i个像素类别对应的特征。In some embodiments of the present disclosure, the labeling information of the sample images corresponding to each pixel category in the M pixel categories is a mask; the segmentation neural network according to the first sample data set and the trained image network, determining the M second features, comprising: performing feature extraction on the sample images corresponding to each pixel category in the M pixel categories through the trained image segmentation neural network, to obtain M eighth features; The i-th eighth feature in the M eighth features, according to the i-th eighth feature and the mask of the sample image corresponding to the i-th pixel category in the M pixel categories, perform mask average pooling operation to obtain the i-th second feature in the M second features; the i-th eighth feature and the i-th second feature are both the i-th pixel in the M pixel categories feature corresponding to the category.

本公开实施例还提供了一种图像分割装置，包括：特征提取模块，配置为对待分割图像进行特征提取，得到所述待分割图像的第一特征，所述待分割图像中包括N个像素类别，N是大于1的整数；特征融合模块，配置为将所述第一特征与M个第二特征进行融合，得到M个第一目标特征，所述M个第二特征和所述M个第一目标特征均与M个像素类别一一对应，所述M个第二特征是基于第一样本数据集确定得到的，所述第一样本数据集中包括所述M个像素类别中各像素类别对应的至少一个样本图像和各样本图像对应的标注信息，M大于或等于N，所述N个像素类别是所述M个像素类别的子集；图像分割模块，配置为根据所述M个第一目标特征，对所述待分割图像进行图像分割，得到所述待分割图像的目标分割结果。An embodiment of the present disclosure further provides an image segmentation apparatus, including: a feature extraction module configured to perform feature extraction on an image to be segmented to obtain a first feature of the image to be segmented, where the image to be segmented includes N pixel categories , N is an integer greater than 1; the feature fusion module is configured to fuse the first feature with M second features to obtain M first target features, the M second features and the M second features A target feature is in one-to-one correspondence with M pixel categories, the M second features are determined based on a first sample data set, and the first sample data set includes each pixel in the M pixel categories At least one sample image corresponding to the category and label information corresponding to each sample image, M is greater than or equal to N, and the N pixel categories are subsets of the M pixel categories; the image segmentation module is configured to The first target feature is to perform image segmentation on the to-be-segmented image to obtain a target segmentation result of the to-be-segmented image.

本公开实施例还提供了一种电子设备，包括：处理器；配置为存储处理器可执行指令的存储器；其中，所述处理器被配置为调用所述存储器存储的指令，以执行上述方法。An embodiment of the present disclosure further provides an electronic device, including: a processor; a memory configured to store instructions executable by the processor; wherein the processor is configured to invoke the instructions stored in the memory to execute the above method.

本公开实施例还提供了一种计算机可读存储介质，其上存储有计算机程序指令，所述计算机程序指令被处理器执行时实现上述方法。Embodiments of the present disclosure further provide a computer-readable storage medium, on which computer program instructions are stored, and when the computer program instructions are executed by a processor, the foregoing method is implemented.

本公开实施例还提供了一种计算机程序，包括计算机可读代码，当所述计算机可读代码在电子设备中运行时，所述电子设备中的处理器执行用于实现上述方法。Embodiments of the present disclosure further provide a computer program, including computer-readable codes, when the computer-readable codes are executed in an electronic device, the processor in the electronic device executes to implement the above method.

在本公开实施例中，通过对包括N个像素类别的待分割图像进行特征提取，得到待分割图像的第一特征，将第一特征与基于M个像素类别的第一样本数据集确定得到的M个第二特征进行融合，得到M个第一目标特征，由于M个第二特征可以用于体现M个像素类别的类别特征，且N个像素类别是M个像素类别的子集，因此，根据待分割图像的第一特征与M个第二特征融合得到的M个第一目标特征，对待分割图像进行图像分割，可以快速准确分割得到待分割图像的目标分割结果，从而实现了对待分割图像中多个像素类别的快速分割。In the embodiment of the present disclosure, the first feature of the to-be-segmented image is obtained by performing feature extraction on the to-be-segmented image including N pixel categories, and the first feature and the first sample data set based on the M pixel categories are determined to obtain the first feature. The M second features of the , according to the M first target features obtained by merging the first features of the image to be segmented and the M second features, the image to be segmented can be segmented by image segmentation, which can quickly and accurately segment to obtain the target segmentation result of the image to be segmented, thereby realizing the target segmentation result of the image to be segmented. Fast segmentation of multiple pixel classes in images.

应当理解的是，以上的一般描述和后文的细节描述仅是示例性和解释性的，而非限制本公开。根据下面参考附图对示例性实施例的详细说明，本公开的其它特征及方面将变得清楚。It is to be understood that the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the present disclosure. Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments with reference to the accompanying drawings.

Description of drawings

此处的附图被并入说明书中并构成本说明书的一部分，这些附图示出了符合本公开的实施例，并与说明书一起用于说明本公开的技术方案。The accompanying drawings, which are incorporated into and constitute a part of this specification, illustrate embodiments consistent with the present disclosure, and together with the description, serve to explain the technical solutions of the present disclosure.

图1a为本公开实施例的一个应用场景的示意图；FIG. 1a is a schematic diagram of an application scenario of an embodiment of the present disclosure;

图1b为本公开实施例的一种图像分割方法的流程图；FIG. 1b is a flowchart of an image segmentation method according to an embodiment of the disclosure;

图2为本公开实施例的一种图像分割神经网络的示意图；2 is a schematic diagram of an image segmentation neural network according to an embodiment of the disclosure;

图3为本公开实施例的一种逐类别预测的示意图；3 is a schematic diagram of a category-by-category prediction according to an embodiment of the present disclosure;

图4为本公开实施例的一种对图像分割神经网络进行训练的示意图；4 is a schematic diagram of training an image segmentation neural network according to an embodiment of the present disclosure;

图5为本公开实施例的一种两阶段训练图像分割神经网络的示意图；5 is a schematic diagram of a two-stage training image segmentation neural network according to an embodiment of the present disclosure;

图6为本公开实施例的一种图像分割装置的框图；6 is a block diagram of an image segmentation apparatus according to an embodiment of the disclosure;

图7为本公开实施例的一种电子设备的框图；7 is a block diagram of an electronic device according to an embodiment of the disclosure;

图8为本公开实施例的另一种电子设备的框图。FIG. 8 is a block diagram of another electronic device according to an embodiment of the disclosure.

Detailed ways

以下将参考附图详细说明本公开的各种示例性实施例、特征和方面。附图中相同的附图标记表示功能相同或相似的元件。尽管在附图中示出了实施例的各种方面，但是除非特别指出，不必按比例绘制附图。Various exemplary embodiments, features and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. The same reference numbers in the figures denote elements that have the same or similar functions. While various aspects of the embodiments are shown in the drawings, the drawings are not necessarily drawn to scale unless otherwise indicated.

在这里专用的词“示例性”意为“用作例子、实施例或说明性”。这里作为“示例性”所说明的任何实施例不必解释为优于或好于其它实施例。The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration." Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

本文中术语“和/或”，仅仅是一种描述关联对象的关联关系，表示可以存在三种关系，例如，A和/或B，可以表示：单独存在A，同时存在A和B，单独存在B这三种情况。另外，本文中术语“至少一种”表示多种中的任意一种或多种中的至少两种的任意组合，例如，包括A、B、C中的至少一种，可以表示包括从A、B和C构成的集合中选择的任意一个或多个元素。The term "and/or" in this article is only an association relationship to describe the associated objects, indicating that there can be three kinds of relationships, for example, A and/or B, it can mean that A exists alone, A and B exist at the same time, and A and B exist independently. B these three cases. In addition, the term "at least one" herein refers to any combination of any one of the plurality or at least two of the plurality, for example, including at least one of A, B, and C, and may mean including from A, B, and C. Any one or more elements selected from the set of B and C.

另外，为了更好地说明本公开，在下文的具体实施方式中给出了众多的具体细节。本领域技术人员应当理解，没有某些具体细节，本公开同样可以实施。在一些实例中，对于本领域技术人员熟知的方法、手段、元件和电路未作详细描述，以便于凸显本公开的主旨。In addition, in order to better illustrate the present disclosure, numerous specific details are set forth in the following detailed description. It will be understood by those skilled in the art that the present disclosure may be practiced without certain specific details. In some instances, methods, means, components and circuits well known to those skilled in the art have not been described in detail so as not to obscure the subject matter of the present disclosure.

图像语义分割是计算机视觉领域中一项重要的研究内容，其目标是将图像分割成具有不同语义信息的区域，并且标注每个区域相应的语义标签，例如通过对一幅图像进行图像语义分割后可为图像中的物体添加语义标签(例如，猫、桌子、椅子、墙壁等)，可应用于例如无人驾驶、场景理解等领域。当前，图像语义分割的主流方法是深度卷积神经网络(Convolutional Neural Networks，CNNs)，CNNs基于大量具有像素级的标注信息的样本图像(标注了样本图像中各像素对应的像素类别，不同像素类别具有不同的语义信息，例如，像素类别包括猫、桌子、椅子、墙壁等)进行学习，学习样本图像中各像素类别的语义特征表示，进而可以利用学习后得到的CNNs对任意大小的输入待分割图像输出像素级的图像分割结果。但是，实际应用中，获得具有像素级的标注信息的样本图像是十分不易的，并且很多像素类别的样本图像的获得十分困难，即实际应用中大部分图像分割为少样本语义分割场景。此外，一张待分割图像中可能包含具有不同语义信息的多个像素类别需要进行分割。根据本公开实施例的图像分割方法可以应用于少样本语义分割场景，且可以实现对包含具有不同语义信息的多个像素类别的待分割图像的多类别分割。Image semantic segmentation is an important research content in the field of computer vision. Its goal is to segment an image into regions with different semantic information, and label the corresponding semantic labels of each region, for example, by performing image semantic segmentation on an image. Semantic labels can be added to objects in images (eg, cats, tables, chairs, walls, etc.), which can be applied to fields such as unmanned driving, scene understanding, etc. At present, the mainstream method for image semantic segmentation is deep convolutional neural networks (CNNs). With different semantic information, for example, pixel categories include cats, tables, chairs, walls, etc.) for learning, learning the semantic feature representation of each pixel category in the sample image, and then using the learned CNNs to segment any size of input to be segmented Image output pixel-level image segmentation results. However, in practical applications, it is very difficult to obtain sample images with pixel-level annotation information, and it is very difficult to obtain sample images of many pixel categories, that is, in practical applications, most images are segmented into few-sample semantic segmentation scenes. In addition, an image to be segmented may contain multiple pixel categories with different semantic information that need to be segmented. The image segmentation method according to the embodiment of the present disclosure can be applied to a few-sample semantic segmentation scene, and can realize multi-category segmentation of an image to be segmented containing multiple pixel categories with different semantic information.

下面结合附图对本公开的应用场景进行说明。图1a为本公开实施例的一个应用场景的示意图，如图1a所示，待分割图像1可以是路况图像，可以将待分割图像1输入至上述图像分割装置60中，对待分割图像1进行分割处理，可以得到分割图像的目标分割结果；例如，在自动驾驶领域，获取自动驾驶车辆拍摄得到的前方道路对应的路况图像，该路况图像中可能包括道路、其它车辆、行人等多个像素类别，采用本公开实施例的图像分割方法对路况图像进行图像分割，得到多个像素类别的分割结果，使得可以根据分割结果对当前路况进行分析，从而作出驾驶决策。The application scenarios of the present disclosure will be described below with reference to the accompanying drawings. FIG. 1a is a schematic diagram of an application scenario of an embodiment of the present disclosure. As shown in FIG. 1a, the image 1 to be divided may be a road condition image, and the image 1 to be divided may be input into the above-mentioned image segmentation device 60, and the image 1 to be divided may be divided After processing, the target segmentation result of the segmented image can be obtained; for example, in the field of autonomous driving, the road condition image corresponding to the road ahead captured by the autonomous vehicle is obtained. The road condition image may include multiple pixel categories such as roads, other vehicles, and pedestrians. The image segmentation method of the embodiment of the present disclosure is used to perform image segmentation on the road condition image to obtain segmentation results of multiple pixel categories, so that the current road conditions can be analyzed according to the segmentation results, so that driving decisions can be made.

下面对本公开实施例的图像分割方法进行详细说明。The image segmentation method according to the embodiment of the present disclosure will be described in detail below.

图1b为本公开实施例的一种图像分割方法的流程图。该图像分割方法可以由终端设备或服务器等电子设备执行，终端设备可以为用户设备(User Equipment，UE)、移动设备、用户终端、终端、蜂窝电话、无绳电话、个人数字助理(Personal Digital Assistant，PDA)、手持设备、计算设备、车载设备、可穿戴设备等，该图像分割方法可以通过处理器调用存储器中存储的计算机可读指令的方式来实现。或者，可通过服务器执行该图像分割方法。如图1b所示，该图像分割方法可以包括：FIG. 1b is a flowchart of an image segmentation method according to an embodiment of the present disclosure. The image segmentation method can be performed by electronic equipment such as terminal equipment or server, and the terminal equipment can be user equipment (User Equipment, UE), mobile equipment, user terminal, terminal, cellular phone, cordless phone, Personal Digital Assistant (Personal Digital Assistant, PDA), handheld devices, computing devices, vehicle-mounted devices, wearable devices, etc., the image segmentation method can be implemented by the processor calling the computer-readable instructions stored in the memory. Alternatively, the image segmentation method may be performed by a server. As shown in Figure 1b, the image segmentation method may include:

在步骤S11中，对待分割图像进行特征提取，得到待分割图像的第一特征，待分割图像中包括N个像素类别，N是大于1的整数。In step S11, feature extraction is performed on the to-be-segmented image to obtain a first feature of the to-be-segmented image. The to-be-segmented image includes N pixel categories, where N is an integer greater than 1.

在步骤S12中，将第一特征与M个第二特征进行融合，得到M个第一目标特征，M个第二特征和M个第一目标特征均与M个像素类别一一对应，M个第二特征是基于第一样本数据集确定得到的，第一样本数据集中包括M个像素类别中各像素类别对应的至少一个样本图像和各样本图像的标注信息，M大于或等于N，N个像素类别是M个像素类别的子集。In step S12, the first feature and M second features are fused to obtain M first target features, M second features and M first target features are in one-to-one correspondence with M pixel categories, and M The second feature is determined based on the first sample data set. The first sample data set includes at least one sample image corresponding to each pixel category in the M pixel categories and the label information of each sample image, where M is greater than or equal to N, The N pixel classes are a subset of the M pixel classes.

在步骤S13中，根据M个第一目标特征，对待分割图像进行图像分割，得到待分割图像的目标分割结果。In step S13, image segmentation is performed on the image to be segmented according to the M first target features, and a target segmentation result of the image to be segmented is obtained.

在本公开的一些实施例中，图像分割方法通过图像分割神经网络实现。In some embodiments of the present disclosure, the image segmentation method is implemented by an image segmentation neural network.

利用图像分割神经网络，可以实现对待分割图像中多个像素类别的快速分割。Using image segmentation neural network, it is possible to achieve fast segmentation of multiple pixel categories in the image to be segmented.

图2为本公开实施例的一种图像分割神经网络的示意图。如图2所示，图像分割神经网络中包括特征提取器、分割模块、类别敏感重塑模块和余弦分类器。FIG. 2 is a schematic diagram of an image segmentation neural network according to an embodiment of the present disclosure. As shown in Figure 2, the image segmentation neural network includes a feature extractor, a segmentation module, a class-sensitive reshaping module and a cosine classifier.

如图2所示，利用图像分割神经网络中的特征提取器对待分割图像进行特征提取，得到待分割图像的第九特征；待分割图像的第九特征经过图像分割神经网络中的分割模块之后，得到待分割图像的第一特征。示例性地，可以采用带有空洞卷积的ResNet-50网络结构作为特征提取器。As shown in Figure 2, the feature extractor in the image segmentation neural network is used to extract the feature of the image to be segmented, and the ninth feature of the image to be segmented is obtained; after the ninth feature of the image to be segmented passes through the segmentation module in the image segmentation neural network, Obtain the first feature of the image to be segmented. Exemplarily, the ResNet-50 network structure with atrous convolution can be adopted as the feature extractor.

在本公开的一些实施例中，将第一特征与M个第二特征进行融合，得到M个第一目标特征，包括：针对M个第二特征中的第i个第二特征，对第一特征与第i个第二特征执行特征乘法，得到第i个第三特征，1≤i≤M；对第一特征与第i个第二特征执行特征减法，得到第i个第四特征；对第一特征、第i个第三特征以及第i个第四特征进行特征连接，得到M个第一目标特征中的第i个第一目标特征；第i个第二特征、第i个第三特征、第i个第四特征以及第i个第一目标特征均为与M个像素类别中的第i个像素类别对应的特征。In some embodiments of the present disclosure, the first feature and M second features are fused to obtain M first target features, including: for the i-th second feature in the M second features, for the first Perform feature multiplication with the i-th second feature to obtain the i-th third feature, 1≤i≤M; perform feature subtraction on the first feature and the i-th second feature to obtain the i-th fourth feature; The first feature, the ith third feature, and the ith fourth feature are feature-connected to obtain the ith first target feature among the M first target features; the ith second feature, the ith third feature The feature, the ith fourth feature, and the ith first target feature are all features corresponding to the ith pixel category in the M pixel categories.

由于在对待分割图像进行图像分割的过程中，引入了M个像素类别对应的M个第二特征，为了避免M个第二特征带来的噪声影响，采用多特征聚合方法，对待分割图像的第一特征、待分割图像的第一特征与第i个第二特征执行特征乘法得到的第i个第三特征，以及待分割图像的第一特征与第i个第二特征执行特征减法得到的第i个第四特征进行特征连接，从而得到能够提高分割预测准确性的M个第一目标特征中对应第i个像素类别的第i个第一目标特征。In the process of image segmentation of the image to be segmented, M second features corresponding to the M pixel categories are introduced. In order to avoid the noise effect caused by the M second features, a multi-feature aggregation method is adopted to obtain the first feature of the image to be segmented. A feature, the ith third feature obtained by performing feature multiplication between the first feature of the image to be segmented and the ith second feature, and the ith third feature obtained by performing feature subtraction on the first feature of the image to be segmented and the ith second feature The i fourth features are feature-connected, thereby obtaining the i-th first target feature corresponding to the i-th pixel category among the M first target features that can improve the accuracy of segmentation prediction.

仍以上述图2为例，如图2所示，图像分割神经网络中的类别敏感重塑模块执行上述多特征融合方法，对待分割图像的第一特征与M个第二特征分别执行特征乘法，得到M个第三特征，对待分割图像的第一特征与M个第二特征分别执行特征减法，得到M个第四特征，对第一特征、M个第三特征以及M个第四特征进行特征连接，从而得到M个第一目标特征。Still taking the above Figure 2 as an example, as shown in Figure 2, the category-sensitive reshaping module in the image segmentation neural network performs the above-mentioned multi-feature fusion method, and performs feature multiplication respectively for the first feature of the image to be segmented and the M second features, Obtain M third features, perform feature subtraction on the first features of the image to be segmented and M second features respectively, obtain M fourth features, and perform features on the first features, M third features and M fourth features connected to obtain M first target features.

例如，待分割图像的第一特征为F，M个第二特征中的第i个第二特征为ω _i，则可以通过下述公式(1)确定M个第一目标特征中的第i个第一目标特征F _i： For example, if the first feature of the image to be segmented is F, and the ith second feature among the M second features is ω _i , the ith feature among the M first target features can be determined by the following formula (1) The first target feature F _i :

在一种可能的实现方式中，根据M个第一目标特征，对待分割图像进行图像分割，得到待分割图像的目标分割结果，包括：根据M个第一目标特征，对待分割图像进行逐类别预测，确定待分割图像对应的M个分割子结果，M个分割子结果与M个像素类别一一对应；根据M个分割子结果，确定目标分割结果。In a possible implementation manner, performing image segmentation on the image to be segmented according to the M first target features to obtain a target segmentation result of the image to be segmented, including: performing category-by-category prediction on the image to be segmented according to the M first target features , determine the M segmentation sub-results corresponding to the image to be segmented, and the M segmentation sub-results are in one-to-one correspondence with the M pixel categories; determine the target segmentation result according to the M segmentation sub-results.

由于M个第一目标特征结合了M个像素类别的类别特征以及待分割图像的第一特征，因此，根据M个第一目标特征对待分割图像进行逐类别预测，可以得到与M个像素类别一一对应的M个分割子结果，从而通过组合M个分割子结果，可以得到待分割图像的目标分割结果。Since the M first target features combine the category features of the M pixel categories and the first features of the image to be segmented, the category-by-category prediction of the image to be segmented according to the M first target features can obtain the same number as the M pixel categories. One corresponding M segmentation sub-results, so by combining the M segmentation sub-results, the target segmentation result of the image to be segmented can be obtained.

例如，在自动驾驶领域，获取自动驾驶车辆拍摄得到的前方道路对应的路况图像，该路况图像中可能包括道路、其它车辆、行人等多个像素类别，路况图像中包括的像素类别是M个像素类别的子集。采用本公开实施例的图像分割方法，对路况图像进行特征提取得到第一特征，进而将第一特征与M个第二特征进行融合得到M个第一目标特征，根据M个第一目标特征对路况图像进行逐类别预测，可以得到M个分割子结果(对应道路像素类别的分割子结果、对应其它车辆像素类别的分割子结果、对应行人像素类别的分割子结果等)，通过组合M个分割子结果，得到路况图像的分割结果，进而可以根据分割结果对当前路况进行分析，从而作出驾驶决策。For example, in the field of autonomous driving, a road condition image corresponding to the road ahead captured by an autonomous vehicle is obtained. The road condition image may include multiple pixel categories such as roads, other vehicles, pedestrians, etc. The pixel category included in the road condition image is M pixels A subset of categories. Using the image segmentation method of the embodiment of the present disclosure, feature extraction is performed on a road condition image to obtain a first feature, and then the first feature and M second features are fused to obtain M first target features, and M first target features are obtained according to the M first target features. The road condition image is predicted by category, and M segmentation sub-results can be obtained (segmentation sub-results corresponding to road pixel categories, segmentation sub-results corresponding to other vehicle pixel categories, segmentation sub-results corresponding to pedestrian pixel categories, etc.), by combining M segmentation sub-results Sub-results are obtained to obtain the segmentation results of the road condition images, and then the current road conditions can be analyzed according to the segmentation results, so as to make driving decisions.

在本公开的一些实施例中，根据M个第一目标特征，对待分割图像进行逐类别预测，确定待分割图像对应的M个分割子结果，包括：将M个第一目标特征输入余弦分类器，基于余弦分类器和M个第一目标特征，对待分割图像进行逐类别预测，确定M个分割子结果。In some embodiments of the present disclosure, performing category-by-category prediction on the image to be segmented according to the M first target features, and determining M segmentation sub-results corresponding to the to-be-segmented image, includes: inputting the M first target features into a cosine classifier , based on the cosine classifier and the M first target features, perform category-by-category prediction on the image to be segmented, and determine M segmentation sub-results.

由于余弦分类器可以减少类别内部的差异，实现更好地分类性能，因此，利用余弦分类器和M个第一目标特征对待分割图像进行逐类别预测，可以有效得到待分割图像对应的M个分割子结果，进而得到待分割图像的目标分割结果。Since the cosine classifier can reduce the differences within the category and achieve better classification performance, using the cosine classifier and the M first target features to predict the image to be segmented by category can effectively obtain M segmentations corresponding to the image to be segmented Sub-results are obtained, and then the target segmentation results of the images to be segmented are obtained.

仍以上述图2为例，如图2所示，将M个第一目标特征输入图像分割神经网络中的同一个余弦分类器，进而余弦分类器可以输出待分割图像的目标分割结果。Still taking the above FIG. 2 as an example, as shown in FIG. 2 , the M first target features are input into the same cosine classifier in the image segmentation neural network, and then the cosine classifier can output the target segmentation result of the image to be segmented.

在本公开的一些实施例中，根据M个第一目标特征，对待分割图像进行逐类别预测，确定待分割图像对应的M个分割子结果，包括：针对M个第一目标特征中的第i个第一目标特征，根据第i个第一目标特征，确定待分割图像对应的M个分割子结果中的第i个分割子结果，第i个分割子结果中包括待分割图像中像素类别是M个像素类别中的第i个像素类别的像素点。In some embodiments of the present disclosure, performing category-by-category prediction on the image to be segmented according to the M first target features, and determining M segmentation sub-results corresponding to the to-be-segmented image, including: for the ith in the M first target features a first target feature, according to the i-th first target feature, determine the i-th segmentation sub-result among the M segmentation sub-results corresponding to the image to be segmented, and the i-th segmentation sub-result includes the pixel category in the image to be segmented. The pixel point of the ith pixel class among the M pixel classes.

由于第i个第一目标特征主要用于对M个像素类别中的第i个像素类别进行图像分割，因此，根据第i个第一目标特征，可以有效对待分割图像中像素类别为第i个像素类别的像素点进行图像分割，得到待分割图像对应的M个分割子结果中的第i个分割子结果。Since the ith first target feature is mainly used for image segmentation of the ith pixel category among the M pixel categories, according to the ith first target feature, the pixel category in the image to be segmented can be effectively the ith pixel category Image segmentation is performed on the pixel points of the pixel category to obtain the i-th segmentation sub-result among the M segmentation sub-results corresponding to the image to be segmented.

图3为本公开实施例的一种逐类别预测的示意图。如图3所示，针对M个第一目标特征中的第i个第一目标特征，第i个第一目标特征经过图像神经网络中的余弦分类器之后，余弦分类器可以输出第i个第一目标特征对应的M个分割结果，各分割结果中包括待分割图像中像素类别是M个像素类别中各像素类别的像素点。FIG. 3 is a schematic diagram of a category-by-category prediction according to an embodiment of the present disclosure. As shown in Figure 3, for the ith first target feature in the M first target features, after the ith first target feature passes through the cosine classifier in the image neural network, the cosine classifier can output the ith first target feature. M segmentation results corresponding to a target feature, and each segmentation result includes pixel points whose pixel categories in the image to be segmented are each of the M pixel categories.

例如，M＝3，i＝1时，存在3个像素类别以及与之一一对应的3个第一目标特征，针对3个第一目标特征中的第1个第一目标特征(对应3个像素类别中的第1个像素类别)，第1个第一目标特征经过图像分割神经网络中的余弦分类器之后，余弦分类器输出第1个第一目标特征对应的3个分割结果：第1个分割结果中包括待分割图像像素类别是第1个像素类别的像素点；第2个分割结果中包括待分割图像中像素类别是第2个像素类别的像素点；第3个分割结果中包括待分割图像中像素类别是第3个像素类别的像素点。由于第1个第一目标特征主要用于对第1个像素类别进行图像分割，因此，仅提取第1个第一目标特征对应的3个分割结果中的第1个分割结果，用于确定为待分割图像对应的第1个分割子结果。For example, when M=3, i=1, there are 3 pixel categories and 3 first target features corresponding to one of them. For the first first target feature (corresponding to 3 first target features) The first pixel category in the pixel category), after the first first target feature passes through the cosine classifier in the image segmentation neural network, the cosine classifier outputs three segmentation results corresponding to the first first target feature: the first The first segmentation result includes pixels whose pixel category of the image to be segmented is the first pixel category; the second segmentation result includes pixels whose pixel category is the second pixel category in the image to be segmented; the third segmentation result includes The pixel category in the image to be segmented is the pixel of the third pixel category. Since the first first target feature is mainly used for image segmentation of the first pixel category, only the first segmentation result among the three segmentation results corresponding to the first first target feature is extracted to determine as The first segmentation sub-result corresponding to the image to be segmented.

例如，M＝3时，存在3个像素类别以及与之一一对应的3个第一目标特征，可以根据第1个第一目标特征(对应3个像素类别中的第1个像素类别)确定包括待分割图像中像素类别是第1个像素类别的像素点的第1个分割子结果，根据第2个第一目标特征(对应3个像素类别中的第2个像素类别)确定包括待分割图像中像素类别是第2个像素类别的像素点的第2个分割子结果，以及根据第3个第一目标特征(对应3个像素类别中的第3个像素类别)确定包括待分割图像中像素类别是第3个像素类别的像素点的第3个分割子结果，最终根据第1个分割子结果、第2个分割子结果以及第3个分割子结果，得到待分割样本图像的目标分割结果。For example, when M=3, there are 3 pixel categories and 3 first target features corresponding to one of them, which can be determined according to the first first target feature (corresponding to the first pixel category in the 3 pixel categories) Include the first segmentation sub-result of pixels whose pixel class is the first pixel class in the image to be segmented, and determine the pixel to be segmented according to the second first target feature (corresponding to the second pixel class in the three pixel classes). The pixel category in the image is the second segmentation sub-result of the pixel point of the second pixel category, and according to the third first target feature (corresponding to the third pixel category in the three pixel categories), it is determined to be included in the image to be segmented. The pixel category is the third segmentation sub-result of the pixels of the third pixel category. Finally, according to the first segmentation sub-result, the second segmentation sub-result and the third segmentation sub-result, the target segmentation of the sample image to be segmented is obtained. result.

在利用图像分割神经网络对待分割图像中多个像素类别进行快速分割之前，还需要对图像分割神经网络进行训练。对图像分割神经网络进行训练，即对图像分割神经网络中的特征提取器、分割模块、类别敏感重塑模块和余弦分类器均进行训练。Before using the image segmentation neural network to quickly segment multiple pixel categories in the image to be segmented, the image segmentation neural network also needs to be trained. The image segmentation neural network is trained, that is, the feature extractor, segmentation module, class-sensitive reshaping module and cosine classifier in the image segmentation neural network are trained.

下面对图像分割神经网络的训练过程进行详细说明。The training process of the image segmentation neural network is described in detail below.

在本公开的一些实施例中，图像分割神经网络的训练样本包括第一待分割样本图像、第一待分割样本图像的分割标注信息，以及第一样本数据集，第一待分割样本图像中包括M个像素类别中的至少两个像素类别；该图像分割方法还包括：通过图像分割神经网络对第一待分割样本图像进行特征提取，得到第一待分割样本图像的第五特征，以及通过图像分割神经网络对M个像素类别中各像素类别对应的目标样本图像进行特征提取，得到M个第六特征，M个第六特征与M个像素类别一一对应，各像素类别对应的目标样本图像为各像素类别对应的至少一个样本图像中的任意一个；根据M个第六特征和M个像素类别中各像素类别对应的目标样本图像的标注信息，确定M个第七特征，以及将第五特征和M个第七特征进行融合，得到M个第二目标特征，M个第七特征和M个第二目标特征均与M个像素类别一一对应；根据M个第二目标特征，对第一待分割样本图像进行图像分割，得到第一待分割样本图像的分割结果；根据第一待分割样本图像的分割结果以及分割标注信息，确定分割损失；根据分割损失，对图像分割神经网络进行训练，得到训练后的图像分割神经网络。In some embodiments of the present disclosure, the training samples of the image segmentation neural network include a first sample image to be segmented, segmentation annotation information of the first sample image to be segmented, and a first sample data set, in which the first sample image to be segmented Including at least two pixel categories in the M pixel categories; the image segmentation method further includes: performing feature extraction on the first sample image to be segmented through an image segmentation neural network to obtain the fifth feature of the first sample image to be segmented, and by The image segmentation neural network performs feature extraction on the target sample images corresponding to each pixel category in the M pixel categories, and obtains M sixth features, the M sixth features are in one-to-one correspondence with the M pixel categories, and the target samples corresponding to each pixel category are obtained. The image is any one of at least one sample image corresponding to each pixel category; according to the M sixth features and the label information of the target sample image corresponding to each pixel category in the M pixel categories, determine the M seventh features, and assign the first The five features and the M seventh features are fused to obtain M second target features, and the M seventh features and the M second target features are in one-to-one correspondence with the M pixel categories; Perform image segmentation on the first sample image to be segmented to obtain the segmentation result of the first sample image to be segmented; determine the segmentation loss according to the segmentation result of the first sample image to be segmented and the segmentation annotation information; and perform the image segmentation neural network according to the segmentation loss. Train to get the trained image segmentation neural network.

利用包括M个像素类别中各像素类别对应的至少一个样本图像和各样本图像的标注信息的第一样本数据集、包括M个像素类别中的至少两个像素类别的第一待分割样本图像以及第一待分割样本图像的分割标注信息，训练图像分割神经网络对至少两个像素类别进行快速分割，从而使得训练后的图像分割神经网络可以快速准确分割得到包括M个像素类别中至少两个像素类别的待分割图像的目标分割结果，从而实现对至少两个像素类别的快速分割。Utilize the first sample data set including at least one sample image corresponding to each pixel class among the M pixel classes and the labeling information of each sample image, and the first sample image to be segmented including at least two pixel classes among the M pixel classes and the segmentation annotation information of the first sample image to be segmented, the image segmentation neural network is trained to rapidly segment at least two pixel categories, so that the trained image segmentation neural network can quickly and accurately segment to obtain at least two pixel categories including M pixel categories. The target segmentation result of the image to be segmented of the pixel category, so as to realize the fast segmentation of at least two pixel categories.

图4为本公开实施例的一种对图像分割神经网络进行训练的示意图。如图4所示，将第一待分割样本图像、M个像素类别中各像素类别对应的目标样本图像，以及各目标样本图像的标注信息输入图像分割神经网络。第一待分割样本图像和M个像素类别中各像素类别对应的目标样本图像之间共享图像分割神经网络中的特征提取器。FIG. 4 is a schematic diagram of training an image segmentation neural network according to an embodiment of the present disclosure. As shown in FIG. 4 , the first sample image to be segmented, the target sample image corresponding to each pixel category in the M pixel categories, and the labeling information of each target sample image are input into the image segmentation neural network. The feature extractor in the image segmentation neural network is shared between the first sample image to be segmented and the target sample image corresponding to each of the M pixel classes.

利用共享的特征提取器分别对第一待分割样本图像和M个像素类别中各像素类别对应的目标样本图像进行特征提取，得到第一待分割样本图像的第十特征，以及M个第六特征。第一待分割样本图像的第十特征经过图像分割神经网络中的分割模块之后，得到第一待分割样本图像的第五特征。Use the shared feature extractor to perform feature extraction on the first sample image to be segmented and the target sample image corresponding to each pixel category in the M pixel categories to obtain the tenth feature of the first sample image to be segmented and M sixth features . After the tenth feature of the first sample image to be segmented passes through the segmentation module in the image segmentation neural network, the fifth feature of the first sample image to be segmented is obtained.

在本公开的一些实施例中，M个像素类别中各像素类别对应的目标样本图像的标注信息为掩膜；根据M个第六特征和M个像素类别中各像素类别对应的目标样本图像的标注信息，确定M个第七特征，包括：针对M个第六特征中的第i个第六特征，根据第i个第六特征以及M个像素类别中第i个像素类别对应的目标样本图像的掩膜，执行掩膜平均池化操作，得到M个第七特征中的第i个第七特征，第i个第六特征和第i个第七特征均为与M个像素类别中的第i个像素类别对应的特征。In some embodiments of the present disclosure, the labeling information of the target sample image corresponding to each pixel category in the M pixel categories is a mask; according to the M sixth features and the target sample image corresponding to each pixel category in the M pixel categories Labeling information, determining M seventh features, including: for the i-th sixth feature in the M sixth features, according to the i-th sixth feature and the target sample image corresponding to the i-th pixel category in the M pixel categories mask, perform the mask average pooling operation, and obtain the i-th seventh feature in the M seventh features, the i-th sixth feature and the i-th seventh feature are the same as the M pixel categories. Features corresponding to i pixel categories.

由于直接将特征提取器提取得到的M个第六特征与第一待分割样本图像的第五特征进行融合时，计算量较大，因此，利用对应掩膜对M个第六特征执行掩膜平均池化操作，得到M个第七特征，从而使得后续将M个第七特征与第一待分割样本图像的第五特征进行融合时，可以降低计算量，快速得到M个第二目标特征。Since the M sixth features extracted by the feature extractor are directly fused with the fifth feature of the first sample image to be segmented, the amount of calculation is large, therefore, mask averaging is performed on the M sixth features by using the corresponding mask. The pooling operation obtains M seventh features, so that when the M seventh features are subsequently fused with the fifth feature of the first sample image to be segmented, the amount of computation can be reduced, and M second target features can be obtained quickly.

仍以上述图4为例，如图4所示，针对M个第六特征中的第i个第六特征，图像分割神经网络中的类别敏感重塑模块，根据第i个第六特征以及M个像素类别中第i个像素类别对应的目标样本图像的掩膜，执行掩膜平均池化操作，得到M个第七特征中与第i个像素类别对应的第i个第七特征。Still taking the above Figure 4 as an example, as shown in Figure 4, for the i-th sixth feature in the M sixth features, the category-sensitive reshaping module in the image segmentation neural network, according to the i-th sixth feature and M For the mask of the target sample image corresponding to the ith pixel category in the pixel categories, the mask average pooling operation is performed to obtain the ith seventh feature corresponding to the ith pixel category among the M seventh features.

例如，第i个像素类别对应的目标图像为S，第i个第六特征为F _S,i，第i个像素类别对应的目标图像的掩膜为M _i，则可以通过下述公式(2)确定M个第七特征中与第i个像素类别对应的第i个第七特征ω _i： For example, if the target image corresponding to the i-th pixel category is S, the i-th sixth feature is F _S,i , and the mask of the target image corresponding to the i-th pixel category is M _i , the following formula (2 ) determine the i-th seventh feature ω _i corresponding to the i-th pixel category among the M seventh features:

其中，(x,y)是第i个第六特征F _S,i对应的特征图像中的像素点位置；1[·]是一个指示函数，当

成立时，

的值为1；当

不成立时，

的值为0。 Among them, (x,y) is the pixel position in the feature image corresponding to the ith sixth feature F _S,i ; 1[ ] is an indicator function, when

When established,

is 1; when

When not established,

value of 0.

在本公开的一些实施例中，将第五特征和M个第七特征进行融合，得到M个第二目标特征，包括：针对M个第七特征中的第i个第七特征，对第五特征与第i个第七特征执行特征乘法，得到第i个第十一特征，1≤i≤M；对第五特征与第i个第七特征执行特征减法，得到第i个第十二特征；对第五特征、第i个第十一特征以及第i个第十二特征进行特征连接，得到M个第二目标特征中的第i个第二目标特征；第i个第七特征、第i个第十一特征、第i个第十二特征以及第i个第二目标特征均为与M个像素类别中的第i个像素类别对应的特征。In some embodiments of the present disclosure, the fifth feature and the M seventh features are fused to obtain M second target features, including: for the i-th seventh feature in the M seventh features, for the fifth Perform feature multiplication with the i-th seventh feature to obtain the i-th eleventh feature, 1≤i≤M; perform feature subtraction on the fifth feature and the i-th seventh feature to obtain the i-th twelfth feature ; Carry out feature connection to the fifth feature, the i-th eleventh feature and the i-th twelfth feature to obtain the i-th second target feature in the M second target features; the i-th seventh feature, the i-th second target feature The i eleventh feature, the i twelfth feature, and the i second target feature are all features corresponding to the i pixel class among the M pixel classes.

由于在一次训练过程中，M个像素类别中各像素类别对应的目标样本图像是随机选择的，为了避免随机选择带来的噪声影响，采用多特征聚合方法。仍以上述图4为例，如图4所示，图像分割神经网络中的类别敏感重塑模块执行上述多特征融合方法，对第一待分割样本图像的第五特征与M个第七特征执行特征乘法，得到M个第十一特征，以及对第一待分割样本图像的第五特征与M个第七特征执行特征减法，得到M个第十二特征，对第五特征、M个第十一特征以及M个第十二特征进行特征连接，从而得到M个第二目标特征。特征连接具体方式可以与上述公式(1)类似，这里不再赘述。Since in a training process, the target sample images corresponding to each pixel category in the M pixel categories are randomly selected, in order to avoid the effect of noise caused by random selection, a multi-feature aggregation method is adopted. Still taking the above Figure 4 as an example, as shown in Figure 4, the category-sensitive reshaping module in the image segmentation neural network performs the above-mentioned multi-feature fusion method, and performs the fifth feature and M seventh features of the first sample image to be segmented. feature multiplication to obtain M eleventh features, and perform feature subtraction on the fifth feature and M seventh features of the first sample image to be segmented to obtain M twelfth features, and perform feature subtraction on the fifth feature and M tenth features. A feature and the M twelfth features are feature-connected to obtain M second target features. The specific manner of the feature connection may be similar to the above formula (1), which will not be repeated here.

仍以上述图4为例，如图4所示，将M个第二目标特征输入图像分割神经网络中的同一余弦分类器，余弦分类器对待分割样本图像进行逐类别预测，进而输出第一待分割样本图像的分割结果。具体分割过程与上述对待分割图像的分割过程类似，这里不再赘述。Still taking the above Figure 4 as an example, as shown in Figure 4, the M second target features are input into the same cosine classifier in the image segmentation neural network, and the cosine classifier performs category-by-category prediction on the sample images to be segmented, and then outputs the first Segmentation result of segmenting the sample image. The specific segmentation process is similar to the foregoing segmentation process of the image to be segmented, and details are not described here.

由于训练样本中包括第一待分割样本图像的分割标注信息，因此，根据第一待分割样本图像的分割结果以及第一待分割样本图像的分割标注信息，可以确定图像分割神经网络的分割损失，进而根据分割损失，调整图像分割神经网络的网络参数(调整特征提取器、分割模块、类别敏感重塑模块、余弦分类器的网络参数)，以完成对图像分割神经网络的本次训练。通过进行多次迭代训练，得到符合预设要求的训练后的图像分割神经网络。Since the training sample includes the segmentation annotation information of the first sample image to be segmented, the segmentation loss of the image segmentation neural network can be determined according to the segmentation result of the first sample image to be segmented and the segmentation annotation information of the first sample image to be segmented, Then, according to the segmentation loss, adjust the network parameters of the image segmentation neural network (adjust the network parameters of the feature extractor, segmentation module, class-sensitive reshaping module, and cosine classifier) to complete the training of the image segmentation neural network. By performing multiple iterations of training, a trained image segmentation neural network that meets the preset requirements is obtained.

在本公开的一些实施例中，可以利用交叉熵损失函数来确定分割损失，也可以采用其它损失函数来确定分割损失，本公开对此不作具体限定。In some embodiments of the present disclosure, a cross-entropy loss function may be used to determine the segmentation loss, or other loss functions may be used to determine the segmentation loss, which is not specifically limited in this disclosure.

在本公开的一些实施例中，在根据第一待分割样本图像、第一待分割样本图像的分割标注信息，以及第一样本数据集对图像分割神经网络进行训练之前，该图像分割方法还包括：根据第二待分割样本图像、第二待分割样本图像的分割标注信息，以及第二样本数据集，对图像分割神经网络进行预训练，第二样本数据集中包括P个像素类别中各像素类别对应的多个样本图像和各样本图像的标注信息，M个像素类别是P个像素类别以外的新像素类别，第二待分割样本图像中包括P个像素类别中的至少两个像素类别。In some embodiments of the present disclosure, before the image segmentation neural network is trained according to the first sample image to be segmented, the segmentation annotation information of the first sample image to be segmented, and the first sample data set, the image segmentation method further Including: pre-training the image segmentation neural network according to the second sample image to be divided, the segmentation annotation information of the second sample image to be divided, and the second sample data set, and the second sample data set includes each pixel in the P pixel categories The multiple sample images corresponding to the categories and the labeling information of each sample image, the M pixel categories are new pixel categories other than the P pixel categories, and the second sample image to be segmented includes at least two pixel categories in the P pixel categories.

P个像素类别是基类别，即P个像素类别中的各像素类别均对应多个样本图像，M个像素类别是P个像素类别以外的新像素类别，即M个像素类别中的各像素类别对应的样本图像较少，例如，每个像素类别仅对应1个样本图像(1-shot)，或者，每个像素类别仅对应5个样本图像(5-shot)。本公开实施例中，M个像素类别中各像素类别对应的样本图像还可以扩展到10-shot，或者扩展到更多shot数，本公开对此不作具体限定。The P pixel categories are the base categories, that is, each pixel category in the P pixel categories corresponds to multiple sample images, and the M pixel categories are new pixel categories other than the P pixel categories, that is, each pixel category in the M pixel categories The corresponding sample images are few, for example, each pixel category corresponds to only 1 sample image (1-shot), or each pixel category corresponds to only 5 sample images (5-shot). In the embodiment of the present disclosure, the sample images corresponding to each pixel category in the M pixel categories may also be extended to 10-shot, or the number of more shots, which is not specifically limited in the present disclosure.

先利用P个像素类别对应的第二图像数据集、第二待分割样本图像以及第二待分割样本图像的分割标注信息对图像分割神经网络进行第一阶段的训练，使得经过第一阶段训练的图像分割神经网络具备对多个像素类别进行快速分割的能力，进而利用M个像素类别对应的第一图像数据集、第一待分割样本图像、第一待分割样本图像的分割标注信息对图像分割神经网络进行第二阶段的训练，使得经过第二阶段训练的图像分割神经网络具备对作为新类别的M个像素类别中多个像素类别进行快速分割的能力。First, the image segmentation neural network is trained in the first stage by using the second image data set corresponding to the P pixel categories, the second sample image to be divided, and the segmentation annotation information of the second sample image to be divided. The image segmentation neural network has the ability to quickly segment multiple pixel categories, and then uses the first image data set corresponding to the M pixel categories, the first sample image to be segmented, and the segmentation annotation information of the first sample image to be segmented to segment the image. The neural network is trained in the second stage, so that the image segmentation neural network trained in the second stage has the ability to quickly segment multiple pixel categories in the M pixel categories as new categories.

利用P个像素类别对应的第二图像数据集、第二待分割样本图像、第二待分割样本图像的分割标注信息对图像分割神经网络进行的第一阶段训练的具体训练过程，与上述利用M个像素类别对应的第一图像数据集、第一待分割样本图像、第一待分割样本图像的分割标注信息对图像分割神经网络进行的第二阶段训练的具体训练过程类似，这里不再赘述。The specific training process of the first-stage training of the image segmentation neural network using the second image data set corresponding to the P pixel categories, the second sample image to be segmented, and the segmentation annotation information of the second sample image to be segmented is the same as the above-mentioned use of M The specific training process of the second-stage training of the image segmentation neural network for the first image data set corresponding to each pixel category, the first sample image to be segmented, and the segmentation annotation information of the first sample image to be segmented is similar, and will not be repeated here.

图5为本公开实施例的一种两阶段训练图像分割神经网络的示意图。如图5所示，首先利用作为基类别的P个像素类别对应的第二样本数据集、第二待分割样本图像和第二待分割样本图像的分割标注信息，对图像分割神经网络进行第一阶段训练；进而利用作为新类别的M个像素类别对应的第一样本数据集、第一待分割样本图像和第一待分割样本图像的分割标注信息，对经过第一阶段训练的图像分割神经网络进行第二阶段训练，得到最终训练后的图像分割神经网络。FIG. 5 is a schematic diagram of a two-stage training image segmentation neural network according to an embodiment of the present disclosure. As shown in FIG. 5 , firstly, using the second sample data set corresponding to the P pixel categories as the base category, the second to-be-segmented sample image and the segmentation annotation information of the second to-be-segmented sample image, the image segmentation neural network performs the first stage training; and then use the first sample data set, the first sample image to be segmented and the segmentation annotation information of the first sample image to be segmented corresponding to the M pixel categories as new categories to segment the neural network of the images trained in the first stage. The network is trained in the second stage to obtain the final trained image segmentation neural network.

在本公开的一些实施例中，该图像分割方法还包括：根据第一样本数据集和训练后的图像分割神经网络，确定M个第二特征。In some embodiments of the present disclosure, the image segmentation method further includes: determining M second features according to the first sample data set and the trained image segmentation neural network.

经过上述两阶段训练后的图像分割神经网络，可以确定用于后续对作为新类别的M个像素类别进行分割的M个第二特征。The image segmentation neural network after the above two-stage training can determine M second features for subsequent segmentation of the M pixel categories as new categories.

在本公开的一些实施例中，根据第一样本数据集和训练后的图像分割神经网络，确定M个第二特征，包括：通过训练后的图像分割神经网络对M个像素类别中各像素类别对应的样本图像进行特征提取，得到M个第八特征；针对M个第八特征中的第i个第八特征，根据第i个第八特征以及M个像素类别中第i个像素类别对应的样本图像的掩膜，执行掩膜平均池化操作，得到M个第二特征中的第i个第二特征；第i个第八特征和第i个第二特征均为M个像素类别中的第i个像素类别对应的特征。In some embodiments of the present disclosure, determining the M second features according to the first sample data set and the trained image segmentation neural network includes: using the trained image segmentation neural network to segment each pixel in the M pixel categories The sample images corresponding to the categories are subjected to feature extraction to obtain M eighth features; for the i-th eighth feature in the M eighth features, according to the i-th eighth feature and the i-th pixel category in the M pixel categories corresponding to The mask of the sample image, perform the mask average pooling operation, and obtain the ith second feature among the M second features; the ith eighth feature and the ith second feature are both in the M pixel categories The feature corresponding to the ith pixel category of .

针对M个像素类别中的第i个像素类别，利用经过二阶段训练后的图像分割神经网络中的特征提取器，对第i个像素类别对应的样本图像进行特征提取，得到第i个第八特征(对应M个像素类别中的第i个像素类别)，进而利用经过二阶段训练后的图像分割神经网络中的类别敏感重塑模块，根据第i个第八特征以及第i个像素类别对应的样本图像的掩膜，执行掩膜平均池化操作，得到第i个第二特征(对应M个像素类别中的第i个像素类别)。特征提取器和类别敏感模块的具体处理过程与上述训练过程类似，这里不再赘述。For the ith pixel category in the M pixel categories, the feature extractor in the image segmentation neural network after two-stage training is used to extract the feature of the sample image corresponding to the ith pixel category, and the ith pixel category is obtained. feature (corresponding to the i-th pixel category in the M pixel categories), and then use the category-sensitive reshaping module in the image segmentation neural network after the second-stage training, according to the i-th eighth feature and the i-th pixel category corresponds to The mask of the sample image is performed, and the mask average pooling operation is performed to obtain the ith second feature (corresponding to the ith pixel category in the M pixel categories). The specific processing process of the feature extractor and the category-sensitive module is similar to the above-mentioned training process, and will not be repeated here.

在本公开的一些实施例中，在M个像素类别中各像素类别均只对应一个样本图像(1-shot)时，仅执行一次上次特征提取和掩膜平均池化操作，即可得到用于对M个像素类别进行图像分割的M个第二特征。在M个像素类别中各像素类别对应多个样本图像时，重复执行多次上次特征提取和掩膜平均池化操作，得到用于对M个像素类别进行图像分割的M个第二特征。In some embodiments of the present disclosure, when each pixel category in the M pixel categories corresponds to only one sample image (1-shot), the last feature extraction and mask average pooling operations are performed only once to obtain the M second features for image segmentation for M pixel categories. When each of the M pixel categories corresponds to multiple sample images, the previous feature extraction and mask average pooling operations are repeatedly performed for multiple times to obtain M second features for image segmentation of the M pixel categories.

例如，在5-shot场景下，重复执行5次上次特征提取和掩膜平均池化操作，针对同一像素类别，每次选取的样本图像不同，对5次得到M个像素类别中各像素类别对应第二特征取平均，得到最终M个第二特征。For example, in the 5-shot scenario, repeat the last feature extraction and mask average pooling operations 5 times. For the same pixel category, each time the selected sample images are different, and each pixel category in the M pixel categories is obtained for 5 times. The average corresponding to the second features is obtained to obtain the final M second features.

利用经过二阶段训练后的图像分割神经网络以及M个像素类别对应的第一样本数据集，确定用于体现M个像素类别的类别特征的M个第二特征之后，在后续实际图像分割过程中，无需再将第一图像数据集输入图像分割神经网络，而仅需将待分割图像和M个第二特征输入图像分割神经网络，即可实现对待分割图像中多个像素类别的快速分割。Using the image segmentation neural network after the two-stage training and the first sample data set corresponding to the M pixel categories, after determining the M second features used to reflect the category features of the M pixel categories, in the subsequent actual image segmentation process In the method, it is not necessary to input the first image data set into the image segmentation neural network, but only need to input the image to be segmented and M second features into the image segmentation neural network, so as to realize the fast segmentation of multiple pixel categories in the image to be segmented.

可以理解，本公开提及的上述各个方法实施例，在不违背原理逻辑的情况下，均可以彼此相互结合形成结合后的实施例，限于篇幅，本公开不再赘述。本领域技术人员可以理解，在具体实施方式的上述方法中，各步骤的具体执行顺序应当以其功能和可能的内在逻辑确定。It can be understood that the above-mentioned method embodiments mentioned in the present disclosure can be combined with each other to form a combined embodiment without violating the principle and logic. Those skilled in the art can understand that, in the above method of the specific embodiment, the specific execution order of each step should be determined by its function and possible internal logic.

此外，本公开还提供了图像分割装置、电子设备、计算机可读存储介质、程序，上述均可用来实现本公开提供的任一种图像分割方法，相应技术方案和描述和参见方法部分的相应记载，不再赘述。In addition, the present disclosure also provides image segmentation devices, electronic devices, computer-readable storage media, and programs, all of which can be used to implement any image segmentation method provided by the present disclosure. For the corresponding technical solutions and descriptions, refer to the corresponding records in the Methods section. ,No longer.

图6为本公开实施例的一种图像分割装置的框图。如图6所示，图像分割装置60包括：FIG. 6 is a block diagram of an image segmentation apparatus according to an embodiment of the disclosure. As shown in FIG. 6, the image segmentation device 60 includes:

特征提取模块61，配置为对待分割图像进行特征提取，得到待分割图像的第一特征，待分割图像中包括N个像素类别，N是大于1的整数；The feature extraction module 61 is configured to perform feature extraction on the to-be-segmented image to obtain the first feature of the to-be-segmented image, where the to-be-segmented image includes N pixel categories, where N is an integer greater than 1;

特征融合模块62，配置为将第一特征与M个第二特征进行融合，得到M个第一目标特征，M个第二特征和M个第一目标特征均与M个像素类别一一对应，M个第二特征是基于第一样本数据集确定得到的，第一样本数据集中包括M个像素类别中各像素类别对应的至少一个样本图像和各样本图像对应的标注信息，M大于或等于N，N个像素类别是M个像素类别的子集；The feature fusion module 62 is configured to fuse the first feature with the M second features to obtain M first target features, and the M second features and the M first target features are in one-to-one correspondence with the M pixel categories, The M second features are determined based on the first sample data set, and the first sample data set includes at least one sample image corresponding to each pixel category in the M pixel categories and label information corresponding to each sample image, where M is greater than or equal to N, where N pixel categories are subsets of M pixel categories;

图像分割模块63，配置为根据M个第一目标特征，对待分割图像进行图像分割，得到待分割图像的目标分割结果。The image segmentation module 63 is configured to perform image segmentation on the image to be segmented according to the M first target features to obtain a target segmentation result of the image to be segmented.

在一种可能的实现方式中，特征融合模块62，包括：In a possible implementation, the feature fusion module 62 includes:

特征乘法子模块，配置为针对M个第二特征中的第i个第二特征，对第一特征与第i个第二特征执行特征乘法，得到第i个第三特征，1≤i≤M；The feature multiplication sub-module is configured to perform feature multiplication on the first feature and the ith second feature for the ith second feature among the M second features to obtain the ith third feature, 1≤i≤M ;

特征减法子模块，配置为对第一特征与第i个第二特征执行特征减法，得到第i个第四特征；a feature subtraction submodule, configured to perform feature subtraction on the first feature and the i-th second feature to obtain the i-th fourth feature;

特征连接子模块，配置为对第一特征、第i个第三特征以及第i个第四特征进行特征连接，得到M个第一目标特征中的第i个第一目标特征；A feature connection submodule, configured to perform feature connection on the first feature, the i-th third feature and the i-th fourth feature to obtain the i-th first target feature in the M first target features;

第i个第二特征、第i个第三特征、第i个第四特征以及第i个第一目标特征均为与M个像素类别中的第i个像素类别对应的特征。The ith second feature, the ith third feature, the ith fourth feature, and the ith first target feature are all features corresponding to the ith pixel category in the M pixel categories.

在一种可能的实现方式中，图像分割模块63，包括：In a possible implementation, the image segmentation module 63 includes:

逐类别预测子模块，配置为根据M个第一目标特征，对待分割图像进行逐类别预测，确定待分割图像对应的M个分割子结果，M个分割子结果与M个像素类别一一对应；The category-by-category prediction submodule is configured to perform category-by-category prediction on the image to be segmented according to the M first target features, and determine M segmentation sub-results corresponding to the to-be-segmented image, and the M segmentation sub-results correspond one-to-one with the M pixel categories;

确定子模块，配置为根据M个分割子结果，确定目标分割结果。The determining sub-module is configured to determine the target segmentation result according to the M segmentation sub-results.

在一种可能的实现方式中，逐类别预测子模块，具体配置为：In a possible implementation, the sub-module is predicted by category, and the specific configuration is:

将M个第一目标特征输入余弦分类器，基于余弦分类器和M个第一目标特征，对待分割图像进行逐类别预测，确定M个分割子结果。Input the M first target features into the cosine classifier, and based on the cosine classifier and the M first target features, perform category-by-category prediction on the image to be segmented, and determine M segmentation sub-results.

针对M个第一目标特征中的第i个第一目标特征，根据第i个第一目标特征，确定待分割图像对应的M个分割子结果中的第i个分割子结果，第i个分割子结果中包括待分割图像中像素类别是M个像素类别中的第i个像素类别的像素点。For the i-th first target feature in the M first target features, according to the i-th first target feature, determine the i-th segmentation sub-result among the M segmentation sub-results corresponding to the image to be segmented, and determine the i-th segmentation sub-result. The sub-result includes pixels whose pixel class in the image to be segmented is the i-th pixel class among the M pixel classes.

在一种可能的实现方式中，图像分割装置60执行的图像分割方法通过图像分割神经网络实现。In a possible implementation manner, the image segmentation method performed by the image segmentation device 60 is implemented by an image segmentation neural network.

在一种可能的实现方式中，图像分割神经网络的训练样本包括第一待分割样本图像、第一待分割样本图像的分割标注信息，以及第一样本数据集，第一待分割样本图像中包括M个像素类别中的至少两个像素类别；In a possible implementation manner, the training samples of the image segmentation neural network include a first sample image to be segmented, segmentation annotation information of the first sample image to be segmented, and a first sample data set, in which the first sample image to be segmented including at least two pixel classes out of the M pixel classes;

特征提取模块61，还配置为通过图像分割神经网络对第一待分割样本图像进行特征提取，得到第一待分割样本图像的第五特征，以及通过图像分割神经网络对M个像素类别中各像素类别对应的目标样本图像进行特征提取，得到M个第六特征，M个第六特征与M个像素类别一一对应，各像素类别对应的目标样本图像为各像素类别对应的至少一个样本图像中的任意一个；The feature extraction module 61 is further configured to perform feature extraction on the first sample image to be segmented through an image segmentation neural network to obtain a fifth feature of the first sample image to be segmented, and to perform feature extraction on each pixel in the M pixel categories through an image segmentation neural network. The target sample images corresponding to the categories are subjected to feature extraction to obtain M sixth features, the M sixth features are in one-to-one correspondence with the M pixel categories, and the target sample images corresponding to each pixel category are at least one sample image corresponding to each pixel category. any one of ;

特征融合模块62，还配置为根据M个第六特征和M个像素类别中各像素类别对应的目标样本图像的标注信息，确定M个第七特征，以及将第五特征和M个第七特征进行融合，得到M个第二目标特征，M个第七特征和M个第二目标特征均与M个像素类别一一对应；The feature fusion module 62 is further configured to determine the M seventh features according to the M sixth features and the annotation information of the target sample image corresponding to each pixel category in the M pixel categories, and combine the fifth feature and the M seventh features. Perform fusion to obtain M second target features, and M seventh features and M second target features are in one-to-one correspondence with M pixel categories;

图像分割模块63，还配置为根据M个第二目标特征，对第一待分割样本图像进行图像分割，得到第一待分割样本图像的分割结果；The image segmentation module 63 is further configured to perform image segmentation on the first sample image to be divided according to the M second target features, to obtain a segmentation result of the first sample image to be divided;

图像分割装置60还包括：The image segmentation device 60 also includes:

分割损失确定模块，配置为根据第一待分割样本图像的分割结果以及分割标注信息，确定分割损失；a segmentation loss determination module, configured to determine the segmentation loss according to the segmentation result of the first sample image to be segmented and segmentation annotation information;

训练模块，配置为根据分割损失，对图像分割神经网络进行训练，得到训练后的图像分割神经网络。The training module is configured to train the image segmentation neural network according to the segmentation loss, and obtain the trained image segmentation neural network.

在一种可能的实现方式中，M个像素类别中各像素类别对应的目标样本图像的标注信息为掩膜；In a possible implementation manner, the labeling information of the target sample image corresponding to each pixel category in the M pixel categories is a mask;

特征融合模块62，包括：Feature fusion module 62, including:

掩膜平均池化子模块，配置为针对M个第六特征中的第i个第六特征，根据第i个第六特征以及M个像素类别中的第i个像素类别对应的目标样本图像的掩膜，执行掩膜平均池化操作，得到M个第七特征中的第i个第七特征，第i个第六特征和第i个第七特征均为与M个像素类别中的第i个像素类别对应的特征。The mask average pooling sub-module is configured for the ith sixth feature in the M sixth features, according to the ith sixth feature and the target sample image corresponding to the ith pixel category in the M pixel categories. mask, perform the mask average pooling operation, and obtain the i-th seventh feature in the M seventh features, the i-th sixth feature and the i-th seventh feature are the same as the i-th feature in the M pixel categories features corresponding to each pixel category.

在一种可能的实现方式中，图像分割装置60，还包括In a possible implementation manner, the image segmentation device 60 further includes

预训练模块，配置为在根据第一待分割样本图像、第一待分割样本图像的分割标注信息，以及第一样本数据集对图像分割神经网络进行训练之前，根据第二待分割样本图像、第二待分割样本图像的分割标注信息，以及第二样本数据集，对图像分割神经网络进行预训练，第二样本数据集中包括P个像素类别中各像素类别对应的多个样本图像和各样本图像的标注信息，M个像素类别是P个像素类别以外的新像素类别，第二待分割样本图像中包括所述P个像素类别中的至少两个像素类别。The pre-training module is configured to, before training the image segmentation neural network according to the first sample image to be divided, the segmentation annotation information of the first sample image to be divided, and the first sample data set, according to the second sample image to be divided, The segmentation annotation information of the second sample image to be segmented, and the second sample data set for pre-training the image segmentation neural network, and the second sample data set includes multiple sample images corresponding to each pixel category in the P pixel categories and each sample The labeling information of the image, the M pixel categories are new pixel categories other than the P pixel categories, and the second sample image to be segmented includes at least two pixel categories in the P pixel categories.

在一种可能的实现方式中，图像分割装置60，还包括：In a possible implementation manner, the image segmentation apparatus 60 further includes:

确定模块，配置为根据第一样本数据集和训练后的图像分割神经网络，确定M个第二特征。The determining module is configured to segment the neural network according to the first sample data set and the trained image to determine M second features.

在一种可能的实现方式中，M个像素类别中各像素类别对应的样本图像的标注信息为掩膜；In a possible implementation manner, the labeling information of the sample image corresponding to each pixel category in the M pixel categories is a mask;

确定模块，具体配置为：Determine the module, the specific configuration is:

通过训练后的图像分割神经网络对M个像素类别中各像素类别对应的样本图像进行特征提取，得到M个第八特征；Perform feature extraction on the sample images corresponding to each pixel category in the M pixel categories through the trained image segmentation neural network to obtain M eighth features;

针对M个第八特征中的第i个第八特征，根据第i个第八特征以及M个像素类别中第i 个像素类别对应的样本图像的掩膜，执行掩膜平均池化操作，得到M个第二特征中的第i个第二特征；For the i-th eighth feature in the M eighth features, according to the i-th eighth feature and the mask of the sample image corresponding to the i-th pixel category in the M pixel categories, the mask average pooling operation is performed to obtain the i-th second feature in the M second features;

第i个第八特征和第i个第二特征均为M个像素类别中的第i个像素类别对应的特征。The ith eighth feature and the ith second feature are both features corresponding to the ith pixel category in the M pixel categories.

在一些实施例中，本公开实施例提供的装置具有的功能或包含的模块可以用于执行上文方法实施例描述的方法，其具体实现可以参照上文方法实施例的描述，为了简洁，这里不再赘述。In some embodiments, the functions or modules included in the apparatuses provided in the embodiments of the present disclosure may be used to execute the methods described in the above method embodiments. For specific implementation, reference may be made to the descriptions of the above method embodiments. For brevity, here No longer.

本公开实施例还提出一种计算机可读存储介质，其上存储有计算机程序指令，所述计算机程序指令被处理器执行时实现上述方法。计算机可读存储介质可以是非易失性计算机可读存储介质。Embodiments of the present disclosure further provide a computer-readable storage medium, on which computer program instructions are stored, and when the computer program instructions are executed by a processor, the foregoing method is implemented. The computer-readable storage medium may be a non-volatile computer-readable storage medium.

本公开实施例还提出一种电子设备，包括：处理器；配置为存储处理器可执行指令的存储器；其中，所述处理器被配置为调用所述存储器存储的指令，以执行上述方法。An embodiment of the present disclosure further provides an electronic device, comprising: a processor; a memory configured to store instructions executable by the processor; wherein the processor is configured to invoke the instructions stored in the memory to execute the above method.

本公开实施例还提供了一种计算机程序产品，包括计算机可读代码，当计算机可读代码在设备上运行时，设备中的处理器执行用于实现如上任一实施例提供的图像分割方法的指令。Embodiments of the present disclosure also provide a computer program product, including computer-readable codes. When the computer-readable codes are run on a device, a processor in the device executes a method for implementing the image segmentation method provided by any of the above embodiments. instruction.

本公开实施例还提供了另一种计算机程序产品，用于存储计算机可读指令，指令被执行时使得计算机执行上述任一实施例提供的图像分割方法的操作。Embodiments of the present disclosure further provide another computer program product for storing computer-readable instructions, which, when executed, cause the computer to perform the operations of the image segmentation method provided by any of the foregoing embodiments.

电子设备可以被提供为终端、服务器或其它形态的设备。The electronic device may be provided as a terminal, server or other form of device.

图7为本公开实施例提供的一种电子设备800的结构框图。例如，电子设备800可以是移动电话、计算机、数字广播终端、消息收发设备、游戏控制台、平板设备、医疗设备、健身设备、个人数字助理等终端。FIG. 7 is a structural block diagram of an electronic device 800 according to an embodiment of the present disclosure. For example, electronic device 800 may be a mobile phone, computer, digital broadcast terminal, messaging device, game console, tablet device, medical device, fitness device, personal digital assistant, etc. terminal.

参照图7，电子设备800可以包括以下一个或多个组件：第一处理组件802，第一存储器804，第一电源组件806，多媒体组件808，音频组件810，第一输入/输出(Input Output，I/O)接口812，传感器组件814，以及通信组件816。7, an electronic device 800 may include one or more of the following components: a first processing component 802, a first memory 804, a first power supply component 806, a multimedia component 808, an audio component 810, a first input/output (Input Output, I/O) interface 812 , sensor component 814 , and communication component 816 .

第一处理组件802通常控制电子设备800的整体操作，诸如与显示，电话呼叫，数据通信，相机操作和记录操作相关联的操作。第一处理组件802可以包括一个或多个处理器820来执行指令，以完成上述的方法的全部或部分步骤。此外，第一处理组件802可以包括一个或多个模块，便于第一处理组件802和其他组件之间的交互。例如，第一处理组件802可以包括多媒体模块，以方便多媒体组件808和第一处理组件802之间的交互。The first processing component 802 generally controls the overall operation of the electronic device 800, such as operations associated with display, phone calls, data communications, camera operations, and recording operations. The first processing component 802 may include one or more processors 820 to execute instructions to perform all or part of the steps of the methods described above. Additionally, the first processing component 802 may include one or more modules to facilitate interaction between the first processing component 802 and other components. For example, the first processing component 802 may include a multimedia module to facilitate interaction between the multimedia component 808 and the first processing component 802.

第一存储器804被配置为存储各种类型的数据以支持在电子设备800的操作。这些数据的示例包括用于在电子设备800上操作的任何应用程序或方法的指令，联系人数据，电话簿数据，消息，图片，视频等。第一存储器804可以由任何类型的易失性或非易失性存储设备或者它们的组合实现，如静态随机存取存储器(Static Random-Access Memory，SRAM)，电可擦除可编程只读存储器(Electrically Erasable Programmable Read Only Memory，EEPROM)，可擦除可编程只读存储器(Electrical Programmable Read Only Memory，EPROM)，可编程只读存储器(Programmable Read-Only Memory，PROM)，只读存储器(Read-Only Memory，ROM)，磁存储器，快闪存储器，磁盘或光盘。The first memory 804 is configured to store various types of data to support operations at the electronic device 800 . Examples of such data include instructions for any application or method operating on electronic device 800, contact data, phonebook data, messages, pictures, videos, and the like. The first memory 804 can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as Static Random-Access Memory (SRAM), Electrically Erasable Programmable Read-Only Memory (Electrically Erasable Programmable Read Only Memory, EEPROM), Erasable Programmable Read Only Memory (Electrical Programmable Read Only Memory, EPROM), Programmable Read Only Memory (Programmable Read-Only Memory, PROM), Read Only Memory (Read- Only Memory, ROM), magnetic memory, flash memory, magnetic disk or optical disk.

第一电源组件806为电子设备800的各种组件提供电力。第一电源组件806可以包括电源管理系统，一个或多个电源，及其他与为电子设备800生成、管理和分配电力相关联的组件。The first power supply component 806 provides power to various components of the electronic device 800 . The first power supply component 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power to the electronic device 800 .

多媒体组件808包括在所述电子设备800和用户之间的提供一个输出接口的屏幕。在一些实施例中，屏幕可以包括液晶显示器(Liquid Crystal Display，LCD)和触摸面板(Touch Pad，TP)。如果屏幕包括触摸面板，屏幕可以被实现为触摸屏，以接收来自用户的输入信号。触摸面板包括一个或多个触摸传感器以感测触摸、滑动和触摸面板上的手势。所述触摸传感器可以不仅感测触摸或滑动动作的边界，而且还检测与所述触摸或滑动操作相关的持续时间和压力。在一些实施例中，多媒体组件808包括以下至少之一：一个前置摄像头、一个后置摄像头。当电子设备800处于操作模式，如拍摄模式或视频模式时，前置摄像头和后置摄像头中的至少之一可以接收外部的多媒体数据。每个前置摄像头和后置摄像头可以是一个固定的光学透镜系统或具有焦距和光学变焦能力。Multimedia component 808 includes a screen that provides an output interface between the electronic device 800 and the user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a touch panel (Touch Pad, TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touch, swipe, and gestures on the touch panel. The touch sensor may not only sense the boundaries of a touch or swipe action, but also detect the duration and pressure associated with the touch or swipe action. In some embodiments, the multimedia component 808 includes at least one of the following: a front-facing camera, a rear-facing camera. When the electronic device 800 is in an operation mode, such as a shooting mode or a video mode, at least one of the front camera and the rear camera may receive external multimedia data. Each of the front and rear cameras can be a fixed optical lens system or have focal length and optical zoom capability.

音频组件810被配置为执行以下至少一项操作：输出音频信号、输入音频信号。例如，音频组件810包括一个麦克风(MIC)，当电子设备800处于操作模式，如呼叫模式、记录模式和语音识别模式时，麦克风被配置为接收外部音频信号。所接收的音频信号可以被进一步存储在第一存储器804或经由通信组件816发送。在一些实施例中，音频组件810还包括一个扬声器，用于输出音频信号。The audio component 810 is configured to perform at least one of: outputting an audio signal, inputting an audio signal. For example, audio component 810 includes a microphone (MIC) that is configured to receive external audio signals when electronic device 800 is in operating modes, such as calling mode, recording mode, and voice recognition mode. The received audio signal may be further stored in the first memory 804 or transmitted via the communication component 816 . In some embodiments, audio component 810 also includes a speaker for outputting audio signals.

第一输入/输出接口812为第一处理组件802和外围接口模块之间提供接口，上述外围接口模块可以是键盘、点击轮、按钮等。这些按钮可包括但不限于：主页按钮、音量按钮、启动按钮和锁定按钮。The first input/output interface 812 provides an interface between the first processing component 802 and a peripheral interface module, and the above-mentioned peripheral interface module may be a keyboard, a click wheel, a button, or the like. These buttons may include, but are not limited to: home button, volume buttons, start button, and lock button.

传感器组件814包括一个或多个传感器，用于为电子设备800提供各个方面的状态评估。例如，传感器组件814可以检测到电子设备800的打开/关闭状态，组件的相对定位，例如所述组件为电子设备800的显示器和小键盘，传感器组件814还可以检测电子设备800或电子设备800一个组件的位置改变，用户与电子设备800接触的存在或不存在，电子设备800方位或加速/减速和电子设备800的温度变化。传感器组件814可以包括接近传感器，被配置用来在没有任何的物理接触时检测附近物体的存在。传感器组件814还可以包括光传感器，如互补金属氧化物半导体(Complementary Metal Oxide Semiconductor，CMOS)或电荷耦合器件(Charge Coupled Device，CCD)图像传感器，用于在成像应用中使用。在一些实施例中，该传感器组件814还可以包括加速度传感器，陀螺仪传感器，磁传感器，压力传感器或温度传感器。Sensor assembly 814 includes one or more sensors for providing status assessment of various aspects of electronic device 800 . For example, the sensor assembly 814 can detect the on/off state of the electronic device 800, the relative positioning of the components, such as the display and the keypad of the electronic device 800, the sensor assembly 814 can also detect the electronic device 800 or one of the electronic device 800 Changes in the position of components, presence or absence of user contact with the electronic device 800 , orientation or acceleration/deceleration of the electronic device 800 and changes in the temperature of the electronic device 800 . Sensor assembly 814 may include a proximity sensor configured to detect the presence of nearby objects in the absence of any physical contact. Sensor assembly 814 may also include a light sensor, such as a Complementary Metal Oxide Semiconductor (CMOS) or Charge Coupled Device (CCD) image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

通信组件816被配置为便于电子设备800和其他设备之间有线或无线方式的通信。电子设备800可以接入基于通信标准的无线网络，如无线网络(WiFi)，第二代移动通信技术(2G)或第三代移动通信技术(3G)，或它们的组合。在一个示例性实施例中，通信组件816经由广播信道接收来自外部广播管理系统的广播信号或广播相关信息。在一个示例性实施例中，所述通信组件816还包括近场通信(Near Field Communication，NFC)模块，以促进短程通信。例如，在NFC模块可基于射频识别(Radio Frequency Identification，RFID)技术，红外数据协会(Infrared Data Association，IrDA)技术，超宽带(Ultra Wide Band，UWB)技术，蓝牙(Bluetooth，BT)技术和其他技术来实现。Communication component 816 is configured to facilitate wired or wireless communication between electronic device 800 and other devices. The electronic device 800 may access a wireless network based on a communication standard, such as wireless network (WiFi), second generation mobile communication technology (2G) or third generation mobile communication technology (3G), or a combination thereof. In one exemplary embodiment, the communication component 816 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 also includes a Near Field Communication (NFC) module to facilitate short-range communication. For example, the NFC module may be based on Radio Frequency Identification (RFID) technology, Infrared Data Association (IrDA) technology, Ultra Wide Band (UWB) technology, Bluetooth (Bluetooth, BT) technology and other technology to achieve.

在示例性实施例中，电子设备800可以被一个或多个应用专用集成电路(Application Specific Integrated Circuit，ASIC)、数字信号处理器(Digital Signal Processor，DSP)、数字信号处理设备(Digital Signal Process，DSPD)、可编程逻辑器件(Programmable Logic Device，PLD)、现场可编程门阵列(Field Programmable Gate Array，FPGA)、控制器、微控制器、微处理器或其他电子元件实现，用于执行上述任意一种方法。In an exemplary embodiment, the electronic device 800 may be implemented by one or more Application Specific Integrated Circuit (ASIC), Digital Signal Processor (DSP), Digital Signal Processing (Digital Signal Process, DSPD), Programmable Logic Device (PLD), Field Programmable Gate Array (FPGA), controller, microcontroller, microprocessor or other electronic component implementation for performing any of the above a way.

在示例性实施例中，还提供了一种非易失性计算机可读存储介质，例如包括计算机程序指令的第一存储器804，上述计算机程序指令可由电子设备800的处理器820执行以完成上述任意一种方法。In an exemplary embodiment, a non-volatile computer-readable storage medium is also provided, such as a first memory 804 including computer program instructions that can be executed by the processor 820 of the electronic device 800 to accomplish any of the above a way.

图8为本公开实施例的另一个电子设备的结构示意图，如图8所示，电子设备1900可以被提供为一服务器。参照图8，电子设备1900包括第二处理组件1922，其进一步包括一个或多个处理器，以及由第二存储器1932所代表的存储器资源，用于存储可由第二处理组件1922的执行的指令，例如应用程序。第二存储器1932中存储的应用程序可以包括一个或一个以上的每一个对应于一组指令的模块。此外，第二处理组件1922被配置为执行指令，以执行上述任意一种方法。FIG. 8 is a schematic structural diagram of another electronic device according to an embodiment of the disclosure. As shown in FIG. 8 , the electronic device 1900 may be provided as a server. 8, the electronic device 1900 includes a second processing component 1922, which further includes one or more processors, and a memory resource represented by a second memory 1932 for storing instructions executable by the second processing component 1922, such as applications. The application program stored in the second memory 1932 may include one or more modules, each corresponding to a set of instructions. Additionally, the second processing component 1922 is configured to execute instructions to perform any of the methods described above.

电子设备1900还可以包括一个第二电源组件1926被配置为执行电子设备1900的电源管理，一个有线或无线网络接口1950被配置为将电子设备1900连接到网络，和第二输入/ 输出接口1958。电子设备1900可以操作基于存储在第二存储器1932的操作系统，例如微软服务器操作系统(Windows ServerTM)，苹果公司推出的基于图形用户界面操作系统(Mac OS XTM)，多用户多进程的计算机操作系统(UnixTM)，自由和开放原代码的类Unix操作系统(LinuxTM)，开放原代码的类Unix操作系统(FreeBSDTM)或类似。The electronic device 1900 may also include a second power supply assembly 1926 configured to perform power management of the electronic device 1900, a wired or wireless network interface 1950 configured to connect the electronic device 1900 to a network, and a second input/output interface 1958. The electronic device 1900 can operate based on an operating system stored in the second memory 1932, such as a Microsoft server operating system (Windows ServerTM), a graphical user interface based operating system (Mac OS XTM) introduced by Apple, a multi-user and multi-process computer operating system. (UnixTM), Free and Open Source Unix-like Operating System (LinuxTM), Open Source Unix-like Operating System (FreeBSDTM) or the like.

在示例性实施例中，还提供了一种非易失性计算机可读存储介质，例如包括计算机程序指令的第二存储器1932，上述计算机程序指令可由电子设备1900的第二处理组件1922执行以完成上述任意一种方法。In an exemplary embodiment, a non-volatile computer-readable storage medium is also provided, such as a second memory 1932 comprising computer program instructions executable by the second processing component 1922 of the electronic device 1900 to complete any of the above methods.

本公开可以是系统、方法或计算机程序产品。计算机程序产品可以包括计算机可读存储介质，其上载有用于使处理器实现本公开的各个方面的计算机可读程序指令。The present disclosure may be a system, method or computer program product. The computer program product may include a computer-readable storage medium having computer-readable program instructions loaded thereon for causing a processor to implement various aspects of the present disclosure.

计算机可读存储介质可以是可以保持和存储由指令执行设备使用的指令的有形设备。计算机可读存储介质例如可以是(但不限于)电存储设备、磁存储设备、光存储设备、电磁存储设备、半导体存储设备或者上述的任意合适的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括：便携式计算机盘、硬盘、随机存取存储器(RAM)、ROM、可擦式可编程只读存储器(EPROM或闪存)、SRAM、便携式压缩盘只读存储器(CD-ROM)、数字多功能盘(DVD)、记忆棒、软盘、机械编码设备、例如其上存储有指令的打孔卡或凹槽内凸起结构、以及上述的任意合适的组合。这里所使用的计算机可读存储介质不被解释为瞬时信号本身，诸如无线电波或者其他自由传播的电磁波、通过波导或其他传输媒介传播的电磁波(例如，通过光纤电缆的光脉冲)、或者通过电线传输的电信号。A computer-readable storage medium may be a tangible device that can hold and store instructions for use by the instruction execution device. The computer-readable storage medium may be, for example, but not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (non-exhaustive list) of computer readable storage media include: portable computer disks, hard disks, random access memory (RAM), ROM, erasable programmable read only memory (EPROM or flash memory), SRAM , portable compact disc read only memory (CD-ROM), digital versatile disc (DVD), memory sticks, floppy disks, mechanically encoded devices, such as punched cards or raised structures in grooves on which instructions are stored, and the above any suitable combination. Computer-readable storage media, as used herein, are not to be construed as transient signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (eg, light pulses through fiber optic cables), or through electrical wires transmitted electrical signals.

这里所描述的计算机可读程序指令可以从计算机可读存储介质下载到各个计算/处理设备，或者通过网络、例如因特网、局域网、广域网或无线网下载到外部计算机或外部存储设备。网络可以包括铜传输电缆、光纤传输、无线传输、路由器、防火墙、交换机、网关计算机或边缘服务器。每个计算/处理设备中的网络适配卡或者网络接口从网络接收计算机可读程序指令，并转发该计算机可读程序指令，以供存储在各个计算/处理设备中的计算机可读存储介质中。The computer readable program instructions described herein may be downloaded to various computing/processing devices from a computer readable storage medium, or to an external computer or external storage device over a network such as the Internet, a local area network, a wide area network, or a wireless network. The network can include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, or edge servers. A network adapter card or network interface in each computing/processing device receives computer-readable program instructions from a network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in each computing/processing device .

用于执行本公开操作的计算机程序指令可以是汇编指令、指令集架构(Instruction Set Architecture，ISA)指令、机器指令、机器相关指令、微代码、固件指令、状态设置数据、或者以一种或多种编程语言的任意组合编写的源代码或目标代码，所述编程语言包括面向对象的编程语言—诸如Smalltalk、C++等，以及常规的过程式编程语言—诸如“C”语言或类似的编程语言。计算机可读程序指令可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中，远程计算机可以通过任意种类的网络—包括局域网(Local Area Network，LAN)或广域网(Wide Area Network，WAN)—连接到用户计算机，或者，可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。在一些实施例中，通过利用计算机可读程序指令的状态信息来个性化定制电子电路，例如可编程逻辑电路、FPGA或可编程逻辑阵列(Programmable Logic Arrays，PLA)，该电子电路可以执行计算机可读程序指令，从而实现本公开的各个方面。The computer program instructions for carrying out the operations of the present disclosure may be assembly instructions, Instruction Set Architecture (ISA) instructions, machine instructions, machine-dependent instructions, microcode, firmware instructions, state setting data, or in one or more source or object code written in any combination of programming languages, including object-oriented programming languages such as Smalltalk, C++, etc., and conventional procedural programming languages such as the "C" language or similar programming languages. The computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server implement. In the case of a remote computer, the remote computer can be connected to the user's computer through any kind of network—including a Local Area Network (LAN) or a Wide Area Network (WAN)—or, can be connected to an external computer (e.g. use an internet service provider to connect via the internet). In some embodiments, electronic circuits, such as programmable logic circuits, FPGAs, or Programmable Logic Arrays (PLAs), that can execute computer-readable Program instructions are read to implement various aspects of the present disclosure.

这里参照根据本公开实施例的方法、装置(系统)和计算机程序产品的流程图和/或框图描述了本公开的各个方面。应当理解，流程图和/或框图的每个方框以及流程图和/或框图中各方框的组合，都可以由计算机可读程序指令实现。Aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer readable program instructions.

这些计算机可读程序指令可以提供给通用计算机、专用计算机或其它可编程数据处理装置的处理器，从而生产出一种机器，使得这些指令在通过计算机或其它可编程数据处理装置的处理器执行时，产生了实现流程图和/或框图中的一个或多个方框中规定的功能/动作的装置。也可以把这些计算机可读程序指令存储在计算机可读存储介质中，这些指令使得计算机、可编程数据处理装置和/或其他设备以特定方式工作，从而，存储有指令的计算机可读介质则包括一个制造品，其包括实现流程图和/或框图中的一个或多个方框中规定的功能/动作的各个方面的指令。These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer or other programmable data processing apparatus to produce a machine that causes the instructions when executed by the processor of the computer or other programmable data processing apparatus , resulting in means for implementing the functions/acts specified in one or more blocks of the flowchart and/or block diagrams. These computer readable program instructions can also be stored in a computer readable storage medium, these instructions cause a computer, programmable data processing apparatus and/or other equipment to operate in a specific manner, so that the computer readable medium on which the instructions are stored includes An article of manufacture comprising instructions for implementing various aspects of the functions/acts specified in one or more blocks of the flowchart and/or block diagrams.

也可以把计算机可读程序指令加载到计算机、其它可编程数据处理装置、或其它设备上，使得在计算机、其它可编程数据处理装置或其它设备上执行一系列操作步骤，以产生计算机实现的过程，从而使得在计算机、其它可编程数据处理装置、或其它设备上执行的指令实现流程图和/或框图中的一个或多个方框中规定的功能/动作。Computer readable program instructions can also be loaded onto a computer, other programmable data processing apparatus, or other equipment to cause a series of operational steps to be performed on the computer, other programmable data processing apparatus, or other equipment to produce a computer-implemented process , thereby causing instructions executing on a computer, other programmable data processing apparatus, or other device to implement the functions/acts specified in one or more blocks of the flowcharts and/or block diagrams.

附图中的流程图和框图显示了根据本公开的多个实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上，流程图或框图中的每个方框可以代表一个模块、程序段或指令的一部分，所述模块、程序段或指令的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。在有些作为替换的实现中，方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如，两个连续的方框实际上可以基本并行地执行，它们有时也可以按相反的顺序执行，这依所涉及的功能而定。也要注意的是，框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合，可以用执行规定的功能或动作的专用的基于硬件的系统来实现，或者可以用专用硬件与计算机指令的组合来实现。The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more functions for implementing the specified logical function(s) executable instructions. In some alternative implementations, the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It is also noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented in dedicated hardware-based systems that perform the specified functions or actions , or can be implemented in a combination of dedicated hardware and computer instructions.

该计算机程序产品可以具体通过硬件、软件或其结合的方式实现。在一个可选实施例中，所述计算机程序产品具体体现为计算机存储介质，在另一个可选实施例中，计算机程序产品具体体现为软件产品，例如软件开发包(Software Development Kit，SDK)等等。The computer program product can be specifically implemented by hardware, software or a combination thereof. In an optional embodiment, the computer program product is embodied as a computer storage medium, and in another optional embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), etc. Wait.

以上已经描述了本公开的各实施例，上述说明是示例性的，并非穷尽性的，并且也不限于所披露的各实施例。在不偏离所说明的各实施例的范围和精神的情况下，对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。本文中所用术语的选择，旨在最好地解释各实施例的原理、实际应用或对市场中的技术的改进，或者使本技术领域的其它普通技术人员能理解本文披露的各实施例。Various embodiments of the present disclosure have been described above, and the foregoing descriptions are exemplary, not exhaustive, and not limiting of the disclosed embodiments. Numerous modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein was chosen to best explain the principles of the various embodiments, the practical application or improvement over the technology in the marketplace, or to enable others of ordinary skill in the art to understand the various embodiments disclosed herein.

Industrial Applicability

本公开实施例提供了一种图像分割方法及装置、电子设备、计算机和存储介质和计算机程序，所述方法包括：对待分割图像进行特征提取，得到待分割图像的第一特征，待分割图像中包括N个像素类别，N是大于1的整数；将第一特征与M个第二特征进行融合，得到M个第一目标特征，M个第二特征和M个第一目标特征均与M个像素类别一一对应，M个第二特征是基于第一样本数据集确定得到的，第一样本数据集中包括M个像素类别中各像素类别对应的至少一个样本图像和各样本图像的标注信息，M大于或等于N，N个像素类别是M个像素类别的子集；根据M个第一目标特征，对待分割图像进行图像分割，得到待分割图像的目标分割结果。本公开实施例可实现对待分割图像中多个像素类别的快速分割。Embodiments of the present disclosure provide an image segmentation method and device, electronic equipment, computer, storage medium, and computer program. The method includes: performing feature extraction on an image to be segmented to obtain a first feature of the image to be segmented, and the image to be segmented in the image to be segmented. Including N pixel categories, N is an integer greater than 1; the first feature and M second features are fused to obtain M first target features, M second features and M first target features are combined with M The pixel categories are in one-to-one correspondence, and the M second features are determined based on the first sample data set. The first sample data set includes at least one sample image corresponding to each pixel category in the M pixel categories and the annotation of each sample image. information, M is greater than or equal to N, and the N pixel categories are subsets of the M pixel categories; according to the M first target features, image segmentation is performed on the image to be segmented, and the target segmentation result of the image to be segmented is obtained. The embodiments of the present disclosure can realize fast segmentation of multiple pixel categories in the image to be segmented.

Claims

An image segmentation method, applied in an electronic device, the method comprising:

Perform feature extraction on the to-be-segmented image to obtain the first feature of the to-be-segmented image, where the to-be-segmented image includes N pixel categories, where N is an integer greater than 1;

The first features and M second features are fused to obtain M first target features, and the M second features and the M first target features are in one-to-one correspondence with the M pixel categories, so The M second features are determined based on the first sample data set, and the first sample data set includes at least one sample image corresponding to each pixel category in the M pixel categories and the annotation information of each sample image. , M is greater than or equal to N, and the N pixel categories are subsets of the M pixel categories;

Perform image segmentation on the to-be-segmented image according to the M first target features to obtain a target segmentation result of the to-be-segmented image.

The method according to claim 1, wherein the first feature and M second features are fused to obtain M first target features, comprising:

For the i-th second feature in the M second features, feature multiplication is performed on the first feature and the i-th second feature to obtain the i-th third feature, 1≤i≤M;

performing feature subtraction on the first feature and the i-th second feature to obtain the i-th fourth feature;

Feature connection is performed on the first feature, the i-th third feature and the i-th fourth feature to obtain the i-th first target feature in the M first target features;

The i-th second feature, the i-th third feature, the i-th fourth feature, and the i-th first target feature are all related to the i-th in the M pixel categories Features corresponding to pixel classes.

The method according to claim 1 or 2, wherein, performing image segmentation on the to-be-segmented image according to the M first target features to obtain a target segmentation result of the to-be-segmented image, comprising:

According to the M first target features, perform category-by-category prediction on the to-be-segmented image, and determine M segmentation sub-results corresponding to the to-be-segmented image, where the M segmentation sub-results are the same as the M pixel categories one correspondence;

The target segmentation result is determined according to the M segmentation sub-results.

The method according to claim 3, wherein the performing category-by-category prediction on the to-be-segmented image according to the M first target features, and determining the M segmentation sub-results corresponding to the to-be-segmented image, comprises:

The M first target features are input into a cosine classifier, and based on the cosine classifier and the M first target features, category-by-category prediction is performed on the to-be-segmented image, and the M segmentation sub-results are determined.

For the i-th first target feature among the M first target features, determine the i-th one of the M segmentation sub-results corresponding to the to-be-segmented image according to the i-th first target feature Segmentation sub-results, the i-th segmentation sub-result includes pixels whose pixel types in the to-be-segmented image are the i-th pixel types among the M pixel types.

The method according to any one of claims 1 to 5, wherein the image segmentation method is implemented by an image segmentation neural network.

The method according to claim 6, wherein the training samples of the image segmentation neural network include a first sample image to be segmented, segmentation label information of the first sample image to be segmented, and the first sample data set , the first sample image to be segmented includes at least two pixel categories in the M pixel categories;

The method also includes:

Perform feature extraction on the first sample image to be segmented through the image segmentation neural network to obtain a fifth feature of the first sample image to be segmented, and perform feature extraction on the M pixel categories through the image segmentation neural network The target sample images corresponding to each pixel category are subjected to feature extraction to obtain M sixth features, the M sixth features are in one-to-one correspondence with the M pixel categories, and the target sample images corresponding to each pixel category are corresponding to each pixel category. Any one of at least one sample image of ;

According to the M sixth features and the labeling information of the target sample image corresponding to each pixel category in the M pixel categories, M seventh features are determined, and the fifth features and the M seventh features are combined Perform fusion to obtain M second target features, and the M seventh features and the M second target features are in one-to-one correspondence with the M pixel categories;

performing image segmentation on the first sample image to be segmented according to the M second target features, to obtain a segmentation result of the first sample image to be segmented;

determining the segmentation loss according to the segmentation result of the first sample image to be segmented and the segmentation annotation information;

According to the segmentation loss, the image segmentation neural network is trained to obtain a trained image segmentation neural network.

The method according to claim 7, wherein the labeling information of the target sample image corresponding to each pixel category in the M pixel categories is a mask;

The M seventh features are determined according to the M sixth features and the annotation information of the target sample image corresponding to each pixel category in the M pixel categories, including:

For the i-th sixth feature in the M sixth features, according to the i-th sixth feature and the mask of the target sample image corresponding to the i-th pixel category in the M pixel categories, execute Mask average pooling operation to obtain the i-th seventh feature in the M seventh features, the i-th sixth feature and the i-th seventh feature are both related to the M pixel categories The feature corresponding to the i-th pixel category in .

The method according to any one of claims 7 to 8, wherein, according to the first sample image to be divided, the segmentation annotation information of the first sample image to be divided, and the first sample data set Before training the image segmentation neural network, the method further includes:

The image segmentation neural network is pre-trained according to the second sample image to be segmented, the segmentation annotation information of the second sample image to be segmented, and a second sample data set, where the second sample data set includes P pixels Multiple sample images corresponding to each pixel category in the category and label information of each sample image, the M pixel categories are new pixel categories other than the P pixel categories, and the second sample image to be segmented includes the At least two pixel classes of the P pixel classes.

The method according to any one of claims 7 to 9, wherein the method further comprises:

The M second features are determined according to the first sample data set and the trained image segmentation neural network.

The method according to claim 10, wherein the labeling information of the sample image corresponding to each pixel category in the M pixel categories is a mask;

Determining the M second features according to the first sample data set and the trained image segmentation neural network, including:

Perform feature extraction on the sample images corresponding to each pixel category in the M pixel categories through the trained image segmentation neural network to obtain M eighth features;

For the i-th eighth feature in the M eighth features, perform mask average pooling according to the i-th eighth feature and the mask of the sample image corresponding to the i-th pixel category in the M pixel categories transformation operation to obtain the i-th second feature in the M second features;

The i-th eighth feature and the i-th second feature are both features corresponding to the i-th pixel category in the M pixel categories.

An image segmentation device, comprising:

A feature extraction module, configured to perform feature extraction on the image to be segmented, to obtain the first feature of the image to be segmented, and N pixel categories are included in the image to be segmented, and N is an integer greater than 1;

A feature fusion module, configured to fuse the first features and M second features to obtain M first target features, and the M second features and the M first target features are both associated with M pixels The categories are in one-to-one correspondence, and the M second features are determined based on the first sample data set, and the first sample data set includes at least one sample image corresponding to each pixel category in the M pixel categories and Labeling information corresponding to each sample image, M is greater than or equal to N, and the N pixel categories are subsets of the M pixel categories;

The image segmentation module is configured to perform image segmentation on the to-be-segmented image according to the M first target features to obtain a target segmentation result of the to-be-segmented image.

The apparatus according to claim 12, wherein the feature fusion module comprises:

The feature multiplication sub-module is configured to perform feature multiplication on the first feature and the ith second feature for the ith second feature among the M second features to obtain the ith third feature, 1≤i≤M ;

a feature subtraction submodule, configured to perform feature subtraction on the first feature and the i-th second feature to obtain the i-th fourth feature;

A feature connection submodule, configured to perform feature connection on the first feature, the i-th third feature and the i-th fourth feature to obtain the i-th first target feature in the M first target features;

The ith second feature, the ith third feature, the ith fourth feature, and the ith first target feature are all features corresponding to the ith pixel category in the M pixel categories.

The apparatus according to claim 12 or 13, wherein the image segmentation module 63 comprises:

The category-by-category prediction submodule is configured to perform category-by-category prediction on the image to be segmented according to the M first target features, and determine M segmentation sub-results corresponding to the to-be-segmented image, and the M segmentation sub-results correspond one-to-one with the M pixel categories;

The determining sub-module is configured to determine the target segmentation result according to the M segmentation sub-results.

The apparatus according to claim 14, wherein the category-by-category prediction submodule is specifically configured as:

Input the M first target features into the cosine classifier, and based on the cosine classifier and the M first target features, perform category-by-category prediction on the image to be segmented, and determine M segmentation sub-results.

For the i-th first target feature in the M first target features, according to the i-th first target feature, determine the i-th segmentation sub-result among the M segmentation sub-results corresponding to the image to be segmented, and the i-th segmentation sub-result is determined. The sub-result includes pixels whose pixel class in the image to be segmented is the i-th pixel class among the M pixel classes.

The apparatus according to any one of claims 12 to 16, wherein the image segmentation method performed by the apparatus is implemented by an image segmentation neural network.

The device according to claim 17, wherein the training samples of the image segmentation neural network include a first sample image to be segmented, segmentation label information of the first sample image to be segmented, and a first sample data set, the first sample image to be segmented. The segmentation sample image includes at least two pixel categories in the M pixel categories;

The feature extraction module is further configured to perform feature extraction on the first sample image to be segmented through an image segmentation neural network to obtain a fifth feature of the first sample image to be segmented, and to perform feature extraction on each of the M pixel categories through an image segmentation neural network. The target sample images corresponding to the pixel categories are subjected to feature extraction to obtain M sixth features, the M sixth features are in one-to-one correspondence with the M pixel categories, and the target sample images corresponding to each pixel category are at least one sample image corresponding to each pixel category. any one of;

The feature fusion module is further configured to determine the M seventh features according to the M sixth features and the annotation information of the target sample image corresponding to each pixel category in the M pixel categories, and combine the fifth feature with the M seventh features. The features are fused to obtain M second target features, and the M seventh features and the M second target features are in one-to-one correspondence with the M pixel categories;

The image segmentation module is further configured to perform image segmentation on the first sample image to be segmented according to the M second target features, to obtain a segmentation result of the first sample image to be segmented;

The device also includes:

a segmentation loss determination module, configured to determine the segmentation loss according to the segmentation result of the first sample image to be segmented and segmentation annotation information;

The training module is configured to train the image segmentation neural network according to the segmentation loss, and obtain the trained image segmentation neural network.

The device according to claim 18, wherein the labeling information of the target sample image corresponding to each pixel category in the M pixel categories is a mask;

The feature fusion module includes:

The mask average pooling submodule is configured for the ith sixth feature in the M sixth features, according to the ith sixth feature and the target sample image corresponding to the ith pixel category in the M pixel categories. mask, perform the mask average pooling operation, and obtain the i-th seventh feature among the M seventh features, the i-th sixth feature and the i-th seventh feature are the same as the i-th feature in the M pixel categories features corresponding to each pixel class.

The apparatus according to any one of claims 18 to 19, wherein the apparatus further comprises

The pre-training module is configured to, before training the image segmentation neural network according to the first sample image to be divided, the segmentation annotation information of the first sample image to be divided, and the first sample data set, according to the second sample image to be divided, The segmentation annotation information of the second sample image to be segmented, and the second sample data set, which pre-trains the image segmentation neural network, and the second sample data set includes multiple sample images corresponding to each pixel category in the P pixel categories and each sample Labeling information of the image, the M pixel categories are new pixel categories other than the P pixel categories, and the second sample image to be segmented includes at least two pixel categories in the P pixel categories.

The device according to any one of claims 18 to 20, wherein the device further comprises:

The determining module is configured to segment the neural network according to the first sample data set and the trained image to determine M second features.

The device according to claim 21, wherein the labeling information of the sample image corresponding to each pixel category in the M pixel categories is a mask;

The determining module is specifically configured as:

For the i-th eighth feature in the M eighth features, according to the i-th eighth feature and the mask of the sample image corresponding to the i-th pixel category in the M pixel categories, the mask average pooling operation is performed to obtain the i-th second feature in the M second features;

The ith eighth feature and the ith second feature are both features corresponding to the ith pixel category in the M pixel categories.

An electronic device comprising:

processor;

memory configured to store processor executable instructions;

wherein the processor is configured to invoke instructions stored in the memory to perform the method of any one of claims 1-11.

A computer-readable storage medium having computer program instructions stored thereon, the computer program instructions implementing the method of any one of claims 1 to 11 when executed by a processor.

A computer program comprising computer readable code, when the computer readable code is executed in an electronic device, a processor in the electronic device executes the method for implementing any one of claims 1 to 11.