CN117132776A

CN117132776A - A multi-organ image segmentation model construction method and segmentation method

Info

Publication number: CN117132776A
Application number: CN202311174265.9A
Authority: CN
Inventors: 洪波; 梁姬慧; 季红丽; 程国华
Original assignee: Hangzhou Jianpei Technology Co ltd
Current assignee: Hangzhou Jianpei Technology Co ltd
Priority date: 2023-09-12
Filing date: 2023-09-12
Publication date: 2023-11-28

Abstract

The application discloses a multi-organ image segmentation model construction method and a segmentation method, which can complete segmentation of various organs by prompting points or circumscribed rectangular frames only by one model aiming at a wide 3D medical image, and do not need a zero-sample segmentation scheme for retraining a segmentation task of a certain organ. The construction method comprises the following steps: acquiring a pre-training model of image segmentation; adding a control module to the pre-training model to obtain a first model; and freezing an original embedded layer of the pre-training model in the first model, inputting the fine-tuning training set into the first model, and training the first model to obtain a fine-tuned multi-organ image segmentation model. The segmentation method comprises the steps of inputting a medical image to be segmented into a constructed multi-organ image segmentation model; acquiring an operation prompt aiming at a point or a rectangular frame of the medical image to be segmented; and obtaining a segmentation result of the prompt area according to the operation prompt.

Description

Multi-organ image segmentation model construction method and segmentation method

Technical Field

The application relates to the technical field of medical image processing, in particular to a medical image segmentation technology.

Background

Medical image segmentation has made a tremendous contribution in the medical field as an image processing technique. Medical image segmentation is a technique that can extract regions of interest (e.g., tissue and organs, lesions, etc.), and can assist doctors in diagnosing diseases, customizing treatment regimens, etc.

Conventional image segmentation techniques are generally based on algorithms such as threshold segmentation, edge detection or region growing, but in medical image processing, the methods often cannot achieve ideal effects in the face of problems of medical images such as large noise, low contrast, complex structure and the like. Therefore, with the development of deep learning, medical image processing based on neural networks has become a mainstream scheme.

Deep learning models such as convolutional neural networks and the like are widely used for medical image segmentation tasks. These models enable automatic learning of image features by training a large amount of marker data and accurate pixel-level segmentation according to different tasks.

However, labeling of medical image segmentations is time consuming and expensive, and requires specialized imaging physician labeling, as having specialized medical knowledge is adequate for such tasks. Meanwhile, the new segmentation model is re-labeled and trained for images of different tasks and different modes, and the images cannot be reused.

There are two existing approaches to solving the segmentation problem. One is interactive segmentation, which can segment any class of objects, but requires a person to fine tune the mask through iterations. The second is automatic segmentation, which can segment specific objects defined in advance, but the training process requires a large number of manually labeled objects. In summary, neither of these two approaches provides a versatile, fully automated segmentation approach.

Disclosure of Invention

The application aims to overcome the problems in the prior art of medical image segmentation, and provides a multi-organ image segmentation model construction method and a segmentation method, which can complete segmentation of various organs by prompting points or circumscribed rectangular frames only by one model aiming at a wide 3D medical image, and do not need a zero-sample segmentation scheme for retraining a segmentation task of a certain organ.

The application provides a multi-organ image segmentation model construction method, which comprises the following steps:

acquiring a pre-training model of image segmentation, wherein the pre-training model is based on a transducer architecture and comprises an encoder layer encoder, a decoder layer decoder and an embedded layer ebedding output by the encoder layer encoder; the encoder layer encoder is used for mapping an image to be segmented and an input campt to a corresponding feature space and outputting the image to a corresponding embedded layer ebedding, and the campt comprises a point or an external rectangular frame; the encoder layer encoder comprises a Multi-head attention module Multi-head attention and a full connection module;

adding a control module into the pre-training model to obtain a first model, wherein the control module is connected with a Multi-head attention module Multi-head attention in an encoder layer encoder and used for processing the characteristics of the Multi-head attention mechanism module Multi-head attention and fusing the characteristics with a subsequent full-connection module;

collecting a public data set, preprocessing, and obtaining a fine tuning training set, wherein the public data set comprises medical three-dimensional images from multiple suppliers, multiple stages, multiple organs and multiple modes and corresponding segmentation labels;

and freezing the original emmbedding belonging to the pre-training model in the first model, inputting the fine-tuning training set into the first model, and training the first model to obtain a fine-tuned multi-organ image segmentation model.

The application also provides a multi-organ image segmentation method, which is realized based on the multi-organ image segmentation model constructed by the method, and comprises the following steps:

inputting a medical image to be segmented into the multi-organ image segmentation model constructed according to claims 1 to 8;

acquiring an operation prompt aiming at a point or a rectangular frame in a medical image to be segmented, and inputting the operation prompt as a prompt into a multi-organ image segmentation model;

and (3) reasoning the multi-organ image segmentation model, and outputting the segmentation result of the selected organ prompted by the operation.

The technical scheme of the application aims at solving the core problem that in the prior art, special model training and labeling are required for three-dimensional data segmentation tasks of different organs. The scheme is based on the existing pre-training model and sample set, fine adjustment is carried out on the pre-training model, and manual labeling is not needed.

In the fine tuning training, a control module connected with a Multi-head attention module is added on the basis of freezing an original enabling module, an original model can be protected, learned parameters are placed in the control module, and training speed is accelerated while new information is learned.

In practical application, the multi-organ image segmentation model constructed and trained by the scheme does not need to carry out independent special training and labeling on the image segmentation task of a certain organ. Moreover, under the operation prompt of a point or a rectangular frame, the hintable interactive segmentation can be carried out aiming at the hinted organ region, so that the application value of the image segmentation technology in the medical field is greatly improved.

Drawings

FIG. 1 is a flow chart of a method for constructing a multi-organ image segmentation model according to the present application.

Fig. 2 is a schematic diagram of a model constructed by a multi-organ image segmentation model construction method of the present application.

FIG. 3 is a schematic diagram of each transducer structure in the encoder layer encoder in a multi-organ image segmentation model construction method according to the present application.

Fig. 4 is a schematic diagram of a control module in a multi-organ image segmentation model construction method according to the present application.

Fig. 5 is a flowchart of a multi-organ image segmentation method according to the present application.

Fig. 6 is an exemplary diagram of a multi-organ image segmentation method according to the present application.

Description of the embodiments

The application is further described below with reference to the drawings and detailed description.

Example 1

The application relates to a multi-organ image segmentation model construction method, which is shown in fig. 1 as a flow chart of the method, and is shown in fig. 2 as a schematic diagram of a constructed multi-organ image segmentation model, and comprises the following steps of.

Step S101, an acquired pre-training model.

The pre-training model is based on a transducer architecture and comprises an encoder layer encoder, a decoder layer decoder and an embedded layer ebedding output by the encoder layer encoder; the encoder layer encoder is used for mapping the image to be segmented and the input campt to the corresponding feature space and outputting the image to the corresponding embedding layer embedding.

Specifically, the pre-training model adopts various image segmentation pre-training models based on a transducer architecture, such as a SAM segmentation model proposed by Meta. The pre-training model further comprises:

an Image encoder layer Image encoder for mapping an Image to be segmented into an Image feature space;

a prompt encoder layer prompt encoder for mapping input prompt to a characteristic space of the prompt, wherein the prompt comprises a point or an external rectangular frame;

and the encoder layer encoder is used for integrating two embedded layers empedding respectively output by the Image encoder layer Image encoder and the prompt encoder layer template encoder, and then decoding a final segmentation mask from the feature map.

Step S102, adding a control module to the encoder layer encoder of the pre-training model to obtain a first model. The control module is connected with the Multi-head attention module Multi-head attention in the encoder layer encoder and is used for processing the characteristics of the Multi-head attention mechanism module Multi-head attention and fusing the characteristics with the following full-connection module.

In the transducer architecture, the encoder layer encoder includes a Multi-head attention module Multi-head attention and a fully connected module. The encoder layer encoder in this embodiment includes an Image encoder layer Image encoder and a prompt encoder layer sample encoder, which are each formed by connecting n transducer structures in series, as shown in fig. 3, and are schematic diagrams of each transducer structure in the encoder layer encoder.

As shown in fig. 4, the control module adopts a bottleneck structure, and includes a first linear layer, a GeLU activation function layer, and a second linear layer, which are sequentially connected, and is configured to embed the features of the Multi-head attention mechanism module Multi-head attention into a smaller dimension by using the projection mapping of the first linear layer downward by 1/4, and then project back to the original dimension upward through the second linear layer after being processed by the GeLU activation function layer.

The method aims at the Multi-head Attention module to carry out parameter sharing, and then a control module is added to carry out fine adjustment of parameters. In the subsequent structure, a control module is also added to fuse the characteristics of the output of the control module after updating the parameters with the characteristics of the frozen linear module. The method is used for adding a control module to a specific module to finely adjust the SAM architecture under the condition of incompletely changing original parameters so as to improve the capability of the SAM architecture on medical image segmentation.

Step S103, collecting a public data set, and preprocessing to obtain a fine tuning training set. The disclosed dataset includes medical three-dimensional images from multiple suppliers, multiple stages, multiple organs, multiple modalities, and corresponding segmentation annotations.

The disclosed data set comprises a MICCAI 2022 flame abdomen scanning data set, wherein 2300 CT images are contained, the data set is from a 20+ medical institution, and the data set comprises 13 images of parts and segmented labels; amos multi-organ dataset comprising 600 CT and MRI scans comprising labeling of 15 organ segmentations; the brain tumor dataset of BraTs 2021 contained data for four modalities of T1, T1Gd, T2, T2Flair for 660 cases of data; the ACDC heart segmentation data set comprises 200 heart left and right partition labeling results; abdoment CT-1K, having more than 1000 CT scans from 12 medical centers, is a three-dimensional medical image segmentation dataset comprising multi-stage, multi-vendor multi-organ cases.

Specifically, the step of collecting the public data set, preprocessing the public data set and obtaining the fine tuning training set further comprises the following steps.

The hospital image card window width window level in the public dataset is normalized to the interval of 0-255.

Slicing the segmented mask in the z-axis direction to obtain a 2D image, obtaining a 2D slice mask, obtaining a bbox tensor of an external frame obtained by taking a rectangle externally connected with the 2D slice mask, and storing the bbox tensor into a npz file to obtain a fine tuning training set.

Step S104, freezing an original embedded layer embedding belonging to the pre-training model in the first model, inputting a fine-tuning training set into the first model, and training the first model to obtain a fine-tuned multi-organ image segmentation model.

The specific training steps are as follows:

and a data iterator is made and used for iterating and returning the 2D slice image, the 2D slice mask and the corresponding bbox frame tensor from the training set in a batch-by-batch mode and inputting the first model.

Scaling the 2D slice images and the 2D slice mask of the same batch to the same size, and simultaneously adjusting the size of the corresponding bbox frame; the bbox tensor increases the random 20 pixel shift as a perturbation.

The information of the bbox frame is used as prompt template to be encoded and added to a part of a prompt encoder layer template decoder.

The original embedded layer embedding belonging to the pre-training model in the first model is frozen, the data iterator inputs the fine tuning training set into the first model for training, and training learning parameters are input into the control module. In the first model training process, the combination loss of DiceLoss and CELOS is used as the training loss, and the learning rate is 1e-5.

The structure of the control module is mainly that firstly, the first linear layer is used for projecting and mapping downwards to 1/4 of the projection, the Multi-head attention mechanism module Multi-head attention is embedded into a smaller dimension, then the Multi-head attention mechanism module is connected with the GeLU activation function layer for processing and then projected upwards to the original dimension through the second linear layer for fine adjustment of the SAM framework, under the condition that a pre-training model is not trained again on a large scale, the original model can be protected by inserting the control module, the learned parameters are simultaneously put into the control module by the fine adjustment training mode of the scheme, and the training speed is accelerated while new information is learned.

Example 2

As shown in fig. 5, the present application also provides a multi-organ image segmentation method, which is implemented based on the multi-organ image segmentation model constructed by the above method, and includes the following steps:

step S201, inputting the medical image to be segmented into the multi-organ image segmentation model constructed by the multi-organ image segmentation model construction method.

Step S202, acquiring an operation prompt aiming at a point or a rectangular frame in a medical image to be segmented, and inputting the operation prompt into a multi-organ image segmentation model as a prompt.

Step S203, the multi-organ image segmentation model performs reasoning and outputs the segmentation result of the selected organ of the operation prompt.

The step of inputting the medical image to be segmented into the multi-organ image segmentation model further comprises a step of preprocessing, specifically:

slicing and preprocessing a three-dimensional image of a medical image to be segmented into a 2D slice image along the z-axis direction;

and inputting the 2D slice image serving as preprocessed data of the medical image to be segmented into a multi-organ image segmentation model.

In practical application, the multi-organ image segmentation model constructed and trained by the scheme does not need to carry out independent special training and labeling on the image segmentation task of a certain organ. As shown in the example diagram of FIG. 6, under the operation prompt of a point or a rectangular frame, the hintable interactive segmentation can be performed for the hinted organ region, so that the application value of the image segmentation technology in the medical field is greatly improved.

Claims

1. The multi-organ image segmentation model construction method is characterized by comprising the following steps of:

acquiring a public data set, and preprocessing to obtain a fine tuning training set, wherein the public data set comprises medical three-dimensional images from multiple suppliers, multiple stages, multiple organs and multiple modes and corresponding segmentation labels;

freezing an original embedded layer embedding belonging to a pre-training model in the first model, inputting a fine-tuning training set into the first model, and training the first model to obtain a fine-tuning multi-organ image segmentation model.

2. The method of claim 1, wherein the encoder layer encoder in the pre-training model comprises an Image encoder layer Image encoder and a prompt encoder layer prompt encoder,

a prompt encoder layer promptness encoder for mapping input promptness to a promptness feature space, wherein the promptness comprises points or circumscribed rectangular boxes;

and the decoding layer decoder is used for integrating two embedded layers enabling the Image encoder layer Image encoder and the prompt encoder layer sample encoder to output respectively, and then decoding a final segmentation mask from the feature map.

3. The method for constructing a multi-organ image segmentation model according to claim 2, wherein the step of acquiring the public dataset and preprocessing the public dataset to obtain the fine tuning training set further comprises the steps of:

normalizing the window width of the hospital image card window in the public data set to a range of 0-255;

4. A method for constructing a multi-organ image segmentation model according to claim 3, wherein the step of freezing the original embedding layer embedding belonging to the pre-training model in the first model, inputting the fine-tuning training set into the first model, training the first model, and obtaining the fine-tuned multi-organ image segmentation model further comprises the steps of:

a data iterator is made for iterating from the fine tuning training set back to the 2D slice image, the 2D slice mask and the corresponding bbox frame tensor on a batch-by-batch basis and inputting a first model,

scaling the 2D slice images and the 2D slice masks of the same batch to the same size, and correspondingly adjusting the size of the bbox frame; the bbox tensor increases the random 20 pixel shift as a perturbation;

the information of the bbox frame is used as prompt template to be encoded and added into a part of a prompt encoder layer template decoder;

the original embedded layer embedding belonging to the pre-training model in the first model is frozen, the data iterator inputs the fine tuning training set into the first model for training, and training learning parameters are input into the control module.

5. The method for constructing a multi-organ image segmentation model according to claim 1, wherein in the first model training process, a combination loss of DiceLoss and celos is used as a training loss, and a learning rate is 1e-5.

6. The method for constructing the Multi-organ image segmentation model according to claim 1, wherein the control module adopts a bottleneck structure and comprises a first linear layer, a GeLU activation function layer and a second linear layer which are sequentially connected, and the control module is used for reducing the dimension of the characteristics of the Multi-head attention mechanism module Multi-head attention by using the projection mapping of the first linear layer downwards by 1/4, and projecting the Multi-head attention mechanism module Multi-head attention back to the original dimension through the second linear layer after being processed by the GeLU activation function layer.

7. The method for constructing a multi-organ image segmentation model according to claim 1, wherein the pre-training model is a SAM model.

8. The method of claim 1, wherein the public dataset is a three-dimensional medical image segmentation dataset comprising multi-stage, multi-vendor, multi-organ cases, including MICCAI 2022 flere abdominal scan dataset, amos multi-organ dataset, braTs 2021 brain tumor dataset, ACDC heart segmentation dataset, abdoment-1K dataset.

9. A method of multi-organ image segmentation, the method comprising the steps of:

10. The method of claim 9, wherein the step of inputting the medical image to be segmented into the multi-organ image segmentation model constructed in claims 1 to 8 further comprises:

inputting the 2D slice image as preprocessed data of the medical image to be segmented into the multi-organ image segmentation model constructed according to the claims 1 to 8.