CN117370903A

CN117370903A - Disease classification method, model training method, device, electronic equipment and medium

Info

Publication number: CN117370903A
Application number: CN202311332672.8A
Authority: CN
Inventors: 万钇良
Original assignee: Neusoft Medical Systems Co Ltd
Current assignee: Neusoft Medical Systems Co Ltd
Priority date: 2023-10-13
Filing date: 2023-10-13
Publication date: 2024-01-09

Abstract

The disclosure relates to the technical field of image classification, and discloses a disease classification method, a model training method, a device, electronic equipment and a medium. The disease classification method comprises the following steps: acquiring a plurality of sub-images of a target scanning image; determining a disease description text related to the target scan image; based on the plurality of sub-images, the illness descriptive text and the illness classification model, a prediction probability of each preset illness is obtained. The method converts a target scanning image with larger size into a plurality of sub-images with smaller size, and the disease classification model can sequentially process the data of each sub-image, so that the limitation of the image processing process beyond the limitation of hardware conditions can be avoided. Meanwhile, the resolution of the images is not greatly reduced by converting the target scanning image into a plurality of sub-images, so that the disease classification model can classify the diseases based on the images with higher resolution, and the accuracy of classifying the diseases by the disease classification model is improved.

Description

Disease classification method, model training method, device, electronic equipment and medium

Technical Field

The present disclosure relates to the technical field of image classification, for example, to a disease classification method, a model training method, an apparatus, an electronic device, and a medium.

Background

Computer aided diagnosis technology based on deep learning is rapidly developing, and in the related technology, a certain number of scanning images with labels (such as X-ray chest radiography images) and illness description texts (such as clinical reports or medical history data and the like) can be used as training data, and the training data is used for training a neural network model to obtain a disease classification model capable of classifying diseases in the scanning images. Therefore, the disease classification model can be used for classifying various diseases in the scanned images, the workload of doctors is reduced, and meanwhile, the accuracy of disease classification can be improved.

However, due to limitations of hardware conditions or quality of training data, the disease classification model is prone to inaccurate classification, and thus, the related art has low accuracy in classifying diseases.

It should be noted that the information disclosed in the foregoing background section is only for enhancing understanding of the background of the present application and thus may include information that does not form the prior art that is already known to those of ordinary skill in the art.

Disclosure of Invention

The following presents a simplified summary in order to provide a basic understanding of some aspects of the disclosed embodiments. This summary is not an extensive overview, and is intended to neither identify key/critical elements nor delineate the scope of such embodiments, but is intended as a prelude to the more detailed description that follows.

The embodiment of the disclosure provides a disease classification method, a model training method, a device, electronic equipment and a medium, which can improve the accuracy of disease classification by a disease classification model.

According to a first aspect of embodiments of the present disclosure, there is provided a disease classification method comprising:

acquiring a plurality of sub-images of a target scanning image;

determining a disease description text related to the target scan image;

based on the plurality of sub-images, the illness descriptive text and the illness classification model, a prediction probability of each preset illness is obtained.

In some embodiments, obtaining the predicted probability for each preset disease based on the plurality of sub-images, the disease description text, and the disease classification model comprises:

acquiring multi-mode information of a plurality of sub-images and illness state description texts;

based on the multimodal information and the disease classification model, a predictive probability for each preset disease is obtained.

In some embodiments, acquiring multimodal information of a plurality of sub-images and condition description text includes:

acquiring an image code of each sub-image;

acquiring a text code corresponding to the illness state description text;

and splicing the plurality of image codes and the text codes to obtain multi-mode information of the plurality of sub-images and the illness state description text.

In some embodiments, the multimodal information includes an image encoding of each sub-image and a text encoding of the condition description text; based on the multimodal information and the disease classification model, obtaining a predictive probability for each preset disease, comprising:

extracting features for image encoding and text encoding using a cross-attention mechanism;

the extracted features are input into a disease classification model, and the disease classification model is utilized to obtain the prediction probability of each preset disease.

In some embodiments, acquiring a plurality of sub-images of a target scan image includes:

sliding in the target scanning image according to a preset step length by using a sliding window to obtain a plurality of sub-images; the preset step length is smaller than the side length of the sliding window.

In some embodiments, before acquiring the plurality of sub-images of the target scan image, further comprising:

the resolution of the original scanned image is adjusted to be a preset resolution;

and normalizing the pixel value of the original scanning image with the preset resolution to a preset pixel value range to obtain a target scanning image.

According to a second aspect of embodiments of the present disclosure, there is provided a model training method for training to obtain a disease classification model of the first aspect, the model training method comprising:

The following training procedure is iteratively performed:

obtaining the prediction probability of each preset disease based on a training sample and a classification model, wherein the training sample comprises a sample scanning image and a sample illness state description text related to the sample scanning image;

aiming at a preset disease with a positive label, obtaining a loss value of a prediction probability of the preset disease by using a first loss function; obtaining a loss value of the predicted probability of the preset disease by using a second loss function aiming at the preset disease with the negative label; the independent variable of the first loss function is the predicted probability of the preset disease, the independent variable of the second loss function is the corrected value of the predicted probability of the preset disease, and the corrected value is smaller than the corresponding predicted probability;

optimizing parameters of the classification model based on the loss values;

ending the training process when the training process meets the preset conditions, and determining the classification model after the last optimization parameter as a disease classification model.

In some embodiments, the first loss function comprises gamma, which is the difference between the reference value and the predicted probability of the predetermined disease ₊ A power term and a logarithmic term of the prediction probability of the preset disease;

the second loss function comprises gamma which is the difference between the reference value and the correction value of the predicted probability of the preset disease _- A power term, a logarithmic term of a correction value of a predictive probability of a preset disease, wherein gamma ₊ And gamma _- All are positive numbers.

In some embodiments, in the case where the difference between the predicted probability of the preset disease and the preset offset value is greater than 0, the correction value of the predicted probability of the preset disease is the difference value;

in the case where the difference between the predicted probability of the preset disease and the preset offset value is not greater than 0, the correction value of the predicted probability of the preset disease is 0.

In some embodiments, obtaining the predicted probability for each preset disease based on the training samples and the classification model comprises:

acquiring a plurality of sub-sample images of a sample scan image;

based on the plurality of sub-sample images, the sample condition description text and the classification model, the prediction probability of each preset disease is obtained.

According to a third aspect of embodiments of the present disclosure, there is provided a disease classification apparatus including an image acquisition module, a text determination module, and a disease classification module;

the image acquisition module is configured to acquire a plurality of sub-images of the target scan image;

the text determination module is configured to determine a condition description text associated with the target scan image;

the disease classification module is configured to obtain a predictive probability for each preset disease based on the plurality of sub-images, the disease description text, and the disease classification model.

According to a fourth aspect of embodiments of the present disclosure, there is provided a model training apparatus for training to obtain a disease classification model of the first aspect, the model training apparatus including a model training module and a model determining module;

the model training module is configured to iteratively perform the following training process:

aiming at a preset disease with a positive label, obtaining a loss value of a prediction probability of the preset disease by using a first loss function; obtaining a loss value of the predicted probability of the preset disease by using a second loss function aiming at the preset disease with the negative label; when the first loss function and the second loss function are substituted into the same probability value, the loss value obtained by the first loss function is larger than the loss value obtained by the second loss function;

optimizing parameters of the classification model based on the loss values;

the model determining module is configured to end the training process when determining that the training process meets the preset condition, and determine the classification model after the last optimization parameter as the disease classification model.

According to a fifth aspect of embodiments of the present disclosure, there is provided an electronic device comprising a processor and a memory storing program instructions; the processor is configured to perform the disease classification method provided in the first aspect or to perform the model training method provided in the second aspect when executing the program instructions.

According to a sixth aspect of embodiments of the present disclosure, there is provided a computer readable storage medium storing program instructions that, when executed, perform the disease classification method provided in the first aspect, or perform the model training method provided in the second aspect.

The disease classification method, the model training method, the device, the electronic equipment and the medium provided by the embodiment of the disclosure can realize the following technical effects:

according to the embodiment of the disclosure, one target scanning image with a larger size is converted into a plurality of sub-images with smaller sizes, and the disease classification model can sequentially process the data of each sub-image, so that the limitation of the image processing process beyond the limitation of hardware conditions can be avoided. Meanwhile, the resolution of the images is not greatly reduced by converting the target scanning image into a plurality of sub-images, so that the disease classification model can classify the diseases based on the images with higher resolution, and the accuracy of classifying the diseases by the disease classification model is improved.

The foregoing general description and the following description are exemplary and explanatory only and are not intended to limit the present disclosure.

Drawings

One or more embodiments are illustrated by way of example and not limitation in the figures of the accompanying drawings, in which like references indicate similar elements, and in which like reference numerals refer to similar elements, and in which:

FIG. 1 is a schematic diagram of a method of disease classification provided by an embodiment of the present disclosure;

FIG. 2 is a schematic illustration of another disease classification method provided by an embodiment of the present disclosure;

FIG. 3 is a schematic illustration of another disease classification method provided by an embodiment of the present disclosure;

FIG. 4 shows a schematic diagram of an exemplary implementation flow diagram of a disease classification method provided by an embodiment of the present disclosure;

FIG. 5 is a schematic diagram of a model training method provided by an embodiment of the present disclosure;

FIG. 6 is a schematic diagram of a disease classification apparatus provided in an embodiment of the present disclosure;

FIG. 7 is a schematic diagram of a model training apparatus provided by an embodiment of the present disclosure;

FIG. 8 illustrates a schematic diagram of an electronic device provided by an embodiment of the present disclosure;

fig. 9 is a schematic diagram of another electronic device according to an embodiment of the disclosure.

Detailed Description

So that the manner in which the features and techniques of the disclosed embodiments can be understood in more detail, a more particular description of the embodiments of the disclosure, briefly summarized below, may be had by reference to the appended drawings, which are not intended to be limiting of the embodiments of the disclosure. In the following description of the technology, for purposes of explanation, numerous details are set forth in order to provide a thorough understanding of the disclosed embodiments. However, one or more embodiments may still be practiced without these details. In other instances, well-known structures and devices may be shown simplified in order to simplify the drawing.

The terms first, second and the like in the description and in the claims of the embodiments of the disclosure and in the above-described figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate in order to describe embodiments of the present disclosure. Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion.

The term "plurality" means two or more, unless otherwise indicated.

In the embodiment of the present disclosure, the character "/" indicates that the front and rear objects are an or relationship. For example, A/B represents: a or B.

The term "and/or" is an associative relationship that describes an object, meaning that there may be three relationships. For example, a and/or B, represent: a or B, or, A and B.

The term "corresponding" may refer to an association or binding relationship, and the correspondence between a and B refers to an association or binding relationship between a and B.

Computer aided diagnosis technology based on deep learning is rapidly developing, and in the related technology, a certain number of scanning images with labels (such as X-ray chest radiography images) and illness description texts (such as clinical reports or medical history data) can be used as training data, and the training data is used for training a neural network model to obtain a disease classification model capable of classifying diseases in the scanning images. Therefore, the disease classification model can be used for classifying various diseases in the scanned images, the workload of doctors is reduced, and meanwhile, the accuracy of disease classification can be improved. However, due to limitations in hardware conditions or training data quality, disease classification models are prone to inaccurate classification.

In particular, when a disease classification model is run, the model is difficult to process higher resolution scan images due to hardware memory limitations, and it is often necessary to convert higher resolution scan images into lower resolution scan images, and the disease classification model classifies the disease based on the lower resolution scan images. Reducing the resolution of the scanned image may lose some detail in the image, which may result in the disease classification model not accurately classifying some disease types.

Furthermore, the classification accuracy for a disease of a disease classification model is highly dependent on training data. In the scene of classifying multiple types of diseases, due to different morbidity of each type of diseases, the data volume of some disease types is small, and the actually collected training data is difficult to achieve class balance. The disease classification model obtained by training the training data of unbalanced disease types has poor classification capability for less disease types, and is easy to be subjected to missed classification for less disease types. It will be appreciated that both of the above conditions affect the ability of the disease classification model to classify the disease. Therefore, the related art has low accuracy in classifying diseases.

The embodiment of the disclosure provides an electronic device, which may be a computer, a terminal, a server, or other devices with computing capabilities. The electronic device includes a processor that may obtain a predictive probability for each of the preset diseases based on a plurality of sub-images of the target scan image, a disease description text associated with the target scan image, and a disease classification model.

In combination with the electronic device provided by the embodiment of the present disclosure, the embodiment of the present disclosure provides a disease classification method, as shown in fig. 1, including the following steps:

s101, the processor acquires a plurality of sub-images of the target scan image.

Here, the target scan image is an image obtained after scanning a body part of a patient using a scanning device. The specific type of target scan image is related to the body part being scanned and the type of scanning device. For example, the target scan image may be an X-ray chest image, an X-ray abdomen image, or the like.

The sub-image is extracted from the target scanning image, and the sub-image is an image of a sub-region in the target scanning image, and the size of the sub-image is smaller than that of the target scanning image. The number of sub-images may be determined according to actual design requirements, and in general, the smaller the size of the sub-images, the greater the number of sub-images. It should be noted here that the data amount of the individual sub-images does not exceed the upper limit of the processing capacity of the hardware resources.

Alternatively, the target scan image may be segmented to obtain a plurality of sub-images.

Alternatively, a plurality of sub-images may be acquired in the target scan image based on a sliding window.

In some embodiments, multiple sub-images may be acquired in the entire region of the target scan image, the multiple sub-images containing all of the information of the target scan image.

In some embodiments, an effective area may be determined in the target scan image, and a plurality of sub-images may be acquired in the effective area, where the plurality of sub-images may contain all information of the effective area. S102, the processor determines a disease description text related to the target scanning image.

Here, both the target scan image and the condition description text correspond to the same patient. The disease description text records disease information of the patient, and the disease description text can be obtained based on clinical reports, medical history data and other files.

S103, the processor obtains the prediction probability of each preset disease based on the plurality of sub-images, the disease description text and the disease classification model.

The disease classification model in the embodiments of the present disclosure is trained on a neural network model with image classification capabilities, which may include a Classifier (Classifier), for example. The classifier comprises an average pooling layer, a full connection layer, a Sigmoid activation layer and the like, the Sigmoid has a normalization function, an output value is normalized to a [0,1] interval, and then the normalized value can be used as a classification probability.

The embodiments of the present disclosure may configure a corresponding plurality of types of preset diseases for each type of target scan image. The disease classification model may output a predictive probability for each preset disease based on the plurality of sub-images and the condition description text. It will be appreciated that the predicted probability for each preset disease is indicative of the probability that the patient is present with such preset disease.

In some embodiments, a comparison of the predicted probability of each preset disease to a set first probability threshold may be used to determine whether the patient is suffering from the preset disease. For example, if the predicted probability of a preset disease is greater than a first probability threshold, it may be determined that the patient is present with the preset disease; if the predicted probability of the preset disease is not greater than the first probability threshold, it may be determined that the patient is not suffering from the preset disease.

According to the disease classification method provided by the embodiment of the disclosure, one target scanning image with a larger size is converted into a plurality of sub-images with smaller sizes, and the disease classification model can sequentially process the data of each sub-image, so that the limitation of the image processing process beyond the limitation of hardware conditions can be avoided. Meanwhile, the resolution of the images is not greatly reduced by converting the target scanning image into a plurality of sub-images, so that the disease classification model can classify the diseases based on the images with higher resolution, and the accuracy of classifying the diseases by the disease classification model is improved.

In some embodiments, acquiring a plurality of sub-images of a target scan image includes: sliding in the target scanning image according to a preset step length by using a sliding window to obtain a plurality of sub-images; the preset step length is smaller than the side length of the sliding window.

It will be appreciated that the size of the sliding window is the same as the size of the sub-image. The side length of the sliding window may be determined according to the size of the sub-image, and the preset step length may be determined according to the side length of the sliding window. For example, the side length of the sliding window is 224, and the preset step size is 112.

Since the preset step size is smaller than the side length of the sliding window, there is an overlapping portion of adjacent sub-images acquired based on the sliding window. The overlapping parts of the adjacent sub-images can ensure that a plurality of sub-images can completely contain important details of a target scanning image on one hand, and important information is prevented from being lost; on the other hand, the sub-images can be associated, so that the disease classification model can accurately classify each preset disease based on the associated sub-images.

In some embodiments, before acquiring the plurality of sub-images of the target scan image, further comprising: the resolution of the original scanned image is adjusted to be a preset resolution; and normalizing the pixel value of the original scanning image with the preset resolution to a preset pixel value range to obtain a target scanning image.

Here, the original scan image is an image obtained by scanning a body part of a patient using a scanning device, and the target scan image is obtained by adjusting the resolution and pixel values of the original scan image. The target scanned image obtained by adjusting the resolution and the pixel value of the original scanned image is characterized in that the original scanned image is reserved.

The specific type of original scanned image is related to the body part being scanned and the type of scanning device. For example, the original scan image may be an X-ray chest image, an X-ray abdomen image, or the like.

The preset resolution and preset pixel value range may be determined according to actual design requirements. Wherein, after adjusting the resolution of the original scanned image to a preset resolution, it should be ensured that the details of the original scanned image are not significantly lost. For example, the preset resolution may be 1024×1024, and the pixel value range may be [0,1], that is, the pixel value of the original scanned image having the preset resolution may be normalized to between 0 and 1.

According to the embodiment of the disclosure, the resolution and the pixel value of the original scanning image are adjusted, so that the original scanning images with different specifications can be converted into the target scanning image with uniform specifications, the target scanning image can be processed in a uniform mode, and the efficiency of picture processing is improved.

In some embodiments, obtaining the predicted probability for each preset disease based on the plurality of sub-images, the disease description text, and the disease classification model comprises: acquiring multi-mode information of a plurality of sub-images and illness state description texts; based on the multimodal information and the disease classification model, a predictive probability for each preset disease is obtained.

According to the embodiment of the disclosure, the multiple sub-images and the illness state description text are fused to obtain the multi-mode information, so that the image information and the text information can be mutually supplemented, the image information and the text information can be mutually enhanced, and the classification accuracy of the illness classification model on the illness can be improved.

In connection with the electronic device provided in the embodiments of the present disclosure, another disease classification method is provided in the embodiments of the present disclosure, as shown in fig. 2, and the method includes the following steps:

s201, the processor adjusts the resolution of the original scanned image to a preset resolution.

S202, the processor normalizes the pixel value of the original scanned image with the preset resolution to a preset pixel value range to obtain a target scanned image.

And S203, the processor slides in the target scanning image by using the sliding window according to a preset step length to obtain a plurality of sub-images.

Here, the preset step size is smaller than the side length of the sliding window.

S204, the processor determines a disease description text related to the target scan image.

S205, the processor obtains the prediction probability of each preset disease based on the plurality of sub-images, the disease description text and the disease classification model.

In connection with the electronic device provided in the embodiments of the present disclosure, another disease classification method is provided in the embodiments of the present disclosure, as shown in fig. 3, and the method includes the following steps:

s301, the processor acquires a plurality of sub-images of the target scan image.

S302, the processor determines a disease description text related to the target scan image.

S303, the processor acquires multi-mode information of a plurality of sub-images and a disease description text.

S304, the processor obtains the prediction probability of each preset disease based on the multi-mode information and the disease classification model.

In some embodiments, acquiring multimodal information of a plurality of sub-images and condition description text includes: acquiring an image code of each sub-image; acquiring a text code corresponding to the illness state description text; and splicing the plurality of image codes and the text codes to obtain multi-mode information of the plurality of sub-images and the illness state description text.

According to the embodiment of the disclosure, the sub-images and the illness state description text are converted into the corresponding coding information, so that the sub-images and the illness state description text are conveniently fused, the multi-mode information in a coding form is obtained, and the multi-mode information in the coding form is easy to process by the illness classification model.

Alternatively, embodiments of the present disclosure may acquire image encoding of a sub-image using an image encoder, where the image encoder may use a ResNet101 network architecture. Specifically, the sub-image may be input to an image encoder, resulting in image encoding of the sub-image. Here, the image coding is a coding vector having a predetermined dimension.

Alternatively, embodiments of the present disclosure may split the condition description text into multiple tokens using a token algorithm. In some embodiments, the segmentation may be required to reach a specified number. Specifically, if the number of split words in the condition description text is less than the specified number, placements Fu Buquan may be used; if the number of split words in the condition description text is greater than the specified number, only the specified number of words may be used.

After obtaining a plurality of word segments corresponding to the illness state description text, each word segment can be converted into dictionary codes according to the coding dictionary. And then using a text encoder to convert dictionary codes of the individual word segments into text codes corresponding to the illness state description text, wherein the text encoder uses a Bert network structure. Specifically, each dictionary code may be input to a text encoder to obtain a text code corresponding to the condition description text. Here, the text code is a code vector having a predetermined dimension.

In some embodiments, after the image encoding and text encoding of each sub-image are obtained, the plurality of image encodings and text encodings are stitched to obtain multi-modal information.

In some embodiments, after the image encoding of each sub-image is obtained, a category encoding for characterizing the encoding of the image encoding belonging to the sub-image and a position encoding for characterizing the position of the sub-image corresponding to the image encoding in the target scan image may be appended to the image encoding. After the text codes are obtained, category codes and position codes can be added for each word segmentation code in the text codes, the category codes are used for representing the codes of the words belonging to the word segmentation codes, and the position codes are used for representing the positions of the words corresponding to the word segmentation codes in the illness state description text.

And after the category codes and the position codes are respectively added to the image codes and the text codes, splicing the plurality of image codes and the text codes to obtain the multi-mode information.

In some embodiments, the multimodal information includes an image encoding of each sub-image and a text encoding of the condition description text. Based on the multimodal information and the disease classification model, obtaining a predictive probability for each preset disease, comprising: extracting features for image encoding and text encoding using a cross-attention mechanism; the extracted features are input into a disease classification model, and the disease classification model is utilized to obtain the prediction probability of each preset disease.

The embodiment of the disclosure extracts the characteristics for the image coding and the text coding by using a cross attention mechanism, can enhance the attention of a model to certain important characteristics, and can enable the image information and the text information to be mutually enhanced and supplemented more fully.

Alternatively, embodiments of the present disclosure may extract features for image encoding and text encoding using a cross-attention module, wherein the cross-attention module includes a multi-layer transducer unit. In particular, the transducer unit may calculate the association between each code for the input image code and text code and extract various features using a multi-headed attention mechanism.

Fig. 4 is a schematic diagram of an exemplary implementation flow chart of a disease classification method according to an embodiment of the present disclosure, where the implementation of the method is specifically as follows:

(1) The processor adjusts the resolution of the original scanning image to 1024×1024, and normalizes the pixel value of the original scanning image with the resolution of 1024×1024 to the [0,1] interval to obtain the target scanning image.

(2) The processor slides in the target scan image according to the step size of 112 by using a sliding window with a side length of 224 to obtain N _image Sub-images.

(3) The processor splits the condition description text into a plurality of words using a word segmentation algorithm.

If the number of split words in the illness state description text is less than N _text Alternatively, placements Fu Buquan may be used; if the number of split words in the illness state description text is more than N _text And only the former N can be used _text And (5) word segmentation. Here, N is _text 128 may be taken.

(4) The processor converts each word into dictionary code according to the code dictionary, and N is calculated _text The individual dictionary codes form a dictionary code sequence S _text 。

Here, the code sequence S _text The dimension can be N _text ×300。

(5) The processor will N _image The sub-images are input into an image encoder to obtain a coded vector sequence E _image The method comprises the steps of carrying out a first treatment on the surface of the Will N _text The individual dictionary codes are input to a text encoder. Obtaining a coded vector sequence E _text 。

In the step (5), each sub-image is encoded into an encoded vector e with 768 dimensions by an image encoder _image Finally all the encoded vectors e _image Spliced into a dimension N _image Code vector sequence E of x 768 _image ，

In the above step (5), the dictionary coding sequence S is coded by a text encoder _text Each dictionary of (a) is encoded into an encoded vector e _text Obtaining N _text Coded vector sequence E of x 768 dimensions _text 。

(6) The processor being a sequence of code vectors E _image Each of the code vectors e _image Additional class coding and position coding for the coded vector sequence E _text Each of the code vectors e _text Additional category coding and position coding, wherein the coding vector e _image The additional category code is 1, coding vector e _text The additional category code is 0, for each code vector e _image Additional position coding of 0 to N _imahe -1, for each code vector e _text Additional bitsSet to 0 to N _text -1。

(7) The processor encodes the vector sequence E _image And a sequence of encoded vectors E _text Input to the cross-attention module, the association between each code is calculated using the transducer unit in the cross-attention module, and multiple features are extracted using the multi-head attention mechanism.

(8) The processor inputs the extracted features to a classifier, and obtains a prediction probability of each preset disease based on the input features using the classifier.

The embodiment of the disclosure provides another electronic device, which may be a computer, a terminal, a server, or the like, with computing capabilities. The electronic device comprises a processor that can train the classification model based on the training samples to derive the disease classification model in the above-described embodiments.

In combination with the electronic device provided by the embodiment of the present disclosure, the embodiment of the present disclosure provides a model training method for training to obtain the disease classification model in the above embodiment. In the disease classification method, training processes need to be iteratively performed, and model parameters are adjusted once in each training process. As shown in fig. 5, the method comprises the steps of:

S501, the processor obtains the prediction probability of each preset disease based on the training samples and the classification model.

Here, the training sample includes a sample scan image, and sample condition descriptive text associated with the sample scan image.

S502, aiming at a preset disease with a positive label, the processor obtains a loss value of a prediction probability of the preset disease by using a first loss function; and obtaining a loss value of the predicted probability of the preset disease by using a second loss function aiming at the preset disease with the negative label.

Here, positive and negative labels of each preset disease corresponding to each sample scan image are set for each sample scan image. If the sample scanning image contains a certain preset disease, setting the label of the preset disease to be positive for the sample scanning image; if the sample scan image does not contain a certain preset disease, the label of the preset disease is set to be negative for the sample scan image.

For a sample scanning image, if the label of a preset disease of a certain preset disease is positive, obtaining a loss value of the prediction probability of the preset disease by using a first loss function; if the label of the preset disease of a certain preset disease is negative, a second loss function is used to obtain a loss value of the predicted probability of the preset disease.

Here, the argument of the first loss function is a predicted probability of a preset disease, and the argument of the second loss function is a correction value of the predicted probability of the preset disease, the correction value being smaller than the corresponding predicted probability. Wherein the correction value of the predicted probability of the preset disease is obtained based on the predicted probability of the preset disease.

S503, the processor optimizes parameters of the classification model based on the loss values.

It should be noted that S501 to S503 are a training process, and the processor iteratively executes S501 to S503, in other words, the processor repeatedly executes S501 to S503 a plurality of times, so as to iteratively optimize the parameters of the classification model. The classification model in the first training process is the original model; after the first training process, the classification model in each training process is the classification model after the parameters are adjusted in the previous training process.

In the process of iteratively executing the training process, it may be determined whether the training process meets the preset condition, and when it is determined that the training process meets the preset condition, the training process is ended, and S504 is continuously executed. The preset conditions may be determined according to actual design requirements.

Alternatively, a loss threshold may be preset. In a training process, after obtaining the loss values for each of the predicted probabilities, a fusion loss value may be determined based on the loss values for each of the predicted probabilities. The fusion loss value may be the average, absolute average, sum, median, etc. of all prediction probabilities. In this case, the preset conditions include: the fusion loss value is less than the loss threshold. It can be appreciated that in the case where the fusion loss value is less than the loss threshold, it is determined that the training procedure satisfies the preset condition.

Alternatively, a threshold number of times may be set in advance. And in the process of iteratively executing the training process, recording the iteration times in real time. In this case, the preset conditions include: the number of iterations is greater than a number threshold. It can be appreciated that in the case where the recorded number of iterations is greater than the number of iterations threshold, it is determined that the training procedure satisfies the preset condition.

And S504, the processor finishes the training process when determining that the training process meets the preset conditions, and determines the classification model after the last optimization parameter as a disease classification model.

For the types of diseases occupying relatively few, the labels in the sample scanned image are more negative, and the labels are less positive. Thus, for a relatively small number of disease types, the model outputs a number of times that the predicted probability of the disease is not greater than the first probability threshold, which may be significantly greater than the number of times that the predicted probability of the disease is greater than the first probability threshold.

In this case, for the same preset disease, a true loss value of the predicted probability of the preset disease is obtained using a first loss function when its label is positive, and a loss value of the preset disease is obtained using a second loss function when its label is negative, wherein the loss value obtained using the second loss function is smaller than the corresponding true loss value. Therefore, the loss value corresponding to the preset probability of the preset disease when the label is negative is always smaller than the corresponding real loss value, so that the influence of data imbalance caused by the fact that the label of the preset disease is negative and more than the label is positive is counteracted to a certain extent, the classification capability of the disease classification model on the disease with less occupation is improved, and the classification accuracy of the disease classification model on the disease is improved.

In some embodiments, the first loss function comprises gamma, which is the difference between the reference value and the predicted probability of the predetermined disease ₊ The power term, the logarithmic term of the predictive probability of the preset disease. The second loss function comprises gamma which is the difference between the reference value and the correction value of the predicted probability of the preset disease _- Prediction of the power term, preset diseaseA logarithmic term of the correction value of the probability, wherein gamma ₊ And gamma _- All are positive numbers.

Optionally, the expression of the first loss function is as follows:

in the above-mentioned expression, the expression,gamma, which is the difference between the reference value and the predicted probability of the preset disease ₊ The power term, log (p) is the logarithmic term of the predicted probability of the preset disease, loss is the Loss value of the predicted probability, p is the predicted probability, gamma ₊ The value of (2) is larger than 0.

Optionally, the expression of the second loss function is as follows:

in the above-mentioned expression, the expression,gamma, which is the difference between the reference value and the correction value of the predictive probability of the preset disease _- The power term, log (p _m ) Is a logarithmic term of a correction value of a predicted probability of a preset disease, loss is a Loss value of the predicted probability, p _m For the correction value of the prediction probability of the preset diseases, the value of the correction value is not less than 0, and gamma is not less than 0 _- The value of (2) is larger than 0.

In some embodiments, in the case where the difference between the predicted probability of the preset disease and the preset offset value is greater than 0, the correction value of the predicted probability of the preset disease is the difference between the predicted probability of the preset disease and the preset offset value; in the case where the difference between the predicted probability of the preset disease and the preset offset value is not greater than 0, the correction value of the predicted probability of the preset disease is 0.

Alternatively, the correction value of the predicted probability of the preset disease may be calculated by the following formula: p is p _m =max (p-m, 0). Wherein max (p-m, 0) represents the larger of the two values selected from (p-m) and 0Is a numerical value of (2). Specifically, when p is greater than m, the correction value p _m Equal to the difference between p and m; when p is less than or equal to m, the correction value p _m Equal to 0.

Here, γ ₊ 、γ _- And m are positive numbers, and the values of the three can be determined according to actual design requirements. For example, gamma ₊ Has a value of 1, gamma _- And m is 0.1.

In some embodiments, obtaining the predicted probability for each preset disease based on the training samples and the classification model comprises: acquiring a plurality of sub-sample images of a sample scan image; determining sample condition descriptive text associated with the sample scan image; based on the plurality of sub-sample images, the sample condition description text and the disease classification model, a prediction probability of each preset disease is obtained.

In some embodiments, obtaining the predictive probability for each preset disease based on the plurality of subsampled images, the sample condition descriptive text, and the disease classification model includes:

acquiring multi-mode information of a plurality of sub-sample images and sample illness state description texts;

In some embodiments, obtaining multimodal information of a plurality of subsampled images and sample condition description text includes:

acquiring an image code of each sub-sample image;

acquiring a text code corresponding to a sample illness state description text;

and splicing the plurality of image codes and the text codes to obtain multi-mode information of a plurality of sub-sample images and sample illness state description texts.

In some embodiments, the multimodal information includes an image encoding of each sub-sample image and a text encoding of sample condition description text; based on the multimodal information and the disease classification model, obtaining a predictive probability for each preset disease, comprising:

In some embodiments, acquiring a plurality of sub-sample images of a sample scan image includes: sliding in the sample scanning image according to a preset step length by using a sliding window to obtain a plurality of sub-sample images; the preset step length is smaller than the side length of the sliding window.

In some embodiments, prior to acquiring the plurality of sub-sample images of the sample scan image, further comprising: the resolution of the original scanned image is adjusted to be a preset resolution; and normalizing the pixel value of the original scanning image with the preset resolution to a preset pixel value range to obtain a sample scanning image.

As shown in connection with fig. 6, an embodiment of the present disclosure provides a disease classification apparatus 600, the disease classification apparatus 600 including an image acquisition module 601, a text determination module 602, and a disease classification module 603.

The image acquisition module 601 is configured to acquire a plurality of sub-images of a target scan image;

the text determination module 602 is configured to determine a condition descriptive text associated with the target scan image;

the disease classification module 603 is configured to obtain a predictive probability for each preset disease based on the plurality of sub-images, the disease description text, and the disease classification model.

According to the disease classification device provided by the embodiment of the disclosure, one target scanning image with a larger size is converted into a plurality of sub-images with smaller sizes, and the disease classification model can sequentially process the data of each sub-image, so that the limitation of the image processing process beyond the limitation of hardware conditions can be avoided. Meanwhile, the resolution of the images is not greatly reduced by converting the target scanning image into a plurality of sub-images, so that the disease classification model can classify the diseases based on the images with higher resolution, and the accuracy of classifying the diseases by the disease classification model is improved.

In some embodiments, the disease classification module 603 is configured to: acquiring multi-mode information of a plurality of sub-images and illness state description texts; based on the multimodal information and the disease classification model, a predictive probability for each preset disease is obtained.

In some embodiments, the disease classification module 603 is configured to: acquiring an image code of each sub-image; acquiring a text code corresponding to the illness state description text; and splicing the plurality of image codes and the text codes to obtain multi-mode information of the plurality of sub-images and the illness state description text.

In some embodiments, the multimodal information includes an image encoding of each sub-image and a text encoding of the condition description text; the disease classification module 603 is configured to: extracting features for image encoding and text encoding using a cross-attention mechanism; the extracted features are input into a disease classification model, and the disease classification model is utilized to obtain the prediction probability of each preset disease.

In some embodiments, the image acquisition module 601 is configured to: sliding in the target scanning image according to a preset step length by using a sliding window to obtain a plurality of sub-images; the preset step length is smaller than the side length of the sliding window.

In some embodiments, the image acquisition module 601 is configured to: the resolution of the original scanned image is adjusted to be a preset resolution; and normalizing the pixel value of the original scanning image with the preset resolution to a preset pixel value range to obtain a target scanning image.

Referring to fig. 7, an embodiment of the disclosure provides a model training apparatus 700, where the model training apparatus 700 is used to train a disease classification model in the above embodiment. Model training apparatus 700 includes a model training module 701 and a model determination module 702.

The model training module 701 is configured to iteratively perform the following training process:

Optimizing parameters of the classification model based on the loss values;

the model determination module 702 is configured to: ending the training process when the training process meets the preset conditions, and determining the classification model after the last optimization parameter as a disease classification model.

In some embodiments, the first loss function comprises gamma, which is the difference between the reference value and the predicted probability of the predetermined disease ₊ A power term and a logarithmic term of the prediction probability of the preset disease; the second loss function comprises gamma which is the difference between the reference value and the correction value of the predicted probability of the preset disease _- A power term, a logarithmic term of a correction value of a predictive probability of a preset disease, wherein gamma ₊ And gamma _- All are positive numbers.

In some embodiments, in the case where the difference between the predicted probability of the preset disease and the preset offset value is greater than 0, the correction value of the predicted probability of the preset disease is the difference value; in the case where the difference between the predicted probability of the preset disease and the preset offset value is not greater than 0, the correction value of the predicted probability of the preset disease is 0.

In some embodiments, model training module 701 is configured to: acquiring a plurality of sub-sample images of a sample scan image; determining sample condition descriptive text associated with the sample scan image; based on the plurality of sub-sample images, the sample condition description text and the disease classification model, a prediction probability of each preset disease is obtained.

In some embodiments, model training module 701 is configured to: acquiring multi-mode information of a plurality of sub-sample images and sample illness state description texts; based on the multimodal information and the disease classification model, a predictive probability for each preset disease is obtained.

In some embodiments, model training module 701 is configured to: acquiring an image code of each sub-sample image; acquiring a text code corresponding to a sample illness state description text; and splicing the plurality of image codes and the text codes to obtain multi-mode information of a plurality of sub-sample images and sample illness state description texts.

In some embodiments, the multimodal information includes an image encoding of each sub-sample image and a text encoding of sample condition description text. The model training module 701 is configured to: extracting features for image encoding and text encoding using a cross-attention mechanism; the extracted features are input into a disease classification model, and the disease classification model is utilized to obtain the prediction probability of each preset disease.

In some embodiments, model training module 701 is configured to: sliding in the sample scanning image according to a preset step length by using a sliding window to obtain a plurality of sub-sample images; the preset step length is smaller than the side length of the sliding window.

In some embodiments, model training module 701 is configured to: the resolution of the original scanned image is adjusted to be a preset resolution; and normalizing the pixel value of the original scanning image with the preset resolution to a preset pixel value range to obtain a sample scanning image.

Fig. 8 shows a schematic diagram of an electronic device 800 provided by an embodiment of the disclosure, and in combination with fig. 8, the electronic device 800 includes a processor (processor) 801 and a memory (memory) 802. Optionally, the apparatus may further comprise a communication interface (Communication Interface) 803 and a bus 804. The processor 801, the communication interface 803, and the memory 802 may communicate with each other via the bus 804. The communication interface 803 may be used for information transfer. The processor 801 may call logic instructions in the memory 802 to perform the disease classification method of the corresponding embodiment described above.

Further, the logic instructions in the memory 802 described above may be implemented in the form of software functional units and stored in a computer-readable storage medium when sold or used as a stand-alone product.

The memory 802 is a computer-readable storage medium that can be used to store a software program, a computer-executable program, and program instructions/modules corresponding to the methods in the embodiments of the present disclosure. The processor 801 executes functional applications and data processing by executing program instructions/modules stored in the memory 802, that is, implements the disease classification method of the respective embodiments described above.

Memory 802 may include a storage program area that may store an operating system, at least one application program required for functionality, and a storage data area; the storage data area may store data created according to the use of the terminal device, etc. In addition, memory 802 may include high-speed random access memory, and may also include non-volatile memory.

It should be noted that, the processor 801 may also call logic instructions in the memory 802 to execute the model training method according to the foregoing corresponding embodiment.

Fig. 9 shows a schematic diagram of another electronic device 900 provided by an embodiment of the disclosure, where, in conjunction with fig. 9, the electronic device 900 includes a processor (processor) 901 and a memory (memory) 902. Optionally, the apparatus may further comprise a communication interface (Communication Interface) 903 and a bus 904. The processor 901, the communication interface 903, and the memory 902 may communicate with each other via the bus 904. The communication interface 903 may be used for information transmission. Processor 901 may invoke logic instructions in memory 902 to perform the model training methods of the respective embodiments described above.

Further, the logic instructions in the memory 902 described above may be implemented in the form of software functional units and stored in a computer readable storage medium when sold or used as a stand alone product.

The memory 902 is a computer-readable storage medium that can be used to store a software program, a computer-executable program, and program instructions/modules corresponding to the methods in the embodiments of the present disclosure. The processor 901 performs functional applications and data processing by executing program instructions/modules stored in the memory 902, i.e., implements the model training method of the respective embodiments described above.

The memory 902 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, at least one application program required for a function; the storage data area may store data created according to the use of the terminal device, etc. In addition, the memory 902 may include high-speed random access memory, and may also include nonvolatile memory.

It should be noted that, the processor 901 may also call the logic instructions in the memory 902 to execute the disease classification method according to the above-mentioned corresponding embodiment.

The disclosed embodiments provide a computer-readable storage medium storing computer-executable instructions configured to perform the above-described disease classification method.

Embodiments of the present disclosure provide a computer readable storage medium storing computer executable instructions configured to perform the model training method described above.

Embodiments of the present disclosure may be embodied in a software product stored on a storage medium, including one or more instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of a method according to embodiments of the present disclosure. And the aforementioned storage medium may be a non-transitory storage medium including: a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk or an optical disk, or the like, which can store program codes.

The above description and the drawings illustrate embodiments of the disclosure sufficiently to enable those skilled in the art to practice them. Other embodiments may involve structural, logical, electrical, process, and other changes. The embodiments represent only possible variations. Individual components and functions are optional unless explicitly required, and the sequence of operations may vary. Portions and features of some embodiments may be included in, or substituted for, those of others. Moreover, the terminology used in the present disclosure is for the purpose of describing embodiments only and is not intended to limit the claims. As used in the description of the embodiments and the claims, the singular forms "a," "an," and "the" (the) are intended to include the plural forms as well, unless the context clearly indicates otherwise. Similarly, the term "and/or" as used in this disclosure is meant to encompass any and all possible combinations of one or more of the associated listed items. Furthermore, when used in this disclosure, the terms "comprises," "comprising," and/or variations thereof, mean the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. Without further limitation, an element defined by the phrase "comprising one …" does not exclude the presence of other like elements in a process, method or apparatus comprising such elements. In this context, each embodiment may be described with emphasis on the differences from the other embodiments, and the same similar parts between the various embodiments may be referred to each other. For the methods, products, etc. disclosed in the embodiments, if they correspond to the method sections disclosed in the embodiments, the description of the method sections may be referred to for relevance.

Those of skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. The skilled artisan may use different methods for each particular application to achieve the described functionality, but such implementation should not be considered to be beyond the scope of the embodiments of the present disclosure. It will be clearly understood by those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein.

In the embodiments disclosed herein, the disclosed methods, articles of manufacture (including but not limited to devices, apparatuses, etc.) may be practiced in other ways. For example, the apparatus embodiments described above are merely illustrative, and for example, the division of the units may be merely a logical function division, and there may be additional divisions when actually implemented, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. In addition, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interface, device or unit indirect coupling or communication connection, which may be in electrical, mechanical or other form. The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to implement the present embodiment. In addition, each functional unit in the embodiments of the present disclosure may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. In the description corresponding to the flowcharts and block diagrams in the figures, operations or steps corresponding to different blocks may also occur in different orders than that disclosed in the description, and sometimes no specific order exists between different operations or steps. For example, two consecutive operations or steps may actually be performed substantially in parallel, they may sometimes be performed in reverse order, which may be dependent on the functions involved. Each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Claims

1. A method of classifying a disease, comprising:

acquiring a plurality of sub-images of a target scanning image;

determining a disease description text associated with the target scan image;

and obtaining the prediction probability of each preset disease based on the plurality of sub-images, the disease description text and the disease classification model.

2. The method of claim 1, wherein the obtaining a predictive probability for each preset disease based on the plurality of sub-images, the condition descriptive text, and the disease classification model comprises:

acquiring a plurality of multi-mode information of the sub-images and the illness state description text;

based on the multi-modal information and the disease classification model, a predictive probability of each preset disease is obtained.

3. The method of claim 2, wherein the acquiring the multi-modal information of the plurality of sub-images and the condition description text comprises:

acquiring an image code of each sub-image;

acquiring a text code corresponding to the illness state description text;

and splicing the image codes and the text codes to obtain multi-mode information of the sub-images and the illness state description text.

4. The method of claim 2, wherein the multimodal information includes an image encoding of each of the sub-images and a text encoding of the condition description text; the obtaining the prediction probability of each preset disease based on the multi-modal information and the disease classification model comprises the following steps:

extracting features for the image encoding and the text encoding using a cross-attention mechanism;

the extracted features are input into a disease classification model, and the prediction probability of each preset disease is obtained by using the disease classification model.

5. The method of any one of claims 1 to 4, wherein the acquiring a plurality of sub-images of the target scan image comprises:

6. The method of any one of claims 1 to 4, further comprising, prior to the acquiring the plurality of sub-images of the target scan image:

and normalizing the pixel value of the original scanning image with the preset resolution to a preset pixel value range to obtain the target scanning image.

7. A model training method for training to obtain a disease classification model according to any one of claims 1 to 6, comprising:

the following training procedure is iteratively performed:

obtaining a loss value of the predicted probability of the preset disease by using a first loss function aiming at the preset disease with a positive label; obtaining a loss value of the predicted probability of the preset disease by using a second loss function aiming at the preset disease with negative label; the independent variable of the first loss function is the predicted probability of the preset disease, the independent variable of the second loss function is a correction value of the predicted probability of the preset disease, and the correction value is smaller than the corresponding predicted probability;

optimizing parameters of the classification model based on each of the loss values;

ending the training process when the training process meets the preset condition, and determining the classification model after the last optimization parameter as the disease classification model.

8. The method of claim 7, wherein the first loss function comprises γ, the difference between a reference value and the predicted probability of the predetermined disease ₊ A power term, a logarithmic term of the predictive probability of the preset disease;

the second loss function comprises gamma, which is the difference between the reference value and the correction value of the predicted probability of the preset disease _- A power term, a logarithmic term of a correction value of the predictive probability of the preset disease, wherein gamma ₊ And gamma _- All are positive numbers.

9. The method according to claim 7, wherein in the case where the difference between the predicted probability of the preset disease and the preset offset value is greater than 0, the correction value of the predicted probability of the preset disease is the difference;

and under the condition that the difference value between the prediction probability of the preset disease and the preset offset value is not more than 0, the correction value of the prediction probability of the preset disease is 0.

10. The method of claim 7, wherein the obtaining the predicted probability for each preset disease based on the training samples and the classification model comprises:

acquiring a plurality of sub-sample images of a sample scan image;

determining a sample condition descriptive text associated with the sample scan image;

And obtaining the prediction probability of each preset disease based on the plurality of sub-sample images, the sample illness state description text and the classification model.

11. A disease classification device, comprising:

an image acquisition module configured to acquire a plurality of sub-images of a target scan image;

a text determination module configured to determine a condition descriptive text associated with the target scan image;

and a disease classification module configured to obtain a predictive probability of each preset disease based on a plurality of the sub-images, the disease description text, and a disease classification model.

12. Model training apparatus for training to obtain a disease classification model according to any one of claims 1 to 6, comprising:

the model training module is configured to iteratively execute the following training process:

and the model determining module is configured to end the training process when the training process meets the preset condition, and determine the classification model after the last optimization parameter as the disease classification model.

13. An electronic device comprising a processor and a memory storing program instructions, characterized in that the processor is configured to perform the disease classification method of any one of claims 1 to 6 or the model training method of any one of claims 7 to 10 when the program instructions are run.