CN111816306A

CN111816306A - Medical data processing method, and prediction model training method and device

Info

Publication number: CN111816306A
Application number: CN202010957988.6A
Authority: CN
Inventors: 贺云鹏
Original assignee: Yibao Medical Technology Shanghai Co ltd
Current assignee: Yibao Medical Technology Shanghai Co ltd
Priority date: 2020-09-14
Filing date: 2020-09-14
Publication date: 2020-10-23
Anticipated expiration: 2040-09-14
Also published as: CN111816306B

Abstract

The invention discloses a medical data processing method, a prediction model training method and a prediction model training device. The medical data processing method comprises the following steps: acquiring a first medical data parameter of a first object and a second medical data parameter of a second object; respectively performing parameter expansion on the first medical data parameter and the second medical data parameter based on a first expansion rule, and performing associated parameter expansion on the first medical data parameter and the second medical data parameter based on a second expansion rule to obtain expanded medical data parameters; determining the distribution state of each expanded medical data parameter based on the prior distribution of each expanded medical data parameter, and performing iterative sampling on each expanded medical data parameter based on the distribution state; and screening the expanded medical data parameters according to the sampling result to determine effective medical data parameters. Invalid medical data parameters are removed, the number of samples required in the process of training the prediction model is reduced, and small sample training of the prediction model is achieved.

Description

Medical data processing method, and prediction model training method and device

Technical Field

The embodiment of the invention relates to the technical field of medical data processing, in particular to a medical data processing method, a prediction model training method and a prediction model training device.

Background

With the rapid development of information science, large data processing modes based on artificial intelligence are widely applied, in particular to intelligent model processing modes such as a deep neural network model and the like.

At present, the data processing mode generally inputs the collected data into an artificial intelligence model, and the artificial intelligence model identifies, screens and processes the input data. Therefore, a large amount of sample data is needed in the training process of the artificial intelligent model, and for small sample data, especially for small sample medical data, the sample acquisition difficulty exists, which further results in poor training precision of the artificial intelligent model.

Disclosure of Invention

The invention provides a medical data processing method, a prediction model training method and a prediction model training device, which are used for meeting the training requirement of a model through processing of medical data.

In a first aspect, an embodiment of the present invention provides a medical data processing method, including:

acquiring a first medical data parameter of a first object and a second medical data parameter of a second object;

respectively performing parameter expansion on the first medical data parameter and the second medical data parameter based on a first expansion rule, and performing associated parameter expansion on the first medical data parameter and the second medical data parameter based on a second expansion rule to obtain expanded medical data parameters;

determining the distribution state of each expanded medical data parameter based on the prior distribution of each expanded medical data parameter, and performing iterative sampling on each expanded medical data parameter based on the distribution state;

and screening the expanded medical data parameters according to a sampling result to determine effective medical data parameters, wherein sample data formed by the effective medical data parameters is used for training a prediction model with a target prediction function.

In a second aspect, an embodiment of the present invention further provides a method for training a prediction model, including:

acquiring sample data formed by effective medical data parameters corresponding to a target prediction function, wherein the effective medical data parameters are determined according to a medical data processing method provided by the embodiment of the invention;

and training the prediction model to be trained based on the sample data to obtain the prediction model with the target prediction function.

In a third aspect, an embodiment of the present invention further provides a medical data processing apparatus, including:

a medical data parameter acquisition module for acquiring a first medical data parameter of a first object and a second medical data parameter of a second object;

the parameter extension module is used for respectively performing parameter extension on the first medical data parameter and the second medical data parameter based on a first extension rule and performing associated parameter extension on the first medical data parameter and the second medical data parameter based on a second extension rule to obtain an extended medical data parameter;

the iterative sampling module is used for determining the distribution state of each expanded medical data parameter based on the prior distribution of each expanded medical data parameter and performing iterative sampling on each expanded medical data parameter based on the distribution state;

and the effective data determining module is used for screening the expanded medical data parameters according to a sampling result to determine effective medical data parameters, wherein sample data formed by the effective medical data parameters is used for training a prediction model with a target prediction function.

In a fourth aspect, an embodiment of the present invention further provides a device for training a prediction model, where the device includes:

the system comprises a sample data acquisition module, a target prediction function generation module and a target prediction function generation module, wherein the sample data acquisition module is used for acquiring sample data formed by effective medical data parameters corresponding to the target prediction function, and the effective medical data parameters are determined according to the medical data processing method provided by the embodiment of the invention;

and the model training module is used for training the prediction model to be trained based on the sample data to obtain the prediction model with the target prediction function.

In a fifth aspect, an embodiment of the present invention further provides an electronic device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and when the processor executes the program, the processor implements a medical data processing method according to an embodiment of the present invention or a training method of a prediction model according to an embodiment of the present invention.

In a sixth aspect, the embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements a medical data processing method according to the embodiment of the present invention or a training method of a prediction model according to the embodiment of the present invention.

According to the technical scheme provided by the invention, the first medical data parameter of the first object and the second medical data parameter of the second object are respectively expanded and associated with each other, so that the diversity of the medical data parameters is improved, meanwhile, the expansion relation among various initial data parameters is preset, the combination and expansion of the input initial data parameters in the process of training the prediction model are replaced, the training process of the prediction model is simplified, the training difficulty of the model is reduced, the training effect of the prediction model is improved, and the requirement on a training sample is reduced. Furthermore, effective medical data parameters are screened from the expanded medical data parameters, so that the screening of invalid medical data parameters in the training process of the prediction model is replaced, the interference of the invalid medical data parameters is reduced, the convergence speed of the prediction model is high, and the quantity of samples required in the training process of the prediction model is further reduced.

Drawings

Fig. 1 is a schematic flow chart of a medical data processing method according to an embodiment of the present invention;

FIG. 2 is a flow chart of a medical data processing method according to a second embodiment of the present invention;

FIG. 3 is a flowchart illustrating a method for training a predictive model according to a third embodiment of the present invention;

FIG. 4 is a graphical representation of roc _ auc values for various models provided by embodiments of the present invention;

FIG. 5 is a flowchart illustrating a method for training a prediction model according to a fourth embodiment of the present invention;

fig. 6 is a schematic structural diagram of a medical data processing apparatus according to a fifth embodiment of the present invention;

fig. 7 is a schematic structural diagram of a training apparatus for a prediction model according to a sixth embodiment of the present invention;

fig. 8 is a schematic structural diagram of an electronic device according to a seventh embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.

Example one

Fig. 1 is a schematic flow chart of a medical data processing method according to an embodiment of the present invention, which is applicable to a case where medical data is processed, and the method can be executed by a medical data processing apparatus according to an embodiment of the present invention, which can be integrated into an electronic device such as a computer or a server. The method specifically comprises the following steps:

s110, acquiring a first medical data parameter of the first object and a second medical data parameter of the second object.

S120, performing parameter expansion on the first medical data parameter and the second medical data parameter respectively based on a first expansion rule, and performing associated parameter expansion on the first medical data parameter and the second medical data parameter based on a second expansion rule to obtain expanded medical data parameters.

S130, determining the distribution state of each expanded medical data parameter based on the prior distribution of each expanded medical data parameter, and performing iterative sampling on each expanded medical data parameter based on the distribution state.

S140, screening the expanded medical data parameters according to a sampling result to determine effective medical data parameters, wherein sample data formed by the effective medical data parameters is used for training a prediction model with a target prediction function.

The first object and the second object may be human objects or animal objects. Alternatively, the first subject and the second subject may be two subjects undergoing organ transplantation, e.g., the first subject is an organ donor and the second subject is an organ recipient. The organ transplant may be, but is not limited to, liver transplant, heart transplant, cornea transplant, kidney transplant, and the like.

The first medical data parameter and the second medical data parameter are medical data parameters that can be directly acquired or collected, for example, parameters that can be collected by instrument detection and analysis, or attribute parameters of the first object and the second object, and the like, and the attribute information may include, but is not limited to, gender, age, weight, height, and the like. At present, in the training process of a prediction model, initial data parameters are input into the prediction model, and the prediction model determines the relationship between the initial data parameters in the training process, so that a large number of training samples and a large training period are required. Wherein the initial data parameters are the first medical data parameters and the second medical data parameters which are not expanded. Alternatively, the predictive model may be a predictive model with functional prediction of organ transplantation.

In this embodiment, parameter expansion may be performed on the first medical data parameter or the second medical data parameter, or associated parameter expansion may be performed on the first medical data parameter and the second medical data parameter, so as to obtain a plurality of expanded medical data parameters. The mining of the initial data is realized, and the expanded medical data parameters comprise the initial data parameters and the expanded medical data parameters obtained by mining. The prediction model is trained through the medical data parameters obtained by mining, so that the process of exploring the relation between the input parameters by the prediction model is simplified, the convergence speed of the prediction model in the training process is accelerated, and the quantity and the training period of the sample data are further reduced.

It should be noted that, after acquiring the first medical data parameters of the first object and the second medical data parameters of the second object, before performing the parameter expansion, the method further includes: the initial data parameters are preprocessed, and data cleaning, de-weighting, normalization and the like can be included. The data cleaning may be deleting initial data parameters of missing data, and the deduplication processing may be removing similar parameters from initial data parameters with higher similarity. Optionally, the deduplication process may include: and calculating the similarity between any two initial data parameters, and eliminating the similar parameters when the similarity exceeds the preset similarity. The influence of the initial data parameters with high similarity on the target prediction function is similar, any initial data parameter is removed, redundant parameters can be reduced, the complexity of medical data parameter processing is reduced, and the processing efficiency is improved.

Wherein, the similarity between any two initial data parameters can be calculated through the Pearson correlation coefficient. Taking the liver transplantation data parameters as an example, see table 1, where table 1 is an illustration of the similarity of the initial data parameters.

TABLE 1

The initial data parameters obtained by screening are normalized, so that the influence of the difference of different parameters on a prediction model can be reduced. The different normalization modes corresponding to the different initial data parameters may be preset with the normalization mode of each initial data parameter. Referring to table 2, table 2 is a normalization of some of the parameters in the liver transplantation data parameters.

TABLE 2

In the acquisition of the above embodiment, the parameter type extension is performed on the preprocessed first medical data parameter and/or second medical data parameter, and optionally, the parameter type extension is performed on the initial data parameter, where the parameter type extension includes at least one of: an extension based on each initial data parameter, an extension based on a parameter set formed by associating the initial data parameters, an extension based on initial data parameters corresponding to different objects, an extension based on a parameter set formed by at least one initial data parameter of the same object, and an extension based on parameter differences of the initial data parameters.

Optionally, the performing parameter expansion on the first medical data parameter and the second medical data parameter respectively based on the first expansion rule includes: for any initial data parameter in the first medical data parameter or the second medical data parameter, determining a standard parameter range corresponding to the initial data parameter, and determining a nominal value based on the standard parameter range; and performing parameter expansion based on the difference value between the initial data parameter and the nominal value to obtain at least one expanded medical data parameter corresponding to the initial data parameter.

The first medical data parameter and the second medical data parameter both include a plurality of initial data parameters, for each initial data parameter, a nominal value of the initial data parameter may be any one of a median, a mean, and a mode of a standard range of the initial data parameter, the standard range includes a maximum value and a minimum value of the initial data parameter in a standard state, the median or the mean of the initial data parameter may be determined according to the maximum value and the minimum value, and accordingly, a difference value between the initial data parameter and the nominal value may be determined, the difference value may be used as an extended medical data parameter, or a preset manner may be performed on the difference value to obtain the extended medical data parameter. For example, the preset mode can be weight calculation or nonlinear calculation. For example, the extended medical data parameter may be Weight x feature-standard _ feature, where Weight is a Weight coefficient, feature is an initial data parameter, and standard _ feature is a nominal value of the initial data parameter; for example, the extended medical data parameter may also be e ^ (w | feature-standard _ feature |), where w is a weight coefficient.

Illustratively, for the initial data parameter of serum sodium, the standard range of human serum sodium is 135-145mmol/L, and correspondingly, the extended medical data parameter may be na- (145 + 135)/2, Weight x | na- (145 + 135)/2 | or e (w | na- (145 + 135)/2 |), wherein na is the serum sodium in the first medical data parameter and/or the second medical data.

Alternatively, there may be different ranges of standard parameters for an initial data parameter under different conditions, such as but not limited to weight status, age status, and gender status. Illustratively, for BMI (body mass index), different sexes correspond to different standard ranges. The parameter expansion can be performed separately in different states.

In this embodiment, each parameter is expanded through the standard range of each parameter, and the medical prior experience of each parameter is introduced, so that the expanded medical data parameters obtained by mining carry the medical prior experience. Since only the training of the logarithm is performed in the training process of the prediction model, the distribution rule of the parameters is usually obtained through a large amount of supervised training, which results in a large amount of training samples. In the embodiment, medical prior experience is given to the parameters of the expanded medical data in the data mining stage, the distribution rule of the parameters obtained through a large number of iterative training in the process of training the prediction model is replaced, the requirement on the number of samples is reduced, and convenience is brought to small sample training of the medical data.

For example, referring to table 3, table 3 is an example of independent extension of initial data parameters according to an embodiment of the present invention.

TABLE 3

Optionally, the performing parameter expansion on the first medical data parameter and the second medical data parameter respectively based on the first expansion rule includes: determining a parameter group for parameter expansion in an initial data parameter of the first medical data parameter or the second medical data parameter, wherein the parameter group comprises at least two initial data parameters determined according to business requirements, or at least two initial data parameters with an association relationship; and performing at least one type of extension operation on at least two initial data parameters in the parameter group to obtain extended medical data parameters.

For a first medical data parameter of a first object, a parameter set is formed according to at least two initial data parameters determined by business requirements, and for each parameter set, an expansion operation is performed on at least two initial data parameters in the parameter set, wherein the expansion operation may be, but is not limited to, a sum, a mean, a variance, and the like. For example, the service requirement may be, but is not limited to, a surgical time requirement, different service requirements correspond to different parameter sets, and the service requirement input by the user may be obtained by preselecting and determining a corresponding relationship between each service requirement and the parameter, and at least two initial data parameters corresponding to each service requirement are respectively called to form the parameter set. Referring to table 4, table 4 is an example of an expansion manner of the associated initial data parameter among the liver transplantation count parameters.

It should be noted that, in the process of determining the parameter group, the above expansion is also performed on not only the first medical data parameter and the second medical data parameter, but also intraoperative information, that is, medical data parameters during the operation performed on the first object and the second object, so as to obtain expanded medical data parameters.

TABLE 4

At least two initial data parameters in the parameter set may also have a correlation, wherein the correlation may be a positive correlation or a negative correlation. The correlation between the initial data parameters may be preset or may be statistically obtained from the medical data parameters of a large number of subjects. For example, in liver transplantation parameters, the lower the BMI of the recipient, the higher the total bilirubin, and a negative correlation. In some embodiments, the initial data parameters having an association relationship in the parameter group are two.

The expanding operation of the at least two initial data parameters having the association relationship may be a sum operation, a difference operation, a ratio operation, a derivative operation of the ratio, and the like. Illustratively, referring to table 5, table 5 is an example of an expansion manner of the associated initial data parameter in the liver transplantation count parameter.

TABLE 5

In the embodiment, a parameter group is formed by a plurality of parameters which have an association relation or belong to the same service requirement through medical prior experience, all initial data parameters in the same parameter group are subjected to extended operation to form extended medical data parameters with parameter association significance, and the association mining is performed on the associated initial data parameters based on the medical prior experience, so that the process of the association relation among the parameters in the prediction model training process is simplified, and correspondingly, the requirement on the number of training samples is reduced.

Optionally, performing associated parameter extension on the first medical data parameter and the second medical data parameter based on a second extension rule, including: determining a medical data parameter pair of the same type in the first medical data parameter and the second medical data parameter, and performing numerical operation on each medical data parameter pair to obtain an extended medical data parameter corresponding to the medical data parameter pair.

The expansion based on the initial data parameters corresponding to the different objects may be based on the same type of initial data parameters of the different objects, and for example, the expansion may include determining whether the same type of initial data parameters of the different objects are matched, and determining a difference, a sum, a quotient, a product, and the like of the same type of initial data parameters of the different objects. Referring to table 6, table 6 is an example of an expansion manner of the associated initial data parameter among the liver transplantation count parameters.

In this embodiment, the prediction model is used to predict organ transplantation data between a first object and a second object, determine an organ transplantation function, perform association expansion on corresponding initial data parameters of the first object and the second object during training of the prediction model, obtain an extended medical data parameter having an association between two users, and train the prediction model based on the extended medical data parameter, so that a parameter association mining process between different objects during a training process can be simplified, training efficiency is improved, and requirements for the number of training samples are further reduced.

TABLE 6

Optionally, performing associated parameter extension on the first medical data parameter and the second medical data parameter based on a second extension rule, including: determining an object group corresponding to the matching state of the target parameters based on the matching states of the target parameters of the first object and the second object; and performing parameter expansion based on the first medical data parameter and/or the second medical data parameter and parameter mean values of different objects in the object group to obtain an expanded medical data parameter.

In this embodiment, the medical data parameters of the plurality of subjects may be grouped based on the matching status of the target parameter, for example, the target parameter may be age, gender, and graft type. Illustratively, the sexes of two subjects subjected to organ transplantation are female and male, and the sexes of the two subjects are not matched; if the ages of both subjects subjected to organ transplantation are 50 years, the ages of the two subjects are matched. Illustratively, the object groupings of historical objects may include, but are not limited to, age mismatch, age match, gender mismatch, graft type match, and the like.

If a match is determined based on the target parameters of the first subject and the second subject, e.g., gender girl, age 20 for the target parameter of the first subject and gender girl, age 50 for the target parameter of the second subject, respectively, then the age of the first subject and the second subject may be determined to be unmatched, gender matched. And respectively determining an age unmatched group and a gender matched group, and performing parameter expansion based on the mean values of the parameters in the age unmatched group and the gender matched group, wherein the mean values of the parameters can comprise a donor mean value, a recipient mean value and an overall mean value. Accordingly, the extended medical data parameter may be a difference of the first medical data parameter or the second medical data parameter and the corresponding parameter mean.

By expanding the first medical data parameters and the second medical data parameters, the expansion relation between the initial data parameters is determined in advance, the combination and expansion of the input initial data parameters in the process of predicting model training are reduced, the convergence efficiency is improved, and the requirement on the number of training samples is reduced. Meanwhile, in the parameter expansion process, medical prior experience is introduced, the problem that the prediction model does not meet medical standards due to the fact that only numerical training is carried out in the training process of the prediction model is solved, and the training precision of the prediction model is further improved.

After the expanded medical data parameters are determined, the expanded medical data parameters are screened for effective medical data parameters, so that the interference of the medical data parameters in the process of predicting model training is reduced, the training efficiency is improved, and the requirement on the number of training samples is further reduced.

Wherein the prior distribution of each medical data parameter may be different, and may be determined according to the medical data parameter type. Illustratively, the prior distribution of the medical data parameter may be, but is not limited to, a cauchy distribution, a uniform distribution, a t distribution, an exponential distribution, or a beta distribution. The weight of each medical data parameter is determined through prior distribution, the larger the weight is, the larger the influence on the target prediction function is, and the smaller the weight is, the smaller the influence on the target prediction function is. And when the weight of the medical data parameter is zero or less than the preset weight value, determining that the medical data parameter is an invalid medical data parameter.

In some embodiments, the weights of the medical data parameters may be determined based on a prior distribution of the medical data parameters and a bayesian algorithm. The prior distribution of the medical data parameters represents a weight distribution mode of the medical data parameters, for example, when the prior distribution is Cauchy distribution, the weight of the medical data parameters satisfies the Cauchy distribution. And performing iterative sampling on each expanded medical data parameter to obtain a sampling result, further determining the weight of each expanded medical data parameter based on the sampling result, and determining the medical data parameter with the weight not being zero as an effective medical data parameter, or determining the medical data parameter with the weight being larger than a preset weight value as the effective medical data parameter. And determining sample data based on the effective medical data parameters, and training the prediction model to be trained to obtain the prediction model with the target prediction function.

According to the technical scheme of the embodiment, the first medical data parameters of the first object and the second medical data parameters of the second object are respectively expanded and associated with each other, so that the diversity of the medical data parameters is improved, meanwhile, the expansion relation among various initial data parameters is preset, the combination and expansion of the input initial data parameters in the prediction model training process are replaced, the training process of the prediction model is simplified, the training difficulty of the model is reduced, the training effect of the prediction model is improved, and the requirement on a training sample is reduced. Furthermore, effective medical data parameters are screened from the expanded medical data parameters, so that the screening of invalid medical data parameters in the training process of the prediction model is replaced, the interference of the invalid medical data parameters is reduced, the convergence speed of the prediction model is high, and the quantity of samples required in the training process of the prediction model is further reduced.

Example two

Fig. 2 is a schematic flow chart of a medical data processing method provided by the second embodiment of the invention, which is optimized on the basis of the second embodiment, and the method includes:

s210, acquiring a first medical data parameter of the first object and a second medical data parameter of the second object.

S220, performing parameter expansion on the first medical data parameter and the second medical data parameter respectively based on a first expansion rule, and performing associated parameter expansion on the first medical data parameter and the second medical data parameter based on a second expansion rule to obtain expanded medical data parameters.

S230, determining the distribution state of each expanded medical data parameter based on the prior distribution of each expanded medical data parameter, and performing iterative sampling on each expanded medical data parameter based on the distribution state.

S240, randomly sampling the weight of the expanded medical data parameters according to the prior distribution of the expanded medical data parameters to obtain the initial state of the medical data parameters;

and S250, carrying out iterative processing on the initial state based on a predetermined transfer matrix to obtain a stable distribution state of each expanded medical data parameter, and carrying out iterative sampling on each expanded medical data parameter based on the stable distribution state.

S260, screening the expanded medical data parameters according to a sampling result to determine effective medical data parameters, wherein sample data formed by the effective medical data parameters is used for training a prediction model with a target prediction function.

The types of medical data parameters related to the target prediction function are complicated, and the method can also comprise expanded medical data parameters obtained by expanding the existing medical data parameters, not all the medical data parameters have influence on the target prediction function, the training process of the current prediction model is to input all the medical data parameters into the prediction model to be trained, and the prediction model is used for screening the parameters in the training process, so the training period of the prediction model is long, and the quantity of samples required is large.

In this embodiment, whether each medical data parameter has an effective influence on the target prediction function is determined through the prior distribution of each medical data parameter, so as to delete an invalid parameter from a large number of medical data parameters, and obtain a medical data parameter effective for the target prediction function, where the valid medical data parameter can be used to train a prediction model with the target prediction function. By means of eliminating invalid medical data parameters, the training difficulty of the prediction model is reduced, the number of samples required in the process of training the prediction model is further reduced, and small sample training of the prediction model can be achieved on the basis of guaranteeing the training precision of the prediction model.

Optionally, determining the distribution state of each expanded medical data parameter based on the prior distribution of each expanded medical data parameter includes: randomly sampling the weight of each expanded medical data parameter according to the prior distribution of each expanded medical data parameter to obtain the initial state of each medical data parameter; and carrying out iterative processing on the initial state based on a predetermined transfer matrix to obtain the stable distribution state of each expanded medical data parameter.

The prior distribution of the medical data parameters represents a weight distribution mode of the medical data parameters, for example, when the prior distribution is Cauchy distribution, the weight of the medical data parameters satisfies the Cauchy distribution. The probability value of the sampling value in prior distribution is determined by randomly sampling the weight of the medical data parameter, and the probability value corresponding to the weighted random sampling value of each medical data parameter forms the initial state of the medical data parameter. The initial state may be presented in the form of a sampling matrix.

In this embodiment, the state transition is performed by iterating the initial state of the medical data parameter, and when the state transition is stable, the stable distribution state of each medical data parameter is obtained. And carrying out state transition on the initial state of the medical data parameters through a predetermined transition matrix, wherein the transition matrix can be determined through a Markov chain, and when the reversible Markov chain meets a detailed balance equation, the transition matrix is obtained.

Performing iterative processing on the initial state based on the transfer matrix, and determining the transfer state of the preset times as a stable distribution state when the iteration times meet the preset iteration times; or, when the transition state obtained by the iterative processing converges, determining the transition state in the converged state as a stable distribution state.

Specifically, the iterative processing of the initial state based on a predetermined transition matrix to obtain the stable distribution state of each medical data parameter includes: performing state transition on the initial state based on the transition matrix to obtain a transition state; judging the transfer state based on preset suggestion distribution and a verification threshold; when the state requirement is met, carrying out state transition on the transition state iteration; when the state requirements are not met, the step of randomly sampling the weights of the medical data parameters according to the prior distribution of the medical data parameters is re-executed.

The state transition to the initial state may be a multiplication of the transition matrix and the initial state of the medical data parameter to obtain the transition state. And judging the transfer state by using a judgment acceptance-rejection algorithm based on a preset suggestion distribution and a verification threshold, wherein the suggestion distribution can be but is not limited to a symmetrical distribution, a normal distribution or an independent distribution and can be set as required. The verification threshold may be a fixed threshold or a random number drawn in a preset interval, which may be (0, 1).

Optionally, the determining the transition state based on a preset recommended distribution and a verification threshold includes: determining an acceptance probability of the transition state based on the preset suggestion distribution; determining that the transition state satisfies a state requirement when the acceptance probability is greater than or equal to the verification threshold.

Wherein, the accepting probability of the transition state can be calculated based on the following formula:

wherein

is composed of

The posterior distribution of (a) is,

is composed of

The probability of (a) of (b) being,

to distribute QWs based on recommendations

To the direction of

The transition probability of making the transition is,

to distribute QWs based on recommendations

To the direction of

The transition probability of making the transition is,

in the case of the current state of the mobile terminal,

the next sampling state.

When the state requirement is met, determining whether the current iteration number meets a preset number or whether the current transfer state is in a convergence state, if not, further performing state transfer on the current transfer state based on a transfer matrix, and if so, determining the current transfer state as a stable distribution state; and when the acceptance probability is smaller than the verification threshold, determining that the state requirement is not met, sampling the weights again based on the prior distribution of the medical data parameters, determining a new initial state, and executing the process until a stable distribution state is obtained.

The stable distribution state comprises a distribution probability of each medical data parameter during sampling of the overall medical data parameter. Based on the probability values of the medical data parameters in the stable distribution state, sampling the weights of the medical data parameters for preset times, wherein the preset times can be 100 or 1000, and the like, and can be set according to requirements.

And determining the sampling result of the weight of each expanded medical data parameter by sampling for a preset number of times, and determining the weight of any expanded medical data parameter based on the distribution of the sampling result of any medical data parameter. Wherein the weights may be determined in a manner related to the prior distribution. When the prior distribution is Cauchy distribution, determining a value corresponding to a distribution peak value of a sampling result of any medical data parameter as the weight of any medical data parameter.

And determining the medical data parameter with the weight not being zero as an effective medical data parameter, or determining the medical data parameter with the weight being larger than a preset weight value as the effective medical data parameter. And determining sample data based on the effective medical data parameters, and training the prediction model to be trained to obtain the prediction model with the target prediction function.

According to the technical scheme of the embodiment, influence weights of the expanded medical data parameters in the target prediction function are determined based on prior distribution of the medical data parameters, the medical data parameters effective to the target prediction function are screened based on the weights, invalid medical data parameters are eliminated, the training difficulty of the prediction model is reduced, the convergence speed of the prediction model is improved, the required number of samples in the process of training the prediction model is further reduced, and small sample training of the prediction model can be realized on the basis of ensuring the training precision of the prediction model.

EXAMPLE III

Fig. 3 is a flowchart of a method for training a predictive model according to a third embodiment of the present invention, where the method is used to train a predictive model with a target prediction function, and the method includes:

s310, obtaining sample data formed by effective medical data parameters corresponding to the target prediction function, wherein the effective medical data parameters are determined according to the medical data processing method provided by the embodiment.

S320, training the prediction model to be trained based on the sample data to obtain the prediction model with the target prediction function.

In this embodiment, sample sampling is performed based on the effective medical data parameters obtained in the above embodiment, so as to obtain sample data. For example, the valid medical data parameters may be extracted from a parameter set of a plurality of sample objects. The prediction model to be trained is trained on the sample data formed based on the effective medical data parameters, so that the process of training the model is simplified, a large amount of sample data is not needed, and the effects of reducing the number of samples and improving the training efficiency are achieved.

On the basis of the above embodiment, before training the predictive model to be trained based on the sample data, the method further includes: and verifying at least two models to be trained based on a preset group of sample data, and determining the models to be trained for the target prediction function training. The at least two models to be trained may include, but are not limited to, at least two of a logistic regression model L1, a logistic regression model L2, a support vector machine, a K-nearest neighbors (KNN) model, a deep learning model CNN, a random forest model (RandomForest), and a LightGBM (gradient lifting decision tree). Training through a preset group (for example, 20 groups) of sample data, verifying the trained model, measuring the prediction accuracy of the trained model, and determining the model with the highest prediction accuracy as the model to be trained for the target prediction function training.

Specifically, the evaluation value of the prediction result obtained from each set of sample data input is obtained, where the evaluation value may be roc _ auc numerical value, and roc _ auc numerical value may be shown in the form of a curve, where the vertical axis of the curve is the TPR true positive rate, i.e., the ratio of samples predicted to be positive and actually positive to the total samples, and the horizontal axis is the FPR false positive rate, i.e., the ratio of samples predicted to be positive and actually negative to the total samples. The evaluation value of the sample data of the preset group is processed, for example, a mean value and a variance of the evaluation value of the sample data of the preset group are determined, and a model to be trained for performing the target prediction function training is screened according to the obtained mean value and variance. Wherein, the model to be trained for performing the target prediction function training can satisfy the following conditions: the variance is minimum and the mean is maximum. In some embodiments, the variance and the mean may be weighted, and the model to be trained for performing the target prediction function training is screened according to the weighting result, so that the variance and the mean may be considered, wherein the weights of the variance and the mean may be determined according to requirements.

Illustratively, referring to fig. 4, fig. 4 is a schematic diagram of roc _ auc values of various models provided by embodiments of the present invention. As can be seen from fig. 4, the logic review model L1 is a model to be trained for performing the target prediction function training.

In the embodiment, the training efficiency and the prediction precision are improved by screening the to-be-trained sample suitable for the target prediction function.

Example four

Fig. 5 is a schematic flow chart of a training method of a prediction model according to a fourth embodiment of the present invention, which is detailed on the basis of the foregoing embodiment, and the method includes:

s410, acquiring a first medical data parameter of a first object subjected to organ transplantation and a second medical data parameter of a second object.

S420, performing parameter expansion on the first medical data parameter and the second medical data parameter respectively based on a first expansion rule, and performing associated parameter expansion on the first medical data parameter and the second medical data parameter based on a second expansion rule to obtain expanded medical data parameters.

S430, determining the distribution state of each expanded medical data parameter based on the prior distribution of each expanded medical data parameter, and performing iterative sampling on each expanded medical data parameter based on the distribution state.

S440, screening the expanded medical data parameters according to the sampling result to determine effective medical data parameters.

S450, obtaining sample data formed by effective medical data parameters, and training the prediction model to be trained based on the sample data to obtain the prediction model with the organ transplantation function.

According to the technical scheme of the embodiment, the organ transplantation can be liver transplantation, and the diversity and comprehensiveness of liver transplantation parameters are improved by expanding initial data parameters of a liver function prediction function after the liver transplantation. And determining corresponding weights according to the prior distribution of the medical data parameters so as to screen effective medical data parameters. Illustratively, see table 7, which is an example of valid medical data parameters and corresponding weights in liver transplantation values.

TABLE 7

Wherein, the group by (graft type) [ donor platelet ]. mean () is the mean value of the donor platelet with the graft type as the grouping mode, and the meanings of the group by (graft type) [ graft weight ]. mean (), group by (tumor or not) [ total operation time ]. mean () and group by (sex matching) [ donor sodium ]. mean () are analogized.

From table 7, graft type _ donor BMI _ mean _ div, graft type _ graft weight _ mean _ div, whether tumor _ total surgery time _ mean _ div, sex matched _ donor sodium _ mean _ div, recipient BMI/time from beginning of lavage to bag, graft type _ donor platelet _ mean _ div are valid medical data parameters.

And determining sample data based on the determined effective medical data parameters, wherein the sample data is small sample data, training the prediction model to be trained to obtain the prediction model with the function of predicting the liver dysfunction after liver transplantation, reducing the data requirement on the sample data on the basis of ensuring the accuracy of the prediction model, simplifying the training process of the prediction model and improving the training efficiency of the prediction model.

EXAMPLE five

Fig. 6 is a schematic structural diagram of a medical data processing apparatus according to a fifth embodiment of the present invention, where the apparatus includes:

a medical data parameter acquisition module 510 for acquiring a first medical data parameter of a first object and a second medical data parameter of a second object;

a parameter extension module 520, configured to perform parameter extension on the first medical data parameter and the second medical data parameter respectively based on a first extension rule, and perform associated parameter extension on the first medical data parameter and the second medical data parameter based on a second extension rule, so as to obtain extended medical data parameters;

an iterative sampling module 530, configured to determine a distribution state of each expanded medical data parameter based on prior distribution of each expanded medical data parameter, and perform iterative sampling on each expanded medical data parameter based on the distribution state;

and the valid data determining module 540 is configured to screen the expanded medical data parameters according to a sampling result to determine valid medical data parameters, where sample data formed by the valid medical data parameters is used to train a prediction model with the target prediction function.

Optionally, the parameter expanding module 520 includes:

a nominal value determining unit, configured to determine, for any initial data parameter of the first medical data parameter or the second medical data parameter, a standard parameter range corresponding to the initial data parameter, and determine a nominal value based on the standard parameter range;

a first expansion unit, configured to perform parameter expansion based on a difference between the initial data parameter and the nominal value, to obtain at least one expanded medical data parameter corresponding to the initial data parameter.

Optionally, the parameter expanding module 520 includes:

a parameter set determining unit, configured to determine, in an initial data parameter of the first medical data parameter or the second medical data parameter, a parameter set for parameter extension, where the parameter set includes at least two initial data parameters determined according to a service requirement, or at least two initial data parameters having an association relationship;

and the second extension unit is used for carrying out at least one extension operation on at least two initial data parameters in the parameter group to obtain extended medical data parameters.

Optionally, the parameter expanding module 520 includes:

an object grouping determination unit, configured to determine, based on a matching state of target parameters of the first object and the second object, an object grouping corresponding to the matching state of the target parameters;

and the third expansion unit is used for performing parameter expansion on the basis of the first medical data parameter and/or the second medical data parameter and the parameter mean values of different objects in the object group to obtain an expanded medical data parameter.

Optionally, the parameter expanding module 520 includes:

and the fourth extension unit is used for determining medical data parameter pairs of the same type in the first medical data parameters and the second medical data parameters, and performing numerical operation on each medical data parameter pair to obtain the extended medical data parameters corresponding to the medical data parameter pairs.

Optionally, the first subject is an organ donor, the second subject is an organ recipient, and the predictive model of target prediction function is a functional prediction of organ transplantation.

Optionally, the iterative sampling module 530 includes:

an initial state determining unit, configured to randomly sample the weight of the expanded medical data parameter according to the prior distribution of the expanded medical data parameter, so as to obtain an initial state of the medical data parameter;

and the stable distribution state determining unit is used for performing iterative processing on the initial state based on a predetermined transfer matrix to obtain the stable distribution state of each expanded medical data parameter.

Optionally, the stable distribution state determining unit is configured to:

the state transfer subunit is used for carrying out state transfer on the initial state based on the transfer matrix to obtain a transfer state;

and the transition state judgment subunit is used for judging the transition state based on preset recommended distribution and a verification threshold, iterating the transition state to carry out state transition when the state requirement is met, and re-performing the step of randomly sampling the weight of the medical data parameter according to the expanded prior distribution of the medical data parameter when the state requirement is not met.

Optionally, the transition state determining subunit is configured to:

determining an acceptance probability of the transition state based on the preset suggestion distribution;

determining that the transition state satisfies a state requirement when the acceptance probability is greater than or equal to the verification threshold.

Optionally, the stable distribution state determining unit is configured to:

when the iteration times meet the preset iteration times, determining the transfer state of the preset times as a stable distribution state; or,

and when the transition state obtained by the iterative processing is converged, determining the transition state in the converged state as a stable distribution state.

Optionally, the valid data determining module 540 is configured to:

determining a numerical value corresponding to a distribution peak value of the sampling result of any expanded medical data parameter as the weight of any expanded medical data parameter;

and determining the medical data parameter with the weight larger than the preset threshold value as the effective medical data parameter.

The medical data processing device can execute the medical data processing method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects for executing the medical data processing method.

EXAMPLE six

Fig. 7 is a schematic structural diagram of a training apparatus for a prediction model according to a sixth embodiment of the present invention, where the apparatus includes:

the sample data acquiring module 610 is configured to acquire sample data formed by valid medical data parameters corresponding to a target prediction function, where the valid medical data parameters are determined according to the medical data processing method provided in the embodiment of the present invention;

and the model training module 620 is configured to train the prediction model to be trained based on the sample data to obtain a prediction model with a target prediction function.

Optionally, the apparatus further comprises:

and the model screening module is used for verifying at least two models to be trained based on a preset group of sample data before training the prediction model to be trained based on the sample data, and determining the model to be trained for training the target prediction function.

The training device of the prediction model can execute the training method of the prediction model provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the training method of the prediction model.

EXAMPLE seven

Fig. 8 is a schematic structural diagram of an electronic device according to a seventh embodiment of the present invention. FIG. 8 illustrates a block diagram of an electronic device 412 suitable for use in implementing embodiments of the present invention. The electronic device 412 shown in fig. 8 is only an example and should not bring any limitations to the functionality and scope of use of the embodiments of the present invention. The device 412 is typically an electronic device that undertakes image classification functions.

As shown in fig. 8, the electronic device 412 is in the form of a general purpose computing device. The components of the electronic device 412 may include, but are not limited to: one or more processors 416, a storage device 428, and a bus 418 that couples the various system components including the storage device 428 and the processors 416.

Bus 418 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MCA) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.

Electronic device 412 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by electronic device 412 and includes both volatile and nonvolatile media, removable and non-removable media.

Storage 428 may include computer system readable media in the form of volatile Memory, such as Random Access Memory (RAM) 430 and/or cache Memory 432. The electronic device 412 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 434 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 8, and commonly referred to as a "hard drive"). Although not shown in FIG. 8, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a Compact disk-Read Only Memory (CD-ROM), a Digital Video disk (DVD-ROM), or other optical media) may be provided. In these cases, each drive may be connected to bus 418 by one or more data media interfaces. Storage 428 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the invention.

Program 436 having a set (at least one) of program modules 426 may be stored, for example, in storage 428, such program modules 426 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination may comprise an implementation of a network environment. Program modules 426 generally perform the functions and/or methodologies of embodiments of the invention as described herein.

The electronic device 412 may also communicate with one or more external devices 414 (e.g., keyboard, pointing device, camera, display 424, etc.), with one or more devices that enable a user to interact with the electronic device 412, and/or with any devices (e.g., network card, modem, etc.) that enable the electronic device 412 to communicate with one or more other computing devices. Such communication may occur via input/output (I/O) interfaces 422. Also, the electronic device 412 may communicate with one or more networks (e.g., a Local Area Network (LAN), Wide Area Network (WAN), and/or a public Network, such as the internet) via the Network adapter 420. As shown, network adapter 420 communicates with the other modules of electronic device 412 over bus 418. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device 412, including but not limited to: microcode, device drivers, Redundant processing units, external disk drive Arrays, disk array (RAID) systems, tape drives, and data backup storage systems, to name a few.

The processor 416 executes programs stored in the storage device 428 to perform various functional applications and data processing, such as implementing a medical data processing method provided by an embodiment of the present invention or a training method of a prediction model provided by an embodiment of the present invention.

Example eight

An eighth embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements a medical data processing method according to an embodiment of the present invention or a training method of a prediction model according to an embodiment of the present invention.

Of course, the embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, which is not limited to the method operations described above, and can also execute the medical data processing method or the training method of the prediction model provided in any embodiment of the present invention.

Computer storage media for embodiments of the invention may employ any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A computer readable signal medium may include a propagated data signal with computer readable source code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Source code embodied on a computer-readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Computer source code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, or the like, as well as conventional procedural programming languages, such as the "C" programming language or similar programming languages. The source code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims

1. A method of medical data processing, comprising:

2. The method according to claim 1, wherein the parameter expanding the first medical data parameter and the second medical data parameter respectively based on a first expansion rule comprises:

for any initial data parameter in the first medical data parameter or the second medical data parameter, determining a standard parameter range corresponding to the initial data parameter, and determining a nominal value based on the standard parameter range;

and performing parameter expansion based on the difference value between the initial data parameter and the nominal value to obtain at least one expanded medical data parameter corresponding to the initial data parameter.

3. The method according to claim 1, wherein the parameter expanding the first medical data parameter and the second medical data parameter respectively based on a first expansion rule comprises:

determining a parameter group for parameter expansion in an initial data parameter of the first medical data parameter or the second medical data parameter, wherein the parameter group comprises at least two initial data parameters determined according to business requirements, or at least two initial data parameters with an association relationship;

and performing at least one type of extension operation on at least two initial data parameters in the parameter group to obtain extended medical data parameters.

4. The method according to claim 1, wherein the associated parameter extension of the first medical data parameter and the second medical data parameter based on a second extension rule comprises:

determining an object group corresponding to the matching state of the target parameters based on the matching states of the target parameters of the first object and the second object;

and performing parameter expansion based on the first medical data parameter and/or the second medical data parameter and parameter mean values of different objects in the object group to obtain an expanded medical data parameter.

5. The method according to claim 1, wherein the associated parameter extension of the first medical data parameter and the second medical data parameter based on a second extension rule comprises:

determining a medical data parameter pair of the same type in the first medical data parameter and the second medical data parameter, and performing numerical operation on each medical data parameter pair to obtain an extended medical data parameter corresponding to the medical data parameter pair.

6. The method of any one of claims 1-5, wherein the first subject is an organ donor, the second subject is an organ recipient, and the predictive model of target prediction function is a functional prediction of organ transplantation.

7. The method according to claim 1, wherein the determining the expanded distribution state of each medical data parameter based on the expanded prior distribution of each medical data parameter comprises:

randomly sampling the weight of each expanded medical data parameter according to the prior distribution of each expanded medical data parameter to obtain the initial state of each medical data parameter;

and carrying out iterative processing on the initial state based on a predetermined transfer matrix to obtain the stable distribution state of each expanded medical data parameter.

8. The method according to claim 7, wherein the iteratively processing the initial state based on a predetermined transition matrix to obtain a stable distribution state of the expanded medical data parameters comprises:

performing state transition on the initial state based on the transition matrix to obtain a transition state;

judging the transfer state based on preset suggestion distribution and a verification threshold;

when the state requirement is met, carrying out state transition on the transition state iteration;

and when the state requirement is not met, re-executing the step of randomly sampling the weights of the medical data parameters according to the expanded prior distribution of the medical data parameters.

9. The method of claim 8, wherein the determining the transition state based on a preset recommendation distribution and a verification threshold comprises:

10. The method according to claim 8, wherein the obtaining the stable distribution state of each expanded medical data parameter comprises:

11. The method according to claim 8, wherein the obtaining the stable distribution state of each expanded medical data parameter comprises:

12. A method for training a predictive model, comprising:

acquiring sample data formed by valid medical data parameters corresponding to a target prediction function, wherein the valid medical data parameters are determined according to the medical data processing method of any one of claims 1-11;

13. The method of claim 12, wherein prior to training a predictive model to be trained based on the sample data, the method further comprises:

and verifying at least two models to be trained based on a preset group of sample data, and determining the models to be trained for the target prediction function training.

14. A medical data processing apparatus, characterized by comprising:

15. An apparatus for training a predictive model, comprising:

a sample data obtaining module, configured to obtain sample data formed by valid medical data parameters corresponding to a target prediction function, where the valid medical data parameters are determined according to the medical data processing method according to any one of claims 1 to 11;

16. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the medical data processing method of any one of claims 1-11 or the training method of the predictive model of any one of claims 12-13 when executing the program.

17. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out a method of processing medical data as claimed in any one of claims 1 to 11 or a method of training a predictive model as claimed in any one of claims 12 to 13.