CN111816306A - Medical data processing method, and prediction model training method and device - Google Patents
Medical data processing method, and prediction model training method and device Download PDFInfo
- Publication number
- CN111816306A CN111816306A CN202010957988.6A CN202010957988A CN111816306A CN 111816306 A CN111816306 A CN 111816306A CN 202010957988 A CN202010957988 A CN 202010957988A CN 111816306 A CN111816306 A CN 111816306A
- Authority
- CN
- China
- Prior art keywords
- medical data
- parameter
- data parameter
- state
- parameters
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000012549 training Methods 0.000 title claims abstract description 120
- 238000000034 method Methods 0.000 title claims abstract description 79
- 238000003672 processing method Methods 0.000 title claims abstract description 25
- 238000009826 distribution Methods 0.000 claims abstract description 116
- 238000005070 sampling Methods 0.000 claims abstract description 46
- 238000012216 screening Methods 0.000 claims abstract description 14
- 230000006870 function Effects 0.000 claims description 54
- 230000007704 transition Effects 0.000 claims description 47
- 238000012545 processing Methods 0.000 claims description 29
- 238000002054 transplantation Methods 0.000 claims description 22
- 238000012546 transfer Methods 0.000 claims description 21
- 210000000056 organ Anatomy 0.000 claims description 18
- 239000011159 matrix material Substances 0.000 claims description 16
- 238000012795 verification Methods 0.000 claims description 11
- 238000004590 computer program Methods 0.000 claims description 8
- 210000004185 liver Anatomy 0.000 description 12
- 238000010586 diagram Methods 0.000 description 8
- 238000005065 mining Methods 0.000 description 6
- 230000003287 optical effect Effects 0.000 description 6
- DGAQECJNVWCQMB-PUAWFVPOSA-M Ilexoside XXIX Chemical compound C[C@@H]1CC[C@@]2(CC[C@@]3(C(=CC[C@H]4[C@]3(CC[C@@H]5[C@@]4(CC[C@@H](C5(C)C)OS(=O)(=O)[O-])C)C)[C@@H]2[C@]1(C)O)C)C(=O)O[C@H]6[C@@H]([C@H]([C@@H]([C@H](O6)CO)O)O)O.[Na+] DGAQECJNVWCQMB-PUAWFVPOSA-M 0.000 description 5
- 229910052708 sodium Inorganic materials 0.000 description 5
- 239000011734 sodium Substances 0.000 description 5
- 238000011156 evaluation Methods 0.000 description 4
- 238000010606 normalization Methods 0.000 description 4
- 238000013473 artificial intelligence Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 210000002966 serum Anatomy 0.000 description 3
- 206010028980 Neoplasm Diseases 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000004140 cleaning Methods 0.000 description 2
- 238000007477 logistic regression Methods 0.000 description 2
- 239000013307 optical fiber Substances 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 238000009827 uniform distribution Methods 0.000 description 2
- 241001465754 Metazoa Species 0.000 description 1
- 238000008050 Total Bilirubin Reagent Methods 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 210000004087 cornea Anatomy 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 238000003066 decision tree Methods 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 210000003734 kidney Anatomy 0.000 description 1
- 208000019423 liver disease Diseases 0.000 description 1
- 230000005976 liver dysfunction Effects 0.000 description 1
- 230000003908 liver function Effects 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 238000007637 random forest analysis Methods 0.000 description 1
- 230000008707 rearrangement Effects 0.000 description 1
- 230000002441 reversible effect Effects 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
- 238000001356 surgical procedure Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/20—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H10/00—ICT specially adapted for the handling or processing of patient-related medical or healthcare data
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/50—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for simulation or modelling of medical disorders
Landscapes
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Medical Informatics (AREA)
- Public Health (AREA)
- Biomedical Technology (AREA)
- Epidemiology (AREA)
- General Health & Medical Sciences (AREA)
- Primary Health Care (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Pathology (AREA)
- Measuring And Recording Apparatus For Diagnosis (AREA)
Abstract
The invention discloses a medical data processing method, a prediction model training method and a prediction model training device. The medical data processing method comprises the following steps: acquiring a first medical data parameter of a first object and a second medical data parameter of a second object; respectively performing parameter expansion on the first medical data parameter and the second medical data parameter based on a first expansion rule, and performing associated parameter expansion on the first medical data parameter and the second medical data parameter based on a second expansion rule to obtain expanded medical data parameters; determining the distribution state of each expanded medical data parameter based on the prior distribution of each expanded medical data parameter, and performing iterative sampling on each expanded medical data parameter based on the distribution state; and screening the expanded medical data parameters according to the sampling result to determine effective medical data parameters. Invalid medical data parameters are removed, the number of samples required in the process of training the prediction model is reduced, and small sample training of the prediction model is achieved.
Description
Technical Field
The embodiment of the invention relates to the technical field of medical data processing, in particular to a medical data processing method, a prediction model training method and a prediction model training device.
Background
With the rapid development of information science, large data processing modes based on artificial intelligence are widely applied, in particular to intelligent model processing modes such as a deep neural network model and the like.
At present, the data processing mode generally inputs the collected data into an artificial intelligence model, and the artificial intelligence model identifies, screens and processes the input data. Therefore, a large amount of sample data is needed in the training process of the artificial intelligent model, and for small sample data, especially for small sample medical data, the sample acquisition difficulty exists, which further results in poor training precision of the artificial intelligent model.
Disclosure of Invention
The invention provides a medical data processing method, a prediction model training method and a prediction model training device, which are used for meeting the training requirement of a model through processing of medical data.
In a first aspect, an embodiment of the present invention provides a medical data processing method, including:
acquiring a first medical data parameter of a first object and a second medical data parameter of a second object;
respectively performing parameter expansion on the first medical data parameter and the second medical data parameter based on a first expansion rule, and performing associated parameter expansion on the first medical data parameter and the second medical data parameter based on a second expansion rule to obtain expanded medical data parameters;
determining the distribution state of each expanded medical data parameter based on the prior distribution of each expanded medical data parameter, and performing iterative sampling on each expanded medical data parameter based on the distribution state;
and screening the expanded medical data parameters according to a sampling result to determine effective medical data parameters, wherein sample data formed by the effective medical data parameters is used for training a prediction model with a target prediction function.
In a second aspect, an embodiment of the present invention further provides a method for training a prediction model, including:
acquiring sample data formed by effective medical data parameters corresponding to a target prediction function, wherein the effective medical data parameters are determined according to a medical data processing method provided by the embodiment of the invention;
and training the prediction model to be trained based on the sample data to obtain the prediction model with the target prediction function.
In a third aspect, an embodiment of the present invention further provides a medical data processing apparatus, including:
a medical data parameter acquisition module for acquiring a first medical data parameter of a first object and a second medical data parameter of a second object;
the parameter extension module is used for respectively performing parameter extension on the first medical data parameter and the second medical data parameter based on a first extension rule and performing associated parameter extension on the first medical data parameter and the second medical data parameter based on a second extension rule to obtain an extended medical data parameter;
the iterative sampling module is used for determining the distribution state of each expanded medical data parameter based on the prior distribution of each expanded medical data parameter and performing iterative sampling on each expanded medical data parameter based on the distribution state;
and the effective data determining module is used for screening the expanded medical data parameters according to a sampling result to determine effective medical data parameters, wherein sample data formed by the effective medical data parameters is used for training a prediction model with a target prediction function.
In a fourth aspect, an embodiment of the present invention further provides a device for training a prediction model, where the device includes:
the system comprises a sample data acquisition module, a target prediction function generation module and a target prediction function generation module, wherein the sample data acquisition module is used for acquiring sample data formed by effective medical data parameters corresponding to the target prediction function, and the effective medical data parameters are determined according to the medical data processing method provided by the embodiment of the invention;
and the model training module is used for training the prediction model to be trained based on the sample data to obtain the prediction model with the target prediction function.
In a fifth aspect, an embodiment of the present invention further provides an electronic device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and when the processor executes the program, the processor implements a medical data processing method according to an embodiment of the present invention or a training method of a prediction model according to an embodiment of the present invention.
In a sixth aspect, the embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements a medical data processing method according to the embodiment of the present invention or a training method of a prediction model according to the embodiment of the present invention.
According to the technical scheme provided by the invention, the first medical data parameter of the first object and the second medical data parameter of the second object are respectively expanded and associated with each other, so that the diversity of the medical data parameters is improved, meanwhile, the expansion relation among various initial data parameters is preset, the combination and expansion of the input initial data parameters in the process of training the prediction model are replaced, the training process of the prediction model is simplified, the training difficulty of the model is reduced, the training effect of the prediction model is improved, and the requirement on a training sample is reduced. Furthermore, effective medical data parameters are screened from the expanded medical data parameters, so that the screening of invalid medical data parameters in the training process of the prediction model is replaced, the interference of the invalid medical data parameters is reduced, the convergence speed of the prediction model is high, and the quantity of samples required in the training process of the prediction model is further reduced.
Drawings
Fig. 1 is a schematic flow chart of a medical data processing method according to an embodiment of the present invention;
FIG. 2 is a flow chart of a medical data processing method according to a second embodiment of the present invention;
FIG. 3 is a flowchart illustrating a method for training a predictive model according to a third embodiment of the present invention;
FIG. 4 is a graphical representation of roc _ auc values for various models provided by embodiments of the present invention;
FIG. 5 is a flowchart illustrating a method for training a prediction model according to a fourth embodiment of the present invention;
fig. 6 is a schematic structural diagram of a medical data processing apparatus according to a fifth embodiment of the present invention;
fig. 7 is a schematic structural diagram of a training apparatus for a prediction model according to a sixth embodiment of the present invention;
fig. 8 is a schematic structural diagram of an electronic device according to a seventh embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Example one
Fig. 1 is a schematic flow chart of a medical data processing method according to an embodiment of the present invention, which is applicable to a case where medical data is processed, and the method can be executed by a medical data processing apparatus according to an embodiment of the present invention, which can be integrated into an electronic device such as a computer or a server. The method specifically comprises the following steps:
s110, acquiring a first medical data parameter of the first object and a second medical data parameter of the second object.
S120, performing parameter expansion on the first medical data parameter and the second medical data parameter respectively based on a first expansion rule, and performing associated parameter expansion on the first medical data parameter and the second medical data parameter based on a second expansion rule to obtain expanded medical data parameters.
S130, determining the distribution state of each expanded medical data parameter based on the prior distribution of each expanded medical data parameter, and performing iterative sampling on each expanded medical data parameter based on the distribution state.
S140, screening the expanded medical data parameters according to a sampling result to determine effective medical data parameters, wherein sample data formed by the effective medical data parameters is used for training a prediction model with a target prediction function.
The first object and the second object may be human objects or animal objects. Alternatively, the first subject and the second subject may be two subjects undergoing organ transplantation, e.g., the first subject is an organ donor and the second subject is an organ recipient. The organ transplant may be, but is not limited to, liver transplant, heart transplant, cornea transplant, kidney transplant, and the like.
The first medical data parameter and the second medical data parameter are medical data parameters that can be directly acquired or collected, for example, parameters that can be collected by instrument detection and analysis, or attribute parameters of the first object and the second object, and the like, and the attribute information may include, but is not limited to, gender, age, weight, height, and the like. At present, in the training process of a prediction model, initial data parameters are input into the prediction model, and the prediction model determines the relationship between the initial data parameters in the training process, so that a large number of training samples and a large training period are required. Wherein the initial data parameters are the first medical data parameters and the second medical data parameters which are not expanded. Alternatively, the predictive model may be a predictive model with functional prediction of organ transplantation.
In this embodiment, parameter expansion may be performed on the first medical data parameter or the second medical data parameter, or associated parameter expansion may be performed on the first medical data parameter and the second medical data parameter, so as to obtain a plurality of expanded medical data parameters. The mining of the initial data is realized, and the expanded medical data parameters comprise the initial data parameters and the expanded medical data parameters obtained by mining. The prediction model is trained through the medical data parameters obtained by mining, so that the process of exploring the relation between the input parameters by the prediction model is simplified, the convergence speed of the prediction model in the training process is accelerated, and the quantity and the training period of the sample data are further reduced.
It should be noted that, after acquiring the first medical data parameters of the first object and the second medical data parameters of the second object, before performing the parameter expansion, the method further includes: the initial data parameters are preprocessed, and data cleaning, de-weighting, normalization and the like can be included. The data cleaning may be deleting initial data parameters of missing data, and the deduplication processing may be removing similar parameters from initial data parameters with higher similarity. Optionally, the deduplication process may include: and calculating the similarity between any two initial data parameters, and eliminating the similar parameters when the similarity exceeds the preset similarity. The influence of the initial data parameters with high similarity on the target prediction function is similar, any initial data parameter is removed, redundant parameters can be reduced, the complexity of medical data parameter processing is reduced, and the processing efficiency is improved.
Wherein, the similarity between any two initial data parameters can be calculated through the Pearson correlation coefficient. Taking the liver transplantation data parameters as an example, see table 1, where table 1 is an illustration of the similarity of the initial data parameters.
TABLE 1
The initial data parameters obtained by screening are normalized, so that the influence of the difference of different parameters on a prediction model can be reduced. The different normalization modes corresponding to the different initial data parameters may be preset with the normalization mode of each initial data parameter. Referring to table 2, table 2 is a normalization of some of the parameters in the liver transplantation data parameters.
TABLE 2
In the acquisition of the above embodiment, the parameter type extension is performed on the preprocessed first medical data parameter and/or second medical data parameter, and optionally, the parameter type extension is performed on the initial data parameter, where the parameter type extension includes at least one of: an extension based on each initial data parameter, an extension based on a parameter set formed by associating the initial data parameters, an extension based on initial data parameters corresponding to different objects, an extension based on a parameter set formed by at least one initial data parameter of the same object, and an extension based on parameter differences of the initial data parameters.
Optionally, the performing parameter expansion on the first medical data parameter and the second medical data parameter respectively based on the first expansion rule includes: for any initial data parameter in the first medical data parameter or the second medical data parameter, determining a standard parameter range corresponding to the initial data parameter, and determining a nominal value based on the standard parameter range; and performing parameter expansion based on the difference value between the initial data parameter and the nominal value to obtain at least one expanded medical data parameter corresponding to the initial data parameter.
The first medical data parameter and the second medical data parameter both include a plurality of initial data parameters, for each initial data parameter, a nominal value of the initial data parameter may be any one of a median, a mean, and a mode of a standard range of the initial data parameter, the standard range includes a maximum value and a minimum value of the initial data parameter in a standard state, the median or the mean of the initial data parameter may be determined according to the maximum value and the minimum value, and accordingly, a difference value between the initial data parameter and the nominal value may be determined, the difference value may be used as an extended medical data parameter, or a preset manner may be performed on the difference value to obtain the extended medical data parameter. For example, the preset mode can be weight calculation or nonlinear calculation. For example, the extended medical data parameter may be Weight x feature-standard _ feature, where Weight is a Weight coefficient, feature is an initial data parameter, and standard _ feature is a nominal value of the initial data parameter; for example, the extended medical data parameter may also be e ^ (w | feature-standard _ feature |), where w is a weight coefficient.
Illustratively, for the initial data parameter of serum sodium, the standard range of human serum sodium is 135-145mmol/L, and correspondingly, the extended medical data parameter may be na- (145 + 135)/2, Weight x | na- (145 + 135)/2 | or e (w | na- (145 + 135)/2 |), wherein na is the serum sodium in the first medical data parameter and/or the second medical data.
Alternatively, there may be different ranges of standard parameters for an initial data parameter under different conditions, such as but not limited to weight status, age status, and gender status. Illustratively, for BMI (body mass index), different sexes correspond to different standard ranges. The parameter expansion can be performed separately in different states.
In this embodiment, each parameter is expanded through the standard range of each parameter, and the medical prior experience of each parameter is introduced, so that the expanded medical data parameters obtained by mining carry the medical prior experience. Since only the training of the logarithm is performed in the training process of the prediction model, the distribution rule of the parameters is usually obtained through a large amount of supervised training, which results in a large amount of training samples. In the embodiment, medical prior experience is given to the parameters of the expanded medical data in the data mining stage, the distribution rule of the parameters obtained through a large number of iterative training in the process of training the prediction model is replaced, the requirement on the number of samples is reduced, and convenience is brought to small sample training of the medical data.
For example, referring to table 3, table 3 is an example of independent extension of initial data parameters according to an embodiment of the present invention.
TABLE 3
Optionally, the performing parameter expansion on the first medical data parameter and the second medical data parameter respectively based on the first expansion rule includes: determining a parameter group for parameter expansion in an initial data parameter of the first medical data parameter or the second medical data parameter, wherein the parameter group comprises at least two initial data parameters determined according to business requirements, or at least two initial data parameters with an association relationship; and performing at least one type of extension operation on at least two initial data parameters in the parameter group to obtain extended medical data parameters.
For a first medical data parameter of a first object, a parameter set is formed according to at least two initial data parameters determined by business requirements, and for each parameter set, an expansion operation is performed on at least two initial data parameters in the parameter set, wherein the expansion operation may be, but is not limited to, a sum, a mean, a variance, and the like. For example, the service requirement may be, but is not limited to, a surgical time requirement, different service requirements correspond to different parameter sets, and the service requirement input by the user may be obtained by preselecting and determining a corresponding relationship between each service requirement and the parameter, and at least two initial data parameters corresponding to each service requirement are respectively called to form the parameter set. Referring to table 4, table 4 is an example of an expansion manner of the associated initial data parameter among the liver transplantation count parameters.
It should be noted that, in the process of determining the parameter group, the above expansion is also performed on not only the first medical data parameter and the second medical data parameter, but also intraoperative information, that is, medical data parameters during the operation performed on the first object and the second object, so as to obtain expanded medical data parameters.
TABLE 4
At least two initial data parameters in the parameter set may also have a correlation, wherein the correlation may be a positive correlation or a negative correlation. The correlation between the initial data parameters may be preset or may be statistically obtained from the medical data parameters of a large number of subjects. For example, in liver transplantation parameters, the lower the BMI of the recipient, the higher the total bilirubin, and a negative correlation. In some embodiments, the initial data parameters having an association relationship in the parameter group are two.
The expanding operation of the at least two initial data parameters having the association relationship may be a sum operation, a difference operation, a ratio operation, a derivative operation of the ratio, and the like. Illustratively, referring to table 5, table 5 is an example of an expansion manner of the associated initial data parameter in the liver transplantation count parameter.
TABLE 5
In the embodiment, a parameter group is formed by a plurality of parameters which have an association relation or belong to the same service requirement through medical prior experience, all initial data parameters in the same parameter group are subjected to extended operation to form extended medical data parameters with parameter association significance, and the association mining is performed on the associated initial data parameters based on the medical prior experience, so that the process of the association relation among the parameters in the prediction model training process is simplified, and correspondingly, the requirement on the number of training samples is reduced.
Optionally, performing associated parameter extension on the first medical data parameter and the second medical data parameter based on a second extension rule, including: determining a medical data parameter pair of the same type in the first medical data parameter and the second medical data parameter, and performing numerical operation on each medical data parameter pair to obtain an extended medical data parameter corresponding to the medical data parameter pair.
The expansion based on the initial data parameters corresponding to the different objects may be based on the same type of initial data parameters of the different objects, and for example, the expansion may include determining whether the same type of initial data parameters of the different objects are matched, and determining a difference, a sum, a quotient, a product, and the like of the same type of initial data parameters of the different objects. Referring to table 6, table 6 is an example of an expansion manner of the associated initial data parameter among the liver transplantation count parameters.
In this embodiment, the prediction model is used to predict organ transplantation data between a first object and a second object, determine an organ transplantation function, perform association expansion on corresponding initial data parameters of the first object and the second object during training of the prediction model, obtain an extended medical data parameter having an association between two users, and train the prediction model based on the extended medical data parameter, so that a parameter association mining process between different objects during a training process can be simplified, training efficiency is improved, and requirements for the number of training samples are further reduced.
TABLE 6
Optionally, performing associated parameter extension on the first medical data parameter and the second medical data parameter based on a second extension rule, including: determining an object group corresponding to the matching state of the target parameters based on the matching states of the target parameters of the first object and the second object; and performing parameter expansion based on the first medical data parameter and/or the second medical data parameter and parameter mean values of different objects in the object group to obtain an expanded medical data parameter.
In this embodiment, the medical data parameters of the plurality of subjects may be grouped based on the matching status of the target parameter, for example, the target parameter may be age, gender, and graft type. Illustratively, the sexes of two subjects subjected to organ transplantation are female and male, and the sexes of the two subjects are not matched; if the ages of both subjects subjected to organ transplantation are 50 years, the ages of the two subjects are matched. Illustratively, the object groupings of historical objects may include, but are not limited to, age mismatch, age match, gender mismatch, graft type match, and the like.
If a match is determined based on the target parameters of the first subject and the second subject, e.g., gender girl, age 20 for the target parameter of the first subject and gender girl, age 50 for the target parameter of the second subject, respectively, then the age of the first subject and the second subject may be determined to be unmatched, gender matched. And respectively determining an age unmatched group and a gender matched group, and performing parameter expansion based on the mean values of the parameters in the age unmatched group and the gender matched group, wherein the mean values of the parameters can comprise a donor mean value, a recipient mean value and an overall mean value. Accordingly, the extended medical data parameter may be a difference of the first medical data parameter or the second medical data parameter and the corresponding parameter mean.
By expanding the first medical data parameters and the second medical data parameters, the expansion relation between the initial data parameters is determined in advance, the combination and expansion of the input initial data parameters in the process of predicting model training are reduced, the convergence efficiency is improved, and the requirement on the number of training samples is reduced. Meanwhile, in the parameter expansion process, medical prior experience is introduced, the problem that the prediction model does not meet medical standards due to the fact that only numerical training is carried out in the training process of the prediction model is solved, and the training precision of the prediction model is further improved.
After the expanded medical data parameters are determined, the expanded medical data parameters are screened for effective medical data parameters, so that the interference of the medical data parameters in the process of predicting model training is reduced, the training efficiency is improved, and the requirement on the number of training samples is further reduced.
Wherein the prior distribution of each medical data parameter may be different, and may be determined according to the medical data parameter type. Illustratively, the prior distribution of the medical data parameter may be, but is not limited to, a cauchy distribution, a uniform distribution, a t distribution, an exponential distribution, or a beta distribution. The weight of each medical data parameter is determined through prior distribution, the larger the weight is, the larger the influence on the target prediction function is, and the smaller the weight is, the smaller the influence on the target prediction function is. And when the weight of the medical data parameter is zero or less than the preset weight value, determining that the medical data parameter is an invalid medical data parameter.
In some embodiments, the weights of the medical data parameters may be determined based on a prior distribution of the medical data parameters and a bayesian algorithm. The prior distribution of the medical data parameters represents a weight distribution mode of the medical data parameters, for example, when the prior distribution is Cauchy distribution, the weight of the medical data parameters satisfies the Cauchy distribution. And performing iterative sampling on each expanded medical data parameter to obtain a sampling result, further determining the weight of each expanded medical data parameter based on the sampling result, and determining the medical data parameter with the weight not being zero as an effective medical data parameter, or determining the medical data parameter with the weight being larger than a preset weight value as the effective medical data parameter. And determining sample data based on the effective medical data parameters, and training the prediction model to be trained to obtain the prediction model with the target prediction function.
According to the technical scheme of the embodiment, the first medical data parameters of the first object and the second medical data parameters of the second object are respectively expanded and associated with each other, so that the diversity of the medical data parameters is improved, meanwhile, the expansion relation among various initial data parameters is preset, the combination and expansion of the input initial data parameters in the prediction model training process are replaced, the training process of the prediction model is simplified, the training difficulty of the model is reduced, the training effect of the prediction model is improved, and the requirement on a training sample is reduced. Furthermore, effective medical data parameters are screened from the expanded medical data parameters, so that the screening of invalid medical data parameters in the training process of the prediction model is replaced, the interference of the invalid medical data parameters is reduced, the convergence speed of the prediction model is high, and the quantity of samples required in the training process of the prediction model is further reduced.
Example two
Fig. 2 is a schematic flow chart of a medical data processing method provided by the second embodiment of the invention, which is optimized on the basis of the second embodiment, and the method includes:
s210, acquiring a first medical data parameter of the first object and a second medical data parameter of the second object.
S220, performing parameter expansion on the first medical data parameter and the second medical data parameter respectively based on a first expansion rule, and performing associated parameter expansion on the first medical data parameter and the second medical data parameter based on a second expansion rule to obtain expanded medical data parameters.
S230, determining the distribution state of each expanded medical data parameter based on the prior distribution of each expanded medical data parameter, and performing iterative sampling on each expanded medical data parameter based on the distribution state.
S240, randomly sampling the weight of the expanded medical data parameters according to the prior distribution of the expanded medical data parameters to obtain the initial state of the medical data parameters;
and S250, carrying out iterative processing on the initial state based on a predetermined transfer matrix to obtain a stable distribution state of each expanded medical data parameter, and carrying out iterative sampling on each expanded medical data parameter based on the stable distribution state.
S260, screening the expanded medical data parameters according to a sampling result to determine effective medical data parameters, wherein sample data formed by the effective medical data parameters is used for training a prediction model with a target prediction function.
The types of medical data parameters related to the target prediction function are complicated, and the method can also comprise expanded medical data parameters obtained by expanding the existing medical data parameters, not all the medical data parameters have influence on the target prediction function, the training process of the current prediction model is to input all the medical data parameters into the prediction model to be trained, and the prediction model is used for screening the parameters in the training process, so the training period of the prediction model is long, and the quantity of samples required is large.
In this embodiment, whether each medical data parameter has an effective influence on the target prediction function is determined through the prior distribution of each medical data parameter, so as to delete an invalid parameter from a large number of medical data parameters, and obtain a medical data parameter effective for the target prediction function, where the valid medical data parameter can be used to train a prediction model with the target prediction function. By means of eliminating invalid medical data parameters, the training difficulty of the prediction model is reduced, the number of samples required in the process of training the prediction model is further reduced, and small sample training of the prediction model can be achieved on the basis of guaranteeing the training precision of the prediction model.
Wherein the prior distribution of each medical data parameter may be different, and may be determined according to the medical data parameter type. Illustratively, the prior distribution of the medical data parameter may be, but is not limited to, a cauchy distribution, a uniform distribution, a t distribution, an exponential distribution, or a beta distribution. The weight of each medical data parameter is determined through prior distribution, the larger the weight is, the larger the influence on the target prediction function is, and the smaller the weight is, the smaller the influence on the target prediction function is. And when the weight of the medical data parameter is zero or less than the preset weight value, determining that the medical data parameter is an invalid medical data parameter.
Optionally, determining the distribution state of each expanded medical data parameter based on the prior distribution of each expanded medical data parameter includes: randomly sampling the weight of each expanded medical data parameter according to the prior distribution of each expanded medical data parameter to obtain the initial state of each medical data parameter; and carrying out iterative processing on the initial state based on a predetermined transfer matrix to obtain the stable distribution state of each expanded medical data parameter.
The prior distribution of the medical data parameters represents a weight distribution mode of the medical data parameters, for example, when the prior distribution is Cauchy distribution, the weight of the medical data parameters satisfies the Cauchy distribution. The probability value of the sampling value in prior distribution is determined by randomly sampling the weight of the medical data parameter, and the probability value corresponding to the weighted random sampling value of each medical data parameter forms the initial state of the medical data parameter. The initial state may be presented in the form of a sampling matrix.
In this embodiment, the state transition is performed by iterating the initial state of the medical data parameter, and when the state transition is stable, the stable distribution state of each medical data parameter is obtained. And carrying out state transition on the initial state of the medical data parameters through a predetermined transition matrix, wherein the transition matrix can be determined through a Markov chain, and when the reversible Markov chain meets a detailed balance equation, the transition matrix is obtained.
Performing iterative processing on the initial state based on the transfer matrix, and determining the transfer state of the preset times as a stable distribution state when the iteration times meet the preset iteration times; or, when the transition state obtained by the iterative processing converges, determining the transition state in the converged state as a stable distribution state.
Specifically, the iterative processing of the initial state based on a predetermined transition matrix to obtain the stable distribution state of each medical data parameter includes: performing state transition on the initial state based on the transition matrix to obtain a transition state; judging the transfer state based on preset suggestion distribution and a verification threshold; when the state requirement is met, carrying out state transition on the transition state iteration; when the state requirements are not met, the step of randomly sampling the weights of the medical data parameters according to the prior distribution of the medical data parameters is re-executed.
The state transition to the initial state may be a multiplication of the transition matrix and the initial state of the medical data parameter to obtain the transition state. And judging the transfer state by using a judgment acceptance-rejection algorithm based on a preset suggestion distribution and a verification threshold, wherein the suggestion distribution can be but is not limited to a symmetrical distribution, a normal distribution or an independent distribution and can be set as required. The verification threshold may be a fixed threshold or a random number drawn in a preset interval, which may be (0, 1).
Optionally, the determining the transition state based on a preset recommended distribution and a verification threshold includes: determining an acceptance probability of the transition state based on the preset suggestion distribution; determining that the transition state satisfies a state requirement when the acceptance probability is greater than or equal to the verification threshold.
Wherein, the accepting probability of the transition state can be calculated based on the following formula:
whereinis composed ofThe posterior distribution of (a) is,is composed ofThe probability of (a) of (b) being,to distribute QWs based on recommendationsTo the direction ofThe transition probability of making the transition is,to distribute QWs based on recommendationsTo the direction ofThe transition probability of making the transition is,in the case of the current state of the mobile terminal,the next sampling state.
When the state requirement is met, determining whether the current iteration number meets a preset number or whether the current transfer state is in a convergence state, if not, further performing state transfer on the current transfer state based on a transfer matrix, and if so, determining the current transfer state as a stable distribution state; and when the acceptance probability is smaller than the verification threshold, determining that the state requirement is not met, sampling the weights again based on the prior distribution of the medical data parameters, determining a new initial state, and executing the process until a stable distribution state is obtained.
The stable distribution state comprises a distribution probability of each medical data parameter during sampling of the overall medical data parameter. Based on the probability values of the medical data parameters in the stable distribution state, sampling the weights of the medical data parameters for preset times, wherein the preset times can be 100 or 1000, and the like, and can be set according to requirements.
And determining the sampling result of the weight of each expanded medical data parameter by sampling for a preset number of times, and determining the weight of any expanded medical data parameter based on the distribution of the sampling result of any medical data parameter. Wherein the weights may be determined in a manner related to the prior distribution. When the prior distribution is Cauchy distribution, determining a value corresponding to a distribution peak value of a sampling result of any medical data parameter as the weight of any medical data parameter.
And determining the medical data parameter with the weight not being zero as an effective medical data parameter, or determining the medical data parameter with the weight being larger than a preset weight value as the effective medical data parameter. And determining sample data based on the effective medical data parameters, and training the prediction model to be trained to obtain the prediction model with the target prediction function.
According to the technical scheme of the embodiment, influence weights of the expanded medical data parameters in the target prediction function are determined based on prior distribution of the medical data parameters, the medical data parameters effective to the target prediction function are screened based on the weights, invalid medical data parameters are eliminated, the training difficulty of the prediction model is reduced, the convergence speed of the prediction model is improved, the required number of samples in the process of training the prediction model is further reduced, and small sample training of the prediction model can be realized on the basis of ensuring the training precision of the prediction model.
EXAMPLE III
Fig. 3 is a flowchart of a method for training a predictive model according to a third embodiment of the present invention, where the method is used to train a predictive model with a target prediction function, and the method includes:
s310, obtaining sample data formed by effective medical data parameters corresponding to the target prediction function, wherein the effective medical data parameters are determined according to the medical data processing method provided by the embodiment.
S320, training the prediction model to be trained based on the sample data to obtain the prediction model with the target prediction function.
In this embodiment, sample sampling is performed based on the effective medical data parameters obtained in the above embodiment, so as to obtain sample data. For example, the valid medical data parameters may be extracted from a parameter set of a plurality of sample objects. The prediction model to be trained is trained on the sample data formed based on the effective medical data parameters, so that the process of training the model is simplified, a large amount of sample data is not needed, and the effects of reducing the number of samples and improving the training efficiency are achieved.
On the basis of the above embodiment, before training the predictive model to be trained based on the sample data, the method further includes: and verifying at least two models to be trained based on a preset group of sample data, and determining the models to be trained for the target prediction function training. The at least two models to be trained may include, but are not limited to, at least two of a logistic regression model L1, a logistic regression model L2, a support vector machine, a K-nearest neighbors (KNN) model, a deep learning model CNN, a random forest model (RandomForest), and a LightGBM (gradient lifting decision tree). Training through a preset group (for example, 20 groups) of sample data, verifying the trained model, measuring the prediction accuracy of the trained model, and determining the model with the highest prediction accuracy as the model to be trained for the target prediction function training.
Specifically, the evaluation value of the prediction result obtained from each set of sample data input is obtained, where the evaluation value may be roc _ auc numerical value, and roc _ auc numerical value may be shown in the form of a curve, where the vertical axis of the curve is the TPR true positive rate, i.e., the ratio of samples predicted to be positive and actually positive to the total samples, and the horizontal axis is the FPR false positive rate, i.e., the ratio of samples predicted to be positive and actually negative to the total samples. The evaluation value of the sample data of the preset group is processed, for example, a mean value and a variance of the evaluation value of the sample data of the preset group are determined, and a model to be trained for performing the target prediction function training is screened according to the obtained mean value and variance. Wherein, the model to be trained for performing the target prediction function training can satisfy the following conditions: the variance is minimum and the mean is maximum. In some embodiments, the variance and the mean may be weighted, and the model to be trained for performing the target prediction function training is screened according to the weighting result, so that the variance and the mean may be considered, wherein the weights of the variance and the mean may be determined according to requirements.
Illustratively, referring to fig. 4, fig. 4 is a schematic diagram of roc _ auc values of various models provided by embodiments of the present invention. As can be seen from fig. 4, the logic review model L1 is a model to be trained for performing the target prediction function training.
In the embodiment, the training efficiency and the prediction precision are improved by screening the to-be-trained sample suitable for the target prediction function.
Example four
Fig. 5 is a schematic flow chart of a training method of a prediction model according to a fourth embodiment of the present invention, which is detailed on the basis of the foregoing embodiment, and the method includes:
s410, acquiring a first medical data parameter of a first object subjected to organ transplantation and a second medical data parameter of a second object.
S420, performing parameter expansion on the first medical data parameter and the second medical data parameter respectively based on a first expansion rule, and performing associated parameter expansion on the first medical data parameter and the second medical data parameter based on a second expansion rule to obtain expanded medical data parameters.
S430, determining the distribution state of each expanded medical data parameter based on the prior distribution of each expanded medical data parameter, and performing iterative sampling on each expanded medical data parameter based on the distribution state.
S440, screening the expanded medical data parameters according to the sampling result to determine effective medical data parameters.
S450, obtaining sample data formed by effective medical data parameters, and training the prediction model to be trained based on the sample data to obtain the prediction model with the organ transplantation function.
According to the technical scheme of the embodiment, the organ transplantation can be liver transplantation, and the diversity and comprehensiveness of liver transplantation parameters are improved by expanding initial data parameters of a liver function prediction function after the liver transplantation. And determining corresponding weights according to the prior distribution of the medical data parameters so as to screen effective medical data parameters. Illustratively, see table 7, which is an example of valid medical data parameters and corresponding weights in liver transplantation values.
TABLE 7
Wherein, the group by (graft type) [ donor platelet ]. mean () is the mean value of the donor platelet with the graft type as the grouping mode, and the meanings of the group by (graft type) [ graft weight ]. mean (), group by (tumor or not) [ total operation time ]. mean () and group by (sex matching) [ donor sodium ]. mean () are analogized.
From table 7, graft type _ donor BMI _ mean _ div, graft type _ graft weight _ mean _ div, whether tumor _ total surgery time _ mean _ div, sex matched _ donor sodium _ mean _ div, recipient BMI/time from beginning of lavage to bag, graft type _ donor platelet _ mean _ div are valid medical data parameters.
And determining sample data based on the determined effective medical data parameters, wherein the sample data is small sample data, training the prediction model to be trained to obtain the prediction model with the function of predicting the liver dysfunction after liver transplantation, reducing the data requirement on the sample data on the basis of ensuring the accuracy of the prediction model, simplifying the training process of the prediction model and improving the training efficiency of the prediction model.
EXAMPLE five
Fig. 6 is a schematic structural diagram of a medical data processing apparatus according to a fifth embodiment of the present invention, where the apparatus includes:
a medical data parameter acquisition module 510 for acquiring a first medical data parameter of a first object and a second medical data parameter of a second object;
a parameter extension module 520, configured to perform parameter extension on the first medical data parameter and the second medical data parameter respectively based on a first extension rule, and perform associated parameter extension on the first medical data parameter and the second medical data parameter based on a second extension rule, so as to obtain extended medical data parameters;
an iterative sampling module 530, configured to determine a distribution state of each expanded medical data parameter based on prior distribution of each expanded medical data parameter, and perform iterative sampling on each expanded medical data parameter based on the distribution state;
and the valid data determining module 540 is configured to screen the expanded medical data parameters according to a sampling result to determine valid medical data parameters, where sample data formed by the valid medical data parameters is used to train a prediction model with the target prediction function.
Optionally, the parameter expanding module 520 includes:
a nominal value determining unit, configured to determine, for any initial data parameter of the first medical data parameter or the second medical data parameter, a standard parameter range corresponding to the initial data parameter, and determine a nominal value based on the standard parameter range;
a first expansion unit, configured to perform parameter expansion based on a difference between the initial data parameter and the nominal value, to obtain at least one expanded medical data parameter corresponding to the initial data parameter.
Optionally, the parameter expanding module 520 includes:
a parameter set determining unit, configured to determine, in an initial data parameter of the first medical data parameter or the second medical data parameter, a parameter set for parameter extension, where the parameter set includes at least two initial data parameters determined according to a service requirement, or at least two initial data parameters having an association relationship;
and the second extension unit is used for carrying out at least one extension operation on at least two initial data parameters in the parameter group to obtain extended medical data parameters.
Optionally, the parameter expanding module 520 includes:
an object grouping determination unit, configured to determine, based on a matching state of target parameters of the first object and the second object, an object grouping corresponding to the matching state of the target parameters;
and the third expansion unit is used for performing parameter expansion on the basis of the first medical data parameter and/or the second medical data parameter and the parameter mean values of different objects in the object group to obtain an expanded medical data parameter.
Optionally, the parameter expanding module 520 includes:
and the fourth extension unit is used for determining medical data parameter pairs of the same type in the first medical data parameters and the second medical data parameters, and performing numerical operation on each medical data parameter pair to obtain the extended medical data parameters corresponding to the medical data parameter pairs.
Optionally, the first subject is an organ donor, the second subject is an organ recipient, and the predictive model of target prediction function is a functional prediction of organ transplantation.
Optionally, the iterative sampling module 530 includes:
an initial state determining unit, configured to randomly sample the weight of the expanded medical data parameter according to the prior distribution of the expanded medical data parameter, so as to obtain an initial state of the medical data parameter;
and the stable distribution state determining unit is used for performing iterative processing on the initial state based on a predetermined transfer matrix to obtain the stable distribution state of each expanded medical data parameter.
Optionally, the stable distribution state determining unit is configured to:
the state transfer subunit is used for carrying out state transfer on the initial state based on the transfer matrix to obtain a transfer state;
and the transition state judgment subunit is used for judging the transition state based on preset recommended distribution and a verification threshold, iterating the transition state to carry out state transition when the state requirement is met, and re-performing the step of randomly sampling the weight of the medical data parameter according to the expanded prior distribution of the medical data parameter when the state requirement is not met.
Optionally, the transition state determining subunit is configured to:
determining an acceptance probability of the transition state based on the preset suggestion distribution;
determining that the transition state satisfies a state requirement when the acceptance probability is greater than or equal to the verification threshold.
Optionally, the stable distribution state determining unit is configured to:
when the iteration times meet the preset iteration times, determining the transfer state of the preset times as a stable distribution state; or,
and when the transition state obtained by the iterative processing is converged, determining the transition state in the converged state as a stable distribution state.
Optionally, the valid data determining module 540 is configured to:
determining a numerical value corresponding to a distribution peak value of the sampling result of any expanded medical data parameter as the weight of any expanded medical data parameter;
and determining the medical data parameter with the weight larger than the preset threshold value as the effective medical data parameter.
The medical data processing device can execute the medical data processing method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects for executing the medical data processing method.
EXAMPLE six
Fig. 7 is a schematic structural diagram of a training apparatus for a prediction model according to a sixth embodiment of the present invention, where the apparatus includes:
the sample data acquiring module 610 is configured to acquire sample data formed by valid medical data parameters corresponding to a target prediction function, where the valid medical data parameters are determined according to the medical data processing method provided in the embodiment of the present invention;
and the model training module 620 is configured to train the prediction model to be trained based on the sample data to obtain a prediction model with a target prediction function.
Optionally, the apparatus further comprises:
and the model screening module is used for verifying at least two models to be trained based on a preset group of sample data before training the prediction model to be trained based on the sample data, and determining the model to be trained for training the target prediction function.
The training device of the prediction model can execute the training method of the prediction model provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the training method of the prediction model.
EXAMPLE seven
Fig. 8 is a schematic structural diagram of an electronic device according to a seventh embodiment of the present invention. FIG. 8 illustrates a block diagram of an electronic device 412 suitable for use in implementing embodiments of the present invention. The electronic device 412 shown in fig. 8 is only an example and should not bring any limitations to the functionality and scope of use of the embodiments of the present invention. The device 412 is typically an electronic device that undertakes image classification functions.
As shown in fig. 8, the electronic device 412 is in the form of a general purpose computing device. The components of the electronic device 412 may include, but are not limited to: one or more processors 416, a storage device 428, and a bus 418 that couples the various system components including the storage device 428 and the processors 416.
The electronic device 412 may also communicate with one or more external devices 414 (e.g., keyboard, pointing device, camera, display 424, etc.), with one or more devices that enable a user to interact with the electronic device 412, and/or with any devices (e.g., network card, modem, etc.) that enable the electronic device 412 to communicate with one or more other computing devices. Such communication may occur via input/output (I/O) interfaces 422. Also, the electronic device 412 may communicate with one or more networks (e.g., a Local Area Network (LAN), Wide Area Network (WAN), and/or a public Network, such as the internet) via the Network adapter 420. As shown, network adapter 420 communicates with the other modules of electronic device 412 over bus 418. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device 412, including but not limited to: microcode, device drivers, Redundant processing units, external disk drive Arrays, disk array (RAID) systems, tape drives, and data backup storage systems, to name a few.
The processor 416 executes programs stored in the storage device 428 to perform various functional applications and data processing, such as implementing a medical data processing method provided by an embodiment of the present invention or a training method of a prediction model provided by an embodiment of the present invention.
Example eight
An eighth embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements a medical data processing method according to an embodiment of the present invention or a training method of a prediction model according to an embodiment of the present invention.
Of course, the embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, which is not limited to the method operations described above, and can also execute the medical data processing method or the training method of the prediction model provided in any embodiment of the present invention.
Computer storage media for embodiments of the invention may employ any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable source code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Source code embodied on a computer-readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer source code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, or the like, as well as conventional procedural programming languages, such as the "C" programming language or similar programming languages. The source code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.
Claims (17)
1. A method of medical data processing, comprising:
acquiring a first medical data parameter of a first object and a second medical data parameter of a second object;
respectively performing parameter expansion on the first medical data parameter and the second medical data parameter based on a first expansion rule, and performing associated parameter expansion on the first medical data parameter and the second medical data parameter based on a second expansion rule to obtain expanded medical data parameters;
determining the distribution state of each expanded medical data parameter based on the prior distribution of each expanded medical data parameter, and performing iterative sampling on each expanded medical data parameter based on the distribution state;
and screening the expanded medical data parameters according to a sampling result to determine effective medical data parameters, wherein sample data formed by the effective medical data parameters is used for training a prediction model with a target prediction function.
2. The method according to claim 1, wherein the parameter expanding the first medical data parameter and the second medical data parameter respectively based on a first expansion rule comprises:
for any initial data parameter in the first medical data parameter or the second medical data parameter, determining a standard parameter range corresponding to the initial data parameter, and determining a nominal value based on the standard parameter range;
and performing parameter expansion based on the difference value between the initial data parameter and the nominal value to obtain at least one expanded medical data parameter corresponding to the initial data parameter.
3. The method according to claim 1, wherein the parameter expanding the first medical data parameter and the second medical data parameter respectively based on a first expansion rule comprises:
determining a parameter group for parameter expansion in an initial data parameter of the first medical data parameter or the second medical data parameter, wherein the parameter group comprises at least two initial data parameters determined according to business requirements, or at least two initial data parameters with an association relationship;
and performing at least one type of extension operation on at least two initial data parameters in the parameter group to obtain extended medical data parameters.
4. The method according to claim 1, wherein the associated parameter extension of the first medical data parameter and the second medical data parameter based on a second extension rule comprises:
determining an object group corresponding to the matching state of the target parameters based on the matching states of the target parameters of the first object and the second object;
and performing parameter expansion based on the first medical data parameter and/or the second medical data parameter and parameter mean values of different objects in the object group to obtain an expanded medical data parameter.
5. The method according to claim 1, wherein the associated parameter extension of the first medical data parameter and the second medical data parameter based on a second extension rule comprises:
determining a medical data parameter pair of the same type in the first medical data parameter and the second medical data parameter, and performing numerical operation on each medical data parameter pair to obtain an extended medical data parameter corresponding to the medical data parameter pair.
6. The method of any one of claims 1-5, wherein the first subject is an organ donor, the second subject is an organ recipient, and the predictive model of target prediction function is a functional prediction of organ transplantation.
7. The method according to claim 1, wherein the determining the expanded distribution state of each medical data parameter based on the expanded prior distribution of each medical data parameter comprises:
randomly sampling the weight of each expanded medical data parameter according to the prior distribution of each expanded medical data parameter to obtain the initial state of each medical data parameter;
and carrying out iterative processing on the initial state based on a predetermined transfer matrix to obtain the stable distribution state of each expanded medical data parameter.
8. The method according to claim 7, wherein the iteratively processing the initial state based on a predetermined transition matrix to obtain a stable distribution state of the expanded medical data parameters comprises:
performing state transition on the initial state based on the transition matrix to obtain a transition state;
judging the transfer state based on preset suggestion distribution and a verification threshold;
when the state requirement is met, carrying out state transition on the transition state iteration;
and when the state requirement is not met, re-executing the step of randomly sampling the weights of the medical data parameters according to the expanded prior distribution of the medical data parameters.
9. The method of claim 8, wherein the determining the transition state based on a preset recommendation distribution and a verification threshold comprises:
determining an acceptance probability of the transition state based on the preset suggestion distribution;
determining that the transition state satisfies a state requirement when the acceptance probability is greater than or equal to the verification threshold.
10. The method according to claim 8, wherein the obtaining the stable distribution state of each expanded medical data parameter comprises:
when the iteration times meet the preset iteration times, determining the transfer state of the preset times as a stable distribution state; or,
and when the transition state obtained by the iterative processing is converged, determining the transition state in the converged state as a stable distribution state.
11. The method according to claim 8, wherein the obtaining the stable distribution state of each expanded medical data parameter comprises:
when the iteration times meet the preset iteration times, determining the transfer state of the preset times as a stable distribution state; or,
and when the transition state obtained by the iterative processing is converged, determining the transition state in the converged state as a stable distribution state.
12. A method for training a predictive model, comprising:
acquiring sample data formed by valid medical data parameters corresponding to a target prediction function, wherein the valid medical data parameters are determined according to the medical data processing method of any one of claims 1-11;
and training the prediction model to be trained based on the sample data to obtain the prediction model with the target prediction function.
13. The method of claim 12, wherein prior to training a predictive model to be trained based on the sample data, the method further comprises:
and verifying at least two models to be trained based on a preset group of sample data, and determining the models to be trained for the target prediction function training.
14. A medical data processing apparatus, characterized by comprising:
a medical data parameter acquisition module for acquiring a first medical data parameter of a first object and a second medical data parameter of a second object;
the parameter extension module is used for respectively performing parameter extension on the first medical data parameter and the second medical data parameter based on a first extension rule and performing associated parameter extension on the first medical data parameter and the second medical data parameter based on a second extension rule to obtain an extended medical data parameter;
the iterative sampling module is used for determining the distribution state of each expanded medical data parameter based on the prior distribution of each expanded medical data parameter and performing iterative sampling on each expanded medical data parameter based on the distribution state;
and the effective data determining module is used for screening the expanded medical data parameters according to a sampling result to determine effective medical data parameters, wherein sample data formed by the effective medical data parameters is used for training a prediction model with a target prediction function.
15. An apparatus for training a predictive model, comprising:
a sample data obtaining module, configured to obtain sample data formed by valid medical data parameters corresponding to a target prediction function, where the valid medical data parameters are determined according to the medical data processing method according to any one of claims 1 to 11;
and the model training module is used for training the prediction model to be trained based on the sample data to obtain the prediction model with the target prediction function.
16. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the medical data processing method of any one of claims 1-11 or the training method of the predictive model of any one of claims 12-13 when executing the program.
17. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out a method of processing medical data as claimed in any one of claims 1 to 11 or a method of training a predictive model as claimed in any one of claims 12 to 13.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202010957988.6A CN111816306B (en) | 2020-09-14 | 2020-09-14 | Medical data processing method, and prediction model training method and device |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202010957988.6A CN111816306B (en) | 2020-09-14 | 2020-09-14 | Medical data processing method, and prediction model training method and device |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN111816306A true CN111816306A (en) | 2020-10-23 |
| CN111816306B CN111816306B (en) | 2020-12-22 |
Family
ID=72859256
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202010957988.6A Active CN111816306B (en) | 2020-09-14 | 2020-09-14 | Medical data processing method, and prediction model training method and device |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN111816306B (en) |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN114386454A (en) * | 2021-12-09 | 2022-04-22 | 首都医科大学附属北京友谊医院 | Data processing method of medical time series signal based on signal mixing strategy |
| CN114386479A (en) * | 2021-12-09 | 2022-04-22 | 首都医科大学附属北京友谊医院 | Medical data processing method and device, storage medium and electronic equipment |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN107909566A (en) * | 2017-10-28 | 2018-04-13 | 杭州电子科技大学 | A kind of image-recognizing method of the cutaneum carcinoma melanoma based on deep learning |
| US20190088359A1 (en) * | 2016-03-03 | 2019-03-21 | Geisinger Health System | System and Method for Automated Analysis in Medical Imaging Applications |
| CN109635850A (en) * | 2018-11-23 | 2019-04-16 | 杭州健培科技有限公司 | A method of network optimization Medical Images Classification performance is fought based on generating |
| CN111126794A (en) * | 2019-12-06 | 2020-05-08 | 北京京航计算通讯研究所 | Data enhancement and neural network confrontation training system based on small samples |
| CN111383215A (en) * | 2020-03-10 | 2020-07-07 | 图玛深维医疗科技(北京)有限公司 | Focus detection model training method based on generation of confrontation network |
-
2020
- 2020-09-14 CN CN202010957988.6A patent/CN111816306B/en active Active
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20190088359A1 (en) * | 2016-03-03 | 2019-03-21 | Geisinger Health System | System and Method for Automated Analysis in Medical Imaging Applications |
| CN107909566A (en) * | 2017-10-28 | 2018-04-13 | 杭州电子科技大学 | A kind of image-recognizing method of the cutaneum carcinoma melanoma based on deep learning |
| CN109635850A (en) * | 2018-11-23 | 2019-04-16 | 杭州健培科技有限公司 | A method of network optimization Medical Images Classification performance is fought based on generating |
| CN111126794A (en) * | 2019-12-06 | 2020-05-08 | 北京京航计算通讯研究所 | Data enhancement and neural network confrontation training system based on small samples |
| CN111383215A (en) * | 2020-03-10 | 2020-07-07 | 图玛深维医疗科技(北京)有限公司 | Focus detection model training method based on generation of confrontation network |
Non-Patent Citations (3)
| Title |
|---|
| CLOUDOX_: "图像训练样本量少时的数据增强技术", 《HTTPS://WWW.JIANSHU.COM/P/12433F179FE2》 * |
| JASON WEI等: "EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks", 《IN CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING》 * |
| YAQING WANG等: "Generalizing from a Few Examples: A Survey on Few-Shot Learning", 《ACM COMPUT. SURV.》 * |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN114386454A (en) * | 2021-12-09 | 2022-04-22 | 首都医科大学附属北京友谊医院 | Data processing method of medical time series signal based on signal mixing strategy |
| CN114386479A (en) * | 2021-12-09 | 2022-04-22 | 首都医科大学附属北京友谊医院 | Medical data processing method and device, storage medium and electronic equipment |
| CN114386479B (en) * | 2021-12-09 | 2023-02-03 | 首都医科大学附属北京友谊医院 | Medical data processing method, device, storage medium and electronic equipment |
Also Published As
| Publication number | Publication date |
|---|---|
| CN111816306B (en) | 2020-12-22 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Narang et al. | Mixed precision training | |
| CN108170749B (en) | Dialog method, device and computer readable medium based on artificial intelligence | |
| CN112633419B (en) | Small sample learning method and device, electronic equipment and storage medium | |
| CN111753863B (en) | Image classification method, device, electronic device and storage medium | |
| CN113011532B (en) | Classification model training method, device, computing equipment and storage medium | |
| CN109947931A (en) | Text automatic abstracting method, system, equipment and medium based on unsupervised learning | |
| CN110796199A (en) | Image processing method and device and electronic medical equipment | |
| CN114724174B (en) | Pedestrian attribute recognition model training method and device based on incremental learning | |
| CN111161884A (en) | Disease prediction method, device, equipment and medium for unbalanced data | |
| EP4035098A1 (en) | A model-agnostic approach to interpreting sequence predictions | |
| CN111145905A (en) | Target decision model construction method and device, electronic equipment and storage medium | |
| CN111079753A (en) | License plate recognition method and device based on deep learning and big data combination | |
| CN111816306B (en) | Medical data processing method, and prediction model training method and device | |
| CN113012774B (en) | Automatic medical record coding method and device, electronic equipment and storage medium | |
| Bakır et al. | Detection of pneumonia from x-ray images using deep learning techniques | |
| Zhu et al. | Robust co-teaching learning with consistency-based noisy label correction for medical image classification | |
| CN110414562B (en) | X-ray film classification method, device, terminal and storage medium | |
| CN117373654A (en) | Auxiliary diagnostic methods, devices, electronic equipment and readable storage media | |
| CN113989560B (en) | Online semi-supervised learning classifier and classification method for radar gesture recognition | |
| CN113239697B (en) | Entity recognition model training method and device, computer equipment and storage medium | |
| CN115662510A (en) | Method, device, equipment and storage medium for determining causal parameters | |
| CN119397349B (en) | A multi-stage classification method based on CIP pneumonia multimodal data | |
| CN111403028B (en) | Medical text classification method and device, storage medium and electronic equipment | |
| CN118691882A (en) | Classification model training method, classification method, device, equipment and storage medium | |
| CN112086174B (en) | Three-dimensional knowledge diagnosis model construction method and system |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |