WO2023123149A1 - Système et procédé de criblage de molécules virtuelles, dispositif électronique et support de stockage lisible par ordinateur - Google Patents
Système et procédé de criblage de molécules virtuelles, dispositif électronique et support de stockage lisible par ordinateur Download PDFInfo
- Publication number
- WO2023123149A1 WO2023123149A1 PCT/CN2021/142815 CN2021142815W WO2023123149A1 WO 2023123149 A1 WO2023123149 A1 WO 2023123149A1 CN 2021142815 W CN2021142815 W CN 2021142815W WO 2023123149 A1 WO2023123149 A1 WO 2023123149A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- virtual
- molecule
- molecular
- molecules
- evaluation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C20/00—Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
- G16C20/60—In silico combinatorial chemistry
- G16C20/64—Screening of libraries
Definitions
- the present application relates to the technical field of molecular design, in particular to a virtual molecular screening system, method, electronic equipment and computer-readable storage medium.
- Virtual screening based on molecular structure has been widely used in the early stages of drug discovery, and its role is to select potential virtual molecules that can bind to target proteins in a large-scale virtual molecular library.
- the commonly used virtual screening algorithm will evaluate the interaction between virtual molecules and target proteins, and virtual molecules with high scores mean that they have more potential to become candidate molecules, and then enter the next stage of development.
- a larger virtual molecular library can obtain better screening results.
- the number of virtual molecules in a virtual molecular library is 1 billion units, which requires a huge amount of computing power, and different virtual molecular libraries , the computational overhead will further increase.
- there are various algorithms for molecular evaluation, and the parameter optimization and application adaptation of various algorithms will bring troubles to molecular design and need to be resolved.
- the first aspect of the present application provides a virtual molecular screening system, the system includes an active learning scheduler, a molecular evaluation surrogate model, a molecular selection module, a molecular evaluation module, and a molecular recommendation module, wherein:
- the active learning scheduler is used to schedule the molecular evaluation surrogate model, the molecular selection module, and the molecular evaluation module to perform their respective functions according to the iterative scheduling order according to the preset loop operation logic;
- the molecule evaluation module is used to evaluate the first virtual molecules satisfying the first evaluation condition in the preset virtual molecule library, obtain the score of each of the first virtual molecules, and generate a score for each of the first virtual molecules score-numerator pair;
- the molecular evaluation substitution model is used to receive at least one score-molecule pair sent by the molecular evaluation model, use the at least one score-molecule pair to perform self-training, and perform self-training on the preset virtual molecular library after the training Evaluate the virtual molecules in , and get the predicted score values of all virtual molecules;
- the molecule selection module is configured to select a second virtual molecule from a preset virtual molecule library according to the score-molecule pair and/or the predicted score value, and use the second virtual molecule as the first virtual molecule
- the molecule recommendation module is used to determine a target virtual molecule from all the virtual molecules evaluated by the molecule evaluation module.
- the molecular evaluation module is used to evaluate the first virtual molecules that meet the first evaluation conditions in the preset virtual molecule library, and obtain each of the first a score of virtual molecules for each of said first virtual molecules when generating score-molecule pairs for:
- a first number of virtual molecules is randomly selected from the preset virtual molecule library as the first virtual molecules, and the first virtual molecules are evaluated to obtain each of the scores for first virtual molecules, generating score-molecule pairs for each of said first virtual molecules;
- the molecule evaluation module When the molecule evaluation module is not evaluating the virtual molecule for the first time, receiving the second virtual molecule sent by the molecule selection module, using the second virtual molecule as the first virtual molecule, evaluating the first virtual molecule, and obtaining A score for each of said first virtual molecules, a score-molecule pair being generated for each of said first virtual molecules.
- the molecular assessment surrogate model is also used for:
- the molecular evaluation surrogate model is updated according to the predicted score value and the score of the first virtual molecule obtained by the molecular evaluation module on the first virtual molecule evaluation, so that the molecular evaluation surrogate model is accurate to the first virtual molecule.
- the absolute value of the difference between the predicted score value obtained by a virtual molecule evaluation and the score value obtained by the molecular evaluation model for the first virtual molecule evaluation is reduced.
- the molecule selection module selects the second virtual molecule from the preset virtual molecule library according to the score-molecule pair and/or the predicted score value , for:
- a second number of virtual molecules that have not been evaluated by the molecular evaluation model are randomly selected from a preset virtual molecule library according to the score-molecule pairs as second virtual molecules.
- the virtual molecular screening system further includes:
- the policy information configuration module is used to store the scheduling parameters of the active learning scheduler, the feature extraction parameters and training parameters of the molecular evaluation surrogate model, the selection strategy of the molecular selection module, the evaluation parameters of the molecular evaluation module, and The loop termination condition.
- the active learning scheduler may also be used for:
- the molecular evaluation module, the molecular substitution evaluation model and the molecular selection model are expanded and replaced through a preset virtual interface.
- the second aspect of the present application provides a virtual molecular screening method, which includes:
- Step S1 evaluating the first virtual molecule meeting the first evaluation condition in the preset virtual molecule library, obtaining the score of each of the first virtual molecules, and generating a score-molecule pair for each of the first virtual molecules;
- Step S2 using the at least one score-molecule to train a preset molecular evaluation surrogate model, and after the training, evaluate the virtual molecules in the preset virtual molecule library to obtain the predicted score values of all virtual molecules;
- Step S3 select a second virtual molecule from the preset virtual molecule library according to the score-molecule pair and/or the predicted score value, and use the second virtual molecule as the first virtual molecule to perform steps S1, cyclically Step S2, and step S3, until the total number of evaluated virtual molecules meets the loop termination condition;
- Step S4 determining a target virtual molecule from all the virtual molecules evaluated in step S1.
- the first virtual molecules satisfying the first evaluation condition in the preset virtual molecule library are evaluated to obtain the score of each of the first virtual molecules, generating score-molecule pairs for each of said first virtual molecules comprising:
- a first number of virtual molecules are randomly selected from the preset virtual molecule library as the first virtual molecules, and the first virtual molecules are evaluated to obtain each of the first virtual molecules.
- a score of molecules generating a score-molecule pair for each of said first virtual molecules;
- the second virtual molecule is received, and the second virtual molecule is used as the first virtual molecule, and the first virtual molecule is evaluated to obtain each of the first virtual molecules A score-molecule pair is generated for each of said first virtual molecules.
- the step S2 further includes:
- the molecular evaluation surrogate model is updated according to the predicted score value and the score of the first virtual molecule obtained by the molecular evaluation module on the first virtual molecule, so that the evaluation surrogate model is The absolute value of the difference between the predicted score value obtained by the virtual molecule evaluation and the score value obtained by the molecular evaluation model for the first virtual molecule evaluation is reduced.
- the selection of the second virtual molecule from the preset virtual molecule library according to the score-molecule pair and/or the predicted score value includes:
- a second number of virtual molecules that have not been evaluated by the molecular evaluation model are randomly selected from a preset virtual molecule library according to the score-molecule pairs as second virtual molecules.
- the third aspect of the present application provides an electronic device, including:
- a memory on which executable codes are stored, which, when executed by the processor, cause the processor to perform the method as described above.
- the fourth aspect of the present application provides a computer-readable storage medium, on which executable code is stored, and when the executable code is executed by a processor of an electronic device, the processor is caused to execute the method as described above.
- the virtual molecular screening system provided by the embodiment of this application separates the molecular evaluation, molecular selection and molecular evaluation replacement models through the modularized overall framework design idea, and realizes the process connection and interface of each module by a unified active learning scheduler Unified, so that molecular evaluation, molecular selection, and selection of molecular evaluation alternative models can be decoupled from the entire molecular screening process, which facilitates the expansion and integration of the functions of each module.
- Fig. 1 is the structural representation of the virtual molecular screening system shown in the embodiment of the present application.
- Fig. 2 is a sequence diagram of a single iteration process shown in the embodiment of the present application.
- Fig. 3 is a schematic flow chart of the virtual molecular screening method shown in the embodiment of the present application.
- Fig. 4 is a schematic flow chart of the molecular evaluation method shown in the embodiment of the present application.
- FIG. 5 is a schematic structural diagram of an electronic device shown in an embodiment of the present application.
- first, second, third and so on may be used in this application to describe various information, such information should not be limited to these terms. These terms are only used to distinguish information of the same type from one another.
- first information may also be called second information, and similarly, second information may also be called first information.
- second information may also be called first information.
- a feature defined as “first” and “second” may explicitly or implicitly include one or more of these features.
- “plurality” means two or more, unless otherwise specifically defined.
- Virtual screening based on molecular structure has been widely used in the early stages of drug discovery, and its role is to select potential virtual molecules that can bind to target proteins in a large-scale virtual molecular library.
- the commonly used virtual screening algorithm will evaluate the interaction between virtual molecules and target proteins, and virtual molecules with high scores mean that they have more potential to become candidate molecules, and then enter the next stage of development.
- a larger virtual molecular library can obtain better screening results.
- usually the number of virtual molecules in a virtual molecular library is 1 billion units, which requires a huge amount of computing power, and different virtual molecular libraries , the computational overhead will further increase.
- there are various algorithms for molecular evaluation and the parameter optimization and application adaptation of various algorithms will bring troubles to molecular design and need to be resolved.
- the embodiment of the present application provides a virtual molecular screening system, which can provide a large-scale molecular accelerated virtual screening system based on active learning, which facilitates the integration of different molecular evaluation algorithms, and can expand different artificial intelligence-based molecular Evaluate surrogate models, realize rapid combinatorial strategy development in large-scale molecular screening scenarios, save computing power, and speed up molecular screening.
- Fig. 1 is a schematic structural diagram of a virtual molecular screening system shown in an embodiment of the present application.
- the virtual molecular screening system includes an active learning scheduler 110, a molecular evaluation module 120, a molecular evaluation substitution model 130, a molecular selection module 140, and a molecular recommendation module 150, wherein:
- the active learning scheduler 110 is configured to schedule the molecular evaluation surrogate model, the molecular selection module, and the molecular evaluation module to perform their respective functions in an iterative scheduling order according to a preset loop operation logic.
- the active learning scheduler is a scheduling module based on an active learning algorithm, which is used to schedule the molecular evaluation module 120, the molecular evaluation substitution model 130, and the molecular selection module 140 according to the iterative scheduling sequence according to the pre-designed loop logic perform their respective functions.
- the active learning scheduler is used to schedule the functions of each module in the entire virtual molecular screening system, such as scheduling the analysis and evaluation module to evaluate the virtual molecule to obtain the score of the virtual molecule; scheduling the molecule to evaluate the alternative model , so that the molecular evaluation surrogate model is trained according to the data set, and after the training, the virtual molecules in the virtual molecule library are evaluated to obtain the predicted score value of each virtual molecule; and the molecular selection module is scheduled to select the evaluated molecules wait.
- the molecular learning scheduler schedules each module to work, it will proceed in accordance with the preset iterative scheduling sequence to ensure the normal operation of the entire virtual molecular screening system.
- the molecule evaluation module 120 is configured to evaluate the first virtual molecules satisfying the first evaluation condition in the preset virtual molecule library, obtain the score of each of the first virtual molecules, and for each of the first virtual molecules Generate fraction-numerator pairs.
- the molecular evaluation module 120 is composed of one or more molecular docking algorithms and molecular evaluation algorithms, and is used to evaluate the virtual molecules in the virtual molecule library, and generate a score for all evaluated virtual molecules - Molecule pair; the molecule in the score-molecule pair refers to the evaluated virtual molecule, and the score in the score-molecule pair refers to the evaluation score of the evaluated virtual molecule.
- the first evaluation condition is different for the number of times the molecule evaluation module evaluates the virtual molecule, and the specific restriction conditions are also different. Optionally, it can be divided into two situations.
- the molecular evaluation module evaluates the virtual molecules for the first time, and randomly selects the first number of virtual molecules in the preset virtual molecule library as meeting the first preset condition.
- the virtual molecule of If the molecular evaluation module is not evaluating the virtual molecule for the first time, the second virtual molecule selected by the receiving molecule selection module is a virtual molecule that satisfies the first preset condition.
- a specific embodiment is taken as an example to provide a large-scale virtual molecular library ZINC, in which a certain number of virtual molecules are randomly selected as the first virtual molecular library.
- Molecules can randomly select 100 virtual molecules, evaluate the 100 virtual molecules through the molecular evaluation module, get the score of each virtual molecule, and then pair the score of each virtual molecule with the virtual molecule to construct the score- molecular pair. For example, 76-molecule 1, 78-molecule 2, 25-molecule 3, 98-molecule 4, etc.
- the above-mentioned evaluation method and matching method are only a possible implementation of the embodiment of the application, and the protection scope of the application No limit.
- the molecular evaluation substitution model 130 is configured to receive at least one score-molecule pair sent by the molecular evaluation model, use the at least one score-molecule pair to perform self-training, and perform self-training on the preset virtual molecule after the training is completed. Virtual molecules in the library are evaluated to obtain predicted score values for all virtual molecules.
- the molecular evaluation surrogate model 130 is a series of interface abstractions based on machine learning molecular evaluation methods. Through the molecular evaluation surrogate model, multiple molecular evaluation methods can be used to evaluate virtual molecules. Among them, for a virtual molecule
- the evaluation result of the molecular evaluation method may be the evaluation result of one molecular evaluation method, or may be the evaluation result of a combination of multiple molecular evaluation methods.
- the molecular evaluation surrogate model is used for training. After the training is completed, the molecular evaluation surrogate model is used to evaluate all the molecules in the virtual molecule library, and the predicted score values of all virtual molecules are obtained.
- the molecule selection module 130 is configured to select a second virtual molecule from a preset virtual molecule library according to the score-molecule pair and/or the predicted score value, and use the second virtual molecule as the first virtual molecule.
- the molecule is returned to the molecular evaluation module, and the molecular evaluation process of the molecular evaluation module and the molecular evaluation surrogate model is executed cyclically until the total number of virtual molecules evaluated by the molecular evaluation module meets the loop termination condition.
- the molecule selection module 130 includes a variety of optimization strategies, which are combined with active learning methods to select suitable molecules in the virtual molecule library based on the evaluation results of molecular evaluation alternative models, and recommend Give the molecular assessment module an assessment.
- the cycle termination condition is to satisfy the condition of terminating the molecular cycle evaluation, which can be that the total number of virtual molecules evaluated by the molecular evaluation module is greater than a certain set value, such as greater than 10000, or the total number of virtual molecules evaluated by the molecular evaluation module is within
- the proportion in the virtual molecule library is greater than a certain preset value, for example, the total number of virtual molecules evaluated by the molecule evaluation module accounts for more than 2% of the total number of virtual molecules in the preset virtual molecule library.
- the process of evaluating and selecting virtual molecules needs to be cycled, and the second virtual molecule selected by the molecular selection module is used as Restart as the first virtual molecule, and perform the work of the molecule evaluation module, the analysis and evaluation alternative model and the molecule selection module in a loop, until the total number of all virtual molecules evaluated by the molecule evaluation module satisfies the loop termination condition.
- the molecule recommendation module 140 is configured to determine a target virtual molecule from all the virtual molecules evaluated by the molecule evaluation module.
- the molecule recommendation module 140 is used to summarize the molecules evaluated by the molecule evaluation module during the entire molecular iterative evaluation process, and determine the target virtual molecule according to the preset recommendation conditions.
- the molecular recommendation module will The evaluated virtual molecules are summarized, and the scores of each virtual molecule are sorted from high to low according to the scores of each virtual molecule, and the target virtual molecule is determined according to the preset recommended conditions. If the total number of virtual molecules evaluated by the molecular evaluation module is greater than 10000, among all the virtual molecules evaluated by the molecular evaluation module, the scores of each virtual molecule are sorted from high to low, and the scores are ranked The top 1000 virtual molecules (preset recommended conditions) are used as target virtual molecules.
- the process of sorting the virtual molecules according to their scores is not a necessary step for this application to determine the target virtual molecules, and the top 1000 virtual molecules with the highest scores (preset recommended conditions) as the target virtual molecule.
- the loop termination condition is that the total number of virtual molecules evaluated by the molecule evaluation module accounts for more than 2/10,000 of the total number of virtual molecules in the preset virtual molecule library, then according to the preset The total number of virtual molecules in the virtual molecular library determines the total number of all virtual molecules evaluated by the molecular evaluation module. If the total number of virtual molecules in the preset virtual molecular library is 1 billion, then the total number of virtual molecules evaluated by the molecular evaluation module The total number is 200,000.
- the loop is terminated.
- the target false molecules can also be the top 1000 with a higher selection score according to the preset selection (preset recommended condition) virtual numerator.
- the virtual molecular screening system provided by the embodiment of this application separates the molecular evaluation, molecular selection and molecular evaluation replacement models through the modularized overall framework design idea, and realizes the process connection and interface of each module by a unified active learning scheduler Unified, so that molecular evaluation, molecular selection, and selection of molecular evaluation alternative models can be decoupled from the entire molecular screening process, which facilitates the expansion and integration of the functions of each module.
- the molecular evaluation module is used to evaluate the first virtual molecules that meet the first evaluation conditions in the preset virtual molecule library, and obtain each of the first A score of virtual molecules, when generating score-molecule pairs for each of said first virtual molecules, may be used to:
- a first number of virtual molecules is randomly selected from the preset virtual molecule library as the first virtual molecules, and the first virtual molecules are evaluated to obtain each of the Scores of first virtual molecules, for each of which a score-molecule pair is generated.
- the virtual molecular screening system provided when executed for the first time, it is necessary to determine the virtual molecule that enters the molecular evaluation module for evaluation for the first time, and a certain number of virtual molecules can be randomly selected in the virtual molecule library as the first virtual molecule.
- a virtual molecule for example, randomly select 1000 virtual molecules in the virtual molecule library as the first virtual molecule, and then evaluate each first virtual molecule using a molecular evaluation model to obtain a corresponding score and generate a score-molecule pair.
- the molecule evaluation module When the molecule evaluation module is not evaluating the virtual molecule for the first time, receiving the second virtual molecule sent by the molecule selection module, using the second virtual molecule as the first virtual molecule, evaluating the first virtual molecule, and obtaining A score for each of said first virtual molecules, a score-molecule pair being generated for each of said first virtual molecules.
- the second virtual molecule selected by the molecular selection module can be used as the new first virtual molecule, and then for each first virtual molecule,
- the molecular evaluation model is used to evaluate, obtain the score of each of the first virtual molecules, and generate a score-molecule pair of each of the first virtual molecules.
- the molecular assessment surrogate model can also be used for:
- the molecular evaluation surrogate model is updated according to the predicted score value and the score of the first virtual molecule obtained by the molecular evaluation module on the first virtual molecule, so that the evaluation surrogate model is The absolute value of the difference between the predicted score value obtained by the virtual molecule evaluation and the score value obtained by the molecular evaluation model for the first virtual molecule evaluation is reduced.
- the molecular evaluation surrogate model evaluates all the virtual molecules in the virtual molecule library, a prediction score value will be obtained for each virtual molecule, and in the virtual molecule library, some virtual molecules still have molecular
- the evaluation module evaluates the resulting scores, and the molecular evaluation surrogate model can be updated for these virtual molecules with two scores.
- the molecular evaluation module evaluates the first virtual molecules to obtain the scores as the true values of these first virtual molecular scores, and replaces the molecular evaluation with The predicted scores obtained by the model for each first virtual molecule evaluation are used as the predicted values of these first virtual molecule scores. According to the difference between the predicted value and the true value, the parameters in the molecular evaluation substitution model can be adjusted, and then The molecular evaluation surrogate model is updated so that the evaluation result of the molecular evaluation surrogate model for the virtual molecule is closer to the evaluation result of the molecular evaluation module for the virtual molecule.
- the molecular evaluation surrogate model is updated through the performance results of the molecular evaluation surrogate model, so as to improve the evaluation accuracy of the molecular evaluation surrogate model, and to screen virtual molecules more quickly and accurately.
- the molecule selection module selects the second virtual molecule from the preset virtual molecule library according to the score-molecule pair and/or the predicted score value , which can be used for:
- the score-molecule pair randomly select a second quantity of virtual molecules that have not been evaluated by the molecular evaluation model from the preset virtual molecule library as the second virtual molecules.
- the molecule selection module may adopt different strategies when selecting the second virtual molecule, and optionally, a greedy strategy or an uncertain strategy may be adopted.
- a greedy strategy or an uncertain strategy may be adopted.
- the molecule selection module selects the second virtual molecule, it can always preferably select the virtual molecules in the virtual molecule library that have not been evaluated by the molecule evaluation module, that is, use the uncertainty strategy to select; it can also use the greedy strategy
- the greedy strategy refers to always selecting molecules with higher virtual molecule evaluation scores as the second virtual molecules, for example, the first 1000 virtual molecules with scores given by the molecular evaluation alternative model are used as the second virtual molecules.
- the molecule selection module selects the second virtual molecule, it can adopt different strategies to ensure the purpose and universality of the second molecule and speed up the molecular screening.
- the virtual molecular screening system further includes:
- the policy information configuration module is used to store the scheduling parameters of the active learning scheduler, the feature extraction parameters and training parameters of the molecular evaluation surrogate model, the selection strategy of the molecular selection module, the evaluation parameters of the molecular evaluation module, and The loop termination condition.
- the virtual molecular screening system further includes a strategy information configuration module 150, wherein the strategy information configuration module is used to store the scheduling parameters of the active learning scheduler, the The feature extraction parameters and training parameters of the molecular evaluation surrogate model, the selection strategy of the molecular selection module, the evaluation parameters of the molecular evaluation module, and the loop termination condition, when the active learning scheduler schedules the work of each module, at the same time in the strategy
- the information configuration module dispatches corresponding parameters to complete the screening of virtual molecules.
- the configuration parameters of each module are stored in the policy information configuration module, so as to provide support for the completion of molecular screening.
- the active learning scheduler may also be used for:
- the molecular evaluation module, the molecular substitution evaluation model and the molecular selection model are expanded and replaced through a preset virtual interface.
- the active learning scheduler is a scheduling module based on an active learning algorithm, which is an interface abstraction.
- an active learning algorithm which is an interface abstraction.
- the algorithm and strategy can be implemented directly through the active learning scheduler. It can easily integrate different molecular evaluation algorithms, and can expand different artificial intelligence-based molecular evaluation alternative models to realize rapid combination strategy development in large-scale molecular screening scenarios.
- a specific embodiment can be used as an example, and a Python software architecture embodiment is taken as an example, and 100 million virtual molecules in the ZINC15 virtual molecule library are selected as virtual molecules for initial large-scale screening library, and store these virtual molecules in the .csv file in the form of SMILES, and configure parameters for each module according to the policy configuration information, including the initialization parameters of the active learning scheduler, the source of the virtual molecular library, and the correlation of molecular evaluation alternative models.
- the molecular selection module uses a greedy strategy
- the molecular evaluation module uses AutoDock-GPU for molecular evaluation
- the molecular evaluation alternative model uses a random forest model based on molecular fingerprints, as shown in Figure 2.
- the sequence diagram of a single screening of virtual molecules by the virtual molecular screening system provided in the embodiment of the present application, wherein the active learning scheduler initiates the first molecular selection, which can be random selection, such as selecting 0.05% of the total number of virtual molecules in the virtual molecular library , or a fixed number of molecules, the molecular selection module selects virtual molecules, and sends the selected virtual molecules to the molecular evaluation module for evaluation, and the molecular evaluation module sends the evaluated score-molecule pairs to the molecular evaluation surrogate model , the molecular evaluation surrogate model is trained using these score-molecule pairs, and the trained molecular evaluation surrogate model is returned to the active learning molecule scheduler, which schedules the molecular evaluation surrogate model to evaluate all molecules in the virtual molecule library , to get the predicted score value of each virtual molecule, update the molecular evaluation substitution model through the predicted score value of the virtual molecule and the true value of the molecular evaluation model, and send the predicted score value to the molecular selection module
- the virtual molecular screening system provided by the embodiment of this application separates the molecular evaluation, molecular selection and molecular evaluation replacement models through the modularized overall framework design idea, and realizes the process connection and interface of each module by a unified active learning scheduler Unified, so that molecular evaluation, molecular selection, and selection of molecular evaluation alternative models can be decoupled from the entire molecular screening process, which facilitates the expansion and integration of the functions of each module.
- the embodiment of the present application provides a virtual molecular screening method, as shown in Figure 3, the method includes:
- Step S1 Evaluate the first virtual molecules satisfying the first evaluation condition in the preset virtual molecule library, obtain the score of each of the first virtual molecules, and generate a score-molecule pair for each of the first virtual molecules.
- the molecular evaluation module when evaluating virtual molecules, can be used to evaluate the virtual analysis.
- the molecular evaluation module includes one or more molecular docking algorithms and molecular evaluation algorithms for The received virtual molecules are evaluated, and a score-molecule pair is generated for all evaluated virtual molecules; the molecule in the score-molecule pair refers to the evaluated virtual molecule, and the score in the score-molecule pair refers to The evaluation score for the virtual molecule being evaluated.
- the first evaluation condition is different for the number of evaluations of the virtual molecule, and the specific restriction conditions are also different. Optionally, it can be divided into two cases.
- One is to evaluate the virtual molecule for the first time, and randomly select the first number of virtual molecules in the preset virtual molecule library as the virtual molecules satisfying the first preset condition, namely Can. If the virtual molecule is not evaluated for the first time, the second virtual molecule is accepted as a virtual molecule satisfying the first preset condition.
- a specific embodiment is taken as an example to provide a large-scale virtual molecular library ZINC, in which a certain number of virtual molecules are randomly selected as the first virtual molecular library.
- Molecules can randomly pick 100 virtual molecules, evaluate the 100 virtual molecules through the molecular evaluation module, get the score of each virtual molecule, and then pair the score of each virtual molecule with the virtual molecule to construct the score- molecular pair. For example, 76-molecule 1, 78-molecule 2, 25-molecule 3, 98-molecule 4, etc.
- the above-mentioned evaluation method and matching method are only a possible implementation of the embodiment of the application, and the protection scope of the application No limit.
- Step S2 using the at least one score-molecule to train a preset molecular evaluation surrogate model, and after the training, evaluate the virtual molecules in the preset virtual molecule library to obtain the predicted score values of all virtual molecules.
- the molecular evaluation surrogate model is a series of interface abstractions based on machine learning molecular evaluation methods. Through the molecular evaluation surrogate model, multiple molecular evaluation methods can be used to evaluate virtual molecules. Among them, for a virtual molecule The evaluation result may be the evaluation result of one molecular evaluation method, or may be the evaluation result of a combination of multiple molecular evaluation methods.
- the molecular evaluation surrogate model before using the molecular evaluation surrogate model to evaluate the virtual molecule, it is necessary to train the molecular evaluation surrogate model through the score-molecule pair sent by the molecular evaluation module, and use the score-molecule pair as a training set. The molecular evaluation surrogate model is used for training. After the training is completed, the molecular evaluation surrogate model is used to evaluate all the molecules in the virtual molecule library, and the predicted score values of all virtual molecules are obtained.
- Step S3 select a second virtual molecule from the preset virtual molecule library according to the score-molecule pair and/or the predicted score value, and use the second virtual molecule as the first virtual molecule to perform steps S1, cyclically Step S2, and step S3, until the total number of evaluated virtual molecules meets the loop termination condition.
- the cycle termination condition is to meet the condition for terminating the molecular cycle evaluation, which can be that the total number of virtual molecules evaluated by the molecular evaluation module is greater than a certain set value, such as greater than 10000, or by the molecular evaluation module
- the proportion of the total number of evaluated virtual molecules in the virtual molecular library is greater than a certain preset value, for example, the ratio of the total number of virtual molecules evaluated by the molecular evaluation module to the total number of virtual molecules in the preset virtual molecular library is greater than 2 %.
- the process of evaluating and selecting virtual molecules needs to be cycled, and the selected second virtual molecule is re-used as the first virtual molecule , and execute step S1, step S2, and step S3 in a loop until the total number of all evaluated virtual molecules satisfies the loop termination condition.
- Step S4 determining a target virtual molecule from all the virtual molecules evaluated in step S1.
- the process of sorting the virtual molecules according to their scores is not a necessary step for this application to determine the target virtual molecules, and the top 1000 virtual molecules with the highest scores (preset recommended conditions) as the target virtual molecule.
- the loop termination condition is that the total number of all evaluated virtual molecules accounts for more than 2/10,000 of the total number of virtual molecules in the preset virtual molecule library
- the preset virtual molecule library Determine the total number of virtual molecules in the virtual molecule. If the total number of virtual molecules in the preset virtual molecule library is 1 billion, the total number of virtual molecules that have been evaluated is 200,000. When all the virtual molecules that have been evaluated When the total number of molecules is greater than 200,000, the loop is terminated.
- the target false molecules can also be the top 1000 virtual molecules with higher selection scores (preset recommendation conditions) according to the preset selection score.
- the virtual molecular screening method provided in the embodiment of the present application can select most of the virtual molecules that meet the requirements faster and more accurately than the traversal screening of all molecules in the entire virtual molecular library, speed up molecular screening, and effectively Save computing power (saving more than 90% of computing power consumption), and reduce the cost of molecular screening.
- the first virtual molecule in the preset virtual molecule library that satisfies the first evaluation condition is evaluated, and each of the first virtual molecules is obtained.
- a score of virtual molecules, generating score-molecule pairs for each of said first virtual molecules, comprising:
- Step S401 when evaluating the virtual molecules for the first time, randomly select a first number of virtual molecules from the preset virtual molecule library as the first virtual molecules, evaluate the first virtual molecules, and obtain each of the Scores of first virtual molecules, for each of which a score-molecule pair is generated.
- the molecular evaluation module determines the virtual molecule that enters the molecular evaluation module for evaluation for the first time, and a certain number of virtual molecules can be randomly selected in the virtual molecule library as the first virtual molecule, such as in 1000 virtual molecules were randomly selected from the virtual molecule library as the first virtual molecule, and then for each first virtual molecule, the molecular evaluation model was used to evaluate the corresponding score to generate a score-molecule pair.
- Step S402 receiving a second virtual molecule when not evaluating the virtual molecule for the first time, using the second virtual molecule as the first virtual molecule, evaluating the first virtual molecule, and obtaining each of the first virtual molecules A score of virtual molecules, a score-molecule pair being generated for each of said first virtual molecules.
- the second virtual molecule selected by the molecule selection module can be used as the new first virtual molecule, and then for each first virtual molecule, the molecular evaluation method is used. The model is evaluated, corresponding scores are obtained, and score-numerator pairs are generated.
- the step S2 further includes:
- the molecular evaluation surrogate model is updated according to the predicted score value and the score of the first virtual molecule obtained by the molecular evaluation module on the first virtual molecule, so that the evaluation surrogate model is The absolute value of the difference between the predicted score value obtained by the virtual molecule evaluation and the score value obtained by the molecular evaluation model for the first virtual molecule evaluation is reduced.
- the molecular evaluation module evaluates the first virtual molecules to obtain the scores as the true values of these first virtual molecular scores, and replaces the molecular evaluation with The predicted scores obtained by the model for each first virtual molecule evaluation are used as the predicted values of these first virtual molecule scores. According to the difference between the predicted value and the true value, the parameters in the molecular evaluation substitution model can be adjusted, and then The molecular evaluation surrogate model is updated so that the evaluation result of the molecular evaluation surrogate model for the virtual molecule is closer to the evaluation result of the molecular evaluation module for the virtual molecule.
- the molecular evaluation surrogate model is updated through the performance results of the molecular evaluation surrogate model, so as to improve the evaluation accuracy of the molecular evaluation surrogate model, and to screen virtual molecules more quickly and accurately.
- selecting a second virtual molecule from a preset virtual molecule library according to the score-molecule pair and/or the predicted score value includes:
- a second number of virtual molecules that have not been evaluated by the molecular evaluation model are randomly selected from a preset virtual molecule library according to the score-molecule pairs as second virtual molecules.
- the molecule selection module when selecting the second virtual molecule, different strategies may be adopted, and optionally, a greedy strategy or an uncertain strategy may be adopted.
- the molecule selection module when the molecule selection module selects the second virtual molecule, it can always preferably select the virtual molecules in the virtual molecule library that have not been evaluated by the molecule evaluation module, that is, use the uncertainty strategy to select; it can also use the greedy strategy
- the greedy strategy refers to always selecting molecules with higher evaluation scores of virtual molecules as the second virtual molecules, for example, taking the top 1000 virtual molecules with the evaluation scores given by the alternative model of molecular evaluation as the second virtual molecules.
- the method further includes:
- the scheduling parameters of the active learning scheduler by receiving and storing the scheduling parameters of the active learning scheduler, the feature extraction parameters and training parameters of the molecular evaluation surrogate model, the selection strategy of the molecular selection module, and the molecular evaluation module's Evaluate the parameters and the loop termination conditions, and simultaneously schedule the corresponding parameters when the active learning scheduler schedules the work of each module, so as to complete the screening of virtual molecules.
- the embodiment of the present application provides support for the completion of molecular screening by storing the configuration parameters of each module.
- the method further includes:
- the molecular evaluation module Through the preset virtual interface, the molecular evaluation module, the molecular substitution evaluation model and the molecular selection model are expanded and replaced.
- an interface abstraction is provided, when it is necessary to expand the algorithms and strategies in the molecular substitution model, molecular evaluation module and molecular selection module in the virtual molecular screening system Or when updating, it can be implemented directly through the active learning scheduler. It can easily integrate different molecular evaluation algorithms, and can expand different artificial intelligence-based molecular evaluation alternative models to realize rapid combination strategy development in large-scale molecular screening scenarios.
- the virtual molecular screening method separates molecular evaluation, molecular selection and molecular evaluation substitution model, and uses a unified active learning scheduler to realize the process connection and interface unification of each module, so that molecular evaluation, molecular selection As well as the selection of alternative molecular evaluation models can be decoupled from the entire molecular screening process to facilitate the expansion and integration of the functions of each module.
- active learning screening algorithm iteratively select suitable molecules from the virtual molecular library for training.
- a molecular evaluation surrogate model compared with traversal screening of all molecules in the entire virtual molecular library, can select most of the virtual molecules that meet the requirements faster and more accurately, speed up molecular screening, and effectively save computing power (saving More than 90% of computing power consumption), reducing the cost of molecular screening.
- FIG. 5 is a schematic structural diagram of an electronic device shown in an embodiment of the present application.
- the electronic device 1000 includes a memory 1010 and a processor 1020 .
- the processor 1020 can be a central processing unit (Central Processing Unit, CPU), and can also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), on-site Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
- a general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
- the memory 1010 may include various types of storage units such as system memory, read only memory (ROM), and persistent storage.
- the ROM may store static data or instructions required by the processor 1020 or other modules of the computer.
- the persistent storage device may be a readable and writable storage device.
- Persistent storage may be a non-volatile storage device that does not lose stored instructions and data even if the computer is powered off.
- the permanent storage device adopts a mass storage device (such as a magnetic or optical disk, flash memory) as the permanent storage device.
- the permanent storage device may be a removable storage device (such as a floppy disk, an optical drive).
- the system memory can be a readable and writable storage device or a volatile readable and writable storage device, such as dynamic random access memory.
- System memory can store some or all of the instructions and data that the processor needs at runtime.
- the memory 1010 may include any combination of computer-readable storage media, including various types of semiconductor memory chips (such as DRAM, SRAM, SDRAM, flash memory, programmable read-only memory), and magnetic disks and/or optical disks may also be used.
- memory 1010 may include a readable and/or writable removable storage device, such as a compact disc (CD), a read-only digital versatile disc (e.g., DVD-ROM, dual-layer DVD-ROM), Read-only Blu-ray Disc, Super Density Disc, Flash memory card (such as SD card, min SD card, Micro-SD card, etc.), magnetic floppy disk, etc.
- a readable and/or writable removable storage device such as a compact disc (CD), a read-only digital versatile disc (e.g., DVD-ROM, dual-layer DVD-ROM), Read-only Blu-ray Disc, Super Density Disc, Flash memory card (such as SD card, min SD card, Micro-SD card, etc.), magnetic floppy disk, etc.
- Computer-readable storage media do not contain carrier waves and transient electronic signals transmitted by wireless or wire.
- Executable codes are stored in the memory 1010 , and when the executable codes are processed by the processor 1020 , the processor 1020 may execute part or all of the methods mentioned above.
- the method according to the present application can also be implemented as a computer program or computer program product, the computer program or computer program product including computer program code instructions for executing some or all of the steps in the above method of the present application.
- the present application may also be implemented as a computer-readable storage medium (or a non-transitory machine-readable storage medium or a machine-readable storage medium), on which executable code (or computer program or computer instruction code) is stored,
- executable code or computer program or computer instruction code
- the processor of the electronic device or server, etc.
- the processor is made to perform part or all of the steps of the above-mentioned method according to the present application.
Landscapes
- Chemical & Material Sciences (AREA)
- Engineering & Computer Science (AREA)
- Crystallography & Structural Chemistry (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Medicinal Chemistry (AREA)
- Library & Information Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computing Systems (AREA)
- Theoretical Computer Science (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
La présente demande concerne un système et un procédé de criblage de molécules virtuelles, un dispositif électronique et un support de stockage lisible par ordinateur. Le système comprend un planificateur d'apprentissage actif, un modèle de remplacement d'évaluation de molécules, un module de sélection de molécules, un module d'évaluation de molécules et un module de recommandation de molécules. Grâce à une idée de conception de structure globale modulaire, des modèles d'évaluation de molécules, de sélection de molécules et de remplacement d'évaluation de molécules sont séparés, et la concaténation des processus et l'unification des interfaces des modules sont mises en œuvre au moyen d'un planificateur d'apprentissage actif, de telle sorte que des sélections de modèles d'évaluation de molécules, de sélection de molécules et de remplacement d'évaluation de molécules peuvent être découplées du processus de criblage de molécules dans son ensemble, l'expansion et l'intégration des fonctions des modules peuvent être mises en œuvre de manière pratique, et des molécules appropriées sont sélectionnées à partir d'une bibliothèque de molécules virtuelles selon un mode itératif pour entraîner un modèle de remplacement d'évaluation de molécules suivant. Par comparaison avec le criblage par analyse de toutes les molécules présentes dans l'ensemble de la bibliothèque de molécules virtuelles, la plupart des molécules virtuelles répondant aux exigences peuvent être sélectionnées plus rapidement et avec plus de précision, la vitesse de criblage des molécules est augmentée, et on peut véritablement réaliser des économies en matière de puissance de calcul.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/CN2021/142815 WO2023123149A1 (fr) | 2021-12-30 | 2021-12-30 | Système et procédé de criblage de molécules virtuelles, dispositif électronique et support de stockage lisible par ordinateur |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/CN2021/142815 WO2023123149A1 (fr) | 2021-12-30 | 2021-12-30 | Système et procédé de criblage de molécules virtuelles, dispositif électronique et support de stockage lisible par ordinateur |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2023123149A1 true WO2023123149A1 (fr) | 2023-07-06 |
Family
ID=86997073
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2021/142815 Ceased WO2023123149A1 (fr) | 2021-12-30 | 2021-12-30 | Système et procédé de criblage de molécules virtuelles, dispositif électronique et support de stockage lisible par ordinateur |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2023123149A1 (fr) |
Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20140171332A1 (en) * | 2012-12-05 | 2014-06-19 | Hudson Robotics, Inc. | System for the efficient discovery of new therapeutic drugs |
| CN110459274A (zh) * | 2019-08-01 | 2019-11-15 | 南京邮电大学 | 一种基于深度迁移学习的小分子药物虚拟筛选方法及其应用 |
| CN110534165A (zh) * | 2019-09-02 | 2019-12-03 | 广州费米子科技有限责任公司 | 一种药物分子活性的虚拟筛选系统及其方法 |
| CN111863120A (zh) * | 2020-06-28 | 2020-10-30 | 深圳晶泰科技有限公司 | 晶体复合物的药物虚拟筛选系统及方法 |
| CN112151127A (zh) * | 2020-09-04 | 2020-12-29 | 牛张明 | 基于分子语义向量的无监督学习药物虚拟筛选方法和系统 |
| US11127488B1 (en) * | 2020-09-25 | 2021-09-21 | Accenture Global Solutions Limited | Machine learning systems for automated pharmaceutical molecule screening and scoring |
-
2021
- 2021-12-30 WO PCT/CN2021/142815 patent/WO2023123149A1/fr not_active Ceased
Patent Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20140171332A1 (en) * | 2012-12-05 | 2014-06-19 | Hudson Robotics, Inc. | System for the efficient discovery of new therapeutic drugs |
| CN110459274A (zh) * | 2019-08-01 | 2019-11-15 | 南京邮电大学 | 一种基于深度迁移学习的小分子药物虚拟筛选方法及其应用 |
| CN110534165A (zh) * | 2019-09-02 | 2019-12-03 | 广州费米子科技有限责任公司 | 一种药物分子活性的虚拟筛选系统及其方法 |
| CN111863120A (zh) * | 2020-06-28 | 2020-10-30 | 深圳晶泰科技有限公司 | 晶体复合物的药物虚拟筛选系统及方法 |
| CN112151127A (zh) * | 2020-09-04 | 2020-12-29 | 牛张明 | 基于分子语义向量的无监督学习药物虚拟筛选方法和系统 |
| US11127488B1 (en) * | 2020-09-25 | 2021-09-21 | Accenture Global Solutions Limited | Machine learning systems for automated pharmaceutical molecule screening and scoring |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11609792B2 (en) | Maximizing resource utilization of neural network computing system | |
| US9009447B2 (en) | Acceleration of string comparisons using vector instructions | |
| US11556756B2 (en) | Computation graph mapping in heterogeneous computer system | |
| US9921951B2 (en) | Optimizations for regression tracking and triaging in software testing | |
| US20200272896A1 (en) | System for deep learning training using edge devices | |
| KR102656568B1 (ko) | 데이터를 분류하는 방법 및 장치 | |
| CN112418416A (zh) | 神经网络计算系统、神经网络计算方法和计算机系统 | |
| US11275661B1 (en) | Test generation of a distributed system | |
| CN116860999A (zh) | 超大语言模型分布式预训练方法、装置、设备及介质 | |
| CN117453503A (zh) | 一种深度学习算子程序的优化方法、装置、设备及介质 | |
| US12008469B1 (en) | Acceleration of neural networks with stacks of convolutional layers | |
| CN110659308B (zh) | 一种数据清算处理方法及装置 | |
| US11468304B1 (en) | Synchronizing operations in hardware accelerator | |
| TW202427274A (zh) | 機器學習編譯器優化中的節點對稱性 | |
| US20230221876A1 (en) | Computational ssd accelerating deep learning service on large-scale graphs | |
| US11188302B1 (en) | Top value computation on an integrated circuit device | |
| WO2023123149A1 (fr) | Système et procédé de criblage de molécules virtuelles, dispositif électronique et support de stockage lisible par ordinateur | |
| CN114429799B (zh) | 虚拟分子筛选系统、方法、电子设备及计算机可读存储介质 | |
| CN113342696A (zh) | 一种单元测试方法及装置、存储介质 | |
| US11113140B2 (en) | Detecting error in executing computation graph on heterogeneous computing devices | |
| JPH0850554A (ja) | プロセッサの動作モデルと論理検証用試験命令列の自動生成方法及び装置 | |
| CN109960529B (zh) | 一种程序代码的修复方法和装置 | |
| WO2024205873A1 (fr) | Entraînement d'un modèle d'apprentissage automatique à l'aide d'un pipeline d'accélération avec des micro-lots populaires et non populaires | |
| US12159217B1 (en) | Using vector clocks to simplify a dependency graph in a neural network accelerator | |
| CN116932218A (zh) | 内存信息确定方法、装置及电子设备 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21969516 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 21969516 Country of ref document: EP Kind code of ref document: A1 |