[go: up one dir, main page]

WO2023102923A1 - Procédé et appareil de détermination pour un programme de conception moléculaire, dispositif et support de stockage - Google Patents

Procédé et appareil de détermination pour un programme de conception moléculaire, dispositif et support de stockage Download PDF

Info

Publication number
WO2023102923A1
WO2023102923A1 PCT/CN2021/137223 CN2021137223W WO2023102923A1 WO 2023102923 A1 WO2023102923 A1 WO 2023102923A1 CN 2021137223 W CN2021137223 W CN 2021137223W WO 2023102923 A1 WO2023102923 A1 WO 2023102923A1
Authority
WO
WIPO (PCT)
Prior art keywords
molecule
preset
molecules
design scheme
molecular
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2021/137223
Other languages
English (en)
Chinese (zh)
Inventor
袁久闯
曾群
金颖滴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Jingtai Technology Co Ltd
Original Assignee
Shenzhen Jingtai Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Jingtai Technology Co Ltd filed Critical Shenzhen Jingtai Technology Co Ltd
Priority to PCT/CN2021/137223 priority Critical patent/WO2023102923A1/fr
Publication of WO2023102923A1 publication Critical patent/WO2023102923A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/50Molecular design, e.g. of drugs
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/90Programming languages; Computing architectures; Database systems; Data warehousing

Definitions

  • the invention relates to the technical field of computational chemistry, in particular to a method, device, equipment and storage medium for determining a molecular design scheme.
  • the currently existing options for crystallization experiments mainly include:
  • Large-scale experimental method traversal screening that is, for any drug molecule, use a variety of experimental methods and experimental parameters that are preset in a unified way to conduct large-scale crystallization experiments, and find out crystallization solutions from them.
  • the disadvantages are: the experimental protocol does not optimize the specified drug molecules, too many experiments need to be done, and the cost of materials, manpower, and time is too high.
  • Embodiments of the present invention provide a method, device, equipment, and storage medium for determining molecular design schemes, so as to at least solve the problem of efficiently, conveniently and low-cost screening out reasonable experimental schemes and parameters based on actual needs. Ask technical questions.
  • a method for determining a molecular design scheme includes: obtaining a target molecule to be designed, and calculating a molecular descriptor of the target molecule; Molecular descriptors, determining the distance between the target molecule and each preset molecule in the pre-constructed target chemical space; according to the distance between the target molecule and each of the preset molecules, from each of the preset molecules determine the candidate molecule; obtain the design scheme of the candidate molecule; determine the design scheme of the target molecule according to the design scheme of the candidate molecule.
  • the method further includes: acquiring a plurality of preset molecules; determining a molecular descriptor of each of the preset molecules, wherein the molecule of one of the preset molecules The descriptor occupies a position in the multidimensional space; based on the position occupied by the molecular descriptor of each of the predetermined molecules in the multidimensional space, a target chemical space is constructed.
  • the obtaining multiple preset molecules includes: obtaining multiple initial molecular design schemes; analyzing the design schemes of each initial molecule to determine the integrity of the design schemes of each initial molecule; The initial molecule of the design scheme is used as the preset molecule.
  • analyzing the design scheme of each initial molecule to determine the integrity of the design scheme of each initial molecule includes: analyzing the design scheme of each initial molecule to obtain the first initial initial The molecule and the second initial molecule with an incomplete design scheme; said using the initial molecule with a complete design scheme as a preset molecule includes: taking the first initial molecule with a complete design scheme as a preset molecule.
  • the method further includes: performing completion processing on the design scheme of the second initial molecule, and using the second initial molecule after completion of the design scheme as the preset molecule.
  • the determining the molecular descriptors of each of the preset molecules includes: using a preset algorithm to calculate multiple molecular descriptors of each of the preset molecules; The molecular descriptors are screened to obtain a set of molecular descriptors used to characterize the preset molecules.
  • the screening process is performed on multiple molecular descriptors of each preset molecule, including: for any molecular descriptor, when there are preset molecules exceeding the first threshold number, the value of the molecular descriptor is If they are the same, delete the molecular descriptor.
  • the screening process is performed on multiple molecular descriptors of each preset molecule, including: for any molecular descriptor, when the value of the molecular descriptor corresponding to the preset molecule corresponds to the rest of the preset molecules The molecular descriptor is deleted when the difference between the values of the molecular descriptor is below a second threshold.
  • the screening process is performed on multiple molecular descriptors of each preset molecule, including: for any molecular descriptor, when the value of the molecular descriptor corresponding to a preset molecule is abnormal, delete the molecule Descriptor.
  • the screening process is performed on multiple molecular descriptors of each preset molecule, including: for any two molecular descriptors, when calculating based on the values of the two molecular descriptors corresponding to all preset molecules When it is obtained that the correlation coefficient of the two molecular descriptors is higher than a third threshold, one of the two molecular descriptors is deleted.
  • performing screening processing on multiple molecular descriptors of each preset molecule includes: performing dimensionality reduction processing on all molecular descriptors.
  • the determining the distance between the target molecule and each preset molecule in the pre-built target chemical space according to the molecular descriptor of the target molecule includes: according to the molecular descriptor of the target molecule, determining the position of the target molecule in the pre-constructed target chemical space; calculating the target molecule according to the position of the target molecule in the target chemical space and the position of each preset molecule in the target chemical space The distance between each preset molecule.
  • the determining candidate molecules from each of the preset molecules according to the distance between the target molecule and each of the preset molecules includes: according to the distance between the target molecule and each of the preset molecules The distance between the preset molecules is selected as the candidate molecule with a smaller preset number or preset ratio; or, according to the distance between the target molecule and each preset molecule, the preset molecule with the smallest distance is selected. molecules as candidates.
  • determining the design scheme of the target molecule according to the design scheme of the candidate molecule includes: designing each of the candidate molecules according to a preset scoring strategy Scoring the scheme to obtain the score value of the design scheme of each candidate molecule; the design scheme of the candidate molecule with the highest score value is used as the design scheme of the target molecule.
  • determining the design scheme for the target molecule according to the design scheme for the candidate molecule includes: The scoring strategy scores each set of design schemes of the candidate molecule to obtain the score value of each set of design schemes of the candidate molecule; the design scheme with the highest score value is used as the design scheme of the target molecule.
  • a device for determining a molecular design scheme includes: a first acquisition unit, configured to acquire the target molecule to be designed, and calculate the molecular weight of the target molecule Descriptor; the first determination unit is used to determine the distance between the target molecule and each preset molecule in the pre-built target chemical space according to the molecular descriptor of the target molecule; the second determination unit is used to determine the distance between The distance between the target molecule and each of the preset molecules determines the candidate molecule from each of the preset molecules; the second obtaining unit is used to obtain the design scheme of the candidate molecule; the third determining unit, It is used for determining the design scheme of the target molecule according to the design scheme of the candidate molecule.
  • the determination device further includes: a third acquisition unit, used to acquire a plurality of preset molecules before the first acquisition unit acquires the target molecule to be designed; a fourth determination unit, used to determine each Molecular descriptors of the preset molecules, wherein one molecular descriptor of the preset molecules occupies a position in the multi-dimensional space; a construction unit is used for each molecular descriptor of the preset molecules in the The position occupied in the multidimensional space constructs the target chemical space.
  • a third acquisition unit used to acquire a plurality of preset molecules before the first acquisition unit acquires the target molecule to be designed
  • a fourth determination unit used to determine each Molecular descriptors of the preset molecules, wherein one molecular descriptor of the preset molecules occupies a position in the multi-dimensional space
  • a construction unit is used for each molecular descriptor of the preset molecules in the The position occupied in the multidimensional space constructs the target chemical space.
  • the third acquisition unit includes: a first acquisition subunit, configured to acquire a plurality of initial molecular design schemes; a first determination subunit, configured to analyze the design schemes of each initial molecule, and determine the The completeness of the design scheme of an initial molecule; the second determination subunit is used to use the initial molecule with a complete design scheme as the preset molecule.
  • the first determination subunit includes: a first acquisition module, configured to analyze the design scheme of each initial molecule, and obtain the first initial molecule with a complete design scheme and the second initial molecule with an incomplete design scheme .
  • the second determining subunit includes: a first determining module, configured to use the first initial molecule with a complete design scheme as a preset molecule.
  • the determination device further includes: a completion unit, configured to perform completion processing on the design scheme of the second initial molecule, and use the second initial molecule whose design scheme has been completed as the preset molecular.
  • a completion unit configured to perform completion processing on the design scheme of the second initial molecule, and use the second initial molecule whose design scheme has been completed as the preset molecular.
  • the fourth determination unit includes: a first calculation subunit, configured to calculate multiple molecular descriptors of each of the preset molecules by using a preset algorithm; a screening subunit, used to calculate each of the preset molecules A plurality of molecular descriptors of the predetermined molecules are screened to obtain a set of molecular descriptors for characterizing the predetermined molecules.
  • the screening subunit includes at least one of the following: a first processing module, configured to, for any molecular descriptor, when there are preset molecules exceeding the first threshold number corresponding to the same value of the molecular descriptor, Delete the molecular descriptor; the second processing module is used for any molecular descriptor, when there is a difference between the value of the molecular descriptor corresponding to the preset molecule and the value of the molecular descriptor corresponding to the rest of the preset molecules When the molecular descriptor is lower than the second threshold, the molecular descriptor is deleted; the third processing module is used to delete the molecular descriptor when the value of the molecular descriptor corresponding to the preset molecule is abnormal for any molecular descriptor; the first Four processing modules, for any two molecular descriptors, when the correlation coefficient of the two molecular descriptors calculated based on the values of the two molecular descriptors corresponding to all preset molecules is
  • the first determination unit includes: a third determination subunit, configured to determine the position of the target molecule in the pre-built target chemical space according to the molecular descriptor of the target molecule; the second calculation A subunit, used to calculate the distance between the target molecule and each preset molecule according to the position of the target molecule in the target chemical space and the position of each preset molecule in the target chemical space .
  • the second determining unit includes at least any of the following: a first selecting subunit, configured to select a preset number with a smaller distance according to the distance between the target molecule and each of the preset molecules Or preset molecules with a preset ratio are used as candidate molecules; the second selection subunit is configured to select the preset molecule with the smallest distance as the candidate molecule according to the distance between the target molecule and each of the preset molecules.
  • the third determination unit includes: a first scoring subunit, configured to score the design scheme of each candidate molecule according to a preset scoring strategy when there are multiple candidate molecules, to obtain The score value of the design scheme of each candidate molecule; the fourth determination subunit is used to use the design scheme of the candidate molecule with the highest score value as the design scheme of the target molecule.
  • the third determining unit includes: a second scoring subunit, configured to evaluate the candidate molecule according to a preset scoring strategy when there is one candidate molecule and there are multiple sets of design schemes for the candidate molecule. Scoring each set of design schemes of the candidate molecules to obtain the score value of each set of design schemes of the candidate molecules; the fifth determination subunit is used to use the design scheme with the highest score value as the design scheme of the target molecule.
  • a computer-readable storage medium includes a stored program, wherein, when the program is running, the device on which the computer-readable storage medium is located is controlled The method for determining the molecular design scheme described above is carried out.
  • an electronic device includes a memory and a processor, the memory stores a program, and the processor is used to execute the program to realize the above-mentioned molecular design scheme Determine the method.
  • the target molecule to be designed by obtaining the target molecule to be designed, and calculating the molecular descriptor of the target molecule; The distance between preset molecules; according to the distance between the target molecule and each of the preset molecules, determine candidate molecules from each of the preset molecules; obtain the design scheme of the candidate molecules; according to the The design scheme of the candidate molecule is to determine the design scheme of the target molecule.
  • the molecular descriptor of the target molecule can be calculated, and at the same time, the The position occupied in the target chemical space, and based on the Euclidean distance between a plurality of the preset molecules and the positions occupied by the target molecule in the target chemical space, among the plurality of preset molecules, determine the required The candidate molecules corresponding to the design scheme are pushed, and the design scheme of the candidate molecule is pushed, so that the target molecule can be designed correspondingly with reference to the design scheme of the candidate molecule. Furthermore, the technical effect of recommending a corresponding design scheme according to the characteristics of the target molecule is achieved efficiently, quickly and at low cost.
  • the molecular descriptors of the drug molecules are calculated by collecting the design schemes of the drug molecules, and the corresponding chemical space is constructed by using important molecular descriptors.
  • the molecular descriptor of the new molecule is calculated, and the distance between the molecular descriptor of the new molecule and the existing molecular descriptor in the chemical space is calculated; then the corresponding design scheme is recommended in order of distance from shortest to farthest.
  • Fig. 1 is a flowchart of an alternative method for determining a molecular design scheme according to an embodiment of the present invention
  • Fig. 2 is a schematic diagram of an alternative method for determining a molecular design scheme according to an embodiment of the present invention
  • FIG. 3 is a schematic diagram of an optional target molecule corresponding to bardoxolone methyl according to an embodiment of the present invention
  • Fig. 4 is a schematic diagram of an optional preset molecule corresponding to bardoxolone methyl according to an embodiment of the present invention
  • Fig. 5 is a schematic diagram of an optional device for determining a molecular design scheme according to an embodiment of the present invention.
  • an embodiment of a method for determining a molecular design scheme is provided. It should be noted that the steps shown in the flow chart of the accompanying drawings can be executed in a computer system such as a set of computer-executable instructions, Also, although a logical order is shown in the flowcharts, in some cases the steps shown or described may be performed in an order different from that shown or described herein.
  • Fig. 1 is a method for determining a molecular design scheme according to an embodiment of the present invention. As shown in Fig. 1, the method includes the following steps:
  • Step S102 acquiring the target molecule to be designed, and calculating the molecular descriptor of the target molecule.
  • Step S104 according to the molecular descriptor of the target molecule, determine the distance between the target molecule and each preset molecule in the pre-built target chemical space.
  • Step S106 determining candidate molecules from each of the preset molecules according to the distance between the target molecule and each of the preset molecules.
  • Step S108 obtaining the design scheme of the candidate molecule.
  • Step S110 according to the design scheme of the candidate molecule, determine the design scheme of the target molecule.
  • the method before the acquisition of the target molecule to be designed, the method further includes: acquiring a plurality of preset molecules; determining the Molecular descriptors, wherein the molecular descriptor of one of the predetermined molecules occupies a position in the multidimensional space; based on the position occupied by the molecular descriptor of each of the predetermined molecules in the multidimensional space, the target chemical space is constructed .
  • the molecular descriptor of the target molecule can be calculated, and the position occupied by the molecular descriptor of the target molecule in the target chemical space can be determined at the same time, and Based on the Euclidean distance between a plurality of predetermined molecules and the position occupied by the target molecule in the target chemical space, among the plurality of predetermined molecules, determine the candidate molecule whose corresponding design scheme needs to be promoted, And push the design scheme of the candidate molecule, wherein the design scheme at least includes the crystallization experiment data of the preset molecule. Furthermore, the technical effect of recommending the corresponding crystallization experiment scheme according to the characteristics of the target molecule is achieved efficiently, quickly and at low cost.
  • the embodiment of the present application collects crystallization experiment data of drug molecules, calculates the molecular descriptors of the drug molecules, and uses important molecular descriptors to construct the corresponding chemical space.
  • the molecular descriptor of the new molecule is calculated, and the distance between the molecular descriptor of the new molecule and the existing molecular descriptor in the chemical space is calculated; and then the corresponding experimental scheme is recommended in order of distance from shortest to farthest.
  • the molecular design scheme can be expressed by the experimental data of the molecule, and the experimental data can include but not limited to experimental conditions (including experimental methods, such as volatilization methods, dissolution methods, etc.; and related experimental parameters, such as temperature, pressure, etc.), And crystal type (such as single crystal, hydrate, solvate and eutectic, etc.). It is also possible to design experiments to supplement more data from the perspective of molecular diversity to ensure high coverage of experimental data.
  • embodiments of the present application are not only applicable to the recommendation of crystallization experimental schemes, but also applicable to the recommendation of other design schemes, such as the design of molecular synthesis experiments.
  • the acquiring multiple preset molecules includes: acquiring multiple initial molecular design schemes; analyzing the design schemes of each initial molecule, and determining the integrity of the design schemes of each initial molecule ; Use the initial molecule with a complete design as the default molecule.
  • multiple data sources mentioned above may be paper databases, experimental databases, specific experimental data, and the like.
  • the analyzing the design scheme of each initial molecule to determine the integrity of the design scheme of each initial molecule includes: analyzing the design scheme of each initial molecule to obtain the first initial molecule with a complete design scheme and a second initial molecule with an incomplete design scheme; said using the initial molecule with a complete design scheme as a preset molecule includes: using the first initial molecule with a complete design scheme as a preset molecule; in addition, the method It also includes: performing completion processing on the design scheme of the second initial molecule, and using the second initial molecule after completion of the design scheme as the preset molecule.
  • the design scheme at least includes: 1. Experimental conditions, such as: method (such as volatilization method, dissolution method, etc.), related parameters (such as temperature, pressure, etc.); 2. Crystal type, such as: single crystal, hydrated substances, solvates, and eutectics.
  • design experiments can be used to supplement the design scheme of the initial molecule, so as to ensure that the database established in the embodiment of the present application has high coverage.
  • the determining the molecular descriptors of each of the preset molecules includes: using a preset algorithm to calculate multiple molecular descriptors of each of the preset molecules; A plurality of molecular descriptors of a molecule are set to be screened to obtain a set of molecular descriptors used to characterize the preset molecule.
  • chemical information calculation software such as RDKit, etc.
  • each preset molecule corresponds to multiple molecular descriptors.
  • chemical information calculation software such as RDKit, etc.
  • each preset molecule corresponds to multiple molecular descriptors.
  • using an unsupervised and/or supervised feature engineering method to screen multiple molecular descriptors of each preset molecule to obtain a set of molecular descriptors for characterizing the preset molecule using an unsupervised and/or supervised feature engineering method to screen multiple molecular descriptors of each preset molecule to obtain a set of molecular descriptors for characterizing the preset molecule,
  • the molecular descriptors used to characterize the preset molecule occupy a data point in the multidimensional space, and the collection of these data points and their coverage constitute the relevant target chemical space.
  • performing screening on multiple molecular descriptors of each preset molecule includes at least the following processing methods:
  • difference can be represented in various forms, such as variance, standard deviation, standard deviation, mean square deviation, root mean square deviation, and so on.
  • the dimensionality reduction processing of molecular descriptors is for the purpose of eliminating irrelevant and redundant information and reducing the number of variables; among them, irrelevant information refers to molecular descriptors that have nothing to do with the design scheme, such as the number of atoms, Atom types that are not included in all molecules; redundant information means that two or more molecular descriptors have similar meanings, such as molecular mass and heavy atom mass, and only one of them can be retained.
  • the determining the distance between the target molecule and each preset molecule in the pre-built target chemical space according to the molecular descriptor of the target molecule includes: according to the molecular descriptor of the target molecule Molecular descriptor, determining the position of the target molecule in the pre-constructed target chemical space; according to the position of the target molecule in the target chemical space and the position of each preset molecule in the target chemical space, calculate The distance between the target molecule and each of the preset molecules.
  • the determining candidate molecules from each of the preset molecules according to the distance between the target molecule and each of the preset molecules includes: according to the distance between the target molecule and each of the preset molecules distance between, select a preset number or a preset ratio of preset molecules with smaller distances as candidate molecules; or, according to the distance between the target molecule and each of the preset molecules, select the preset molecule with the smallest distance as candidate molecules.
  • the distance between the target molecule and each of the preset molecules may be a Euclidean distance.
  • determining the design scheme of the target molecule according to the design scheme of the candidate molecule includes: evaluating each of the target molecules according to a preset scoring strategy The design schemes of the candidate molecules are scored to obtain the score value of each design scheme of the candidate molecule; the design scheme of the candidate molecule with the highest score value is used as the design scheme of the target molecule.
  • determining the design scheme for the target molecule according to the design scheme for the candidate molecule includes: following a preset scoring strategy Scoring each set of design schemes of the candidate molecule to obtain the score value of each set of design schemes of the candidate molecule; taking the design scheme with the highest score value as the design scheme of the target molecule.
  • the preset scoring strategy for scoring the design scheme of the candidate molecule can be characterized by at least one information such as crystallinity, crystallization duration, and crystallization rate. For example, designs with high crystallinity will score higher than designs with low crystallinity, and designs with fast crystallization rates will score higher than designs with low crystallization rates.
  • the embodiment of the present application achieves the following technical effects: 1.
  • the speed of recommending a reasonable crystallization experiment plan is fast and high throughput can be achieved; 2.
  • the number of experiments is reduced and costs are saved; 3.
  • a computer-readable storage medium includes a stored program, wherein, when the program is running, the device on which the computer-readable storage medium is located is controlled The method for determining the molecular design scheme described above is carried out.
  • an electronic device includes a memory and a processor, the memory stores a program, and the processor is used to execute the program to realize the above-mentioned molecular design scheme Determine the method.
  • an embodiment of a device for determining a molecular design scheme is also provided. It should be noted that the device for determining a molecular design scheme can be used to determine the molecular design scheme in the embodiment of the present invention In the method, the method for determining the molecular design scheme in the embodiment of the present invention can be executed in the device for determining the molecular design scheme.
  • Fig. 5 is a schematic diagram of a device for determining a molecular design scheme according to an embodiment of the present invention.
  • the device may include: a first acquisition unit 51, configured to obtain the target molecule to be designed, and calculate the The molecular descriptor of the target molecule; the first determination unit 53 is used to determine the distance between the target molecule and each preset molecule in the pre-built target chemical space according to the molecular descriptor of the target molecule; the second determination A unit 55, configured to determine candidate molecules from each of the preset molecules according to the distance between the target molecule and each of the preset molecules; a second acquiring unit 57, configured to acquire the design of the candidate molecules Scheme; the third determination unit 59 is configured to determine the design scheme of the target molecule according to the design scheme of the candidate molecule.
  • the determination device further includes: a third acquisition unit, used to acquire a plurality of preset molecules before the first acquisition unit 51 acquires the target molecule to be designed; a fourth determination unit, used to determine each Molecular descriptors of each of the predetermined molecules, wherein the molecular descriptor of one of the predetermined molecules occupies a position in the multi-dimensional space; a construction unit is used for each of the molecular descriptors of the predetermined molecules in each of the predetermined molecular descriptors According to the position occupied in the multidimensional space, the target chemical space is constructed.
  • the third acquisition unit includes: a first acquisition subunit, configured to acquire a plurality of initial molecular design schemes; a first determination subunit, configured to analyze the design schemes of each initial molecule, and determine the The completeness of the design scheme of an initial molecule; the second determination subunit is used to use the initial molecule with a complete design scheme as the preset molecule.
  • the first determination subunit includes: a first acquisition module, configured to analyze the design scheme of each initial molecule, and obtain the first initial molecule with a complete design scheme and the second initial molecule with an incomplete design scheme .
  • the second determining subunit includes: a first determining module, configured to use the first initial molecule with a complete design scheme as a preset molecule.
  • the determination device further includes: a completion unit, configured to perform completion processing on the design scheme of the second initial molecule, and use the second initial molecule whose design scheme has been completed as the preset molecular.
  • a completion unit configured to perform completion processing on the design scheme of the second initial molecule, and use the second initial molecule whose design scheme has been completed as the preset molecular.
  • the fourth determination unit includes: a first calculation subunit, configured to calculate multiple molecular descriptors of each of the preset molecules by using a preset algorithm; a screening subunit, used to calculate each of the preset molecules A plurality of molecular descriptors of the predetermined molecules are screened to obtain a set of molecular descriptors for characterizing the predetermined molecules.
  • the screening subunit includes at least one of the following: a first processing module, configured to, for any molecular descriptor, when there are preset molecules exceeding the first threshold number corresponding to the same value of the molecular descriptor, Delete the molecular descriptor; the second processing module is used for any molecular descriptor, when there is a difference between the value of the molecular descriptor corresponding to the preset molecule and the value of the molecular descriptor corresponding to the rest of the preset molecules When the molecular descriptor is lower than the second threshold, the molecular descriptor is deleted; the third processing module is used to delete the molecular descriptor when the value of the molecular descriptor corresponding to the preset molecule is abnormal for any molecular descriptor; the first Four processing modules, for any two molecular descriptors, when the correlation coefficient of the two molecular descriptors calculated based on the values of the two molecular descriptors corresponding to all preset molecules is
  • the first determining unit 53 includes: a third determining subunit, configured to determine the position of the target molecule in the pre-built target chemical space according to the molecular descriptor of the target molecule; A calculation subunit, configured to calculate the distance between the target molecule and each of the preset molecules according to the position of the target molecule in the target chemical space and the position of each preset molecule in the target chemical space distance.
  • the second determining unit 55 includes at least one of the following: a first selecting subunit, configured to select a preset with a smaller distance according to the distance between the target molecule and each preset molecule.
  • the number or preset proportion of preset molecules are used as candidate molecules; the second selection subunit is used to select the preset molecule with the smallest distance as the candidate molecule according to the distance between the target molecule and each of the preset molecules.
  • the third determination unit 59 includes: a first scoring subunit, configured to score the design scheme of each candidate molecule according to a preset scoring strategy when there are multiple candidate molecules, The score value of the design scheme of each candidate molecule is obtained; the fourth determination subunit is used to use the design scheme of the candidate molecule with the highest score value as the design scheme of the target molecule.
  • the third determining unit 59 includes: a second scoring subunit, configured to evaluate all candidate molecules according to a preset scoring strategy when there is one candidate molecule and there are multiple sets of design schemes for the candidate molecule. Scoring each set of design schemes of the candidate molecules to obtain the score value of each set of design schemes of the candidate molecules; the fifth determining subunit is used to use the design scheme with the highest score value as the design scheme of the target molecule.
  • the disclosed technical content can be realized in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units may be a logical function division.
  • multiple units or components may be combined or may be Integrate into another system, or some features may be ignored, or not implemented.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of units or modules may be in electrical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units can be implemented in the form of hardware or in the form of software functional units.
  • the integrated unit is realized in the form of a software function unit and sold or used as an independent product, it can be stored in a computer-readable storage medium.
  • the computer software product is stored in a storage medium , including several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in various embodiments of the present invention.
  • the aforementioned storage media include: U disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or optical disc, etc., which can store program codes. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computing Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Medicinal Chemistry (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

La présente invention concerne un procédé et un appareil de détermination pour un programme de conception moléculaire, un dispositif et un support de stockage. Le procédé de détermination consiste à : obtenir une molécule cible à concevoir ; calculer un descripteur moléculaire de la molécule cible ; déterminer des distances entre la molécule cible et des molécules prédéfinies dans un espace chimique cible préconstruit en fonction du descripteur moléculaire de la molécule cible ; déterminer une molécule candidate parmi les molécules prédéfinies en fonction des distances entre la molécule cible et les molécules prédéfinies ; obtenir un programme de conception de la molécule candidate ; et déterminer un programme de conception de la molécule cible en fonction du programme de conception de la molécule candidate. La présente invention résout le problème technique de l'état de la technique selon lequel il est impossible d'obtenir un criblage efficace, pratique et à faible coût pour des paramètres et des programmes expérimentaux raisonnables conformes aux exigences réelles.
PCT/CN2021/137223 2021-12-10 2021-12-10 Procédé et appareil de détermination pour un programme de conception moléculaire, dispositif et support de stockage Ceased WO2023102923A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/137223 WO2023102923A1 (fr) 2021-12-10 2021-12-10 Procédé et appareil de détermination pour un programme de conception moléculaire, dispositif et support de stockage

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/137223 WO2023102923A1 (fr) 2021-12-10 2021-12-10 Procédé et appareil de détermination pour un programme de conception moléculaire, dispositif et support de stockage

Publications (1)

Publication Number Publication Date
WO2023102923A1 true WO2023102923A1 (fr) 2023-06-15

Family

ID=86729528

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/137223 Ceased WO2023102923A1 (fr) 2021-12-10 2021-12-10 Procédé et appareil de détermination pour un programme de conception moléculaire, dispositif et support de stockage

Country Status (1)

Country Link
WO (1) WO2023102923A1 (fr)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2003019183A1 (fr) * 2001-08-23 2003-03-06 Deltagen Research Laboratories, L.L.C. Procede d'elaboration informative et iterative d'une bibliotheque combinatoire de familles de genes
WO2018234718A1 (fr) * 2017-06-22 2018-12-27 Arianegroup Sas Procédé et dispositif de sélection d'un sous-ensemble de molécules destinées à être utilisées pour prédire au moins une propriété d'une structure moléculaire
CN112201313A (zh) * 2020-09-15 2021-01-08 北京晶派科技有限公司 一种自动化的小分子药物筛选方法和计算设备
CN112309510A (zh) * 2020-10-31 2021-02-02 平安科技(深圳)有限公司 药物分子生成方法、装置、终端设备以及存储介质
CN113764054A (zh) * 2021-08-30 2021-12-07 深圳晶泰科技有限公司 一种功能有机晶体材料设计方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2003019183A1 (fr) * 2001-08-23 2003-03-06 Deltagen Research Laboratories, L.L.C. Procede d'elaboration informative et iterative d'une bibliotheque combinatoire de familles de genes
WO2018234718A1 (fr) * 2017-06-22 2018-12-27 Arianegroup Sas Procédé et dispositif de sélection d'un sous-ensemble de molécules destinées à être utilisées pour prédire au moins une propriété d'une structure moléculaire
CN112201313A (zh) * 2020-09-15 2021-01-08 北京晶派科技有限公司 一种自动化的小分子药物筛选方法和计算设备
CN112309510A (zh) * 2020-10-31 2021-02-02 平安科技(深圳)有限公司 药物分子生成方法、装置、终端设备以及存储介质
CN113764054A (zh) * 2021-08-30 2021-12-07 深圳晶泰科技有限公司 一种功能有机晶体材料设计方法

Similar Documents

Publication Publication Date Title
EP3005168B1 (fr) Résultats de recherche en langage naturel associés à des interrogations d'intention
TW201241773A (en) Method and apparatus of determining product category information
CN105095240B (zh) 数据库的数据样本采集
CN109240637B (zh) 音量调节的处理方法、装置、设备及存储介质
JP2013502653A5 (fr)
CN103415850A (zh) 结构化文档管理装置、结构化文档检索方法
CN105069080A (zh) 一种文献检索方法及系统
CN110119479B (zh) 一种餐馆推荐方法、装置、设备及可读存储介质
CN114300065B (zh) 分子设计方案的确定方法、装置、设备及存储介质
US10509800B2 (en) Visually interactive identification of a cohort of data objects similar to a query based on domain knowledge
CN104361109B (zh) 确定图片筛选结果的方法和装置
WO2023102923A1 (fr) Procédé et appareil de détermination pour un programme de conception moléculaire, dispositif et support de stockage
CN106933846A (zh) 肿瘤相关科学文献和科学数据的非结构化整合分析方法
CN102411584B (zh) 一种数据搜索方法及系统
CN112231535B (zh) 一种农业病虫害领域多模态数据集制作方法、处理装置和存储介质
CN103793504B (zh) 一种基于用户偏好与项目属性的聚类初始点选择方法
WO2015176624A1 (fr) Procédé et système d'identification de termes de recherche d'actualité immédiate
CN105989167A (zh) 基于新闻客户端的数据采集方法及装置
CN105740246B (zh) 基于图数据的集合关键字查询方法
CN110069508A (zh) 基于大数据的数据分析方法、装置及终端设备
CN114595372A (zh) 场景推荐方法、装置、计算机设备和存储介质
CN117540095A (zh) 基于协同过滤和TransH的图书推荐方法、装置及设备
JP2009070210A (ja) カテゴリ別ランキング作成装置
CN108595521B (zh) 信息的检索方法、装置、存储介质和电子装置
CN105407157B (zh) 视频信息推送方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21966841

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21966841

Country of ref document: EP

Kind code of ref document: A1