CN116646028A - Training method and device of molecular docking model, electronic equipment and storage medium - Google Patents
Training method and device of molecular docking model, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN116646028A CN116646028A CN202310589399.0A CN202310589399A CN116646028A CN 116646028 A CN116646028 A CN 116646028A CN 202310589399 A CN202310589399 A CN 202310589399A CN 116646028 A CN116646028 A CN 116646028A
- Authority
- CN
- China
- Prior art keywords
- protein
- drug molecule
- data
- combination
- prediction
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C20/00—Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
- G16C20/50—Molecular design, e.g. of drugs
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B15/00—ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
- G16B15/30—Drug targeting using structural data; Docking or binding prediction
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C20/00—Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
- G16C20/70—Machine learning, data mining or chemometrics
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Crystallography & Structural Chemistry (AREA)
- Artificial Intelligence (AREA)
- Medicinal Chemistry (AREA)
- Pharmacology & Pharmacy (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Biophysics (AREA)
- Biotechnology (AREA)
- Evolutionary Biology (AREA)
- Bioethics (AREA)
- Epidemiology (AREA)
- Public Health (AREA)
- Investigating Or Analysing Biological Materials (AREA)
Abstract
Description
技术领域technical field
本公开涉及人工智能技术领域,具体涉及生物计算、深度学习技术领域,尤其涉及一种分子对接模型的训练方法、装置和电子设备及存储介质。The present disclosure relates to the technical field of artificial intelligence, specifically to the technical fields of biological computing and deep learning, and in particular to a training method, device, electronic equipment and storage medium for a molecular docking model.
背景技术Background technique
在药物研发过程中,通常为了找到安全有效的药物,需要对数千种化合物进行测试,因此药物和靶点之间的亲和力进行预测成为药物筛选过程中至关重要的一步。然而,现有的亲和力预测方法对蛋白与药物分子探索能力有限,且探索时间成本大,成功率低,不适合大规模虚拟筛选等问题,所以亟待寻找一种更优的药物靶点亲和力预测方法。In the process of drug development, usually thousands of compounds need to be tested in order to find safe and effective drugs, so the prediction of the affinity between the drug and the target becomes a crucial step in the drug screening process. However, the existing affinity prediction methods have limited ability to explore proteins and drug molecules, and the cost of exploration time is high, the success rate is low, and they are not suitable for large-scale virtual screening. Therefore, it is urgent to find a better drug target affinity prediction method .
发明内容Contents of the invention
本公开提供了一种用于分子对接模型的训练方法、药物靶点的信息获取方法、装置、电子设备及存储介质。The disclosure provides a training method for a molecular docking model, a drug target information acquisition method, a device, an electronic device, and a storage medium.
根据本公开的一方面,提供了一种分子对接模型的训练方法,包括:获取用于训练的蛋白与药物分子组合,并对所述蛋白与药物分子组合进行仿真,得到所述蛋白与药物分子组合的标注数据;将所述蛋白与药物分子组合输入初始分子对接模型中,对所述初始分子对接模型进行多任务训练,得到所述蛋白与药物分子组合的预测数据;基于所述标注数据与所述预测数据间的差异,对所述初始分子对接模型进行修正并返回继续训练,直至得到训练好的分子对接模型。According to one aspect of the present disclosure, a training method for a molecular docking model is provided, including: obtaining a combination of a protein and a drug molecule for training, and simulating the combination of the protein and a drug molecule to obtain the combination of the protein and a drug molecule Combined labeling data; input the protein and drug molecule combination into the initial molecular docking model, perform multi-task training on the initial molecular docking model, and obtain the prediction data of the protein and drug molecule combination; based on the labeling data and For the difference between the predicted data, the initial molecular docking model is corrected and returned to continue training until a trained molecular docking model is obtained.
根据本公开的另一方面,提供了一种药物靶点的信息获取方法,包括:获取模块,用于获取靶点蛋白与候选药物分子组合,并将所述靶点蛋白与候选药物分子组合输入预训练的目标分子对接模型中,获取所述靶点蛋白与候选药物分子组合的结合构象数据和所述靶点蛋白与候选药物分子间的亲和力数据;其中,所述目标分子对接模型采用本公开的训练方法训练出的模型。According to another aspect of the present disclosure, there is provided a drug target information acquisition method, including: an acquisition module for acquiring a combination of a target protein and a candidate drug molecule, and inputting the combination of the target protein and a candidate drug molecule In the pre-trained target molecule docking model, the binding conformation data of the combination of the target protein and the candidate drug molecule and the affinity data between the target protein and the candidate drug molecule are obtained; wherein, the target molecule docking model adopts the present disclosure The model trained by the training method.
根据本公开的另一方面,提供了一种分子对接模型的训练装置,包括:获取模块,用于获取用于训练的蛋白与药物分子组合,并对所述蛋白与药物分子组合进行仿真,得到所述蛋白与药物分子组合的标注数据;训练模块,用于将所述蛋白与药物分子组合输入初始分子对接模型中,对所述初始分子对接模型进行多任务训练,得到所述蛋白与药物分子组合的预测数据;修正模块,用于基于所述标注数据与所述预测数据间的差异,对所述初始分子对接模型进行修正并返回继续训练,直至得到训练好的分子对接模型。According to another aspect of the present disclosure, a training device for a molecular docking model is provided, including: an acquisition module for acquiring a combination of a protein and a drug molecule for training, and simulating the combination of the protein and a drug molecule to obtain Labeling data of the combination of the protein and the drug molecule; a training module for inputting the combination of the protein and the drug molecule into the initial molecular docking model, performing multi-task training on the initial molecular docking model, and obtaining the protein and the drug molecule Combined prediction data; a correction module, configured to correct the initial molecular docking model based on the difference between the labeled data and the prediction data and return to continue training until a trained molecular docking model is obtained.
根据本公开的另一方面,提供了一种药物靶点的信息获取装置,包括:获取模块,用于获取靶点蛋白与候选药物分子组合,并将所述靶点蛋白与候选药物分子组合输入预训练的目标分子对接模型中,获取所述靶点蛋白与候选药物分子组合的结合构象数据和所述靶点蛋白与候选药物分子间的亲和力数据。其中,所述目标分子对接模型采用本公开的训练方法训练出的模型。According to another aspect of the present disclosure, an information acquisition device for a drug target is provided, including: an acquisition module for acquiring a combination of a target protein and a candidate drug molecule, and inputting the combination of the target protein and a candidate drug molecule In the pre-trained target molecule docking model, the binding conformation data of the combination of the target protein and the candidate drug molecule and the affinity data between the target protein and the candidate drug molecule are obtained. Wherein, the target molecule docking model adopts the model trained by the training method of the present disclosure.
根据本公开的另一方面,提供了一种电子设备,包括:至少一个处理器;以及与所述至少一个处理器通信连接的存储器;其中,所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行上述一方面实施例所述的分子对接模型的训练方法。According to another aspect of the present disclosure, there is provided an electronic device, including: at least one processor; and a memory communicatively connected to the at least one processor; Executable instructions, the instructions are executed by the at least one processor, so that the at least one processor can execute the molecular docking model training method described in the above-mentioned one embodiment.
根据本公开另一方面,提供了一种存储有计算机指令的非瞬时计算机可读存储介质,其上存储有计算机程序/指令,所述计算机指令用于使所述计算机执行上述一方面实施例所述的分子对接模型的训练方法。According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions, on which computer programs/instructions are stored, and the computer instructions are used to cause the computer to execute the above-mentioned one aspect embodiment. The training method of the molecular docking model described above.
根据本公开的另一方面,提供了一种计算机程序产品,包括计算机程序/指令,所述计算机程序/指令被处理器执行时实现上述一方面实施例所述的分子对接模型的训练方法。According to another aspect of the present disclosure, a computer program product is provided, including computer programs/instructions, and when the computer programs/instructions are executed by a processor, the method for training molecular docking models described in the above-mentioned one embodiment is implemented.
应当理解,本部分所描述的内容并非旨在标识本公开的实施例的关键或重要特征,也不用于限制本公开的范围。本公开的其它特征将通过以下的说明书而变得容易理解。It should be understood that what is described in this section is not intended to identify key or important features of the embodiments of the present disclosure, nor is it intended to limit the scope of the present disclosure. Other features of the present disclosure will be readily understood through the following description.
附图说明Description of drawings
附图用于更好地理解本方案,不构成对本公开的限定。其中:The accompanying drawings are used to better understand the present solution, and do not constitute a limitation to the present disclosure. in:
图1为本公开实施例提供的一种分子对接模型的训练方法的流程示意图;FIG. 1 is a schematic flowchart of a training method for a molecular docking model provided by an embodiment of the present disclosure;
图2为本公开实施例提供的另一种分子对接模型的训练方法的流程示意图;FIG. 2 is a schematic flowchart of another training method for a molecular docking model provided by an embodiment of the present disclosure;
图3为本公开实施例提供的另一种分子对接模型的训练方法的流程示意图;FIG. 3 is a schematic flowchart of another training method for a molecular docking model provided by an embodiment of the present disclosure;
图4为本公开实施例提供的另一种分子对接模型的训练方法的流程示意图;FIG. 4 is a schematic flowchart of another training method for a molecular docking model provided by an embodiment of the present disclosure;
图5为本公开实施例提供的分子对接模型的预训练过程示意图;Fig. 5 is a schematic diagram of the pre-training process of the molecular docking model provided by the embodiment of the present disclosure;
图6为本公开实施例提供的另一种分子对接模型的训练方法的流程示意图;FIG. 6 is a schematic flowchart of another training method for a molecular docking model provided by an embodiment of the present disclosure;
图7为本公开实施例提供的一种药物靶点的信息获取方法的流程示意图;FIG. 7 is a schematic flowchart of a method for obtaining information on a drug target provided by an embodiment of the present disclosure;
图8为本公开实施例提供的预测蛋白与药物分子亲和力数据及结合构象的示意图;Fig. 8 is a schematic diagram of predicted protein-drug molecule affinity data and binding conformation provided by the embodiments of the present disclosure;
图9为本公开实施例提供的一种分子对接模型的训练装置的结构示意图;9 is a schematic structural diagram of a training device for a molecular docking model provided by an embodiment of the present disclosure;
图10为本公开实施例提供的一种药物靶点的信息获取装置的结构示意图;FIG. 10 is a schematic structural diagram of an information acquisition device for a drug target provided by an embodiment of the present disclosure;
图11为用来实现本公开实施例的分子对接模型的训练方法的电子设备的框图。FIG. 11 is a block diagram of an electronic device used to implement the method for training a molecular docking model according to an embodiment of the present disclosure.
具体实施方式Detailed ways
以下结合附图对本公开的示范性实施例做出说明,其中包括本公开实施例的各种细节以助于理解,应当将它们认为仅仅是示范性的。因此,本领域普通技术人员应当认识到,可以对这里描述的实施例做出各种改变和修改,而不会背离本公开的范围和精神。同样,为了清楚和简明,以下的描述中省略了对公知功能和结构的描述。Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and they should be regarded as exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
下面参考附图描述本公开实施例的分子对接模型的训练方法、装置和电子设备。The following describes the training method, device and electronic equipment of the molecular docking model of the embodiments of the present disclosure with reference to the accompanying drawings.
人工智能(Artificial Intelligence,简称AI),是研究使计算机来模拟人生的某些思维过程和智能行为(如学习、推理、思考、规划等)的学科,既有硬件层面的技术,也有软件层面的技术。人工智能硬件技术一般包括计算机视觉技术、语音识别技术、自然语言处理技术以及及其学习/深度学习、大数据处理技术、知识图谱技术等几大方面。Artificial Intelligence (AI for short) is a subject that studies certain thinking processes and intelligent behaviors (such as learning, reasoning, thinking, planning, etc.) technology. Artificial intelligence hardware technology generally includes computer vision technology, speech recognition technology, natural language processing technology and its learning/deep learning, big data processing technology, knowledge map technology and other major aspects.
自然语言处理(Natural Language Processing,NLP)是计算机科学领域与人工智能领域中的一个重要方向。它研究能实现人与计算机之间用自然语言进行有效通信的各种理论和方法。自然语言处理是一门融语言学、计算机科学、数学于一体的科学。自然语言处理主要应用于机器翻译、舆情监测、自动摘要、观点提取、文本分类、问题回答、文本语义对比、语音识别等方面。Natural Language Processing (NLP) is an important direction in the field of computer science and artificial intelligence. It studies various theories and methods that can realize effective communication between humans and computers using natural language. Natural language processing is a science that combines linguistics, computer science, and mathematics. Natural language processing is mainly used in machine translation, public opinion monitoring, automatic summarization, opinion extraction, text classification, question answering, text semantic comparison, speech recognition, etc.
深度学习(Deep Learning,简称DL),是机器学习(Machine Learning,简称ML)领域中一个新的研究方向,它被引入机器学习使其更接近于最初的目标——人工智能。深度学习是学习样本数据的内在律和表示层次,这些学习过程中获得的信息对诸如文字,图像和声音等数据的解释有很大的帮助。它的最终目标是让机器能够像人一样具有分析学习能力,能够识别文字、图像和声音等数据。深度学习是一个复杂的机器学习算法,在语音和图像识别方面取得的效果,远远超过先前相关技术。Deep Learning (DL for short) is a new research direction in the field of Machine Learning (ML for short). It is introduced into machine learning to make it closer to the original goal-artificial intelligence. Deep learning is to learn the internal law and representation level of sample data. The information obtained in the learning process is of great help to the interpretation of data such as text, images and sounds. Its ultimate goal is to enable machines to have the ability to analyze and learn like humans, and to be able to recognize data such as text, images, and sounds. Deep learning is a complex machine learning algorithm that has achieved results in speech and image recognition that far exceed previous related technologies.
智能搜索是结合了人工智能技术的新一代搜索引擎。他除了能提供传统的快速检索、相关度排序等功能,还能提供用户角色登记、用户兴趣自动识别、内容的语义理解、智能信息化过滤和推送等功能。Smart search is a new generation search engine combined with artificial intelligence technology. In addition to providing traditional functions such as fast retrieval and relevance ranking, it can also provide functions such as user role registration, automatic identification of user interests, semantic understanding of content, intelligent information filtering and push, etc.
计算机视觉,计算机视觉是一门研究如何使机器“看”的科学,更进一步的说,就是指用摄影机和电脑代替人眼对目标进行识别、跟踪和测量等机器视觉,并进一步做图形处理,使电脑处理成为更适合人眼观察或传送给仪器检测的图像。作为一个科学学科,计算机视觉研究相关的理论和技术,试图建立能够从图像或者多维数据中获取“信息”的人工智能系统,可以用来帮助做一个“决定”的信息。因为感知可以看作是从感官信号中提取信息,所以计算机视觉也可以看作是研究如何使人工系统从图像或多维数据中“感知”的科学。Computer vision, computer vision is a science that studies how to make machines "see". More specifically, it refers to using cameras and computers instead of human eyes to identify, track and measure targets, and further graphics processing. Make the computer process into an image that is more suitable for human observation or sent to the instrument for detection. As a scientific discipline, computer vision studies related theories and technologies, trying to build an artificial intelligence system that can obtain "information" from images or multi-dimensional data, which can be used to help make a "decision". Because perception can be thought of as extracting information from sensory signals, computer vision can also be thought of as the science of how to make artificial systems "perceive" from images or multidimensional data.
图像处理(image processing)技术,用计算机对图像进行分析,以达到所需结果的技术。又称影像处理。图像处理一般指数字图像处理。数字图像是指用工业相机、摄像机、扫描仪等设备经过拍摄得到的一个大的二维数组,该数组的元素称为像素,其值称为灰度值。图像处理技术一般包括图像压缩,增强和复原,匹配、描述和识别3个部分。Image processing (image processing) technology, using a computer to analyze images to achieve the desired results. Also known as image processing. Image processing generally refers to digital image processing. A digital image refers to a large two-dimensional array obtained by shooting with industrial cameras, video cameras, scanners and other equipment. The elements of this array are called pixels, and their values are called grayscale values. Image processing technology generally includes three parts: image compression, enhancement and restoration, matching, description and recognition.
机器翻译(machine translation),又称为自动翻译,是利用计算机将一种自然语言(源语言)转换为另一种自然语言(目标语言)的过程。它是计算语言学的一个分支,是人工智能的目标之一。Machine translation, also known as automatic translation, is the process of using a computer to convert a natural language (source language) into another natural language (target language). It is a branch of computational linguistics and one of the goals of artificial intelligence.
生物计算是一种借鉴生物系统的原理和机制来解决计算问题的领域。它将生物学的一些特性和过程应用于计算系统中,以改进计算效率和性能。生物计算的目标是从生物系统中获取灵感,并将其转化为新的计算方法和技术,以解决复杂的问题。它在优化、模式识别、数据分析和仿真等领域都有广泛的应用,并且正在不断发展和扩展。Biocomputing is a field that borrows principles and mechanisms from biological systems to solve computational problems. It applies some properties and processes of biology to computing systems to improve computing efficiency and performance. The goal of biocomputing is to take inspiration from biological systems and translate it into new computational methods and techniques to solve complex problems. It has a wide range of applications in optimization, pattern recognition, data analysis, and simulation, and is constantly being developed and expanded.
图1为本公开实施例提供的一种分子对接模型的训练方法的流程示意图。FIG. 1 is a schematic flowchart of a training method for a molecular docking model provided by an embodiment of the present disclosure.
如图1所示,该分子对接模型的训练方法,可包括:As shown in Figure 1, the training method of the molecular docking model may include:
S101,获取用于训练的蛋白与药物分子组合,并对蛋白与药物分子组合进行仿真,得到蛋白与药物分子组合的标注数据。S101, acquiring a combination of protein and drug molecule for training, and simulating the combination of protein and drug molecule to obtain labeled data of the combination of protein and drug molecule.
需要说明的是,本公开实施例中分子对接模型的训练方法的执行主体可为具有数据信息处理能力的硬件设备和/或驱动该硬件设备工作所需必要的软件。可选地,执行主体可包括服务器、计算机、用户终端及其他智能设备。可选地,用户终端包括但不限于手机、电脑、智能语音交互设备等。可选地,服务器包括但不限于网络服务器、应用服务器,也可以为分布式系统的服务器,或者是结合了区块链的服务器等。It should be noted that the execution subject of the training method of the molecular docking model in the embodiment of the present disclosure may be a hardware device with data information processing capability and/or the necessary software required to drive the hardware device to work. Optionally, the execution subject may include servers, computers, user terminals and other intelligent devices. Optionally, the user terminal includes, but is not limited to, a mobile phone, a computer, an intelligent voice interaction device, and the like. Optionally, the server includes but is not limited to a web server, an application server, and may also be a server of a distributed system, or a server combined with a block chain, etc.
在一些实现中,可以从源数据库中下载预先构建蛋白结构数据库(Protein DataBank bind,PDBbind)和药物分子数据库。本公开中,可以预先从源数据库中下载蛋白文件,基于蛋白文件,构建蛋白结构数据库。蛋白结构数据库和药物分子数据库包含大规模的、多样性丰富的数据,从而丰富用于训练分子对接模型的训练数据,而且便于提高分子对接模型的泛化能力。In some implementations, pre-built protein structure database (Protein DataBank bind, PDBbind) and drug molecule database can be downloaded from the source database. In the present disclosure, the protein file can be downloaded from the source database in advance, and the protein structure database can be constructed based on the protein file. The protein structure database and the drug molecule database contain large-scale and diverse data, which enriches the training data used to train the molecular docking model and facilitates the improvement of the generalization ability of the molecular docking model.
进一步地,对蛋白结构数据库和药物分子数据库进行随机采样,将采样的蛋白和采样的药物分子进行组合,得到用于训练的蛋白与药物分子组合。通过随机采样,可以避免由于选择的倾向性导致采样结果的偏差,有助于提高采样结果的可靠性。对于大规模数据库,随机采样可以简化采样的过程,从而减少计算成本。Further, the protein structure database and the drug molecule database are randomly sampled, and the sampled protein and the sampled drug molecule are combined to obtain a combination of protein and drug molecule for training. Through random sampling, the deviation of sampling results due to the tendency of selection can be avoided, which helps to improve the reliability of sampling results. For large-scale databases, random sampling can simplify the sampling process, thereby reducing computational costs.
在一些实现中,可以利用分子对接算法,对蛋白与药物分子组合进行仿真模拟,得到蛋白与药物分子组合的标注数据。其中,标注数据可以包括但不限于:蛋白与药物分子组合的结合构象数据、蛋白与药物分子间的亲和力数据、以及药物分子的内部构象数据。分子对接算法可以预测蛋白与药物分子之间的结合构象和亲和力。通过模拟不同蛋白与药物分子的相互作用,可以筛选出具有潜在治疗效果的药物候选物。这有助于加速药物发现和开发过程,并提高筛选的效率。In some implementations, the molecular docking algorithm can be used to simulate the combination of protein and drug molecule to obtain the labeled data of the combination of protein and drug molecule. Wherein, the annotation data may include but not limited to: binding conformation data of protein and drug molecule combination, affinity data between protein and drug molecule, and internal conformation data of drug molecule. Molecular docking algorithms can predict the binding conformation and affinity between proteins and drug molecules. By simulating the interaction of different proteins with drug molecules, drug candidates with potential therapeutic effects can be screened. This helps speed up the drug discovery and development process and increases the efficiency of screening.
可以理解的是,分子对接是计算生物学和药物设计中的一项技术,用于预测和模拟药物分子(配体)与蛋白(靶点)之间的相互作用,可以确定药物分子与蛋白的结合构象和亲和力。It can be understood that molecular docking is a technique in computational biology and drug design, which is used to predict and simulate the interaction between drug molecules (ligands) and proteins (targets), and can determine the interaction between drug molecules and proteins. Binding conformation and affinity.
药物分子是指用于与靶点蛋白相互作用的化合物,它可以是已知药物、天然产物或由计算机生成的分子。配体的化学结构和性质将决定其与靶点之间的相互作用方式和亲和力。A drug molecule is a compound used to interact with a target protein, which can be a known drug, a natural product, or a computer-generated molecule. The chemical structure and properties of the ligand will determine its interaction mode and affinity with the target.
蛋白,为药物和靶点结合后,位于药物分子附近一定范围内靶点中的蛋白。A protein is a protein located in a target within a certain range near the drug molecule after the drug is combined with the target.
结合构象,是药物分子与蛋白原子结合时,在空间中的排布形象。亲和力是指药物分子和蛋白原子的结合强度,表征蛋白质与配体间的相互作用关系的程度的大小,可以是药物分子对蛋白的作用关系大小。亲和力越大代表结合强度越大,则在药物对靶点进行作用时越可能产生活性。药物分子的内部构象,是药物分子在空间中的排布形象。The binding conformation is the image of the arrangement in space when the drug molecule binds to the protein atom. Affinity refers to the binding strength between drug molecules and protein atoms, which characterizes the degree of interaction between proteins and ligands, and can be the degree of interaction between drug molecules and proteins. The greater the affinity, the greater the binding strength, and the more likely the drug will be active when it acts on the target. The internal conformation of drug molecules is the image of the arrangement of drug molecules in space.
本公开实施例中,需要不断从蛋白结构数据库和药物分子数据库中随机采样出蛋白和药物分子,并组合得到蛋白与药物分子组合。进而不断对蛋白与药物分子组合进行仿真,以获得大量的蛋白与药物分子组合的标注数据,用于对分子对接模型进行训练。In the embodiments of the present disclosure, proteins and drug molecules need to be continuously randomly sampled from the protein structure database and the drug molecule database, and combined to obtain a protein-drug molecule combination. Then, the combination of protein and drug molecule is continuously simulated to obtain a large amount of labeled data of protein and drug molecule combination, which is used to train the molecular docking model.
S102,将蛋白与药物分子组合输入初始分子对接模型中,对初始分子对接模型进行多任务训练,得到蛋白与药物分子组合的预测数据。S102, input the combination of protein and drug molecule into the initial molecular docking model, perform multi-task training on the initial molecular docking model, and obtain the prediction data of the combination of protein and drug molecule.
在一些实现中,可以对蛋白与药物分子组合进行特征提取,将提取到的特征信息输入到初始分子对接模型中,利用深度学习模型对初始分子对接模型进行多任务训练。其中,多任务训练辅助分子对接模型输出更好的预测数据。多任务可以包括但不限于预测药物分子的内部构象的第一任务、预测蛋白与药物分子组合的结合构象的第二任务、预测蛋白与药物分子间的亲和力的第三任务。本公开实施例中对此不做限定。In some implementations, features can be extracted from the combination of protein and drug molecules, the extracted feature information can be input into the initial molecular docking model, and the deep learning model can be used to perform multi-task training on the initial molecular docking model. Among them, multi-task training assists the molecular docking model to output better prediction data. The multitasking may include but not limited to the first task of predicting the internal conformation of the drug molecule, the second task of predicting the binding conformation of the combination of the protein and the drug molecule, and the third task of predicting the affinity between the protein and the drug molecule. This is not limited in the embodiments of the present disclosure.
可选地,可以将蛋白与药物分子组合的特征信息输入至第一任务进行训练,以得到预测的药物分子的内部构象数据。可选地,可以将蛋白与药物分子组合的特征信息输入至第二任务进行训练,以得到预测的蛋白与药物分子组合的结合构象数据。可选地,可以将蛋白与药物分子组合的特征信息输入至第三任务进行训练,以得到预测的蛋白与药物分子间的亲和力数据。Optionally, the feature information of the combination of protein and drug molecule can be input into the first task for training, so as to obtain the predicted internal conformation data of the drug molecule. Optionally, the characteristic information of the combination of protein and drug molecule can be input into the second task for training, so as to obtain the predicted binding conformation data of the combination of protein and drug molecule. Optionally, the characteristic information of the combination of protein and drug molecule can be input into the third task for training, so as to obtain the predicted affinity data between protein and drug molecule.
在一些实现中,可以对蛋白与药物分子组合的特征信息进行分组,并确定不同分组内的特征信息选用不同的训练任务,对初始分子对接模型进行训练。In some implementations, the feature information of protein and drug molecule combinations can be grouped, and different training tasks can be selected for the feature information in different groups to train the initial molecular docking model.
S103,基于标注数据与预测数据间的差异,对初始分子对接模型进行修正并返回继续训练,直至得到训练好的分子对接模型。S103, based on the difference between the labeled data and the predicted data, correct the initial molecular docking model and return to continue training until a trained molecular docking model is obtained.
其中,初始分子对接模型,可以是为任意结构的网络模型,本公开对此不做限定。Wherein, the initial molecular docking model may be a network model with any structure, which is not limited in this disclosure.
在一些实现中,预测数据可以包括但不限于药物分子的内部构象的预测数据、蛋白与药物分子组合的结合构象预测数据、蛋白与药物分子间的亲和力预测数据。在训练过程中可以获取标注数据与预测数据间的差异,确定修正梯度,之后基于修正梯度,对初始分子对接模型进行反向修正,直至获取最终训练好的对接模型。In some implementations, the prediction data may include, but not limited to, the prediction data of the internal conformation of the drug molecule, the prediction data of the binding conformation of the combination of the protein and the drug molecule, and the prediction data of the affinity between the protein and the drug molecule. During the training process, the difference between the labeled data and the predicted data can be obtained, the correction gradient can be determined, and then based on the correction gradient, the initial molecular docking model can be reversely corrected until the final trained docking model is obtained.
在一些实现中,根据该差异计算分子对接模型的损失函数。基于该损失函数对分子对接模型的模型参数进行修正,并继续基于后续的蛋白与药物分子组合对调整后的分子对接模型进行训练,直至满足训练结束条件,得到训练好的分子对接模型。In some implementations, a loss function for the molecular docking model is calculated from the difference. Correct the model parameters of the molecular docking model based on the loss function, and continue to train the adjusted molecular docking model based on the subsequent combination of protein and drug molecules until the training end condition is met, and a trained molecular docking model is obtained.
可以理解的是,训练结束条件可以是分子对接模型的训练时长到达设定值。训练结束条件还可以是设置分子对接模型输出的预测数据的准确度。It can be understood that the training end condition may be that the training time of the molecular docking model reaches a set value. The training end condition can also be setting the accuracy of the prediction data output by the molecular docking model.
根据本公开实施例的分子对接模型的训练方法,基于蛋白结构数据库和药物分子数据库中的数据,对蛋白和药物分子进行随机采样,得到蛋白和药物分子组合,并不断蛋白和药物分子组合进行仿真,得到标注数据。将蛋白和药物分子组合输入到初始分子对接模型进行多任务训练,以获得泛化能力强的分子对接模型。基于标注数据和预测数据的差异对模型进行修正,通过修正分子对接模型中的缺陷和偏差,可以改进对结合构象和亲和力的预测,提高预测结果的可靠性和准确性。进一步地,可以提高模型对蛋白与药物分子探索能力,减少探索时间,提高分子对接模型预测数据准确率。According to the training method of the molecular docking model in the embodiment of the present disclosure, based on the data in the protein structure database and the drug molecule database, randomly sample the protein and the drug molecule, obtain the combination of the protein and the drug molecule, and continuously simulate the combination of the protein and the drug molecule , to get the labeled data. Input the combination of protein and drug molecules into the initial molecular docking model for multi-task training to obtain a molecular docking model with strong generalization ability. The model is corrected based on the difference between the labeled data and the predicted data. By correcting the defects and deviations in the molecular docking model, the prediction of the binding conformation and affinity can be improved, and the reliability and accuracy of the prediction results can be improved. Further, it can improve the model's ability to explore proteins and drug molecules, reduce the exploration time, and improve the accuracy of molecular docking model prediction data.
图2为本公开实施例提供的一种分子对接模型的训练方法的流程示意图。FIG. 2 is a schematic flowchart of a training method for a molecular docking model provided by an embodiment of the present disclosure.
如图2所示,该分子对接模型的训练方法,可包括:As shown in Figure 2, the training method of the molecular docking model may include:
S201,获取用于训练的蛋白与药物分子组合,并对蛋白与药物分子组合进行仿真,得到蛋白与药物分子组合的标注数据。S201, acquiring a combination of protein and drug molecule for training, and simulating the combination of protein and drug molecule to obtain labeled data of the combination of protein and drug molecule.
步骤S201的相关内容可参见上述实施例,这里不再赘述。Relevant content of step S201 may refer to the foregoing embodiments, and details are not repeated here.
S202,对蛋白与药物分子组合进行特征提取。S202, performing feature extraction on the combination of protein and drug molecule.
在一些实现中,可以通过分子描述符对蛋白与药物分子组合进行特征提取,得到蛋白与药物分子组合的特征信息。可以理解的是,特征信息包括但不限于:分子量、氢键受体数量、氢键供体数量、可旋转键数量、脂水分配系数、特定官能团的数量、原子结构、原子电性、原子键的类型和数量、蛋白与药物分子的共价键、非供价键和距离等信息。其中,共价键,是两个或多个原子共同使用它们的外层电子,在理想情况下达到电子饱和的状态,由此组成比较稳定和坚固的化学结构,也就是说,当蛋白原子与药物分子间存在共价键时,可以确定两者的结合强度较高。在本公开中,蛋白原子和药物分子间的共价键信息可以在数据集得到。In some implementations, molecular descriptors can be used to extract features of the combination of protein and drug molecules to obtain feature information of the combination of protein and drug molecules. It can be understood that characteristic information includes, but is not limited to: molecular weight, number of hydrogen bond acceptors, number of hydrogen bond donors, number of rotatable bonds, lipid-water partition coefficient, number of specific functional groups, atomic structure, atomic electrical properties, atomic bonds Information such as the type and quantity of protein and drug molecules, covalent bonds, non-supply bonds, and distances between proteins and drug molecules. Among them, the covalent bond is that two or more atoms use their outer electrons together, and ideally reach the state of electron saturation, thus forming a relatively stable and strong chemical structure, that is, when protein atoms and When there is a covalent bond between the drug molecules, it can be determined that the binding strength of the two is relatively high. In this disclosure, covalent bond information between protein atoms and drug molecules can be obtained in the dataset.
S203,对提取的特征信息进行编码,得到蛋白与药物分子组合对应的特征向量表示。S203, encoding the extracted feature information to obtain a feature vector representation corresponding to the combination of the protein and the drug molecule.
在一些实现中,可以通过预训练的编码器对蛋白与药物分子组合的特征信息进行编码,得到对应的编码信息。例如编码器可以是循环神经网络(Recurrent neuralnetwork,RNN)、变分自编码器(Variational Auto Encoder,VAE)以及自注意力编码器。In some implementations, the characteristic information of the combination of the protein and the drug molecule can be encoded by a pre-trained encoder to obtain the corresponding encoded information. For example, the encoder may be a recurrent neural network (Recurrent neural network, RNN), a variational autoencoder (Variational Auto Encoder, VAE), and a self-attention encoder.
例如,可以使用自注意力编码器对蛋白与药物分子组合的特征信息进行编码,通过编码器中的自注意力机制进行输入特征信息进行关联输出,使得输出的特征向量表示可以包含输入的多个维度上的特征信息之间的关联关系。本公开实施例中对此不做限定。For example, a self-attention encoder can be used to encode the feature information of the combination of protein and drug molecules, and the input feature information can be associated and output through the self-attention mechanism in the encoder, so that the output feature vector representation can contain multiple input The association relationship between the feature information on the dimension. This is not limited in the embodiments of the present disclosure.
S204,对特征向量表示进行多预测头训练,得到蛋白与药物分子组合的预测数据。S204, performing multi-prediction head training on the feature vector representation to obtain prediction data of combinations of proteins and drug molecules.
需要说明的是,多预测头训练是一种在神经网络中使用多个预测头进行并行预测的方法。每个预测头对应一个预测任务,可以对应输出预测数据。在本公开中,对蛋白与药物分子组合对应的特征向量表示进行多预测头训练,可以使分子对接模型获得更好的泛化能力。其中,每个预测头输出不同的预测数据结果。It should be noted that multi-prediction head training is a method of using multiple prediction heads in a neural network for parallel prediction. Each prediction head corresponds to a prediction task and can output prediction data correspondingly. In the present disclosure, multi-prediction head training is performed on the feature vector representation corresponding to the combination of protein and drug molecule, so that the molecular docking model can obtain better generalization ability. Wherein, each prediction head outputs different prediction data results.
在一些实现中,可以将特征向量表示输入到分子对接模型中进行多预测头训练,得到输出的预测数据。可选地,将特征向量表示输入第一预测头中,由第一预测头基于特征向量表示得到药物分子的内部构象数据。其中,药物分子的内部构象数据可以包含了原子间成键、化学键键角、电子层之间相互作用等信息,这些信息可以让模型学习到精细的空间结构和原子作用,对于预测出的几何构象的合法性具有重要的指导意义。In some implementations, the feature vector representation can be input into the molecular docking model for multi-prediction head training to obtain the output prediction data. Optionally, the feature vector representation is input into the first prediction head, and the internal conformation data of the drug molecule is obtained by the first prediction head based on the feature vector representation. Among them, the internal conformation data of drug molecules can include information such as interatomic bonds, chemical bond angles, and interactions between electron layers. This information allows the model to learn the fine spatial structure and atomic interactions. For the predicted geometric conformation The legitimacy of the law has important guiding significance.
可选地,将特征向量表示输入第二预测头中,由第二预测头基于所述特征向量表示,得到所述蛋白与药物分子组合的结合构象数据。其中,蛋白与药物分子间的结合构象可以包括药物分子与蛋白原子间的距离,药物分子结合后的扭转角变化,平移旋转大小等信息,这些信息可以让模型学习到药物分子与蛋白间的互相作用,可以辅助模型更为准确地预测出结合后的蛋白与药物分子之间的结合构象,进一步地,精准结合构象可以对预测蛋白与药物分子间的结合力大小供帮助。Optionally, the feature vector representation is input into the second prediction head, and the binding conformation data of the combination of the protein and the drug molecule is obtained by the second prediction head based on the feature vector representation. Among them, the binding conformation between the protein and the drug molecule can include information such as the distance between the drug molecule and the protein atom, the change of the torsion angle after the drug molecule is combined, and the size of the translational rotation. These information allow the model to learn the interaction between the drug molecule and the protein. The function can assist the model to more accurately predict the binding conformation between the bound protein and the drug molecule, and further, the precise binding conformation can help predict the binding force between the protein and the drug molecule.
可选地,将特征向量表示输入第三预测头中进行训练,得到第三预测头输出的蛋白与药物分子间的亲和力数据。Optionally, the feature vector representation is input into the third prediction head for training, and the affinity data between the protein and the drug molecule output by the third prediction head is obtained.
在本公开实施例中,预测数据可以是药物分子的内部构象数据、蛋白与药物分子组合的结合构象数据、以及蛋白与药物分子间的亲和力数据。其中,蛋白与药物分子间的亲和力数据,可以反应出蛋白与药物分子的互相作用。例如,可以包括蛋白与药物分子结合或排斥、结合亲和力大小等信息。亲和力数据可以帮助筛选出与选定蛋白具有较好互相作用的药物分子。In the embodiment of the present disclosure, the prediction data may be the internal conformation data of the drug molecule, the binding conformation data of the combination of the protein and the drug molecule, and the affinity data between the protein and the drug molecule. Among them, the affinity data between proteins and drug molecules can reflect the interaction between proteins and drug molecules. For example, information such as protein binding or repulsion with drug molecules, binding affinity, etc. can be included. Affinity data can help screen out drug molecules that interact well with selected proteins.
S205,基于标注数据与预测数据间的差异,对初始分子对接模型进行修正并返回继续训练,直至得到训练好的分子对接模型。S205. Based on the difference between the labeled data and the predicted data, correct the initial molecular docking model and return to continue training until a trained molecular docking model is obtained.
步骤S205的相关内容可参见上述实施例,这里不再赘述。Relevant content of step S205 may refer to the foregoing embodiments, and details are not repeated here.
根据本公开实施例的分子对接模型的训练方法,基于蛋白结构数据库和药物分子数据库中的数据,对蛋白和药物分子进行随机采样,得到蛋白和药物分子组合,并对蛋白和药物分子组合进行仿真,得到标注数据。同时,对蛋白和药物分子组合进行特征提取,将提取的特征信息输入到初始分子对接模型进行多任务训练,以获得泛化能力强的分子对接模型。基于标注数据和预测数据的差异对模型进行修正,通过修正分子对接模型中的缺陷和偏差,可以改进对结合构象和亲和力的预测,提高预测结果的可靠性和准确性。进一步地,可以提高模型对蛋白与药物分子探索能力,减少探索时间,提高分子对接模型预测数据准确率。According to the training method of the molecular docking model of the embodiment of the present disclosure, based on the data in the protein structure database and the drug molecule database, randomly sample the protein and the drug molecule, obtain the combination of the protein and the drug molecule, and simulate the combination of the protein and the drug molecule , to get the labeled data. At the same time, feature extraction is performed on the combination of protein and drug molecules, and the extracted feature information is input into the initial molecular docking model for multi-task training to obtain a molecular docking model with strong generalization ability. The model is corrected based on the difference between the labeled data and the predicted data. By correcting the defects and deviations in the molecular docking model, the prediction of the binding conformation and affinity can be improved, and the reliability and accuracy of the prediction results can be improved. Further, it can improve the model's ability to explore proteins and drug molecules, reduce the exploration time, and improve the accuracy of molecular docking model prediction data.
图3为本公开实施例提供的一种分子对接模型的训练方法的流程示意图。FIG. 3 is a schematic flowchart of a training method for a molecular docking model provided by an embodiment of the present disclosure.
如图3所示,该分子对接模型的训练方法,可包括:As shown in Figure 3, the training method of the molecular docking model may include:
S301,获取用于训练的蛋白与药物分子组合,并对蛋白与药物分子组合进行仿真,得到蛋白与药物分子组合的标注数据。S301, acquiring a combination of protein and drug molecule for training, and simulating the combination of protein and drug molecule to obtain labeled data of the combination of protein and drug molecule.
在一些实现中,可以从源数据库中下载蛋白质文件,并基于下载的蛋白质文件构建蛋白结构数据库;从源数据库中下载药物分子文件,并基于下载的药物分子文件构建药物分子数据库。源数据库通常包含大量的蛋白结构数据和药物分子数据,通过下载蛋白质文件和药物分子文件并构建数据库,可以提供丰富的蛋白质和药物信息,包括蛋白质结构、蛋白质序列、药物名称、结构、物理化学性质、生物活性等信息,有助于获得全面的蛋白、药物数据和知识,推动对蛋白和和药物分子的研究。In some implementations, protein files can be downloaded from the source database, and a protein structure database can be constructed based on the downloaded protein files; drug molecule files can be downloaded from the source database, and a drug molecule database can be constructed based on the downloaded drug molecule files. The source database usually contains a large amount of protein structure data and drug molecule data. By downloading protein files and drug molecule files and building a database, a wealth of protein and drug information can be provided, including protein structure, protein sequence, drug name, structure, and physical and chemical properties. , biological activity and other information, it is helpful to obtain comprehensive protein and drug data and knowledge, and promote the research on protein and drug molecules.
可以理解的是,从源数据库中的下载蛋白质文件,可能会存在蛋白质相关信息的缺失,导致构建蛋白结构数据库失败。可选地,对下载的蛋白质文件进行异常识别,获取异常蛋白质文件,并对异常蛋白质文件进行异常修复。进一步地,获取异常蛋白质文件的异常类型,并基于异常类型获取修复程序,并基于修复程序对蛋白质文件进行修改。通过对蛋白质文件相关信息的缺失进行修复,可以确保蛋白质序列的完整性和正确性,以获得更稳定和合理的蛋白质结构,从而可以将完整、准确的蛋白质结构用于后续的分子对接和药物筛选。It is understandable that there may be missing protein-related information from the downloaded protein files in the source database, resulting in failure to construct a protein structure database. Optionally, identify the abnormality of the downloaded protein file, obtain the abnormal protein file, and perform abnormal repair on the abnormal protein file. Further, the abnormal type of the abnormal protein file is obtained, and a repair program is obtained based on the abnormal type, and the protein file is modified based on the repair program. By repairing the missing information of the protein file, the integrity and correctness of the protein sequence can be ensured to obtain a more stable and reasonable protein structure, so that the complete and accurate protein structure can be used for subsequent molecular docking and drug screening .
在一些实现中,基于修复好的蛋白质文件,构建蛋白结构数据库。从蛋白结构数据库中采样蛋白,并从药物分子数据库中获取药物分子。进而对采样的蛋白和采样的药物分子进行组合,得到蛋白与药物分子组合。In some implementations, a database of protein structures is constructed based on the repaired protein files. Proteins are sampled from the Protein Structure Database and drug molecules are obtained from the Drug Molecule Database. Then, the sampled protein and the sampled drug molecule are combined to obtain a combination of protein and drug molecule.
在一些实现中,可以利用蛋白的结合口袋,对对采样到的蛋白和药物分子进行组合。还可以利用基于蛋白的结构信息,对采样到的蛋白和药物分子进行组合。可以理解的是,在蛋白和药物分子进行组合时,存在一个大概率的结合区域,结合口袋在蛋白上找到该区域,引导药物分子在该区域内与蛋白进行组合。In some implementations, the sampled protein and drug molecule can be combined using the binding pocket of the protein. Sampled proteins and drug molecules can also be combined using protein-based structural information. It can be understood that when a protein and a drug molecule are combined, there is a high-probability binding region, and the binding pocket finds this region on the protein, guiding the drug molecule to combine with the protein in this region.
可选地,可以对采样的蛋白的结合口袋进行挖掘,得到采样的蛋白的结合口袋。基于蛋白的结合口袋,对采样到的蛋白和药物分子进行组合,得到蛋白与药物分子组合。蛋白的结合口袋是药物分子与蛋白之间相互作用的关键区域,利用结合口袋对蛋白和药物分子进行组合,可以筛选出具有高结合活性和选择性的药物分子,有助于指导药物的优化和开发。Optionally, the binding pockets of the sampled proteins can be mined to obtain the binding pockets of the sampled proteins. Based on the binding pocket of the protein, the sampled protein and drug molecule are combined to obtain a combination of protein and drug molecule. The binding pocket of a protein is the key area for the interaction between a drug molecule and a protein. Combining proteins and drug molecules using the binding pocket can screen out drug molecules with high binding activity and selectivity, which helps to guide drug optimization and develop.
可选地,确定采样的蛋白的结构信息,并基于蛋白的结构信息,对采样到的蛋白和药物分子进行组合,得到蛋白与药物分子组合。蛋白的结构信息可以揭示蛋白的功能,解析药物与蛋白之间的相互作用机制,根据蛋白的结构信息对蛋白和药物分子进行组合,可以对药物分子的筛选,从而为药物设计提供目标和方向。Optionally, the structural information of the sampled protein is determined, and based on the structural information of the protein, the sampled protein and the drug molecule are combined to obtain a combination of the protein and the drug molecule. The structural information of proteins can reveal the function of proteins, analyze the interaction mechanism between drugs and proteins, combine proteins and drug molecules according to the structural information of proteins, and screen drug molecules, thus providing goals and directions for drug design.
进一步地,基于分子对接算法,对蛋白与药物分子组合进行分子对接仿真,得到蛋白与药物分子组合的结合构象数据,蛋白与药物分子间的亲和力数据以及药物分子的内部构象数据。将仿真得到的结合构象数据、亲和力数据和药物分子的内部构象数据,作为标注数据。Furthermore, based on the molecular docking algorithm, the molecular docking simulation is carried out for the combination of protein and drug molecule, and the binding conformation data of the combination of protein and drug molecule, the affinity data between protein and drug molecule, and the internal conformation data of the drug molecule are obtained. The binding conformation data, affinity data and internal conformation data of drug molecules obtained from the simulation are used as labeling data.
S302,将蛋白与药物分子组合输入初始分子对接模型中,对初始分子对接模型进行多任务训练,得到蛋白与药物分子组合的预测数据。S302, input the combination of protein and drug molecule into the initial molecular docking model, perform multi-task training on the initial molecular docking model, and obtain prediction data of the combination of protein and drug molecule.
步骤S301-S302的相关内容可参见上述实施例,这里不再赘述。Relevant content of steps S301-S302 may refer to the foregoing embodiments, and details are not repeated here.
S303,确定每个预测头对应的标注数据。S303. Determine label data corresponding to each prediction head.
在一些实现中,可以将蛋白与药物分子组合的标注数据与每个预测头进行匹配,确定每个预测头对应的标注数据。可选地,可以将蛋白与药物分子组合的结合构象数据作为第一预测头的标注数据;可以将蛋白与药物分子间的亲和力数据作为第二预测头的标注数据;可以将药物分子的内部构象数据作为第三预测头的标注数据。In some implementations, the annotation data of the combination of protein and drug molecule can be matched with each prediction head, and the annotation data corresponding to each prediction head can be determined. Optionally, the combined conformational data of the protein and the drug molecule can be used as the annotation data of the first prediction head; the affinity data between the protein and the drug molecule can be used as the annotation data of the second prediction head; the internal conformation of the drug molecule can be The data is used as the annotation data of the third prediction head.
S304,获取预测头的预测数据和对应的标注数据的差异,获取初始分子对接模型的第一损失函数。S304. Obtain the difference between the prediction data of the prediction head and the corresponding label data, and obtain the first loss function of the initial molecular docking model.
在一些实现中,可以根据每个预测头的预测数据和对应的标注数据的差异,计算两者的损失函数。可选地,可以为每个预测头定义对应的第二损失函数。例如,可以使用均方误差作为第二损失函数,还可以使用交叉熵作为第二损失函数。In some implementations, the loss function of each prediction head can be calculated according to the difference between the prediction data and the corresponding label data. Optionally, a corresponding second loss function can be defined for each prediction head. For example, mean square error may be used as the second loss function, and cross entropy may also be used as the second loss function.
在一些实现中,根据每个预测头的差异,确定预测头的第二损失函数。可选地,使用均方误差作为第二损失函数,根据每个预测头输出的预测数据与每个预测头对应的标注数据的均方误差,计算每个预测头对应的第二损失函数。In some implementations, a second loss function for the prediction heads is determined based on the difference of each prediction head. Optionally, the mean square error is used as the second loss function, and the second loss function corresponding to each prediction head is calculated according to the mean square error between the prediction data output by each prediction head and the label data corresponding to each prediction head.
进一步地,对每个预测头的第二损失函数进行加权,得到第一损失函数。可选地,加权的方法可以使用均匀加权,也就是每个第二损失函数的权重值相同。Further, the second loss function of each prediction head is weighted to obtain the first loss function. Optionally, the weighting method may use uniform weighting, that is, each second loss function has the same weight value.
S305,基于第一损失函数对初始分子对接模型进行修正。S305. Correct the initial molecular docking model based on the first loss function.
在一些实现中,可以基于第一损失函数对初始分子对接模型的模型参数进行修正。可选地,可以根据第一损失函数确定修正模型参数的修正梯度,进而基于修正梯度,对初始分子对接模型进行反向修正,以提高分子对接模型的性能和稳定性。In some implementations, model parameters of the initial molecular docking model can be modified based on the first loss function. Optionally, the correction gradient of the correction model parameters can be determined according to the first loss function, and then based on the correction gradient, the initial molecular docking model is reversely corrected, so as to improve the performance and stability of the molecular docking model.
S306,继续训练,直至得到训练好的分子对接模型。S306, continue training until a trained molecular docking model is obtained.
步骤S306的相关内容可参见上述实施例,这里不再赘述。Relevant content of step S306 may refer to the foregoing embodiments, and details are not repeated here.
根据本公开实施例的分子对接模型的训练方法,基于蛋白结构数据库和药物分子数据库中的数据,对蛋白和药物分子进行随机采样,得到蛋白和药物分子组合,并不断蛋白和药物分子组合进行仿真,得到标注数据。将蛋白和药物分子组合输入到初始分子对接模型进行多任务训练,以获得泛化能力强的分子对接模型。基于标注数据和预测数据的差异对模型进行修正,通过修正分子对接模型中的缺陷和偏差,可以改进对结合构象和亲和力的预测,提高预测结果的可靠性和准确性。进一步地,可以提高模型对蛋白与药物分子探索能力,减少探索时间,提高分子对接模型预测数据准确率。According to the training method of the molecular docking model in the embodiment of the present disclosure, based on the data in the protein structure database and the drug molecule database, randomly sample the protein and the drug molecule, obtain the combination of the protein and the drug molecule, and continuously simulate the combination of the protein and the drug molecule , to get the labeled data. Input the combination of protein and drug molecules into the initial molecular docking model for multi-task training to obtain a molecular docking model with strong generalization ability. The model is corrected based on the difference between the labeled data and the predicted data. By correcting the defects and deviations in the molecular docking model, the prediction of the binding conformation and affinity can be improved, and the reliability and accuracy of the prediction results can be improved. Further, it can improve the model's ability to explore proteins and drug molecules, reduce the exploration time, and improve the accuracy of molecular docking model prediction data.
图4为本公开实施例提供的一种分子对接模型的训练方法的流程示意图。FIG. 4 is a schematic flowchart of a training method for a molecular docking model provided by an embodiment of the present disclosure.
如图4所示,该分子对接模型的训练方法,可包括:As shown in Figure 4, the training method of the molecular docking model may include:
S401,获取用于训练的蛋白与药物分子组合,并对蛋白与药物分子组合进行仿真,得到蛋白与药物分子组合的标注数据。S401. Obtain a combination of protein and drug molecule for training, and simulate the combination of protein and drug molecule to obtain labeled data of the combination of protein and drug molecule.
S402,对蛋白与药物分子组合进行特征提取。S402, performing feature extraction on the combination of protein and drug molecule.
S403,对提取的特征信息进行编码,得到蛋白与药物分子组合对应的特征向量表示。S403, encoding the extracted feature information to obtain a feature vector representation corresponding to the combination of the protein and the drug molecule.
步骤S401-S403的相关内容可参见上述实施例,这里不再赘述。Relevant content of steps S401-S403 may refer to the foregoing embodiments, and details are not repeated here.
S404,将特征向量表示分别输入到各预测头,由预测头对特征向量表示进行学习,得到预测头的预测数据。S404. Input the feature vector representations to each prediction head respectively, and the prediction heads learn the feature vector representations to obtain prediction data of the prediction heads.
可以理解的是,不同的预测头输出不同的预测数据。预测数据可以是药物分子的内部构象数据、蛋白与药物分子组合的结合构象数据、以及蛋白与药物分子间的亲和力数据。It is understandable that different prediction heads output different prediction data. The prediction data can be the internal conformation data of the drug molecule, the binding conformation data of the combination of the protein and the drug molecule, and the affinity data between the protein and the drug molecule.
可选地,将特征向量表示输入第一预测头中,由第一预测头基于特征向量表示得到药物分子的内部构象数据。Optionally, the feature vector representation is input into the first prediction head, and the internal conformation data of the drug molecule is obtained by the first prediction head based on the feature vector representation.
可选地,将特征向量表示输入第二预测头中,由第二预测头基于特征向量表示,得到蛋白与药物分子组合的结合构象数据。Optionally, the eigenvector representation is input into the second prediction head, and the second prediction head obtains the binding conformation data of the combination of the protein and the drug molecule based on the eigenvector representation.
可选地,将特征向量表示输入第三预测头中进行训练,得到第三预测头输出的蛋白与药物分子间的亲和力数据。Optionally, the feature vector representation is input into the third prediction head for training, and the affinity data between the protein and the drug molecule output by the third prediction head is obtained.
S405,基于预测头的预测数据,得到蛋白与药物分子组合的预测数据。S405. Based on the prediction data of the prediction head, the prediction data of the combination of the protein and the drug molecule is obtained.
可以理解的是,基于预测头的预测数据,可以预测出蛋白与药物分子组合的预测数据。It can be understood that, based on the prediction data of the prediction head, the prediction data of the combination of protein and drug molecule can be predicted.
在一些实现中,药物分子的内部构象包含了原子间成键、化学键键角、电子层之间相互作用等信息,这些信息可以使分子对接模型学习到精细的空间结构和原子作用。因此基于药物分子的内部构象数据可以预测出蛋白与药物分子组合的几何构象。In some implementations, the internal conformation of drug molecules contains information such as interatomic bonds, chemical bond angles, and interactions between electron shells. These information can enable molecular docking models to learn fine spatial structures and atomic interactions. Therefore, based on the internal conformation data of the drug molecule, the geometric conformation of the combination of the protein and the drug molecule can be predicted.
在一些实现中,蛋白与药物分子组合的结合构象包括了药物分子原子与蛋白原子间的距离,药物分子结合后的扭转角变化,平移旋转大小等信息,这些信息可以让分子对接模型学习到药物分子与蛋白间互相作用。因此基于蛋白与药物分子组合的结合构象数据可以预测蛋白与药物分子组合结合力大小。In some implementations, the binding conformation of the combination of protein and drug molecule includes information such as the distance between the drug molecule atom and the protein atom, the change of the torsion angle after the drug molecule is combined, and the size of translation and rotation. This information can allow the molecular docking model to learn the drug Interaction between molecules and proteins. Therefore, based on the binding conformation data of the protein-drug molecule combination, the binding force of the protein-drug molecule combination can be predicted.
在一些实现中,蛋白与药物分子间的亲和力数据包括了蛋白与药物分子结合或排斥、结合的亲和力大小等信息,这些信息可以帮助分子对接模型筛选出更有可能与选定蛋白有互相作用的药物分子。因此基于蛋白与药物分子间的亲和力数据可以预测蛋白与药物分子组合的结合或排斥的亲和力大小。In some implementations, the affinity data between proteins and drug molecules includes information such as binding or repulsion between proteins and drug molecules, and the magnitude of binding affinity. These information can help molecular docking models to screen out more likely to interact with selected proteins. drug molecule. Therefore, based on the affinity data between the protein and the drug molecule, the binding or repelling affinity of the combination of the protein and the drug molecule can be predicted.
S406,基于标注数据与预测数据间的差异,对初始分子对接模型进行修正并返回继续训练,直至得到训练好的分子对接模型。S406. Based on the difference between the labeled data and the predicted data, correct the initial molecular docking model and return to continue training until a trained molecular docking model is obtained.
步骤S406的相关内容可参见上述实施例,这里不再赘述。Relevant content of step S406 may refer to the foregoing embodiments, and details are not repeated here.
如图5所示的分子对接模型的预训练过程,将从蛋白结构数据库和药物分子数据库中采集到的蛋白与药物分子进行组合,得到蛋白与药物分子组合。对蛋白与药物分子组合进行仿真,得到蛋白与药物分子组合的标注数据。对蛋白与药物分子组合进行特征提取,输入到分子对接模型的神经网络中,进行多预测头训练,得到输出的预测数据。计算预测头的预测数据和对应的标注数据损失函数,基于损失函数,对初始分子对接模型进行修正,并继续基于后续的蛋白与药物分子组合对修正后的分子对接模型进行训练,直至满足训练结束条件,得到训练好的分子对接模型。In the pre-training process of the molecular docking model shown in Figure 5, the protein and drug molecules collected from the protein structure database and drug molecule database are combined to obtain the combination of protein and drug molecule. The combination of protein and drug molecule is simulated to obtain the labeled data of the combination of protein and drug molecule. Extract features from the combination of protein and drug molecules, input them into the neural network of the molecular docking model, perform multi-prediction head training, and obtain the output prediction data. Calculate the prediction data of the prediction head and the corresponding label data loss function, based on the loss function, correct the initial molecular docking model, and continue to train the revised molecular docking model based on the subsequent combination of protein and drug molecules until the end of training is satisfied condition, the trained molecular docking model is obtained.
根据本公开实施例的分子对接模型的训练方法,基于蛋白结构数据库和药物分子数据库中的数据,对蛋白和药物分子进行随机采样,得到蛋白和药物分子组合,并对蛋白和药物分子组合进行仿真,得到标注数据。同时,对蛋白和药物分子组合进行特征提取,将提取的特征信息输入到初始分子对接模型进行多任务训练,以获得泛化能力强的分子对接模型。基于标注数据和预测数据的差异对模型进行修正,通过修正分子对接模型中的缺陷和偏差,可以改进对结合构象和亲和力的预测,提高预测结果的可靠性和准确性。进一步地,可以提高模型对蛋白与药物分子探索能力,减少探索时间,提高分子对接模型预测数据准确率。According to the training method of the molecular docking model of the embodiment of the present disclosure, based on the data in the protein structure database and the drug molecule database, randomly sample the protein and the drug molecule, obtain the combination of the protein and the drug molecule, and simulate the combination of the protein and the drug molecule , to get the labeled data. At the same time, feature extraction is performed on the combination of protein and drug molecules, and the extracted feature information is input into the initial molecular docking model for multi-task training to obtain a molecular docking model with strong generalization ability. The model is corrected based on the difference between the labeled data and the predicted data. By correcting the defects and deviations in the molecular docking model, the prediction of the binding conformation and affinity can be improved, and the reliability and accuracy of the prediction results can be improved. Further, it can improve the model's ability to explore proteins and drug molecules, reduce the exploration time, and improve the accuracy of molecular docking model prediction data.
在上述实施例的基础之上,本公开实施例中可以对启动分子对接模型的训练方法后,得到目标分子对接模型的过程,进行解释说明。On the basis of the above embodiments, in the embodiments of the present disclosure, the process of obtaining the target molecular docking model after starting the training method of the molecular docking model can be explained.
图6为本公开实施例提供的一种分子对接模型的训练方法中得到目标分子对接模型的流程示意图。FIG. 6 is a schematic flowchart of obtaining a target molecular docking model in a molecular docking model training method provided by an embodiment of the present disclosure.
如图6所示,该得到目标分子对接模型的方法,可包括:As shown in Figure 6, the method for obtaining the docking model of the target molecule may include:
S601,基于采样到的蛋白与药物分子组合对初始分子对接模型进行多任务训练,得到训练好的分子对接模型。S601, performing multi-task training on the initial molecular docking model based on the sampled protein and drug molecule combinations, to obtain a trained molecular docking model.
步骤S601的相关内容可参见上述实施例,这里不再赘述。Relevant content of step S601 may refer to the foregoing embodiments, and details are not repeated here.
S602,基于预测需求信息,从训练好的分子对接模型中多预测头中需要保留的目标预测头,并对剩余预测头进行解耦,得到目标分子对接模型。S602. Based on the prediction demand information, obtain the target molecular docking model from the target prediction heads that need to be retained among the multi-prediction heads in the trained molecular docking model, and decouple the remaining prediction heads.
在一些实现中,可以根据预测需求信息,确定训练好的分子对接模型中多预测头中需要保留的目标预测头,目标预测头与预测数据结果相关。可选地,可以确定一个或多个预测头为目标预测头。可选地,可以将第一预测头保留,保留药物分子的内部构象的预测数据。可选地,可以将第二预测头保留,保留蛋白与药物分子组合的结合构象的预测数据。可选地,可以将第三预测头保留,保留蛋白与药物分子间的亲和力的预测数据。In some implementations, the target prediction head that needs to be retained among the multiple prediction heads in the trained molecular docking model can be determined according to the prediction requirement information, and the target prediction head is related to the result of the prediction data. Optionally, one or more prediction heads may be determined as target prediction heads. Optionally, the first prediction head may be retained, and the prediction data of the internal conformation of the drug molecule may be retained. Optionally, the second prediction head can be retained to retain the prediction data of the binding conformation of the protein combined with the drug molecule. Optionally, the third prediction head can be retained, retaining the prediction data of the affinity between the protein and the drug molecule.
进一步地,将剩余预测头与目标预测头分开,将剩余的预测头与训练好的分子对接模型进行解耦。可选地,可以预测头进行存储,在需要进行指定的预测任务时,可以调用该预测任务对应的目标预测头,也就是在分子对接模型的主干网络后面接入该目标预测头,通过主干网络和目标预测头,得到预测任务对应的预测数据。Further, the remaining prediction heads are separated from the target prediction heads, and the remaining prediction heads are decoupled from the trained molecular docking model. Optionally, the prediction head can be stored, and when a specified prediction task needs to be performed, the target prediction head corresponding to the prediction task can be called, that is, the target prediction head is connected behind the backbone network of the molecular docking model, and through the backbone network and the target prediction head to obtain the prediction data corresponding to the prediction task.
根据本公开实施例的分子对接模型的训练方法,基于蛋白结构数据库和药物分子数据库中的数据,对蛋白和药物分子进行随机采样,得到蛋白和药物分子组合,并对蛋白和药物分子组合进行仿真,得到标注数据。将蛋白和药物分子组合输入到初始分子对接模型进行多任务训练,以获得泛化能力强的分子对接模型。基于标注数据和预测数据的差异对模型进行修正,通过修正分子对接模型中的缺陷和偏差,可以改进对结合构象和亲和力的预测,提高预测结果的可靠性和准确性。进一步地,可以提高模型对蛋白与药物分子探索能力,减少探索时间,提高分子对接模型预测数据准确率。进一步地,基于需求保留目标预测头,对剩余预测头进行解耦,使得分子对接模型既可以实现多任务的并行处理,也可以使得分子对接模型能够执行单独的让任务,在单任务执行模式下可以降低分子对接模型的复杂度,减少运算资源的占用。According to the training method of the molecular docking model of the embodiment of the present disclosure, based on the data in the protein structure database and the drug molecule database, randomly sample the protein and the drug molecule, obtain the combination of the protein and the drug molecule, and simulate the combination of the protein and the drug molecule , to get the labeled data. Input the combination of protein and drug molecules into the initial molecular docking model for multi-task training to obtain a molecular docking model with strong generalization ability. The model is corrected based on the difference between the labeled data and the predicted data. By correcting the defects and deviations in the molecular docking model, the prediction of the binding conformation and affinity can be improved, and the reliability and accuracy of the prediction results can be improved. Further, it can improve the model's ability to explore proteins and drug molecules, reduce the exploration time, and improve the accuracy of molecular docking model prediction data. Furthermore, the target prediction head is reserved based on the demand, and the remaining prediction heads are decoupled, so that the molecular docking model can not only realize the parallel processing of multi-tasks, but also enable the molecular docking model to perform individual tasks. In the single-task execution mode It can reduce the complexity of the molecular docking model and reduce the occupation of computing resources.
图7为本公开实施例提供的一种药物靶点的信息获取方法的流程示意图。FIG. 7 is a schematic flowchart of a method for obtaining information on a drug target provided by an embodiment of the present disclosure.
如图7所示,该药物靶点的信息获取方法,可包括:As shown in Figure 7, the information acquisition method of the drug target may include:
S701,获取靶点蛋白与候选药物分子组合。S701, obtaining the combination of target protein and candidate drug molecules.
可以理解的是,靶点蛋白是指药物和靶点结合后,位于药物分子附近一定范围内靶点中的蛋白。常见的靶点蛋白可以是酶、受体和其他调控蛋白。候选药物分子可以为一种化合物。It can be understood that the target protein refers to the protein located in the target within a certain range near the drug molecule after the drug binds to the target. Common target proteins can be enzymes, receptors, and other regulatory proteins. A candidate drug molecule can be a compound.
S702,将靶点蛋白与候选药物分子组合输入预训练的目标分子对接模型中,获取靶点蛋白与候选药物分子组合的结合构象数据和靶点蛋白与候选药物分子间的亲和力数据。S702, inputting the combination of the target protein and the candidate drug molecule into the pre-trained target molecule docking model, obtaining binding conformation data of the combination of the target protein and the candidate drug molecule and affinity data between the target protein and the candidate drug molecule.
在一些实现中,可以从源数据库中下载蛋白结构数据库和药物分子数据库,根据药物研究的目标和需求,指定蛋白结构数据库中与研究相关的蛋白作为靶点蛋白。可选地,可以根据蛋白结构信息指定靶点蛋白。从药物分子数据库中筛选可能与靶点蛋白产生作用的药物分子,作为候选药物分子。将靶点蛋白和候选药物分子进行组合,得到靶点蛋白与候选药物分子组合。In some implementations, the protein structure database and the drug molecule database can be downloaded from the source database, and according to the goals and needs of drug research, the proteins related to the research in the protein structure database are designated as target proteins. Alternatively, target proteins can be assigned based on protein structure information. Screen the drug molecules that may interact with the target protein from the drug molecule database as candidate drug molecules. Combining the target protein and candidate drug molecules to obtain a combination of target protein and candidate drug molecules.
需要说明的是,目标分子对接模型可采用图1至图6所示的分子对接模型的训练方法得到,这里不再赘述。It should be noted that the target molecular docking model can be obtained by using the training method of the molecular docking model shown in Fig. 1 to Fig. 6 , which will not be repeated here.
本公开实施例中,对靶点蛋白与候选药物分子组合进行特征提取,将提取到的特征信息输入到目标分子对接模型中,对靶点蛋白与候选药物分子组合进行预测,可以得到靶点蛋白与候选药物分子组合的结合构象数据和靶点蛋白与候选药物分子间的亲和力数据。In the embodiment of the present disclosure, feature extraction is performed on the combination of target protein and candidate drug molecule, the extracted feature information is input into the target molecule docking model, and the combination of target protein and candidate drug molecule is predicted to obtain the target protein Binding conformational data combined with candidate drug molecules and affinity data between the target protein and the candidate drug molecule.
根据本公开实施例的种药物靶点的信息获取方法,利用预训练的目标分子对接模型,可以预测靶点蛋白与候选药物分子组合的构象数据和亲和力数据。目标分子对接模型采用本公开的分子对接模型的训练方法得到,分子对接模型具有较强的泛化能力,可以更快地预测靶点蛋白与候选药物分子组合的构象数据和亲和力数据,同时具有更高的准确率。According to the information acquisition method of a drug target in an embodiment of the present disclosure, the conformational data and affinity data of a combination of a target protein and a candidate drug molecule can be predicted by using a pre-trained target molecule docking model. The target molecular docking model is obtained by using the training method of the disclosed molecular docking model. The molecular docking model has strong generalization ability, and can predict the conformation data and affinity data of the combination of the target protein and the candidate drug molecule more quickly. High accuracy rate.
如图8所示的预测蛋白与药物分子亲和力数据及结合构象的示意图。从数据库中获取靶点蛋白与候选药物分子,将靶点蛋白与候选药物分子进行组合,得到靶点蛋白与候选药物分子组合。进一步地,将靶点蛋白与候选药物分子组合输入到预训练后的分子对接模型中,该模型可以预测并输出靶点蛋白与候选药物分子组合的结合构象数据和靶点蛋白与候选药物分子间的亲和力数据。A schematic diagram of the predicted protein-drug molecule affinity data and binding conformation as shown in FIG. 8 . Obtain the target protein and candidate drug molecules from the database, combine the target protein and candidate drug molecules, and obtain the combination of target protein and candidate drug molecules. Furthermore, the combination of target protein and candidate drug molecule is input into the pre-trained molecular docking model, which can predict and output the binding conformation data of the combination of target protein and candidate drug molecule and the interaction between target protein and candidate drug molecule. affinity data.
与上述几种实施例提供的分子对接模型的训练方法相对应,本公开的一个实施例还提供了一种分子对接模型的训练装置,由于本公开实施例提供的分子对接模型的训练装置与上述几种实施例提供的分子对接模型的训练方法相对应,因此上述分子对接模型的训练方法的实施方式也适用于本公开实施例提供的分子对接模型的训练装置,在下述实施例中不再详细描述。Corresponding to the molecular docking model training methods provided in the above several embodiments, an embodiment of the present disclosure also provides a molecular docking model training device, since the molecular docking model training device provided by the embodiment of the present disclosure is the same as the above The training methods of the molecular docking model provided in several embodiments correspond to each other, so the implementation of the above-mentioned training method of the molecular docking model is also applicable to the training device of the molecular docking model provided in the embodiments of the present disclosure, and will not be described in detail in the following examples describe.
图9为本公开实施例提供的一种分子对接模型的训练装置的结构示意图。FIG. 9 is a schematic structural diagram of a training device for a molecular docking model provided by an embodiment of the present disclosure.
如图9所示,本公开实施例的分子对接模型的训练装置900,包括获取模块901、训练模块902和修正模块903。As shown in FIG. 9 , a training device 900 for a molecular docking model according to an embodiment of the present disclosure includes an acquisition module 901 , a training module 902 and a correction module 903 .
获取模块901,用于获取用于训练的蛋白与药物分子组合,并对所述蛋白与药物分子组合进行仿真,得到所述蛋白与药物分子组合的标注数据;The obtaining module 901 is used to obtain the combination of protein and drug molecule used for training, and simulate the combination of protein and drug molecule to obtain the labeled data of the combination of protein and drug molecule;
训练模块902,用于将所述蛋白与药物分子组合输入初始分子对接模型中,对所述初始分子对接模型进行多任务训练,得到所述蛋白与药物分子组合的预测数据;A training module 902, configured to input the combination of the protein and the drug molecule into the initial molecular docking model, perform multi-task training on the initial molecular docking model, and obtain the prediction data of the combination of the protein and the drug molecule;
修正模块903,用于基于所述标注数据与所述预测数据间的差异,对所述初始分子对接模型进行修正并返回继续训练,直至得到训练好的分子对接模型。The correction module 903 is configured to correct the initial molecular docking model based on the difference between the labeled data and the predicted data, and return to continue training until a trained molecular docking model is obtained.
在本公开的一个实施例中,所述训练模块902,还用于:对所述蛋白与药物分子组合进行特征提取;对提取的特征信息进行编码,得到所述蛋白与药物分子组合对应的特征向量表示;对所述特征向量表示进行多预测头训练,得到所述蛋白与药物分子组合的预测数据。In an embodiment of the present disclosure, the training module 902 is also used to: extract features from the combination of the protein and the drug molecule; encode the extracted feature information to obtain the features corresponding to the combination of the protein and the drug molecule Vector representation; multi-prediction head training is performed on the feature vector representation to obtain prediction data of the combination of the protein and the drug molecule.
在本公开的一个实施例中,所述修正模块903,还用于:确定每个预测头对应的标注数据;获取所述预测头的预测数据和对应的标注数据的差异,获取所述初始分子对接模型的第一损失函数;基于所述第一损失函数对所述初始分子对接模型进行修正。In an embodiment of the present disclosure, the correction module 903 is further configured to: determine the label data corresponding to each prediction head; obtain the difference between the prediction data of the prediction head and the corresponding label data, and obtain the initial molecule A first loss function of the docking model; modifying the initial molecular docking model based on the first loss function.
在本公开的一个实施例中,所述修正模块903,还用于:根据每个所述预测头的所述差异,确定所述预测头的第二损失函数;对每个预测头的所述第二损失函数进行加权,得到所述第一损失函数。In an embodiment of the present disclosure, the correction module 903 is further configured to: determine the second loss function of the prediction head according to the difference of each prediction head; The second loss function performs weighting to obtain the first loss function.
在本公开的一个实施例中,所述训练模块902,还用于:将所述特征向量表示分别输入到各预测头,由所述预测头对所述特征向量表示进行学习,得到所述预测头的预测数据;基于所述预测头的预测数据,得到所述蛋白与药物分子组合的预测数据。In an embodiment of the present disclosure, the training module 902 is further configured to: respectively input the feature vector representations to each prediction head, and let the prediction heads learn the feature vector representations to obtain the prediction The predicted data of the head; based on the predicted data of the predicted head, the predicted data of the combination of the protein and the drug molecule is obtained.
在本公开的一个实施例中,所述训练模块902,还用于:将所述特征向量表示输入第一预测头中,由所述第一预测头基于所述特征向量表示得到所述药物分子的内部构象数据;将所述特征向量表示输入第二预测头中,由所述第二预测头基于所述特征向量表示,得到所述蛋白与药物分子组合的结合构象数据;将所述特征向量表示输入第三预测头中进行训练,得到所述第三预测头输出的所述蛋白与药物分子间的亲和力数据。In an embodiment of the present disclosure, the training module 902 is further configured to: input the feature vector representation into the first prediction head, and the first prediction head obtains the drug molecule based on the feature vector representation The internal conformation data of the internal conformation data; the feature vector representation is input into the second prediction head, and the second prediction head is based on the feature vector representation to obtain the binding conformation data of the combination of the protein and the drug molecule; the feature vector Indicates that it is input into the third prediction head for training, and the affinity data between the protein and the drug molecule output by the third prediction head is obtained.
在本公开的一个实施例中,所述训练模块902,还用于:基于预测需求信息,从所述训练好的分子对接模型中多预测头中需要保留的目标预测头,并对剩余预测头进行解耦,得到目标分子对接模型。In an embodiment of the present disclosure, the training module 902 is further configured to: based on the prediction demand information, predict the target heads that need to be reserved from the multi-prediction heads in the trained molecular docking model, and perform the remaining prediction heads Decoupling is performed to obtain the docking model of the target molecule.
在本公开的一个实施例中,所述获取模块901,还用于:基于分子对接算法,对所述蛋白与药物分子组合进行分子对接仿真,得到所述蛋白与药物分子组合的结合构象数据,所述蛋白与药物分子间的亲和力数据以及所述药物分子的内部构象数据;将仿真得到的所述结合构象数据、所述亲和力数据和所述药物分子的内部构象数据,作为所述标注数据。In an embodiment of the present disclosure, the acquisition module 901 is further configured to: perform molecular docking simulation on the combination of the protein and the drug molecule based on a molecular docking algorithm, to obtain binding conformation data of the combination of the protein and the drug molecule, The affinity data between the protein and the drug molecule and the internal conformation data of the drug molecule; the binding conformation data, the affinity data and the internal conformation data of the drug molecule obtained by simulation are used as the labeling data.
在本公开的一个实施例中,所述获取模块901,还用于:从蛋白结构数据库中采样蛋白,并从药物分子数据库中获取药物分子;对采样的蛋白和采样的药物分子进行组合,得到所述蛋白与药物分子组合。In an embodiment of the present disclosure, the acquisition module 901 is further configured to: sample proteins from the protein structure database, and obtain drug molecules from the drug molecule database; combine the sampled proteins and sampled drug molecules to obtain The protein is combined with a drug molecule.
在本公开的一个实施例中,所述获取模块901,还用于:对所述采样的蛋白的结合口袋进行挖掘,得到所述采样的蛋白的结合口袋;基于所述蛋白的结合口袋,对所述采样到的蛋白和药物分子进行组合,得到所述蛋白与药物分子组合。In an embodiment of the present disclosure, the acquisition module 901 is further configured to: mine the binding pocket of the sampled protein to obtain the binding pocket of the sampled protein; based on the binding pocket of the protein, The sampled protein and drug molecule are combined to obtain a combination of the protein and drug molecule.
在本公开的一个实施例中,所述获取模块901,还用于:确定所述采样的蛋白的结构信息,并基于所述蛋白的结构信息,对所述采样到的蛋白和药物分子进行组合,得到所述蛋白与药物分子组合。In an embodiment of the present disclosure, the acquisition module 901 is further configured to: determine the structural information of the sampled protein, and combine the sampled protein and drug molecules based on the structural information of the protein , to obtain a combination of the protein and the drug molecule.
在本公开的一个实施例中,所述获取模块901,还用于:从源数据库中下载蛋白质文件,并基于下载的所述蛋白质文件构建所述蛋白结构数据库。In an embodiment of the present disclosure, the obtaining module 901 is further configured to: download protein files from a source database, and construct the protein structure database based on the downloaded protein files.
在本公开的一个实施例中,所述获取模块901,还用于:对下载的所述蛋白质文件进行异常识别,获取异常蛋白质文件,并对所述异常蛋白质文件进行异常修复。In an embodiment of the present disclosure, the obtaining module 901 is further configured to: identify abnormality of the downloaded protein file, obtain the abnormal protein file, and repair the abnormal protein file.
在本公开的一个实施例中,所述获取模块901,还用于:获取所述异常蛋白质文件的异常类型,并基于所述异常类型获取修复程序,并基于所述修复程序对所述蛋白质文件进行修改。In an embodiment of the present disclosure, the obtaining module 901 is further configured to: obtain the abnormal type of the abnormal protein file, and obtain a repair program based on the abnormal type, and repair the protein file based on the repair program to modify.
根据本公开实施例的分子对接模型的训练方法,基于蛋白结构数据库和药物分子数据库中的数据,对蛋白和药物分子进行随机采样,得到蛋白和药物分子组合,并对蛋白和药物分子组合进行仿真,得到标注数据。同时,对蛋白和药物分子组合进行特征提取,将提取的特征信息输入到初始分子对接模型进行多任务训练,以获得泛化能力强的分子对接模型。基于标注数据和预测数据的差异对模型进行修正,通过修正分子对接模型中的缺陷和偏差,可以改进对结合构象和亲和力的预测,提高预测结果的可靠性和准确性。进一步地,可以提高模型对蛋白与药物分子探索能力,减少探索时间,提高分子对接模型预测数据准确率。According to the training method of the molecular docking model of the embodiment of the present disclosure, based on the data in the protein structure database and the drug molecule database, randomly sample the protein and the drug molecule, obtain the combination of the protein and the drug molecule, and simulate the combination of the protein and the drug molecule , to get the labeled data. At the same time, feature extraction is performed on the combination of protein and drug molecules, and the extracted feature information is input into the initial molecular docking model for multi-task training to obtain a molecular docking model with strong generalization ability. The model is corrected based on the difference between the labeled data and the predicted data. By correcting the defects and deviations in the molecular docking model, the prediction of the binding conformation and affinity can be improved, and the reliability and accuracy of the prediction results can be improved. Further, it can improve the model's ability to explore proteins and drug molecules, reduce the exploration time, and improve the accuracy of molecular docking model prediction data.
根据本公开的实施例,本公开还提供了一种药物靶点的信息获取装置,用于实现上述的分子对接模型的训练方法。According to an embodiment of the present disclosure, the present disclosure also provides a drug target information acquisition device, which is used to implement the above-mentioned molecular docking model training method.
图10是根据本公开第一实施例的药物靶点的信息获取装置的结构示意图。Fig. 10 is a schematic structural diagram of an information acquisition device for drug targets according to the first embodiment of the present disclosure.
如图10所示,本公开实施例的药物靶点的信息获取装置1000,包括:获取模块1001。As shown in FIG. 10 , an information acquisition device 1000 of a drug target according to an embodiment of the present disclosure includes: an acquisition module 1001 .
获取模块1001,用于获取靶点蛋白与候选药物分子组合,并将所述靶点蛋白与候选药物分子组合输入预训练的目标分子对接模型中,获取所述靶点蛋白与候选药物分子组合的结合构象数据和所述靶点蛋白与候选药物分子间的亲和力数据。The obtaining module 1001 is used to obtain the combination of the target protein and the candidate drug molecule, and input the combination of the target protein and the candidate drug molecule into the pre-trained target molecule docking model, and obtain the combination of the target protein and the candidate drug molecule. Binding conformational data and affinity data between the target protein and the candidate drug molecule.
根据本公开实施例的种药物靶点的信息获取方法,利用预训练的目标分子对接模型,可以预测靶点蛋白与候选药物分子组合的构象数据和亲和力数据。目标分子对接模型采用本公开的分子对接模型的训练方法得到,分子对接模型具有较强的泛化能力,可以更快地预测靶点蛋白与候选药物分子组合的构象数据和亲和力数据,同时具有更高的准确率。According to the information acquisition method of a drug target in an embodiment of the present disclosure, the conformational data and affinity data of a combination of a target protein and a candidate drug molecule can be predicted by using a pre-trained target molecule docking model. The target molecular docking model is obtained by using the training method of the disclosed molecular docking model. The molecular docking model has strong generalization ability, and can predict the conformation data and affinity data of the combination of the target protein and the candidate drug molecule more quickly. High accuracy rate.
本公开的技术方案中,所涉及的用户个人信息的获取,存储和应用等,均符合相关法律法规的规定,且不违背公序良俗。In the technical solution of the present disclosure, the acquisition, storage and application of the user's personal information involved are in compliance with relevant laws and regulations, and do not violate public order and good customs.
根据本公开的实施例,本公开还提供了一种电子设备、一种可读存储介质和一种计算机程序产品。According to the embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium, and a computer program product.
图11示出了可以用来实施本公开的实施例的示例电子设备1100的示意性框图。电子设备旨在表示各种形式的数字计算机,诸如,膝上型计算机、台式计算机、工作台、个人数字助理、服务器、刀片式服务器、大型计算机、和其它适合的计算机。电子设备还可以表示各种形式的移动装置,诸如,个人数字处理、蜂窝电话、智能电话、可穿戴设备和其它类似的计算装置。本文所示的部件、它们的连接和关系、以及它们的功能仅仅作为示例,并且不意在限制本文中描述的和/或者要求的本公开的实现。FIG. 11 shows a schematic block diagram of an example electronic device 1100 that may be used to implement embodiments of the present disclosure. Electronic device is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers. Electronic devices may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are by way of example only, and are not intended to limit implementations of the disclosure described and/or claimed herein.
如图11所示,设备1100包括计算单元1101,其可以根据存储在只读存储器(ROM)1102中的计算机程序/指令或者从存储单元1106载到随机访问存储器(RAM)1103中的计算机程序/指令,来执行各种适当的动作和处理。在RAM 1103中,还可存储设备1100操作所需的各种程序和数据。计算单元1101、ROM 1102以及RAM 1103通过总线1104彼此相连。输入/输出(I/O)接口1105也连接至总线1104。As shown in FIG. 11 , the device 1100 includes a computing unit 1101 that can be configured according to computer programs/instructions stored in a read-only memory (ROM) 1102 or loaded from a storage unit 1106 into a random access memory (RAM) 1103/instructions. instructions to perform various appropriate actions and processing. In the RAM 1103, various programs and data necessary for the operation of the device 1100 can also be stored. The computing unit 1101 , ROM 1102 , and RAM 1103 are connected to each other through a bus 1104 . An input/output (I/O) interface 1105 is also connected to the bus 1104 .
设备1100中的多个部件连接至I/O接口1105,包括:输入单元1106如键盘、鼠标等;输出单元1107,例如各种类型的显示器、扬声器等;存储单元1108,例如磁盘、光盘等;以及通信单元1109,例如网卡、调制解调器、无线通信收发机等。通信单元1109允许设备1100通过诸如因特网的计算机网络和/或各种电信网络与其他设备交换信息/数据。Multiple components in the device 1100 are connected to the I/O interface 1105, including: an input unit 1106 such as a keyboard, a mouse, etc.; an output unit 1107, such as various types of displays, speakers, etc.; a storage unit 1108, such as a magnetic disk, an optical disk, etc.; And a communication unit 1109, such as a network card, a modem, a wireless communication transceiver, and the like. The communication unit 1109 allows the device 1100 to exchange information/data with other devices through a computer network such as the Internet and/or various telecommunication networks.
计算单元1101可以是各种具有处理和计算能力的通用和/或专用处理组件。计算单元1101的一些示例包括但不限于中央处理单元(CPU)、图形处理单元(GPU)、各种专用的人工智能(AI)计算芯片、各种运行机器学习模型算法的计算单元、数字信号处理器(DSP)、以及任何适当的处理器、控制器、微控制器等。计算单元1101执行上文所描述的各个方法和处理,例如分子对接模型的训练方法。例如,在一些实施例中,分子对接模型的训练方法可被实现为计算机软件程序,其被有形地包含于机器可读介质,例如存储单元1106些实施例中,计算机程序/指令的部分或者全部可以经由ROM 1102和/或通信单元1109而被载入和/或安装到设备1100上。当计算机程序/指令加载到RAM 1103并由计算单元1101执行时,可以执行上文描述的分子对接模型的训练方法的一个或多个步骤。备选地,在其他实施例中,计算单元1101可以通过其他任何适当的方式(例如,借助于固件)而被配置为执行分子对接模型的训练方法。The computing unit 1101 may be various general-purpose and/or special-purpose processing components having processing and computing capabilities. Some examples of computing units 1101 include, but are not limited to, central processing units (CPUs), graphics processing units (GPUs), various dedicated artificial intelligence (AI) computing chips, various computing units that run machine learning model algorithms, digital signal processing processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 1101 executes the various methods and processes described above, such as the training method of the molecular docking model. For example, in some embodiments, the training method of the molecular docking model can be implemented as a computer software program, which is tangibly contained in a machine-readable medium, such as the storage unit 1106. In some embodiments, part or all of the computer program/instructions may be loaded and/or installed onto device 1100 via ROM 1102 and/or communication unit 1109 . When the computer program/instruction is loaded into the RAM 1103 and executed by the computing unit 1101, one or more steps of the above-described training method of the molecular docking model can be performed. Alternatively, in other embodiments, the computing unit 1101 may be configured in any other appropriate way (for example, by means of firmware) to execute the training method of the molecular docking model.
本文中以上描述的系统和技术的各种实施方式可以在数字电子电路系统、集成电路系统、场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、芯片上系统的系统(SOC)、负载可编程逻辑设备(CPLD)、计算机硬件、固件、软件、和/或它们的组合中实现。这些各种实施方式可以包括:实施在一个或者多个计算机程序/指令中,该一个或者多个计算机程序/指令可在包括至少一个可编程处理器的可编程系统上执行和/或解释,该可编程处理器可以是专用或者通用可编程处理器,可以从存储系统、至少一个输入装置、和至少一个输出装置接收数据和指令,并且将数据和指令传输至该存储系统、该至少一个输入装置、和该至少一个输出装置。Various implementations of the systems and techniques described above herein can be implemented in digital electronic circuit systems, integrated circuit systems, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), application specific standard products (ASSPs), systems on chips Implemented in a system of systems (SOC), load programmable logic device (CPLD), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include being embodied in one or more computer programs/instructions executable and/or interpretable on a programmable system including at least one programmable processor, the The programmable processor may be a dedicated or general-purpose programmable processor, capable of receiving data and instructions from the storage system, at least one input device, and at least one output device, and transmitting data and instructions to the storage system, the at least one input device , and the at least one output device.
用于实施本公开的方法的程序代码可以采用一个或多个编程语言的任何组合来编写。这些程序代码可以提供给通用计算机、专用计算机或其他可编程数据处理装置的处理器或控制器,使得程序代码当由处理器或控制器执行时使流程图和/或框图中所规定的功能/操作被实施。程序代码可以完全在机器上执行、部分地在机器上执行,作为独立软件包部分地在机器上执行且部分地在远程机器上执行或完全在远程机器或服务器上执行。Program codes for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general-purpose computer, a special purpose computer, or other programmable data processing devices, so that the program codes, when executed by the processor or controller, make the functions/functions specified in the flow diagrams and/or block diagrams Action is implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。In the context of the present disclosure, a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device. A machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media would include one or more wire-based electrical connections, portable computer discs, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
为了提供与用户的交互,可以在计算机上实施此处描述的系统和技术,该计算机具有:用于向用户显示信息的显示装置(例如,CRT(阴极射线管)或者LCD(液晶显示器)监视器);以及键盘和指向装置(例如,鼠标或者轨迹球),用户可以通过该键盘和该指向装置来将输入提供给计算机。其它种类的装置还可以用于提供与用户的交互;例如,提供给用户的反馈可以是任何形式的传感反馈(例如,视觉反馈、听觉反馈、或者触觉反馈);并且可以用任何形式(包括声输入、语音输入或者、触觉输入)来接收来自用户的输入。To provide for interaction with the user, the systems and techniques described herein can be implemented on a computer having a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user. ); and a keyboard and pointing device (eg, a mouse or a trackball) through which a user can provide input to the computer. Other kinds of devices can also be used to provide interaction with the user; for example, the feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and can be in any form (including Acoustic input, speech input or, tactile input) to receive input from the user.
可以将此处描述的系统和技术实施在包括后台部件的计算系统(例如,作为数据服务器)、或者包括中间件部件的计算系统(例如,应用服务器)、或者包括前端部件的计算系统(例如,具有图形用户界面或者网络浏览器的用户计算机,用户可以通过该图形用户界面或者该网络浏览器来与此处描述的系统和技术的实施方式交互)、或者包括这种后台部件、中间件部件、或者前端部件的任何组合的计算系统中。可以通过任何形式或者介质的数字数据通信(例如,通信网络)来将系统的部件相互连接。通信网络的示例包括:局域网(LAN)、广域网(WAN)、互联网和区块链网络。The systems and techniques described herein can be implemented in a computing system that includes back-end components (e.g., as a data server), or a computing system that includes middleware components (e.g., an application server), or a computing system that includes front-end components (e.g., as a a user computer having a graphical user interface or web browser through which a user can interact with embodiments of the systems and techniques described herein), or including such backend components, middleware components, Or any combination of front-end components in a computing system. The components of the system can be interconnected by any form or medium of digital data communication, eg, a communication network. Examples of communication networks include: local area networks (LANs), wide area networks (WANs), the Internet, and blockchain networks.
计算机系统可以包括客户端和服务器。客户端和服务器一般远离彼此并且通常通过通信网络进行交互。通过在相应的计算机上运行并且彼此具有客户端-服务器关系的计算机程序/指令来产生客户端和服务器的关系。服务器可以是云服务器,也可以为分布式系统的服务器,或者是结合了区块链的服务器。A computer system may include clients and servers. Clients and servers are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by computer programs/instructions running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, a server of a distributed system, or a server combined with a blockchain.
应该理解,可以使用上面所示的各种形式的流程,重新排序、增加或删除步骤。例如,公开中记载的各步骤可以并行地执行也可以顺序地执行也可以不同的次序执行,只要能够实现本公开公开的技术方案所期望的结果,本文在此不进行限制。It should be understood that steps may be reordered, added or deleted using the various forms of flow shown above. For example, each step described in the disclosure may be executed in parallel, sequentially, or in a different order, as long as the desired result of the technical solution disclosed in the present disclosure can be achieved, no limitation is imposed herein.
上述具体实施方式,并不构成对本公开保护范围的限制。本领域技术人员应该明白的是,根据设计要求和其他因素,可以进行各种修改、组合、子组合和替代。任何在本公开的精神和原则之内所作的修改、等同替换和改进等,均应包含在本公开保护范围之内。The specific implementation manners described above do not limit the protection scope of the present disclosure. It should be apparent to those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made depending on design requirements and other factors. Any modifications, equivalent replacements and improvements made within the spirit and principles of the present disclosure shall be included within the protection scope of the present disclosure.
Claims (33)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202310589399.0A CN116646028A (en) | 2023-05-23 | 2023-05-23 | Training method and device of molecular docking model, electronic equipment and storage medium |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202310589399.0A CN116646028A (en) | 2023-05-23 | 2023-05-23 | Training method and device of molecular docking model, electronic equipment and storage medium |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN116646028A true CN116646028A (en) | 2023-08-25 |
Family
ID=87618193
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202310589399.0A Pending CN116646028A (en) | 2023-05-23 | 2023-05-23 | Training method and device of molecular docking model, electronic equipment and storage medium |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN116646028A (en) |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN117497041A (en) * | 2023-11-01 | 2024-02-02 | 北京百度网讯科技有限公司 | Protein docking methods, devices, electronic equipment and storage media |
| CN119314591A (en) * | 2024-09-30 | 2025-01-14 | 北京百度网讯科技有限公司 | Drug molecule generation and model training methods, devices and equipment |
Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN106518970A (en) * | 2016-11-15 | 2017-03-22 | 郑州大学第附属医院 | Peptide sequence capable of being specifically bound with alpha fetoprotein and application of peptide sequence |
| CN108197429A (en) * | 2018-01-03 | 2018-06-22 | 中国科学院亚热带农业生态研究所 | A kind of metabolin peptide aptamer rapid screening method based on molecular docking technology |
| CN111402967A (en) * | 2020-03-12 | 2020-07-10 | 中南大学 | Method for improving virtual screening capability of docking software based on machine learning algorithm |
| US20200392178A1 (en) * | 2019-05-15 | 2020-12-17 | International Business Machines Corporation | Protein-targeted drug compound identification |
| CN113409884A (en) * | 2021-06-30 | 2021-09-17 | 北京百度网讯科技有限公司 | Training method of sequencing learning model, sequencing method, device, equipment and medium |
| CN113409883A (en) * | 2021-06-30 | 2021-09-17 | 北京百度网讯科技有限公司 | Information prediction and information prediction model training method, device, equipment and medium |
-
2023
- 2023-05-23 CN CN202310589399.0A patent/CN116646028A/en active Pending
Patent Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN106518970A (en) * | 2016-11-15 | 2017-03-22 | 郑州大学第附属医院 | Peptide sequence capable of being specifically bound with alpha fetoprotein and application of peptide sequence |
| CN108197429A (en) * | 2018-01-03 | 2018-06-22 | 中国科学院亚热带农业生态研究所 | A kind of metabolin peptide aptamer rapid screening method based on molecular docking technology |
| US20200392178A1 (en) * | 2019-05-15 | 2020-12-17 | International Business Machines Corporation | Protein-targeted drug compound identification |
| CN111402967A (en) * | 2020-03-12 | 2020-07-10 | 中南大学 | Method for improving virtual screening capability of docking software based on machine learning algorithm |
| CN113409884A (en) * | 2021-06-30 | 2021-09-17 | 北京百度网讯科技有限公司 | Training method of sequencing learning model, sequencing method, device, equipment and medium |
| CN113409883A (en) * | 2021-06-30 | 2021-09-17 | 北京百度网讯科技有限公司 | Information prediction and information prediction model training method, device, equipment and medium |
Non-Patent Citations (3)
| Title |
|---|
| 傅钦翠: "《深度学习在牵引供电系统暂态辨识与故障测距中的应用研究》", 30 November 2022, 西南交通大学出版社, pages: 65 * |
| 尚振伟;李晋;姜永帅;张明明;吕洪超;张瑞杰;: "基于SVM的药物靶点预测方法及其应用", 现代生物医学进展, no. 20, 20 July 2012 (2012-07-20) * |
| 陈路飞 等: "FP-Net: 基于任意角度单幅人体图像的正面姿态估计", 《计算机辅助设计与图形学学报》, vol. 34, no. 10, 31 October 2022 (2022-10-31), pages 1604 - 1612 * |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN117497041A (en) * | 2023-11-01 | 2024-02-02 | 北京百度网讯科技有限公司 | Protein docking methods, devices, electronic equipment and storage media |
| CN119314591A (en) * | 2024-09-30 | 2025-01-14 | 北京百度网讯科技有限公司 | Drug molecule generation and model training methods, devices and equipment |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN112560496B (en) | Training method and device of semantic analysis model, electronic equipment and storage medium | |
| JP6889270B2 (en) | Neural network architecture optimization | |
| Hu et al. | An improved deep learning method for predicting DNA-binding proteins based on contextual features in amino acid sequences | |
| US20240055071A1 (en) | Artificial intelligence-based compound processing method and apparatus, device, storage medium, and computer program product | |
| US20200342953A1 (en) | Target molecule-ligand binding mode prediction combining deep learning-based informatics with molecular docking | |
| US20190050538A1 (en) | Prediction and generation of hypotheses on relevant drug targets and mechanisms for adverse drug reactions | |
| CN113868519B (en) | Information search methods, devices, electronic equipment and storage media | |
| CN112837466B (en) | Bill recognition method, device, equipment and storage medium | |
| CN114611532B (en) | Language model training method and device, and target translation error detection method and device | |
| WO2024125466A1 (en) | Neural network training method and protein structure prediction method | |
| CN113255770B (en) | Compound attribute prediction model training method and compound attribute prediction method | |
| US12191003B2 (en) | Real-time prediction of chemical properties through combining calculated, structured and unstructured data at large scale | |
| CN116646028A (en) | Training method and device of molecular docking model, electronic equipment and storage medium | |
| US12437847B1 (en) | Drug design method based on autoregressive model | |
| Chen et al. | The application of artificial intelligence accelerates G protein-coupled receptor ligand discovery | |
| CN115455171A (en) | Method, device, equipment and medium for mutual retrieval and model training of text videos | |
| US20230253076A1 (en) | Local steps in latent space and descriptors-based molecules filtering for conditional molecular generation | |
| US20230420070A1 (en) | Protein Structure Prediction | |
| CN115206421B (en) | Drug repositioning method, and repositioning model training method and device | |
| US20240013004A1 (en) | Automatic data card generation | |
| WO2023216065A1 (en) | Differentiable drug design | |
| US12288600B2 (en) | Generative machine learning on textual queries relating to molecules | |
| US20250391515A1 (en) | Determining phenomic relationships between compounds and cell perturbations utilizing machine learning models | |
| US20250054572A1 (en) | Protein docking method, electronic device, and storage medium | |
| US20250174305A1 (en) | Utilizing biological machine learning representations and a language machine learning model for initiating compound exploration programs |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination |