CN116403639A - New antigen sequence generation method and system based on deep learning - Google Patents
New antigen sequence generation method and system based on deep learning Download PDFInfo
- Publication number
- CN116403639A CN116403639A CN202310331805.3A CN202310331805A CN116403639A CN 116403639 A CN116403639 A CN 116403639A CN 202310331805 A CN202310331805 A CN 202310331805A CN 116403639 A CN116403639 A CN 116403639A
- Authority
- CN
- China
- Prior art keywords
- neoantigen
- sequence
- original
- data
- hla
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B15/00—ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
- G16B15/30—Drug targeting using structural data; Docking or binding prediction
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/30—Detection of binding sites or motifs
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
- G16B30/10—Sequence alignment; Homology search
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Medical Informatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- General Health & Medical Sciences (AREA)
- Theoretical Computer Science (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biotechnology (AREA)
- Evolutionary Biology (AREA)
- Biophysics (AREA)
- Chemical & Material Sciences (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Analytical Chemistry (AREA)
- Bioethics (AREA)
- Evolutionary Computation (AREA)
- Medicinal Chemistry (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Epidemiology (AREA)
- Artificial Intelligence (AREA)
- Public Health (AREA)
- Software Systems (AREA)
- Pharmacology & Pharmacy (AREA)
- Genetics & Genomics (AREA)
- Molecular Biology (AREA)
- Crystallography & Structural Chemistry (AREA)
- Peptides Or Proteins (AREA)
Abstract
Description
技术领域technical field
本发明涉及新抗原序列生成技术领域,尤其涉及一种基于深度学习的新抗原序列生成方法及系统。The present invention relates to the technical field of neoantigen sequence generation, in particular to a method and system for generating neoantigen sequences based on deep learning.
背景技术Background technique
新抗原,又称为新生抗原(Neoantigen),是指当肿瘤细胞的DNA发生某些突变时,在肿瘤细胞(包括恶性肿瘤细胞即癌细胞)上形成的一种新蛋白质。新抗原具有选择性,可以引起针对肿瘤的T细胞反应,进而消除肿瘤,这使得新抗原成为设计癌症疫苗的关键元素。新抗原在帮助身体对癌细胞做出免疫反应方面发挥着重要作用,用于疫苗和其他类型的免疫疗法的新抗原正在被研究用于治疗许多类型的癌症。Neoantigen, also known as neoantigen (Neoantigen), refers to a new protein formed on tumor cells (including malignant tumor cells, ie cancer cells) when certain mutations occur in the DNA of tumor cells. Neoantigens are selective in that they can elicit a T-cell response against tumors and thereby eliminate them, making neoantigens a key element in the design of cancer vaccines. Neoantigens play an important role in helping the body mount an immune response against cancer cells, and neoantigens used in vaccines and other types of immunotherapy are being studied to treat many types of cancer.
现有的技术方案通常是对免疫原性肿瘤新抗原进行识别和鉴定,是在已有的新抗原基础上进行的算法设计,而目前这些新抗原仅占有蛋白质空间的小部分,还未对未知的蛋白质巨大空间进行探索。即,目前有关新抗原的研究集中在新抗原识别和新抗原疫苗设计上,缺少对生成新抗原序列的相关研究。Existing technical solutions usually identify and identify immunogenic tumor neoantigens, which are algorithm designs based on existing neoantigens. At present, these neoantigens only occupy a small part of the protein space, and have not yet identified unknown tumor antigens. The vast space of proteins to explore. That is, the current research on neoantigens focuses on neoantigen recognition and neoantigen vaccine design, and there is a lack of related research on the generation of neoantigen sequences.
发明内容Contents of the invention
为解决上述现有技术的不足,本发明提供了一种基于深度学习的新抗原序列生成方法及系统,基于深度学习的方法生成全新的新抗原,使得生成的新抗原与原有的新抗原相比,与人类白细胞抗原(HLA,human leucocyte antigen)有更好的亲和力;通过对生成的新抗原序列进行筛选,得到同样能够被T细胞识别记忆的新抗原,配合后续的生物学验证,能够达到与现有新抗原相比更好地免疫能力,为后续的个性化疫苗开发提供指导价值。In order to solve the above-mentioned deficiencies in the prior art, the present invention provides a method and system for generating a neoantigen sequence based on deep learning. The method based on deep learning generates a brand new neoantigen, so that the generated neoantigen is comparable to the original neoantigen. Compared with human leucocyte antigen (HLA, human leucocyte antigen), it has a better affinity; by screening the generated neoantigen sequence, a neoantigen that can also be recognized and remembered by T cells can be obtained, and with subsequent biological verification, it can achieve Compared with existing neoantigens, better immunity can provide guidance value for the subsequent development of personalized vaccines.
第一方面,本公开提供了一种基于深度学习的新抗原序列生成方法。In the first aspect, the present disclosure provides a method for generating new antigen sequences based on deep learning.
一种基于深度学习的新抗原序列生成方法,包括:A method for generating neoantigen sequences based on deep learning, comprising:
获取原始新抗原数据;Obtain raw neoantigen data;
对原始新抗原数据进行预处理,获取原始新抗原序列及其对应的HLA序列;Preprocess the original neoantigen data to obtain the original neoantigen sequence and its corresponding HLA sequence;
根据原始新抗原序列及其对应的HLA序列,对原始新抗原序列进行突变,并对原始新抗原序列在序列空间中进行蒙特卡洛搜索,确定符合设计目标的突变后的新抗原序列;According to the original neoantigen sequence and its corresponding HLA sequence, the original neoantigen sequence is mutated, and a Monte Carlo search is performed on the original neoantigen sequence in the sequence space to determine the mutated neoantigen sequence that meets the design goal;
通过分子对接、递呈能力预测和免疫能力预测进行突变后的新抗原序列的筛选,获取最终的新抗原序列。Screen the mutated neoantigen sequence through molecular docking, presentation ability prediction and immune ability prediction to obtain the final neoantigen sequence.
进一步的技术方案,所述预处理,包括:Further technical scheme, described pretreatment comprises:
对原始新抗原数据进行筛选,筛选出人类新抗原数据,包括新抗原数据和潜在新抗原数据;Screen the original neoantigen data to screen out human neoantigen data, including neoantigen data and potential neoantigen data;
对潜在新抗原数据进行过滤,筛选出置信分数高的潜在新抗原数据;Filter potential neoantigen data to screen out potential neoantigen data with high confidence scores;
对每一筛选后的人类新抗原数据,筛选出HLA-Ⅰ型数据,并获取每条数据的原始新抗原序列及其对应的HLA序列。For each screened human neoantigen data, HLA-I type data is screened out, and the original neoantigen sequence and its corresponding HLA sequence of each data are obtained.
进一步的技术方案,所述根据原始新抗原序列及其对应的HLA序列,对原始新抗原序列进行突变,包括:A further technical solution, said mutating the original neoantigen sequence according to the original neoantigen sequence and its corresponding HLA sequence, includes:
根据原始新抗原序列及其对应的HLA序列,计算原始新抗原序列中每个氨基酸对HLA序列的注意力权重,根据注意力权重排序,选择氨基酸位置进行突变。According to the original neoantigen sequence and its corresponding HLA sequence, the attention weight of each amino acid in the original neoantigen sequence to the HLA sequence is calculated, sorted according to the attention weight, and the amino acid position is selected for mutation.
进一步的技术方案,所述对原始新抗原序列在序列空间中进行蒙特卡洛搜索,确定符合设计目标的突变后的新抗原序列,包括:In a further technical solution, the Monte Carlo search is performed on the original neoantigen sequence in the sequence space to determine the mutated neoantigen sequence that meets the design goal, including:
将突变后的新抗原序列输入至蛋白质结构预测模型,输出预测的蛋白质结构;Input the mutated neoantigen sequence into the protein structure prediction model, and output the predicted protein structure;
将突变后的新抗原序列及其对应的HLA序列输入至多肽-HLA结合亲和力预测模型,输出预测的亲和力;Input the mutated neoantigen sequence and its corresponding HLA sequence into the polypeptide-HLA binding affinity prediction model, and output the predicted affinity;
以结构置信度和亲和力分数为损失函数,对原始新抗原序列在序列空间中进行蒙特卡洛搜索,确定最终的蛋白质结构,再通过蛋白质序列设计模型,输出对应的氨基酸序列。Using the structural confidence and affinity score as the loss function, a Monte Carlo search is performed on the original neoantigen sequence in the sequence space to determine the final protein structure, and then the protein sequence design model is used to output the corresponding amino acid sequence.
进一步的技术方案,所述通过分子对接、递呈能力预测和免疫能力预测进行突变后的新抗原序列的筛选,获取最终的新抗原序列,包括:In a further technical solution, the screening of the mutated neoantigen sequence through molecular docking, presentation ability prediction, and immune ability prediction to obtain the final neoantigen sequence includes:
通过分子对接判断生成的新抗原序列与其对应的HLA序列的结合亲和力,根据结合亲和力进行筛选,筛选得到有更高亲和力的新抗原序列。The binding affinity between the generated neoantigen sequence and its corresponding HLA sequence is judged by molecular docking, and screening is performed according to the binding affinity to obtain a neoantigen sequence with higher affinity.
进一步的技术方案,还包括:Further technical solutions also include:
利用基于深度学习的递呈能力模型,捕获HLA结合肽的结合信息,预测HLA结合肽是否能够呈现在细胞表面,筛选预测HLA结合肽能够呈现在细胞表面的新抗原序列。Using the deep learning-based presentation ability model to capture the binding information of HLA-binding peptides, predict whether HLA-binding peptides can be displayed on the cell surface, and screen and predict neoantigen sequences that HLA-binding peptides can be displayed on the cell surface.
进一步的技术方案,还包括:Further technical solutions also include:
利用新抗原特征作为过滤标准,将新抗原特征纳入免疫原性评分,计算得到免疫原性分数;所述新抗原特征包括新抗原-HLA复合物的解离常数和结合稳定性以及肿瘤中突变基因的表达;Using the neoantigen characteristics as filtering criteria, the neoantigen characteristics are included in the immunogenicity score to calculate the immunogenicity score; the neoantigen characteristics include the dissociation constant and binding stability of the neoantigen-HLA complex and the mutated genes in the tumor expression;
将分数最高的新抗原作为输出结果,获取最终的新抗原序列。The neoantigen with the highest score is used as the output result to obtain the final neoantigen sequence.
第二方面,本公开提供了一种基于深度学习的新抗原序列生成系统。In a second aspect, the present disclosure provides a system for generating neoantigen sequences based on deep learning.
一种基于深度学习的新抗原序列生成系统,包括:A system for generating neoantigen sequences based on deep learning, including:
数据获取模块,用于获取原始新抗原数据;A data acquisition module, configured to acquire raw neoantigen data;
数据预处理模块,用于对原始新抗原数据进行预处理,获取原始新抗原序列及其对应的HLA序列;The data preprocessing module is used to preprocess the original neoantigen data to obtain the original neoantigen sequence and its corresponding HLA sequence;
新抗原序列设计模块,用于根据原始新抗原序列及其对应的HLA序列,对原始新抗原序列进行突变,并对原始新抗原序列在序列空间中进行蒙特卡洛搜索,确定符合设计目标的突变后的新抗原序列;The neoantigen sequence design module is used to mutate the original neoantigen sequence based on the original neoantigen sequence and its corresponding HLA sequence, and perform a Monte Carlo search on the original neoantigen sequence in the sequence space to determine the mutation that meets the design goal After the neoantigen sequence;
新抗原序列筛选模块,用于通过分子对接、递呈能力预测和免疫能力预测进行突变后的新抗原序列的筛选,获取最终的新抗原序列。The neoantigen sequence screening module is used to screen the mutated neoantigen sequence through molecular docking, presentation ability prediction and immune ability prediction to obtain the final neoantigen sequence.
第三方面,本公开还提供了一种电子设备,包括存储器和处理器以及存储在存储器上并在处理器上运行的计算机指令,所述计算机指令被处理器运行时,完成第一方面所述方法的步骤。In a third aspect, the present disclosure also provides an electronic device, including a memory, a processor, and computer instructions stored in the memory and run on the processor. When the computer instructions are executed by the processor, the computer instructions described in the first aspect can be completed. method steps.
第四方面,本公开还提供了一种计算机可读存储介质,用于存储计算机指令,所述计算机指令被处理器执行时,完成第一方面所述方法的步骤。In a fourth aspect, the present disclosure also provides a computer-readable storage medium for storing computer instructions, and when the computer instructions are executed by a processor, the steps of the method described in the first aspect are completed.
以上一个或多个技术方案存在以下有益效果:The above one or more technical solutions have the following beneficial effects:
1、本发明提供了一种基于深度学习的新抗原序列生成方法及系统,基于深度学习的方法生成全新的新抗原,使得生成的新抗原与原有的新抗原相比,与人类白细胞抗原HLA有更好的亲和力;通过对生成的新抗原序列进行筛选,得到同样能够被T细胞识别记忆的新抗原,即得到具有免疫能力的新抗原。1. The present invention provides a method and system for generating a neoantigen sequence based on deep learning. A new neoantigen is generated based on a deep learning method, so that the generated neoantigen is comparable to the human leukocyte antigen HLA compared with the original neoantigen. It has better affinity; by screening the generated neoantigen sequence, a neoantigen that can also be recognized and remembered by T cells is obtained, that is, a neoantigen with immune ability is obtained.
2、本发明利用深度学习来设计全新的新抗原,相比于传统的仅针对已有新抗原的方案,大大增加了蛋白质空间的探索。2. The present invention uses deep learning to design a brand new neoantigen, which greatly increases the exploration of protein space compared to the traditional scheme that only targets existing neoantigens.
3、本发明配合后续的生物学验证,能够达到与现有新抗原相比更好地免疫能力,为后续的个性化疫苗开发提供指导价值。3. The present invention cooperates with subsequent biological verification to achieve better immunity compared with existing neoantigens, and provides guidance for the subsequent development of personalized vaccines.
附图说明Description of drawings
构成本发明的一部分的说明书附图用来提供对本发明的进一步理解,本发明的示意性实施例及其说明用于解释本发明,并不构成对本发明的不当限定。The accompanying drawings constituting a part of the present invention are used to provide a further understanding of the present invention, and the schematic embodiments of the present invention and their descriptions are used to explain the present invention and do not constitute improper limitations to the present invention.
图1为本发明实施例一所述基于深度学习的新抗原序列生成方法的整体流程图;FIG. 1 is an overall flow chart of the method for generating a new antigen sequence based on deep learning according to Embodiment 1 of the present invention;
图2为本发明实施例一中新抗原设计算法的流程图。Fig. 2 is a flow chart of the neoantigen design algorithm in Example 1 of the present invention.
具体实施方式Detailed ways
应该指出,以下详细说明都是示例性的,旨在对本发明提供进一步的说明。除非另有指明,本文使用的所有技术和科学术语具有与本发明所属技术领域的普通技术人员通常理解的相同含义。It should be noted that the following detailed description is exemplary and intended to provide further explanation of the present invention. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
需要注意的是,这里所使用的术语仅是为了描述具体实施方式,而非意图限制根据本发明的示例性实施方式。如在这里所使用的,除非上下文另外明确指出,否则单数形式也意图包括复数形式,此外,还应当理解的是,当在本说明书中使用术语“包含”和/或“包括”时,其指明存在特征、步骤、操作、器件、组件和/或它们的组合。It should be noted that the terminology used here is only for describing specific embodiments, and is not intended to limit exemplary embodiments according to the present invention. As used herein, unless the context clearly dictates otherwise, the singular is intended to include the plural, and it should also be understood that when the terms "comprising" and/or "comprising" are used in this specification, they mean There are features, steps, operations, means, components and/or combinations thereof.
实施例一Embodiment one
本实施例提供了一种基于深度学习的新抗原序列生成方法,首先对收集的原始新抗原数据库的数据进行预处理,然后通过深度学习方法对输入的新抗原序列进行突变,得到与其对应HLA有更优亲和力的、合理的变异后的新抗原序列,最后通过使用一系列方法对生成的新抗原进行多次筛选,配合生物学验证,以此得到具有更强免疫能力的新抗原作为最终结果。本实施例所述方法如图1所示,包括以下步骤:This embodiment provides a method for generating neoantigen sequences based on deep learning. First, the collected data in the original neoantigen database is preprocessed, and then the input neoantigen sequences are mutated by deep learning methods to obtain the corresponding HLA-related For better affinity and reasonable mutated neoantigen sequences, a series of methods are used to screen the generated neoantigens multiple times and cooperate with biological verification to obtain neoantigens with stronger immunity as the final result. The method described in this embodiment is shown in Figure 1, comprising the following steps:
步骤S1、获取原始新抗原数据;Step S1, obtaining original neoantigen data;
步骤S2、对原始新抗原数据进行预处理,获取原始新抗原序列及其对应的HLA序列;Step S2, preprocessing the original neoantigen data to obtain the original neoantigen sequence and its corresponding HLA sequence;
步骤S3、根据原始新抗原序列及其对应的HLA序列,对原始新抗原序列进行突变,并对原始新抗原序列在序列空间中进行蒙特卡洛搜索,确定符合设计目标的突变后的新抗原序列;Step S3. According to the original neoantigen sequence and its corresponding HLA sequence, mutate the original neoantigen sequence, and perform a Monte Carlo search on the original neoantigen sequence in the sequence space to determine the mutated neoantigen sequence that meets the design goal ;
步骤S4、通过分子对接、递呈能力预测和免疫能力预测进行突变后的新抗原序列的筛选,获取最终的新抗原序列。Step S4, screening the mutated neoantigen sequence through molecular docking, presentation ability prediction and immune ability prediction to obtain the final neoantigen sequence.
首先,步骤S1中,获取原始新抗原数据,构建原始新抗原数据集。鉴于精确和完整的数据集与深度学习的效率具有很高的相关性,收集高质量的数据集是生成新抗原序列的重要的开端。在本实施例中,从NeoPeptide、TSNAdb、TransLnc等数据库下载数据,其分别包含约17万实验新抗原、130万和40万的潜在新抗原,分布范围包括黑色素瘤、乳腺癌、肺癌、鳞状细胞癌等。First, in step S1, the original neoantigen data is obtained, and an original neoantigen data set is constructed. Given that accurate and complete datasets are highly correlated with the efficiency of deep learning, collecting high-quality datasets is an important start for generating neoantigen sequences. In this example, download data from NeoPeptide, TSNAdb, TransLnc and other databases, which respectively contain about 170,000 experimental neoantigens, 1.3 million and 400,000 potential neoantigens, and the distribution range includes melanoma, breast cancer, lung cancer, squamous cell carcinoma, etc.
其次,步骤S2中,对原始新抗原数据进行预处理,获取原始新抗原序列及其对应的HLA序列。在分析输入的原始新抗原数据前,需要对这些数据预处理。首先,因为以上数据库包含多个物种的数据,比如人、老鼠等,而本实施例是针对人类新抗原进行研究,因此,过滤掉其它物种的新抗原,筛选出人类新抗原,其中,该过滤方法包括:针对数据库的每条数据,判断每条数据中的HLA是否为人类HLA(比如HLA-A,B,C等),依次判断该条数据是否来源于人类。然后,因为数据不仅包含实验新抗原,还包含潜在新抗原,对于潜在新抗原部分,需要进一步过滤,筛选出置信分数高的潜在新抗原,其中,该筛选方法包括:通过使用新抗原识别工具(如netMHCpan等)来计算置信分数,使用netMHCpan得到的结果为%rank,若%rank<0.5,那么则认为该新抗原与对应HLA有较高的亲和力,筛选出该潜在新抗原,并将其选择作为输入。最后,筛选出HLA-Ⅰ型数据,并对每条数据取出需要部分,使得每条数据只包含新抗原序列及其对应的HLA序列。此处的HLA序列是指HLA对应的氨基酸序列,下述均采用HLA序列表述。Secondly, in step S2, the original neoantigen data is preprocessed to obtain the original neoantigen sequence and its corresponding HLA sequence. Before analyzing the input raw neoantigen data, these data need to be preprocessed. First of all, because the above database contains data of multiple species, such as humans, mice, etc., and this embodiment is for research on human neoantigens, therefore, the neoantigens of other species are filtered out to screen out human neoantigens, wherein the filtering The method includes: for each piece of data in the database, judging whether the HLA in each piece of data is human HLA (such as HLA-A, B, C, etc.), and sequentially judging whether the piece of data is from human beings. Then, because the data contains not only experimental neoantigens, but also potential neoantigens, for the potential neoantigen part, it is necessary to further filter to screen out potential neoantigens with high confidence scores, wherein the screening method includes: by using the neoantigen recognition tool ( Such as netMHCpan, etc.) to calculate the confidence score, the result obtained by using netMHCpan is %rank, if %rank<0.5, then it is considered that the new antigen has a higher affinity with the corresponding HLA, and the potential new antigen is screened out and selected as input. Finally, the HLA-I type data is screened out, and the required part is taken out of each piece of data, so that each piece of data only contains neoantigen sequences and their corresponding HLA sequences. The HLA sequence here refers to the amino acid sequence corresponding to HLA, and the following are expressed by HLA sequence.
然后,步骤S3中,设计新抗原序列,即根据原始新抗原序列及其对应的HLA序列,对原始新抗原序列进行突变,并对原始新抗原序列在序列空间中进行蒙特卡洛搜索,确定符合设计目标的突变后的新抗原序列。设计算法如图2所示,本实施例所述方法基于深度学习,通过对输入的原始新抗原序列进行突变来广泛地探索新抗原的蛋白质空间,这种方法对原始序列的结构没有任何限制。Then, in step S3, the neoantigen sequence is designed, that is, the original neoantigen sequence is mutated according to the original neoantigen sequence and its corresponding HLA sequence, and a Monte Carlo search is performed on the original neoantigen sequence in the sequence space to determine that Design target mutated neoantigen sequences. The design algorithm is shown in Figure 2. The method described in this example is based on deep learning and extensively explores the protein space of the neoantigen by mutating the input original neoantigen sequence. This method does not have any restrictions on the structure of the original sequence.
步骤S3.1、根据原始新抗原序列及其对应的HLA序列,计算原始新抗原序列中每个氨基酸对HLA序列的注意力权重,根据注意力权重排序,选择氨基酸位置进行突变。Step S3.1. According to the original neoantigen sequence and its corresponding HLA sequence, calculate the attention weight of each amino acid in the original neoantigen sequence to the HLA sequence, sort according to the attention weight, and select amino acid positions for mutation.
对原始新抗原序列和对应的HLA序列进行注意力计算,公式为:The attention calculation is performed on the original neoantigen sequence and the corresponding HLA sequence, and the formula is:
其中,K和V表示新抗原序列,Q表示对应的HLA序列。新抗原序列和HLA序列中,每个字符都表示一个氨基酸,因此上述参数实际上也都是表示氨基酸序列。在公式中的softmax操作后,新抗原序列的每个氨基酸获取各自的注意力权重。Among them, K and V represent neoantigen sequences, and Q represents the corresponding HLA sequence. In the neoantigen sequence and HLA sequence, each character represents an amino acid, so the above parameters actually represent amino acid sequences. After the softmax operation in the formula, each amino acid of the neoantigen sequence gets its respective attention weight.
之后,取中间计算结果,即对于HLA,分别对新抗原序列的每个氨基酸元素的注意力权重进行平均化,得到的值作为对应氨基酸的重要性分数,分数越高说明对应氨基酸可能在结合HLA中起到更重要的作用。将分数从低到高排序,只对排序在前的氨基酸位置进行突变。Afterwards, the intermediate calculation results are taken, that is, for HLA, the attention weights of each amino acid element of the neoantigen sequence are averaged, and the obtained value is used as the importance score of the corresponding amino acid. The higher the score, the corresponding amino acid may be combined with HLA. play a more important role. Rank the scores from low to high and only mutate the top ranked amino acid positions.
步骤S3.2、将突变后的新抗原序列输入至蛋白质结构预测模型,输出预测的蛋白质结构,将突变后的新抗原序列及其对应的HLA序列输入至多肽-HLA结合亲和力预测模型,输出预测的亲和力,以结构置信度和亲和力分数为损失函数,对原始新抗原序列在序列空间中进行蒙特卡洛搜索,确定最终的蛋白质结构,再通过蛋白质序列设计模型,输出对应的氨基酸序列。上述蒙特卡洛搜索是指马尔科夫链蒙特卡洛优化;上述结构置信度是指模型输出plddt和ptm两种值,其表示模型预测的输出结构和真实结构的差别,其中,plddt表示通过计算预测结构的内禀距离,在缺乏天然结构的情况下,判断预测结构中的每一个残基的可信度;ptm表示模型预测的模板建模评分,其在整体层次上评价结构。Step S3.2, input the mutated neoantigen sequence into the protein structure prediction model, output the predicted protein structure, input the mutated neoantigen sequence and its corresponding HLA sequence into the polypeptide-HLA binding affinity prediction model, and output the predicted Using the structural confidence and affinity score as the loss function, Monte Carlo search is performed on the original neoantigen sequence in the sequence space to determine the final protein structure, and then the protein sequence design model is used to output the corresponding amino acid sequence. The above-mentioned Monte Carlo search refers to Markov chain Monte Carlo optimization; the above-mentioned structure confidence refers to the two values of the model output plddt and ptm, which represent the difference between the output structure predicted by the model and the real structure. The intrinsic distance of the predicted structure, in the absence of natural structure, judges the reliability of each residue in the predicted structure; ptm represents the template modeling score predicted by the model, which evaluates the structure at the overall level.
对原始新抗原序列在序列空间中进行蒙特卡洛搜索,并结合使用蛋白质结构预测模型和多肽-HLA结合亲和力预测模型。蛋白质结构预测模型的输入为氨基酸序列而输出为预测的蛋白质结构,通过对序列抽取特征,将特征进行编码以及解码成蛋白质结构的3D位置来得到最终输出;多肽-HLA结合亲和力预测模型的输入为多肽和对应的HLA,输出为多肽和对应HLA的结合亲和力,通过使用结合亲和力数据和洗脱配体数据来增大训练数据规模,并且该多肽-HLA结合亲和力预测模型使用NNAlign_MA架构,使得模型能够处理这两种数据。在本实施例中,上述蛋白质结构预测模型和多肽-HLA结合亲和力预测模型的训练过程中,蛋白质结构预测模型的训练数据集为蛋白质结构数据集PDB和蛋白质序列数据集Uniclust30,多肽-HLA结合亲和力预测模型的训练数据集为结合亲和力数据集和洗脱配体数据集。A Monte Carlo search in sequence space was performed on the original neoantigen sequence, combined with a protein structure prediction model and a peptide-HLA binding affinity prediction model. The input of the protein structure prediction model is the amino acid sequence and the output is the predicted protein structure. The final output is obtained by extracting features from the sequence, encoding and decoding the features into the 3D position of the protein structure; the input of the peptide-HLA binding affinity prediction model is The peptide and the corresponding HLA, the output is the binding affinity of the polypeptide and the corresponding HLA, the scale of the training data is increased by using the binding affinity data and the elution ligand data, and the peptide-HLA binding affinity prediction model uses the NNAlign_MA architecture, so that the model can handle both types of data. In this example, during the training process of the above-mentioned protein structure prediction model and polypeptide-HLA binding affinity prediction model, the training data sets of the protein structure prediction model are the protein structure data set PDB and the protein sequence data set Uniclust30, and the polypeptide-HLA binding affinity The training data sets of the prediction model are the binding affinity data set and the elution ligand data set.
以上述两种模型的输出作为蒙特卡洛搜索时的损失函数,通过结合蒙特卡洛和上述模型不断搜索序列,直到得到的结构符合设计目标,最后再使用蛋白质序列设计模型重新设计序列,该模型与蛋白质结构预测模型相反,输入为蛋白质结构而输出为对应的氨基酸序列,通过消息传递神经网络,使用编码器-解码器架构以及灵活的解码顺序使得模型有更高的性能。上述蛋白质序列设计模型的训练数据集为蛋白质结构数据集PDB。通过搜索得到的序列需要利用蛋白质结构预测模型预测结构,对于每个预测的结构,再输入至蛋白质序列设计模型生成最终对应的序列,以此来减少搜索过程中的过拟合。The output of the above two models is used as the loss function of the Monte Carlo search, and the sequence is continuously searched by combining Monte Carlo and the above model until the obtained structure meets the design goal, and finally the sequence is redesigned using the protein sequence design model. Contrary to the protein structure prediction model, the input is the protein structure and the output is the corresponding amino acid sequence. Through the message passing neural network, the use of encoder-decoder architecture and flexible decoding order makes the model have higher performance. The training data set of the above-mentioned protein sequence design model is the protein structure data set PDB. The sequence obtained by searching needs to use the protein structure prediction model to predict the structure. For each predicted structure, it is input to the protein sequence design model to generate the final corresponding sequence, so as to reduce overfitting in the search process.
虽然通过以上设计方法得到的新抗原预期能更好地与HLA相结合,但因为仅有少量的HLA结合肽能够实际呈现在细胞表面,因此只基于亲和力无法得到能够更好激发免疫能力的新抗原,对此,本实施例还采用以下方法进行后续的筛选。Although the neoantigen obtained through the above design method is expected to better bind to HLA, because only a small amount of HLA-binding peptides can actually be displayed on the cell surface, it is impossible to obtain a neoantigen that can better stimulate immunity based on affinity alone , in this regard, this embodiment also adopts the following method for subsequent screening.
即,步骤S4中,通过分子对接、递呈能力预测和免疫能力预测进行突变后的新抗原序列的筛选,获取最终的新抗原序列。That is, in step S4, the screening of the mutated neoantigen sequence is carried out through molecular docking, presentation ability prediction and immune ability prediction to obtain the final neoantigen sequence.
通过上述步骤S3获取新抗原的氨基酸序列后,仍需要对获取的新抗原序列进行进一步的筛选。在后续筛选的过程中,若需要用到结构数据,则可使用结构预测模型得到相应的结构来用于筛选,最终确定筛选后所需的新抗原序列。After obtaining the amino acid sequence of the neoantigen through the above step S3, further screening of the obtained neoantigen sequence is still required. In the subsequent screening process, if structural data is required, the structure prediction model can be used to obtain the corresponding structure for screening, and finally determine the required neoantigen sequence after screening.
其中,分子对接被广泛用于研究蛋白质-配体的相互作用以及药物的发现和开发中。通常情况下,在已知结构的目标的基础上,分子对接用来预测小分子与目标的结合构象和结合自由能,判断它们如何相结合。在本实施例中,通过分子对接判断设计生成的新抗原序列与其对应的HLA序列的结合亲和力,根据结合亲和力进行筛选,筛选得到有更高亲和力的新抗原序列。Among them, molecular docking is widely used in the study of protein-ligand interactions and in the discovery and development of drugs. Typically, on the basis of targets with known structures, molecular docking is used to predict the binding conformation and binding free energy of small molecules and targets to determine how they bind. In this example, molecular docking was used to determine the binding affinity of the designed and generated neoantigen sequence to its corresponding HLA sequence, and screening was performed based on the binding affinity to obtain a neoantigen sequence with higher affinity.
新抗原不仅需要能够与HLA结合,还需要呈递在细胞表面才能被T细胞识别并触发免疫反应。对此,在本实施例中,利用基于深度学习的递呈能力模型预测HLA结合肽是否能够呈现在细胞表面,该模型通过捕获HLA结合肽的结合信息和循环模式来进行预测。上述递呈能力模型以洗脱配体数据集为训练数据集进行训练,该数据集中包含HLA单等位和多等位。通过该深度学习模型,过滤掉不能呈现在细胞表面的新抗原,以进一步得到更有说服力的结果。Neoantigens not only need to be able to bind to HLA, but also need to be presented on the cell surface in order to be recognized by T cells and trigger an immune response. In this regard, in this example, a deep learning-based presentation ability model is used to predict whether HLA-binding peptides can be displayed on the cell surface. The model makes predictions by capturing the binding information and circulation patterns of HLA-binding peptides. The above presentation ability model is trained with the eluted ligand data set as the training data set, which contains HLA monoallelic and multiallelic. Through this deep learning model, new antigens that cannot be presented on the cell surface are filtered out to further obtain more convincing results.
为进一步筛选出具有免疫能力的新抗原,在本实施例中,使用新抗原特征作为过滤标准,包括新抗原-HLA复合物的解离常数和结合稳定性以及肿瘤中突变基因的表达,将以上特征纳入免疫原性评分而计算得到分数;对新抗原序列计算免疫原性分数,并将分数最高的新抗原作为输出结果,获取最终的新抗原序列。In order to further screen out neoantigens with immunocompetence, in this embodiment, the characteristics of neoantigens are used as filtering criteria, including the dissociation constant and binding stability of neoantigen-HLA complexes and the expression of mutant genes in tumors. The feature is incorporated into the immunogenicity score to calculate the score; the immunogenicity score is calculated for the neoantigen sequence, and the neoantigen with the highest score is used as the output result to obtain the final neoantigen sequence.
通过本实施例上述方案,利用深度学习来设计全新的新抗原,相比于传统的仅针对已有新抗原的方案来说,大大增加了蛋白质空间的探索;并且在使用深度学习设计新抗原之后,还进一步对设计结果进行筛选,得到具有免疫能力的新抗原,使其更有实际意义,可以为后续的个性化疫苗的开发提供指导价值。Through the above scheme of this embodiment, using deep learning to design a new neoantigen, compared with the traditional scheme that only targets existing neoantigens, greatly increases the exploration of protein space; and after using deep learning to design new antigens , and further screened the design results to obtain neoantigens with immunity, which made it more practical and could provide guidance for the subsequent development of personalized vaccines.
实施例二Embodiment two
本实施例提供了一种基于深度学习的新抗原序列生成系统,包括:This embodiment provides a system for generating neoantigen sequences based on deep learning, including:
数据获取模块,用于获取原始新抗原数据;A data acquisition module, configured to acquire raw neoantigen data;
数据预处理模块,用于对原始新抗原数据进行预处理,获取原始新抗原序列及其对应的HLA序列;The data preprocessing module is used to preprocess the original neoantigen data to obtain the original neoantigen sequence and its corresponding HLA sequence;
新抗原序列设计模块,用于根据原始新抗原序列及其对应的HLA序列,对原始新抗原序列进行突变,并对原始新抗原序列在序列空间中进行蒙特卡洛搜索,确定符合设计目标的突变后的新抗原序列;The neoantigen sequence design module is used to mutate the original neoantigen sequence based on the original neoantigen sequence and its corresponding HLA sequence, and perform a Monte Carlo search on the original neoantigen sequence in the sequence space to determine the mutation that meets the design goal After the neoantigen sequence;
新抗原序列筛选模块,用于通过分子对接、递呈能力预测和免疫能力预测进行突变后的新抗原序列的筛选,获取最终的新抗原序列。The neoantigen sequence screening module is used to screen the mutated neoantigen sequence through molecular docking, presentation ability prediction and immune ability prediction to obtain the final neoantigen sequence.
实施例三Embodiment Three
本实施例提供了一种电子设备,包括存储器和处理器以及存储在存储器上并在处理器上运行的计算机指令,所述计算机指令被处理器运行时,完成如上所述的基于深度学习的新抗原序列生成方法中的步骤。This embodiment provides an electronic device, including a memory, a processor, and computer instructions stored in the memory and run on the processor. When the computer instructions are run by the processor, the above-mentioned new deep learning-based Steps in a method for generating an antigen sequence.
实施例四Embodiment four
本实施例还提供了一种计算机可读存储介质,用于存储计算机指令,所述计算机指令被处理器执行时,完成如上所述的基于深度学习的新抗原序列生成方法中的步骤。This embodiment also provides a computer-readable storage medium for storing computer instructions, and when the computer instructions are executed by a processor, the steps in the method for generating neoantigen sequences based on deep learning as described above are completed.
以上实施例二至四中涉及的各步骤与方法实施例一相对应,具体实施方式可参见实施例一的相关说明部分。术语“计算机可读存储介质”应该理解为包括一个或多个指令集的单个介质或多个介质;还应当被理解为包括任何介质,所述任何介质能够存储、编码或承载用于由处理器执行的指令集并使处理器执行本发明中的任一方法。The steps involved in the above embodiments 2 to 4 correspond to the method embodiment 1, and for specific implementation methods, please refer to the relevant description part of the embodiment 1. The term "computer-readable storage medium" shall be construed to include a single medium or multiple media including one or more sets of instructions; and shall also be construed to include any medium capable of storing, encoding, or carrying A set of instructions to execute and cause the processor to execute any method in the present invention.
本领域技术人员应该明白,上述本发明的各模块或各步骤可以用通用的计算机装置来实现,可选地,它们可以用计算装置可执行的程序代码来实现,从而,可以将它们存储在存储装置中由计算装置来执行,或者将它们分别制作成各个集成电路模块,或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。本发明不限制于任何特定的硬件和软件的结合。Those skilled in the art should understand that each module or each step of the present invention described above can be realized by a general-purpose computer device, optionally, they can be realized by a program code executable by the computing device, thereby, they can be stored in a memory The device is executed by a computing device, or they are made into individual integrated circuit modules, or multiple modules or steps among them are made into a single integrated circuit module for realization. The invention is not limited to any specific combination of hardware and software.
以上所述仅为本发明的优选实施例而已,并不用于限制本发明,对于本领域的技术人员来说,本发明可以有各种更改和变化。凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the present invention. For those skilled in the art, the present invention may have various modifications and changes. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the present invention shall be included within the protection scope of the present invention.
上述虽然结合附图对本发明的具体实施方式进行了描述,但并非对本发明保护范围的限制,所属领域技术人员应该明白,在本发明的技术方案的基础上,本领域技术人员不需要付出创造性劳动即可做出的各种修改或变形仍在本发明的保护范围以内。Although the specific implementation of the present invention has been described above in conjunction with the accompanying drawings, it is not a limitation to the protection scope of the present invention. Those skilled in the art should understand that on the basis of the technical solution of the present invention, those skilled in the art do not need to pay creative work Various modifications or variations that can be made are still within the protection scope of the present invention.
Claims (10)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202310331805.3A CN116403639A (en) | 2023-03-30 | 2023-03-30 | New antigen sequence generation method and system based on deep learning |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202310331805.3A CN116403639A (en) | 2023-03-30 | 2023-03-30 | New antigen sequence generation method and system based on deep learning |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN116403639A true CN116403639A (en) | 2023-07-07 |
Family
ID=87011799
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202310331805.3A Pending CN116403639A (en) | 2023-03-30 | 2023-03-30 | New antigen sequence generation method and system based on deep learning |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN116403639A (en) |
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN117037903A (en) * | 2023-07-24 | 2023-11-10 | 深圳新锐基因科技有限公司 | Method and device for reducing immunogenicity of protein drugs to set population |
| CN117912559A (en) * | 2024-01-17 | 2024-04-19 | 北京百度网讯科技有限公司 | Method, device, electronic device and storage medium for determining antibody sequence |
| CN120452555A (en) * | 2025-07-11 | 2025-08-08 | 南昌大学第一附属医院 | MHC-presented peptide prediction method and system based on multimodal deep learning |
| CN120600110A (en) * | 2025-08-06 | 2025-09-05 | 上海交通大学医学院附属仁济医院 | Artificial intelligence-based precise screening method for variable pathogen T cell epitopes |
Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110322925A (en) * | 2019-07-18 | 2019-10-11 | 杭州纽安津生物科技有限公司 | A method of prediction fusion generates neoantigen |
| CN111798919A (en) * | 2020-06-24 | 2020-10-20 | 上海交通大学 | A kind of tumor neoantigen prediction method, prediction device and storage medium |
| CN113160887A (en) * | 2021-04-23 | 2021-07-23 | 哈尔滨工业大学 | Screening method of tumor neoantigen fused with single cell TCR sequencing data |
| US20220028487A1 (en) * | 2020-07-27 | 2022-01-27 | Shenzhen Neocura Biotechnology Corporation | Deep learning-based method for predicting binding affinity between human leukocyte antigens and peptides |
| CN114333998A (en) * | 2020-10-10 | 2022-04-12 | 格源致善(上海)生物科技有限公司 | Tumor neoantigen prediction method and system based on deep learning model |
| US20230032934A1 (en) * | 2019-11-27 | 2023-02-02 | Myst Therapeutics, Llc | Method of producing tumor-reactive t cell composition using modulatory agents |
| US20230047716A1 (en) * | 2020-01-07 | 2023-02-16 | Korea Advanced Institute Of Science And Technology | Method and system for screening neoantigens, and uses thereof |
-
2023
- 2023-03-30 CN CN202310331805.3A patent/CN116403639A/en active Pending
Patent Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110322925A (en) * | 2019-07-18 | 2019-10-11 | 杭州纽安津生物科技有限公司 | A method of prediction fusion generates neoantigen |
| US20230032934A1 (en) * | 2019-11-27 | 2023-02-02 | Myst Therapeutics, Llc | Method of producing tumor-reactive t cell composition using modulatory agents |
| US20230047716A1 (en) * | 2020-01-07 | 2023-02-16 | Korea Advanced Institute Of Science And Technology | Method and system for screening neoantigens, and uses thereof |
| CN111798919A (en) * | 2020-06-24 | 2020-10-20 | 上海交通大学 | A kind of tumor neoantigen prediction method, prediction device and storage medium |
| US20220028487A1 (en) * | 2020-07-27 | 2022-01-27 | Shenzhen Neocura Biotechnology Corporation | Deep learning-based method for predicting binding affinity between human leukocyte antigens and peptides |
| CN114333998A (en) * | 2020-10-10 | 2022-04-12 | 格源致善(上海)生物科技有限公司 | Tumor neoantigen prediction method and system based on deep learning model |
| CN113160887A (en) * | 2021-04-23 | 2021-07-23 | 哈尔滨工业大学 | Screening method of tumor neoantigen fused with single cell TCR sequencing data |
Non-Patent Citations (2)
| Title |
|---|
| JIAN LIU等: "Cancer vaccines as promising immuno-therapeutics: platforms and current progress", 《JOURNAL OF HEMATOLOGY & ONCOLOGY》, 31 December 2022 (2022-12-31) * |
| 王广志: "基于个性化肿瘤HLA-Ⅰ类新生抗原肽预测模型的改进研究", 《中国优秀硕士学位论文全文数据库 工程科技Ⅰ辑》, 15 February 2021 (2021-02-15), pages 016 - 2748 * |
Cited By (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN117037903A (en) * | 2023-07-24 | 2023-11-10 | 深圳新锐基因科技有限公司 | Method and device for reducing immunogenicity of protein drugs to set population |
| CN117912559A (en) * | 2024-01-17 | 2024-04-19 | 北京百度网讯科技有限公司 | Method, device, electronic device and storage medium for determining antibody sequence |
| CN120452555A (en) * | 2025-07-11 | 2025-08-08 | 南昌大学第一附属医院 | MHC-presented peptide prediction method and system based on multimodal deep learning |
| CN120452555B (en) * | 2025-07-11 | 2025-09-19 | 南昌大学第一附属医院 | MHC-presented peptide prediction method and system based on multimodal deep learning |
| CN120600110A (en) * | 2025-08-06 | 2025-09-05 | 上海交通大学医学院附属仁济医院 | Artificial intelligence-based precise screening method for variable pathogen T cell epitopes |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN116403639A (en) | New antigen sequence generation method and system based on deep learning | |
| CN114649054B (en) | Antigen affinity prediction method and system based on deep learning | |
| Sikandar et al. | Decision tree based approaches for detecting protein complex in protein protein interaction network (PPI) via link and sequence analysis | |
| CN116312752A (en) | A Rigid Body Protein Docking Method Based on Equivariant Graph Neural Network | |
| Pei et al. | Fabind: Fast and accurate protein-ligand binding | |
| CN110136773A (en) | A method for constructing plant-protein interaction network based on deep learning | |
| CN115985384A (en) | A targeted peptide design method and system based on reinforcement learning and molecular simulation | |
| CN113762417B (en) | Method for enhancing HLA antigen presentation prediction system based on deep migration | |
| Yu et al. | Constructing query-driven dynamic machine learning model with application to protein-ligand binding sites prediction | |
| Qi et al. | String kernels construction and fusion: a survey with bioinformatics application | |
| Lan et al. | Transformer-based single-cell language model: A survey | |
| Wang et al. | Molecular property prediction based on a multichannel substructure graph | |
| CN119580825B (en) | Drug target prediction model and method based on graph neural network | |
| Jha et al. | Prediction of protein-protein interactions using vision transformer and language model | |
| Widrich et al. | DeepRC: immune repertoire classification with attention-based deep massive multiple instance learning | |
| CN113807468A (en) | HLA antigen presentation prediction method and system based on multi-mode depth coding | |
| CN120280001B (en) | HLA and antigen peptide combination prediction method based on interactive attention | |
| CN113257341A (en) | Method for predicting distribution of distance between protein residues based on depth residual error network | |
| Chen et al. | Antibody Design and Optimization with Multi-scale Equivariant Graph Diffusion Models for Accurate Complex Antigen Binding | |
| Peng et al. | TlcMHCpan: A Novel Deep Learning Model for Enhanced Pan-Specific Prediction of Peptide-HLA Binding | |
| Wang et al. | TDLM: A Diffusion Language Model for TCR Sequence Exploration and Generation | |
| Wang et al. | A multi-objective comprehensive framework for predicting protein-peptide interactions and binding residues | |
| Dai et al. | CryoDomain: Sequence-free Protein Domain Identification from Low-resolution Cryo-EM Density Maps | |
| CN119479821A (en) | A method for predicting the immunogenicity of tumor antigen peptides based on multimodal deep learning | |
| CN120164519B (en) | Ligand prediction method of ligand-free target based on contrast learning and protein clustering strategy |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination |