[go: up one dir, main page]

WO2025175065A1 - Systèmes et procédés d'évaluation de réponse immunitaire et leurs applications - Google Patents

Systèmes et procédés d'évaluation de réponse immunitaire et leurs applications

Info

Publication number
WO2025175065A1
WO2025175065A1 PCT/US2025/015875 US2025015875W WO2025175065A1 WO 2025175065 A1 WO2025175065 A1 WO 2025175065A1 US 2025015875 W US2025015875 W US 2025015875W WO 2025175065 A1 WO2025175065 A1 WO 2025175065A1
Authority
WO
WIPO (PCT)
Prior art keywords
immune
gene
sequences
cell receptor
status
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/US2025/015875
Other languages
English (en)
Inventor
Maxim Zaslavsky
Erin CRAIG
Jackson Michuda
Nidhi Sehgal
Robert Tibshirani
Anshul Bharat KUNDAJE
Scott D. Boyd
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Leland Stanford Junior University
Original Assignee
Leland Stanford Junior University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Leland Stanford Junior University filed Critical Leland Stanford Junior University
Publication of WO2025175065A1 publication Critical patent/WO2025175065A1/fr
Pending legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K14/00Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
    • C07K14/435Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans
    • C07K14/705Receptors; Cell surface antigens; Cell surface determinants
    • C07K14/70503Immunoglobulin superfamily
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K14/00Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
    • C07K14/435Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans
    • C07K14/705Receptors; Cell surface antigens; Cell surface determinants
    • C07K14/70503Immunoglobulin superfamily
    • C07K14/7051T-cell receptor (TcR)-CD3 complex
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/10Signal processing, e.g. from mass spectrometry [MS] or from PCR
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems

Definitions

  • the disclosure is generally directed to systems and methods for assessment of immune response by evaluating sequences of B cell receptors and T cell receptors, and further to biomedical applications based on an immune response assessment.
  • B cells and T cells are immunological cells that provide an adaptive immune response to pathogens and vaccines.
  • B cells provide humoral immunity, meaning when matured, B cells produce antibodies to detect pathogens and other foreign bodies for removal.
  • T cells provide cellular immunity, meaning when matured, T cells can detect when a cell of the body is infected or having an abnormal growth of cells and treat the cells in order to remove the infection or growth. To potentiate these responses, B cells and T cells utilize receptors capable of complementing with pathogens such that the pathogen can be detected.
  • Fig. 2 provides a flow diagram of a computational method to train one or more computational models for assessment of immune receptors to infer an immune status.
  • Fig. 3 provides a flow diagram of a computational method to predict immune status of an individual based on V(D)J-gene segment usage.
  • Fig. 4 provides a flow diagram of a computational method to predict immune status based on clustering of immune receptor sequences.
  • Fig. 5 provides a flow diagram of a computational method to predict immune status based on class associations of embedded immune receptor sequences.
  • Fig. 6 provides a flow diagram of a computational method to predict a total immune status by aggregating immune status predictions of two or more classifiers.
  • Figs. 7A to 7E provide alternative examples of computational frameworks to associate receptor sequences with a class.
  • Fig. 8 provides an example of a computational system for performing computational methods that predict immune status.
  • Figs. 9-34H provide exemplary data in support of the description and claims.
  • the various embodiments of the disclosure are generally directed towards systems and methods for computationally evaluating immune receptor sequences to predict an immune status of the immune receptors.
  • a predicted immune status can be utilized in several medical applications, including (but not limited to) diagnostic test surrogates, diagnosing a medical condition, informing therapy strategy, and monitoring patients for progression and/or amelioration of medical conditions.
  • one or more trained machine-learning computational models are utilized to predict immune status.
  • immune status is predicted with a multistage classifier that assess embedded immune receptor sequences.
  • the multi-stage classifier has a framework that promotes associating immune receptor sequences with a class in favor of merely associating immune receptor sequences with their selected V(D)J-gene segments.
  • a class can be a medical condition and the multistage classifier predicts the association of an immune receptor sequence with that medical condition.
  • a computational language model is utilized to interpret immunological peptide sequence semantics by extracting latent properties from each sequence.
  • the language model is trained to embed utilizing protein and peptide sequences.
  • other computational logistic regression models can be utilized to predict immune status based on V(D)J-gene-segment usage or based on immune receptor sequence clustering.
  • two or more predictions of immune status are aggregated with a computational logistic regression model to yield a total immune status.
  • a predicted immune status can be computed for an individual based on immune cell receptor sequences derived from a sample of their B cells and/or T cells.
  • the process of collecting a sample of immune cells, performing high-throughput sequencing to yield immune cell receptor sequences, and predicting an immune status is a surrogate diagnostic test.
  • an individual’s predicted immune status informs further clinical action that can be performed, such as performing a diagnostic assessment, administering a therapeutic treatment, altering a treatment regimen, or administering a vaccination.
  • a patient is monitored by collecting multiple samples over a period of time and assessing immune status (or differential immune status from a prior sample).
  • a monitoring regimen can be utilized to monitor progression of a medical condition, immunological response to a treatment, and/or monitor relapse and remission of a medical condition. Monitoring can inform whether to initiate/reinitiate a treatment regimen, end/pause a treatment regimen, and/or alter a treatment regimen. Monitoring can also be utilized to determine whether other clinical actions, such as diagnostic assessments are to be performed.
  • Several embodiments are also directed to synthesis of antigen-specific protein polymers and/or to engineering immune cells.
  • Various computational models of the disclosure can determine an association of an immune receptor sequence and biological phenotype.
  • Computational models that utilize embedded immune receptor sequences also learn the strength of association of specific immune receptor sequences to a biological phenotype. These associations can be utilized to select immune receptor sequences (or, generate de novo immune receptor sequences in silico) that would be useful for generating antigen-binding domains of antibodies and/or antigen-recognizing domains of T-cell receptors.
  • Antigen-complementary protein polymers can be synthesized to be utilized for further research assessment, diagnostic assessment, and/or biologic treatments.
  • Antigen complementary protein polymers include (but are not limited to) an immunoglobulin (Ig), a monoclonal antibody, a nanobody, a B cell receptor, a T cell receptor, a chimeric antigen receptor (CAR), a CDR peptide, and any partial peptide thereof with antigen complementation.
  • antigen-complementary cells can be engineered to express a select B-cell receptor or a select T-cell receptor, which can be utilized for further research assessment and/or biologic treatments.
  • Antigen complementary cells include (but are not limited to) a B cell, a T cell, a CAR T cell, and a hybridoma cell.
  • Utilizing amino-acid sequences within language models yields highly informative embeddings that can be highly dimensional, or can be reduced dimensionally to yield compact, meaningful, and optimized representations. These representations can then be utilized within a classifier to learn associations of these amino-acid sequences with a class (e.g., a biological phenotype) that is not afforded by clustering methods and other classical machine-learning methods that are incapable of capturing these intricate associations.
  • the computational methods of the disclosure take advantage of language model embedding of immune receptor sequences to learn intricate associations with receptor sequences.
  • the computational models of the disclosure provide improvement over earlier iterations of computational models that yield immune-receptor-related predictions. It was discovered that prior methods using a language model to yield embedded immune receptor sequences had difficulty robustly associating embedded immune receptor sequence with biological phenotypes. This problem arises due to the complex and expansive repertoire of immune receptor sequence of each individual, with only a small portion of that repertoire dedicated to the biological phenotype of interest.
  • a basic classifier model When using embedded immune receptor sequences from a cohort of individuals sharing a biological phenotype (e.g., influenza vaccination), a basic classifier model would prioritize learning genetic associations (e.g., influence of particular V(D)J-gene segments on antigenic specificity) over biological phenotype associations (e.g., immune receptor sequence patterns that confer complementarity to antigens).
  • genetic associations e.g., influence of particular V(D)J-gene segments on antigenic specificity
  • biological phenotype associations e.g., immune receptor sequence patterns that confer complementarity to antigens.
  • the various embodiments of systems and methods of the current disclosure account for these issues by isolating and/or penalizing the genetic associations such that a classifier can robustly learn associations of embedded immune receptor sequences with the biological phenotype of interest.
  • high-throughput sequencing will generally refer to sequencing of nucleic acids (DNA or RNA), but could refer to an amino acid sequence when a mass-spectrometry based method is utilized to perform sequencing.
  • immune receptor sequences for input within computational models will generally refer to amino acid sequences, which directly relate to antigen recognition; however, nucleic acid sequences can be utilized instead.
  • Clinical and medical context of immune receptor evaluation will generally refer to sequencing of nucleic acids (DNA or RNA), but could refer to an amino acid sequence when a mass-spectrometry based method is utilized to perform sequencing.
  • immune receptor sequences for input within computational models will generally refer to amino acid sequences, which directly relate to antigen recognition; however, nucleic acid sequences can be utilized instead.
  • a method can comprise processing of a biological sample and computational steps that utilizes immune receptor sequences yield ed by the processing of the biological sample.
  • a method can optionally further comprise clinical steps, such as collecting the biological sample from the individual for processing and/or performing clinical action on the individual upon predicting an immune status.
  • the method can be useful for various clinical evaluations and diagnostics to better understand the immune status of the individual.
  • An immune status generally refers to an adaptive immunological response (or lack of immunological response) involving B cells and/or T cells.
  • an individual can be evaluated for an autoimmune disease and the method can predict whether the individual has an immunological response that signifies presence and/or severity of the autoimmune disease.
  • the method can be applied in many clinical contexts, including (but not limited to) assessment of: pathogen exposure, autoimmune disorders, inflammation, vaccination, cancer, treatment response, allergies, and transplant rejection.
  • the assessment can be particular or wide-ranging, meaning the method can yield a particular immune status such as the presence of medical condition; or the method can concurrently perform multiple assessments that can be utilized to generate a wide- ranging immune status covering multiple clinical and medical contexts.
  • the method can concurrently predict the presence of a medical condition and whether the condition is a type that is responsive to a particular treatment regimen.
  • Method 100 begins with obtaining (101 ) high- throughput sequencing data of an individual’s immune receptors.
  • Sequencing data can be obtained by any appropriate method.
  • nucleic acid molecules and/or proteinaceous species are extracted from a biological sample of the individual and prepared for sequencing. Any high-throughput method of sequencing can be utilized.
  • high-throughput sequencing is performed utilizing a sequencer, such as ones manufactured by Illumina (San Diego, CA).
  • high throughput sequencing is performed utilizing mass spectrometry.
  • Immune receptors can include B-cell receptors and/or T-cell receptors.
  • the one or more models comprise the classifier that utilizes embedded immune receptor sequences.
  • a trained computational language model is utilized to embed each immune receptor sequence. Any language model capable of extracting latent embeddings can be utilized.
  • Various types of language models can be utilized, such as (for example) neural networks, k-mer embeddings, unigram models, n- gram models, and exponential models.
  • the language model is a neural network trained to reconstruct protein sequences that have been masked or corrupted.
  • Various architectures of neural networks can be utilized, such as (for example) Long short-term memory (LSTM), transformers, and variational autoencoders.
  • the language model is capable of extracting a latent embedding of each peptide sequence regardless of its amino acid length.
  • a language model that can be utilized is ESM-2, which is trained on biological protein sequences across the spectrum of biological life.
  • Method 100 optionally uses (105) the predicted immune status of the individual within a clinical or medical context.
  • the method for predicting an individual’s immune response status is a diagnostic test surrogate.
  • Several immune- related disorders are difficult to diagnose, such as several autoimmune disorders.
  • no positive-affirmative diagnostic exists e.g., seronegative rheumatoid arthritis.
  • Other disorders require unpleasant procedures, in which the surrogate diagnostic can utilized in lieu of those procedures or as preliminary diagnostic assessment prior to confirm the need to perform the unpleasant procedure.
  • the method for predicting an individual’s immune response status is utilized to monitor an immunological response or to monitor a treatment response. Because the method can be performed with easily accessible collections of patient sample (e.g., a blood draw), the method can be repeated over a period of time to monitor the patient. Examples of monitoring include (but are not limited to) monitoring the progression of a pathogen infection, monitoring the remission and relapse of an autoimmune disorder, monitoring for relapse of cancer after completion of treatment, monitoring the effect of a treatment that modulates the immune system (e.g., immune suppressors), or monitoring the effect of a treatment for an undesired effect on the immune system.
  • a treatment that modulates the immune system e.g., immune suppressors
  • the method for predicting an individual’s immune response status is utilized to identify antigen binding sequences. Because the model learns robust associations between immune receptor sequences and a biological phenotype, the immune receptor sequences that yield a strong association can be further assessed and/or utilized. For example, these immune receptor sequences can provide design for high affinity biologies, such as antibody treatments and T-cell therapies. Because the method identifies immune receptor sequences that highly associate with disease conditions, these sequences would yield biologies shown to have robust activity in vivo.
  • the immune receptor sequences are sequences of B-cell receptors and/or sequences of T-cell receptors.
  • receptor sequences associated with biological phenotype or genetic characteristic are utilized to train the one or more models such that models can predict the biological phenotype or genetic characteristic when assessing immune receptor sequences.
  • the one or more models receive the immune receptor sequences and generate features for training and/or assessment. The one or more models can further be aggregated to yield a total immune status.
  • Fig. 2 Provided in Fig. 2 is a computational method to train one or more models to predict an immune status, which can be performed on a computational processing system.
  • Computational method 200 receives (201 ) high-throughput sequencing data of immune receptors associated with a biological phenotype or genetic characteristic.
  • the high-throughput sequencing data that is received is derived from at least two cohorts of individuals, each cohort having either a biological phenotype or genetic characteristic.
  • the B-cell receptors can include isotypes IgM, IgD, IgA, IgG, and/or IgE and subclasses thereof (e.g., lgA1 , lgA2, lgG1 , lgG2, lgG3, and lgG4).
  • Examples of pathogenic infections include a viral infection, a bacterial infection, a fungal infection, or a parasitic infection.
  • examples of viral infections include coronavirus, influenza, dengue, rhinovirus, human immunodeficiency virus (HIV), herpes simplex virus (HSV), human papillomavirus (HPV), viral hepatitis, rabies virus, measles, and ebola.
  • examples of bacterial infection include staphylococcus, streptococcus, enterococcus, Clostridium, and meningococcus.
  • Examples of fungal infection include a yeast infection.
  • Examples of parasitic infections include malaria, giardiasis, toxoplasmosis, trypanosomiasis, and cysticercosis.
  • the one or more models are trained to provide an indication of an immune response to an organ transplant or an immune response to a blood transfusion.
  • Method 200 can train (203) a classifier to predict immune status based on V(D)J-gene-segment usage.
  • Each immune receptor comprises one V-gene segment and one J-gene segment.
  • Usage of the gene has a correlation with various antigens such that certain immune responses to certain antigens bias the presence of particular V-genes and particular J-genes. Based on this bias, the usage of V(D)J genes can be utilized as features in a machine learning model to learn an association with an immunogenic response.
  • the level of somatic hypermutation of B-cell receptors is also biased based on types of immune response. Accordingly, somatic hypermutation of B- cell receptors can also be utilized as features in a machine learning model to learn an association with an immunogenic response.
  • a computational processor can receive the sequencing data of immune receptors and identify the V-gene, the J-gene, and/or the presence of somatic hypermutation within each immune receptor.
  • the usage can further be labeled with the immune status of the individuals of each cohort.
  • the model can learn to identify and differentiate immune response based on the association with V(D)J-gene-segment usage and/or somatic hypermutation (for B-cell receptors).
  • the classifier to predict immune status based on V(D)J- gene-segment usage is a logistic regression model.
  • the model can be any type of regression model for associating variables, such as (for example) LASSO, gradient boosted trees, a neural network, nearest neighbors, decision trees, or a support vector machine.
  • Method 200 can train (205) a classifier to predict immune status based on clustering of immune receptor sequences.
  • CDR3 of each immune receptor comprises the hypervariable region which promotes high specificity for recognizing an antigen.
  • clusters are formed. The similarity clusters that are highly represented in or populated by a cohort of individuals having a particular immune response are then associated with that immune response.
  • the association of immune receptors with clusters can be used as features within a logistic regression model to yield a prediction of immune status.
  • a computational model can receive the sequencing data of immune receptors and identify the hypervariability region sequence.
  • the sequence can be utilized to compute amino acid distance similarities which are used to cluster the sequences.
  • Clusters can be assigned a label as determined based on the frequency of immune receptor sequences that are derived from one of the cohorts within a cluster and further the uniqueness that one cohort contributes sequences to a cluster.
  • Each cluster can further compute a centroid sequence that is one of: a consensus sequence created by taking the most common residue at each amino acid position across all sequences in the cluster, a sequence created to minimize distances to all other sequences, the medoid sequence of the cluster, the mode sequence of the cluster, or a sequence created or selected with other criteria based on sequences in the cluster.
  • the classifier to predict immune status based on clustering of immune receptor sequences is a logistic regression model.
  • the model can be any type of regression model for associating variables, such as (for example) LASSO, gradient boosted trees, a neural network, nearest neighbors, decision trees, or a support vector machine.
  • Method 200 can train (207) a multi-stage classifier to predict an immune status.
  • the multi-stage classifier can be utilized with a language model to generate embedded representations of receptor sequences, which can be utilized as features to learn an association of receptor sequences with an immune status.
  • Many benefits are yielded by utilizing a language model to yield embedded representations of immune receptor sequences.
  • the language model can handle a variation of lengths of sequences and prevents a need to trim any portion of a hypervariable sequence.
  • embedding via a language model retains a lot of sequence information, allowing for more complex associations among sequences and with immune status to be learned.
  • a multi-stage classifier can comprise two or more stages in which a first stage (or an early stage) learns associations of embedded immune receptor sequences and immune status while accounting for V-gene selection and/or J-gene selection and/or isotype.
  • a first stage or an early stage learns associations of embedded immune receptor sequences and immune status while accounting for V-gene selection and/or J-gene selection and/or isotype.
  • model architectures that can be utilized to complete this task (see Figs. 7A-7E and associated description).
  • association between embedded immune receptor sequences and a biological class is learned in isolation of V- gene selection and/or J-gene selection and/or isotype; or the V-gene selection and/or J- gene selection and/or isotype is penalized or otherwise accounted for such that it does not overly influence associations learned between embedded immune receptor sequences and the biological class.
  • V-gene selection and/or J- gene selection and/or isotype One method to isolate V-gene selection and/or J- gene selection and/or isotype is to separate embedded immune receptor sequences by V-gene selection and/or J-gene selection and/or isotype and then train a classifier for each group, yielding a series of classifiers in the first stage (or early stage).
  • the biological class can be any biological phenotype or genetic characteristic, but especially a phenotype or characteristic that would affect immune status.
  • the series of classifiers can be split by a variety of metrics in regards to V-gene selection, J-gene selection, and isotype.
  • V-gene family selection and/or J-gene family selection is utilized to split the series of classifiers.
  • the specific V-gene selection and/or the specific J-gene selection is utilized to split the series of classifiers.
  • the isotype is utilized to split the series of classifiers.
  • the series of classifiers each include only a subclass of immune receptor sequences and the subclass is defined by one of: isotype of B-cell receptor; gene of a B-cell receptor; V gene of a T-cell receptor; V gene and isotype of a B-cell receptor; V-gene family of a B-cell receptor; V-gene family of a T-cell receptor; V-gene family and isotype of a B-cell receptor; J gene of a B-cell receptor; J gene of a T-cell receptor; J gene and isotype of a B-cell receptor; J-gene family of a B-cell receptor; J-gene family of a T-cell receptor; J-gene family and isotype of a B- cell receptor; V gene and J gene of a B-cell receptor; V gene and J gene of a T-cell receptor; V gene, J gene and isotype of a B-cell receptor; V-gene family and J gene of a B-cell receptor; V gene, J gene
  • Each classifier in the series of classifiers yields a prediction of biological class.
  • a subsequent stage comprises a classifier that aggregates the associations of the first stage to yield a sample-level immune status.
  • the prediction of biological class can be reweighted or otherwise adjusted prior to combination, which can be based on (for example) an amount of sequences within each classifier or a learned weight based on contribution to the overall prediction of immune status.
  • a language model extracts a latent embedding of each immune receptor sequence.
  • a language model embeds the sequence with context and knowledge of the peptide sequence. For each amino acid, the context of surrounding amino acids is embedded.
  • an immune receptor sequence comprises a sequence within a CDR that is embedded with context of the other amino acids included in the embedding, with context of the amino acids of the CDR, with context of the amino acids of all CDRs, with context of the amino acids of the variable region, with context of the entire receptor sequence, or with context of any portion or portions of the entire receptor sequence. Any language model capable of extracting latent embeddings can be utilized.
  • Various types of language models can be utilized, such as (for example) neural networks, k-mer embeddings, unigram models, n-gram models, and exponential models.
  • the language model is a neural network trained to reconstruct protein sequences that have been masked or corrupted.
  • Various architectures of neural networks can be utilized, such as (for example) Long short-term memory (LSTM), transformers, and variational autoencoders.
  • the language model extracts features and transforms the features into a vector.
  • the language model compresses each peptide sequence into an internal, low-dimensional embedding that captures important traits, which are chosen through optimization.
  • Each iteration of model training refines the set of transformations used first to compress a masked sequence, then to restore an unmasked sequence from its low-dimensional version.
  • the transformation weights that deliver better reconstruction accuracy are accepted. If the final model can successfully un-mask protein sequences, the internal compression has extracted fundamental features that summarize the input sequence. Accordingly, in several embodiments, the language model is improved with each sequence utilized for training and/or assessment.
  • Any peptide sequences can be utilized to train the language model.
  • a diverse set of proteins from various biological kingdoms are utilized. Proteins of a particular species (e.g., homo sapiens) or of a specific class of proteins can be utilized.
  • immune receptor sequences can be utilized to yield an immunological language model.
  • a language model can be fine-tuned with further information, such as antibody structural information.
  • a trained language model can be further fine-tuned to reduce error for predicting amino acid contact maps.
  • a language model is initially trained on general proteins and peptides and then further trained on a particular class of sequences such that the model learns general rules first and then more specific rules of the particular class.
  • Training can be performed with supervision, which can include reconstruction error and/or knowledge of class labels of the sequences.
  • a model can be trained with a mixture of unsupervised and supervised learning.
  • the language model can be trained in an unsupervised fashion on unlabeled protein sequences from a variety of sources, then is fine-tuned in a supervised manner on labeled immune protein sequences.
  • Method 200 optionally combines (209) two or more classifiers to yield an ensemble model to predict total immune status.
  • the two or more classifiers can be any of the classifiers as described in reference to steps 203, 205, and 207, or any other classifier that provides a prediction of immune status.
  • each of the classifiers as described in reference to steps 203, 205, and 207 can be individually trained for B-cell receptors or T-cell receptors, yielding B-cell-receptor-specific and T-cell-receptor-specific classifiers.
  • each of the classifiers as described in reference to steps 203, 205, and 207 can be individually trained for subclasses of B-cell receptors or subclasses of T- cell receptors. Regardless of the total number of classifiers, a final classifier can be utilized to aggregate the immune status predictions of all the classifiers utilized.
  • Trained classifiers can be utilized to assess one or more immune receptor sequences to predict an immune status.
  • the collection of immune receptor sequences can be derived from a biological sample of an individual, which can be utilized to assess the repertoire of immune receptors of the individual.
  • Fig. 3 is a computational method to predict immune status of an individual based on V(D)J-gene-segment usage.
  • Computational method 300 receives (301 ) high-throughput sequencing data of immune receptors derived from an individual.
  • the sequencing data comprises at least 100 unique receptor sequences per individual, at least 1 ,000 unique receptor sequences per individual, at least 10,000 unique receptor sequences per individual, at least 100,000 unique receptor sequences per individual, at least 1 ,000,000 unique receptor sequences per individual, at least 10,000,000 unique receptor sequences per individual, at least 100,000,000 unique receptor sequences per individual, at least 1 ,000,000,000 unique receptor sequences per individual, at least 10,000,000,000 unique receptor sequences per individual, at least 100,000,000,000 unique receptor sequences per individual, or at least 1 ,000,000,000,000 unique receptor sequences per individual.
  • Immune receptors can include B-cell receptors and/or T-cell receptors.
  • the B-cell receptors can include isotypes IgM, IgD, IgA, IgG, and/or IgE and subclasses thereof (e.g., lgA1 , lgA2, lgG1 , lgG2, lgG3, and lgG4).
  • a computational processor can receive the sequencing data of immune receptors and identify the V-gene, the J-gene, the isotype, and/or the presence of somatic hypermutation within each immune receptor.
  • the usage of V-genes, J-genes, V-gene-J-gene pairs, isotypes (for B-cell receptor analysis), and/or somatic hypermutation (for B-cell-receptor analysis) can be entered into a trained model.
  • Computational method 300 predicts (303) immune status of the individual based on V(D)J-gene-segment usage.
  • the model can be trained by any appropriate method, such as (for example) as described within Fig. 2 step 203.
  • Fig. 4 is a computational method to predict immune status of an individual based on cluster membership.
  • Computational method 400 receives (401 ) high-throughput sequencing data of immune receptors derived from an individual.
  • the sequencing data comprises at least 100 unique receptor sequences per individual, at least 1 ,000 unique receptor sequences per individual, at least 10,000 unique receptor sequences per individual, at least 100,000 unique receptor sequences per individual, at least 1 ,000,000 unique receptor sequences per individual, at least 10,000,000 unique receptor sequences per individual, at least 100,000,000 unique receptor sequences per individual, at least 1 ,000,000,000 unique receptor sequences per individual, at least 10,000,000,000 unique receptor sequences per individual, at least 100,000,000,000 unique receptor sequences per individual, or at least 1 ,000,000,000,000 unique receptor sequences per individual.
  • Immune receptors can include B-cell receptors and/or T-cell receptors.
  • the B-cell receptors can include isotypes IgM, IgD, IgA, IgG, and/or IgE and subclasses thereof (e.g., lgA1 , lgA2, lgG1 , lgG2, lgG3, and lgG4).
  • computational method 500 aggregates (507) the class probability predictions within each classifier to yield immune receptor sequence subclass-level aggregate scores. Computational method 500 then predicts (509) immune status based on a classifier that aggregates immune receptor sequence-level scores, aggregated within each subclass if the first (or early) stage classifier was a series of classifiers divided by sequence subclass, to yield a sample-level immune status.
  • the language model and multi-stage classifier can be trained by any appropriate method, such as (for example) as described within Fig. 2 step 207.
  • the two or more classifiers can be classifiers for classification of B-cell receptor sequences, for classification of T-cell receptors, or for classification of B-cell and T-cell receptors.
  • Figs. 7A-7E Provided in Figs. 7A-7E are examples of model architectures that are trained to learn the contributions of immune receptor sequences to association with biological phenotypes (shown as disease logits within figures), such as various medical conditions.
  • Fig. 7A is an architecture utilizing one classification head that is trained with embedded immune receptor sequence features, or alternatively, a series of classification heads that are each trained utilizing a subclass of embedded immune receptor sequence features.
  • the subclass of immune receptor sequences restricted to each model is determined by one or more of: isotype, V-gene, and/or J-gene.
  • Fig. 7B Provided in Fig. 7B is an alternative architecture that incorporates knowledge of the V-gene and J-gene segments into embeddings that are concatenated to the immune receptor sequence embeddings from a language model.
  • the V-gene/J-gene segment information is represented either as a one-hot encoded categorical representation, or as an embedded representation that is learned by the classifier model such that, as the model determines values of the embeddings through training and backpropagation, the embeddings from V-gene/J-gene segment categories with related biological phenotype associations may develop similar latent embedding vector representations.
  • Fig. 7D provides an architecture in which the architecture embeds immune receptor sequences while accounting for the influence of V-gene/J-gene segments.
  • two classification heads are utilized: a first classification head uses the sequence embedding to learn association with biological phenotype, producing one set of predicted logits, and a second head uses the same sequence embedding to learn association with V-gene/J-gene categories, producing another set of predicted logits.
  • the model can be trained with a loss function such that the model is penalized when accurately predicting V-gene/J-gene categories but not accurately predicting biological phenotype.
  • the penalization term can be utilized during training to discourage overreliance on the V-gene/J-gene segment information and reward the model for extracting additional biological phenotype-associated information from the hypervariable sequence region.
  • a computational processing system to evaluate immune receptor sequences in accordance with various embodiments of the disclosure utilizes a processing system including one or more of a CPU, GPU and/or other processing engine.
  • the computational processing system is housed within a computing device.
  • the computational processing system is implemented as a software application on a computing device such as (but not limited to) mobile phone, a tablet computer, and/or portable computer.
  • the computational processing system 800 includes a processor system 802, an I/O interface 804, and a memory system 806.
  • the processor system 802, I/O interface 804, and memory system 806 can be implemented using any of a variety of components appropriate to the requirements of specific applications including (but not limited to) CPUs, GPUs, ISPs, DSPs, wireless modems (e.g., WiFi, Bluetooth modems), serial interfaces, depth sensors, IMUs, pressure sensors, ultrasonic sensors, volatile memory (e.g., DRAM) and/or nonvolatile memory (e.g., SRAM, and/or NAND Flash).
  • volatile memory e.g., DRAM
  • nonvolatile memory e.g., SRAM, and/or NAND Flash
  • the memory system is capable of storing receptor sequence data 808, language models 810, and classifier models 812.
  • the various model applications can be downloaded and/or stored in non-volatile memory. When executed the various model applications are each capable of configuring the processing system to implement computational processes including (but not limited to) the computational processes described above and/or combinations and/or modified versions of the computational methods described above.
  • the language models 810 and classifier models 812 can utilize receptor sequence data 808 to perform the various tasks of the models.
  • Fig. 9A a confusion matrix showing that utilizing immune receptor sequences within the trained machine learning model was able to distinguish several medical conditions.
  • the immune profiles of patients with Covid19 were distinguished from patients with HIV, were distinguished from healthy individuals, and were also distinguished from individuals with influenza vaccination.
  • the confusion matrix further shows that the autoimmune disorders systemic lupus erythematosus and type-1 diabetes were distinguished from one another and from the viral infections, vaccination recipients, and healthy individuals.
  • the model yielded 95% sensitivity, 69% specificity. Notably, this high sensitivity means that the test correctly identified 95% of cases, making a negative result less likely in true cases and suggesting that the test may be utilized for reducing the likelihood of an autoimmune disorder as a cause of symptoms. More models trained to distinguish autoimmune disorders yielded the following results:
  • the method is able to distinguish similar disorders from peripheral blood immune receptor sequences that are often difficult to diagnose accurately.
  • the computational method was also tested to delineate subtypes and severity of autoimmune disorders, specifically in the context of lupus.
  • a computational model was trained to distinguish treatment-naive pediatric lupus patients with nephritis from treatment-naive pediatric lupus patients without nephritis. The model yielded: 99.8% positive predictive value, 79.6% negative predictive value. This model data demonstrates that a positive test result is highly reliable for confirming nephritis, with a 99.8% likelihood that those who test positive truly have the disease (positive predictive value).
  • nephritis is often difficult to identify and further requires extraction of a biopsy from the kidneys to diagnose. Collection of a blood sample, sequencing immune receptor sequences, and assessing the sequences within a model would be much more preferred than the biopsy procedure.
  • the clinical method can be used as a diagnostic surrogate for the standard diagnostic test.
  • a confirmatory extraction of a biopsy can be performed if the test comes back positive, and/or treatment for nephritis can be administered (e.g., administration of mycophenolate mofetil benlysta, voclosporin, omalizumab, CAR-T, ofatumumab).
  • treatment for nephritis can be administered (e.g., administration of mycophenolate mofetil benlysta, voclosporin, omalizumab, CAR-T, ofatumumab).
  • a similar diagnostic surrogate could be performed for pregnant woman, distinguishing nephritis from preeclampsia.
  • Model 1 Overall repertoire composition.
  • the first machine learning model uses an individual’s IgH or TRB repertoire composition to predict disease status.
  • Prior studies have reported immune status classification using deviations in B cell or T cell V(D)J recombination gene segment usage from healthy individuals (16, 60).
  • Certain V gene segments may be more prevalent among antigen-responding V(D)J rearrangements than in the population of immune receptors in naive lymphocytes, and these gene segments increase in frequency as antigen-specific cells become clonally expanded (47, 61 ), which can be seen in our data (fig. S7A).
  • Model 2 Convergent clustering of antigen-specific sequences by edit distance.
  • the second classifier detects highly similar CDR3 amino acid sequences shared between individuals with the same diagnosis, an approach we and others have previously reported (12-15).
  • the CDR3s are the highly variable regions of IgH and TRB that often determine antigen binding specificity. For each locus, we clustered CDR3 sequences with the same V gene, J gene, and CDR3 length that had high sequence identity, allowing for some variability created by somatic hypermutation in B cell receptors. A new sample’s sequences can then be assigned to nearby clusters with the same constraints.
  • clusters enriched for sequences from subjects with a particular disease, using Fisher’s exact test and setting a significance threshold based on cross-validation with data derived from different individuals. The same significance threshold was used for all immune conditions tested. These clusters represent candidate sequences predictive of a specific disease across individuals. To score a new sample, we assigned its sequences to the identified predictive clusters. For each sample, we counted how many clusters associated with each disease were matched, and used these counts as features in a logistic regression model to predict immune status.
  • Model 3 Immune receptor sequence features extracted from a large language model. Small changes to immune receptor amino acid sequences can alter receptor structure and function, while different structures with divergent primary amino acid sequences can bind the same target epitope (62).
  • a protein language model which transforms BCR and TCR amino acid sequences into a lower-dimensional representation, to estimate functional similarities between sequences that extend beyond sequence alignment.
  • ESM-2 a self-supervised model trained to predict masked amino acids from the remaining sequence context of a protein, learning complex statistical relationships between residues in each sequence and encoding functional and evolutionary relationships across sequences (31 ).
  • Prior autoencoder models which also convert immune receptor sequences to a latent representation, have enabled classification and clustering of functionally related sequences (26, 28).
  • ESM-2 is a large language model with substantially more parameters that is trained on a much larger compendium of over 65 million proteins across the tree of life, which allows it to learn richer latent representations that encode properties of a broad diversity of protein structures and functions (31 ).
  • Each model is specialized to one IGHV gene and isotype combination in the BCR case, or to one TRBV gene in the TCR case. Somatic hypermutation rate was used as an additional feature in the BCR case (hypermutation does not occur in TCRs). Then we trained a second-stage model that aggregates predicted probabilities of disease state of all sequences in a patient sample, again grouped by IGHV gene and isotype or by TRBV gene, to predict disease state at the patient level.
  • T cell receptor beta chains and each immunoglobulin heavy chain isotype were sampled, PCR amplified with immunoglobulin and T cell receptor gene primers, and sequenced as previously described (13, 58). Briefly, we amplified T cell receptor beta chains and each immunoglobulin heavy chain isotype in separate PCR reactions using random hexamer-primed cDNA templates, and performed paired-end Illumina MiSeq sequencing. To reduce the potential for batch effects, data collection followed a consistent protocol. Only IgH sequencing was performed for some older cohorts processed before the study was extended to include TRB sequencing.
  • Paired- end reads were merged with FLASH (Fast Length Adjustment of SHort reads) v1 .2.11 .
  • Samples were demultiplexed by matching barcodes to the sample reads, and the barcodes and primers were trimmed.
  • This process iteratively merged sequence clusters from the same individual with matching IGHV/TRBV genes, IGHJ/TRBJ genes, and CDR-H3/CDR3P lengths, and with any crosscluster pairs having at least 95% CDR3[3 sequence identity by string substitution distance, or at least 90% CDR-H3 identity, which allows for BCR somatic hypermutation (13).
  • Performance metrics that take predicted class probabilities as input including AUROC and AUPRC, were computed separately for each fold, because probabilities may be on different scales in each fold and should not be combined into a global AUROC or AUPRC score.
  • For overall performance we report multi-class AUROC and AUPRC calculated in a one-versus-one fashion, taking the class size-weighted average of the binary AUROCs/AUPRCs calculated for each pair of classes, allowing each class a turn to be the positive class in the pair.
  • For each disease class’s individual performance we report multi-class AUROC calculated in a one-versus-rest fashion.
  • Model 1 Disease classifier using overall BCR or TCR repertoire composition features
  • IgG, IgA, IgM/D, and TRB summary feature vectors by tallying IGHV/TRBV gene and IGHJ/TRBJ gene usage, counting each clone once.
  • IGHV or TRBV genes by training set prevalence and excluded the bottom half, to avoid overfitting to minute differences in rare V gene proportions between cohorts.
  • To account for different total clone counts across samples we normalized total counts to sum to one per sample. Then we log-transformed and Z-scored (i.e. subtracted the mean and divided by the standard deviation, to achieve zero mean and unit variance) the matrix representing how counts are distributed across V-J gene pairs. Finally, we performed a PCA to reduce the count matrix to fifteen dimensions.
  • Fit and evaluate model for each locus Features were standardized, then used to fit separate BCR and TCR logistic regression models mapping from cluster counts to patient diagnosis. The models were fit on each train-2 set and evaluated on the corresponding validation set. The best performing models, according to average validation set ALIROC across three cross-validation folds for the disease classification task on our primary dataset, were ridge logistic regression for BCR and lasso logistic regression for TCR.
  • CMV cytomegalovirus
  • Comparison to exact matches approach Briefly, Emerson et al. classified cytomegalovirus (CMV) exposure by counting the number of TRB sequences that were exact matches to a CMV-associated list derived from a training set of CMV+ and CMV- individuals (12). CMV-associated sequences were determined with a Fisher’s exact test using a two-by-two contingency table denoting how many unique people are CMV+ and have a particular sequence; the threshold on Fisher’s exact test p values was selected by cross-validation.
  • CMV cytomegalovirus
  • Model 3 Disease classifier using language model embeddings
  • the analysis pipeline for classifying disease with language model embeddings of sequences is complex, but necessarily so because it aggregates individual sequence data to generate patient-level predictions.
  • sequence-level disease classifier for each sequence category: First, we trained classification models to map sequences to disease labels — one model per fold and per sequence category, defined as an IGHV gene and isotype pair for BCR sequences or a TRBV gene for TCR sequences. As input data, we used ESM-2 embeddings (standardized to zero mean and unit variance), along with somatic hypermutation rate in the BCR case. To train the individual-sequence-level model, we labeled each sequence with the patient's immune status or disease category. These labels should be considered noisy: we do not know which of a patient's sequences are truly associated with their disease. Since we have no true sequence labels, we also cannot evaluate classification performance for the sequence-level classifier directly. These sequence-level classifiers were trained on the train-1 set of each cross-validation fold.
  • This procedure gives the final k-dimensional predicted disease class probabilities vector for each sequence category in each sample. For example, it computes P(Covid19) among IGHV1 -24/lgG sequences, P(HIV) among IGHV1-24/lgG sequences, and so on; then similarly P(Covid19) among IGHV3-53/lgA sequences, P(HIV) among IGHV3-53/lgA sequences, and so forth.
  • Evaluate classifier We evaluated the pipeline by computing sample-level classification performance on the validation set using AUROC scores. (The one-versus- rest model predicted probabilities are not necessarily calibrated against each other, so we did not evaluate accuracy or other metrics determined by the comparison of predicted class probabilities for selecting a winning label).
  • the highest validation set performance on our primary dataset was achieved by a pipeline consisting of random forest sequence-level models, followed by a random forest second-stage model using mean aggregation.
  • the best pipeline used one-versus-rest ridge logistic regression sequence-level models, with a random forest second-stage model using mean aggregation after an entropy cutoff at 20% below the maximal entropy value (table S8).
  • SHAP Tree SHAP on each one-class-versus-rest random forest aggregation model, and averaged the SHAP feature importance values across positive class instances from the train-2 data used to train the aggregation model.
  • SHAP values were rescaled from 0 to 1.
  • Louvain clustering (resolution 1 .0) on the full SHAP value matrix in which rows represent positive class examples and columns represent features, then calculated average SHAP values within each cluster.
  • the permutation test ensured that all sequences originating from each healthy donor individual retained their grouping (i.e. had consistent binder/non-binder labels) throughout the process of performing 1000 label permutations. Since the known binders have low prevalence and since permutation affects the prevalence, we computed the ALIPRC fold change over baseline prevalence in each permutation, then calculated the p-value as the proportion of permutations whose AUPRC fold change was greater than the observed AUPRC fold change in the original data.
  • IGHV genes prioritized in influenza class prediction including IGHV1 -18 (87), IGHV2-70 (88), IGHV3-7 (89-91 ), IGHV3-23 (87, 89), IGHV3-30 (89), IGHV3-48 (89), IGHV3-66 (92), IGHV4-39 (90), IGHV4-59 (89, 93), and IGHV5-51 (91 ), have been found in antibodies reactive to influenza virus.
  • IGHV4-34 has been described in HIV-specific B cell responses with unusually high somatic hypermutation frequencies in individuals producing broadly- neutralizing antibodies (13). It was ranked highly for HIV classification by the model (Fig. 3B).
  • TRBV10-2, TRBV24-1 , and TRBV25- 1 all gene segments enriched in African healthy controls, were among the most highly ranked TRBV gene groups for classifying our predominantly African HIV cohort (fig. S16B).
  • TRBV5-1 , TRBV6-1 , TRBV7-2, and TRBV30 were also prioritized for HIV classification but were not enriched in African healthy controls.
  • TRBV2, TRBV6-6, TRBV12-3, and TRBV18 were prioritized for T1 D prediction (fig.
  • TRBV12-3 and TRBV18 also had potentially age-associated differences in strength of contribution, which was seen when we decomposed the T1 D Shapley feature importances into two clusters, one that is 71 % composed of pediatric patients, and a second that is half pediatric and half adult (fig. S13, C and D).
  • the clusters have distinct V gene prioritizations, indicating that different sequence signals identified the patients as positive for T1 D.
  • the lupus TRBV gene contributions can also be divided into two clusters, one of which is predominantly (88%) adult, while the other consists entirely of pediatric patients (fig. S12, C and D).
  • Model 3 ALIROCs averaged 0.60 +/- 0.07, ranging up to 0.78, meaning known binders with certain IGHV usage were often ranked higher than healthy sequences by Covid-19 predicted probability (Fig. 4, E and G), despite no knowledge of these binding relationships during training. Since known binders have 2.7% prevalence when paired with healthy donor sequences, we also evaluated AUPRC, an alternative score for distinguishing the sequence classes within each IGHV gene that may be better suited to class-imbalanced settings (98). ALIPRCs averaged 1.88-fold +/- 1.03 change higher than baseline prevalence, ranging up to 6.9-fold over baseline (Fig. 4F). IGHV1-24, IGHV2-70, and IGHV3-7 had high ALIROCs and normalized ALIPRCs, are reported in the Covid-19 literature, and also had high SHAP feature importance scores as described above (Fig. 3).
  • Model 2 missing most known binders reflects its focus on finding shared public clones, whereas binding sequences may be private to an individual. Therefore, we evaluated Model 2’s ability to identify potential Covid-19 binders, not how well it rules them out. We called positives if query sequences matched any Covid-19 cluster’s clonal lineage parameters, evaluating each IGHV gene individually (consistent with Model 2’s division of sequences before clustering).
  • Model 3 can score sequences that Model 2 cannot evaluate. For the 79% of known binders whose clonal lineage parameters do not match public clone clusters, Model 3 AUROCs averaged 0.59 +/- 0.06 across IGHV genes, ranging up to 0.75 (Fig. 4H), with ALIPRCs averaging 1.63-fold +/- 0.46 change over baseline prevalence, with maximum 2.80-fold change. In this manner, Models 2 and 3 are complementary: Model 2 can confidently identify a subset of sequences as very similar to convergent clusters found in Covid-19 training set patients, and Model 3 can evaluate the remaining sequences.
  • Models 2 and 3 learned sequence patterns truly associated with disease we also compared scores between influenza known binders and healthy donor sequences (53).
  • influenza predictions generated by Models 2 and 3 after they were trained on samples from seasonal flu vaccine recipients The influenza known binder dataset was smaller (0.62% prevalence when combined with healthy sequences).
  • Model 3 AUROCs were 0.55 +/- 0.07 across IGHV genes, ranging up to 0.65 (fig. S17, E and G).
  • Normalized AUPRCs were 1 .95-fold +/- 1.00 change over baseline prevalence, up to a maximum of 4.00-fold change (fig. S17F).
  • Model 2 when calling positives based on having shared clonal lineage parameters with any influenza cluster, achieved low precision (maximum of 2.6%) and moderate recall (30.8% on average, ranging up to 51.5%; fig. S17, A and B). Comparing the models, recall was not consistently higher for Model 2 or Model 3 when evaluated at equivalent precision (fig. S17I).

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Biophysics (AREA)
  • Chemical & Material Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • Immunology (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • General Physics & Mathematics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Organic Chemistry (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Biochemistry (AREA)
  • Biotechnology (AREA)
  • Zoology (AREA)
  • Gastroenterology & Hepatology (AREA)
  • Medicinal Chemistry (AREA)
  • Genetics & Genomics (AREA)
  • Toxicology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Cell Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Analytical Chemistry (AREA)
  • Evolutionary Biology (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Peptides Or Proteins (AREA)

Abstract

Des systèmes et des procédés pour évaluer des séquences de récepteurs immunitaires peuvent incorporer un modèle de langage pour produire des représentations et une classification intégrées d'état immunitaire. Des systèmes et un procédé de classification peuvent comprendre l'évaluation de représentations de séquences de récepteurs immunitaires intégrées qui tiennent compte de la sélection de l'isotype ou du segment de gène V(D)J. Des systèmes et des procédés pour évaluer l'état d'immunité peuvent incorporer un ou plusieurs classificateurs pour prédire l'état immunitaire.
PCT/US2025/015875 2024-02-13 2025-02-13 Systèmes et procédés d'évaluation de réponse immunitaire et leurs applications Pending WO2025175065A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202463553078P 2024-02-13 2024-02-13
US63/553,078 2024-02-13

Publications (1)

Publication Number Publication Date
WO2025175065A1 true WO2025175065A1 (fr) 2025-08-21

Family

ID=96773499

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2025/015875 Pending WO2025175065A1 (fr) 2024-02-13 2025-02-13 Systèmes et procédés d'évaluation de réponse immunitaire et leurs applications

Country Status (1)

Country Link
WO (1) WO2025175065A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN120823883A (zh) * 2025-09-15 2025-10-21 北京溯本源和生物科技有限公司 一种用于生物系统状态建模的序列数据分析方法、装置和存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050064421A1 (en) * 2001-11-23 2005-03-24 Bayer Healthcare Ag Profiling of the immune gene repertoire
US20190390273A1 (en) * 2016-04-15 2019-12-26 University Health Network Hybrid-capture sequencing for determining immune cell clonality
US20210265008A1 (en) * 2018-05-10 2021-08-26 Iogenetics, Llc Immune repertoire patterns
US20210381050A1 (en) * 2015-02-24 2021-12-09 Adaptive Biotechnologies Corporation Methods for diagnosing infectious disease and determining hla status using immune repertoire sequencing
US20230121729A1 (en) * 2020-04-21 2023-04-20 Tempus Labs, Inc. TCR/BCR Profiling
WO2023086999A1 (fr) * 2021-11-11 2023-05-19 The Board Of Trustees Of The Leland Stanford Junior University Systèmes et procédés d'évaluation de séquences peptidiques immunologiques

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050064421A1 (en) * 2001-11-23 2005-03-24 Bayer Healthcare Ag Profiling of the immune gene repertoire
US20210381050A1 (en) * 2015-02-24 2021-12-09 Adaptive Biotechnologies Corporation Methods for diagnosing infectious disease and determining hla status using immune repertoire sequencing
US20190390273A1 (en) * 2016-04-15 2019-12-26 University Health Network Hybrid-capture sequencing for determining immune cell clonality
US20210265008A1 (en) * 2018-05-10 2021-08-26 Iogenetics, Llc Immune repertoire patterns
US20230121729A1 (en) * 2020-04-21 2023-04-20 Tempus Labs, Inc. TCR/BCR Profiling
WO2023086999A1 (fr) * 2021-11-11 2023-05-19 The Board Of Trustees Of The Leland Stanford Junior University Systèmes et procédés d'évaluation de séquences peptidiques immunologiques

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN120823883A (zh) * 2025-09-15 2025-10-21 北京溯本源和生物科技有限公司 一种用于生物系统状态建模的序列数据分析方法、装置和存储介质

Similar Documents

Publication Publication Date Title
Roskin et al. Aberrant B cell repertoire selection associated with HIV neutralizing antibody breadth
US20200357487A1 (en) Computer-implemented method and system for determining a disease status of a subject from immune-receptor sequencing data
US20250329410A1 (en) Systems and Methods for Evaluating Immunological Peptide Sequences
AU2019380342A1 (en) Machine learning disease prediction and treatment prioritization
Mhanna et al. Adaptive immune receptor repertoire analysis
Zaslavsky et al. Disease diagnostics using machine learning of B cell and T cell receptor sequences
Zaslavsky et al. Disease diagnostics using machine learning of immune receptors
CN104271759B (zh) 作为疾病信号的同种型谱的检测
Katayama et al. Machine learning approaches to TCR repertoire analysis
WO2025175065A1 (fr) Systèmes et procédés d'évaluation de réponse immunitaire et leurs applications
Smith et al. Identification of antigen-specific TCR sequences based on biological and statistical enrichment in unselected individuals
Chen et al. A deep learning model for accurate diagnosis of infection using antibody repertoires
US20220364170A1 (en) Biomarker for myalgic encephalomyelitis/chronic fatigue syndrome (me/cfs)
Canderan et al. Distinct type 1 immune networks underlie the severity of restrictive lung disease after COVID-19
Yohannes et al. Clustering based approach for population level identification of condition-associated T-cell receptor β-chain CDR3 sequences
US20250174366A1 (en) Methods and Compositions for Assessing and Treating Lupus
CN114512244A (zh) 一种基于深度学习的感染类疾病无创诊断方法
WO2020009822A1 (fr) Procédé d'apprentissage automatique pour trouver des motifs dans des ensembles de séquences biologiques sur la base de propriétés biophysiques
Ghraichy et al. Maturation of the human B-cell receptor repertoire with age
CN117981011A (zh) 用于个体化疗法的方法和系统
KR20240044417A (ko) 개인맞춤형 요법을 위한 방법 및 시스템
Ghraichy et al. Maturation of naïve and antigen-experienced B-cell receptor repertoires with age
WO2022205775A1 (fr) Procédé et dispositif pour déterminer l'indice d'immunité d'un individu, dispositif électronique et support de stockage lisible par machine
Nagafuchi et al. T cell plasticity in systemic lupus erythematosus revealed by large-scale T cell receptor repertoire and transcriptome studies
Xu et al. Generalizable features for the diagnosis of infectious disease, autoimmunity and cancer from adaptive immune receptor repertoires

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 25755626

Country of ref document: EP

Kind code of ref document: A1