WO2025049921A1 - Methods for single molecule resolution of near full-length immunoglobulin heavy and light chain repertoires - Google Patents
Methods for single molecule resolution of near full-length immunoglobulin heavy and light chain repertoires Download PDFInfo
- Publication number
- WO2025049921A1 WO2025049921A1 PCT/US2024/044692 US2024044692W WO2025049921A1 WO 2025049921 A1 WO2025049921 A1 WO 2025049921A1 US 2024044692 W US2024044692 W US 2024044692W WO 2025049921 A1 WO2025049921 A1 WO 2025049921A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- seq
- disease
- flairr
- rna
- subisotype
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6869—Methods for sequencing
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
- C12Q1/6881—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for tissue or cell typing, e.g. human leukocyte antigen [HLA] probes
Definitions
- the present disclosure relates to long-read sequencing methods to generate highly accurate immunoglobulin heavy and light chain transcripts.
- sr-AIRR-seq Current short-read Adaptive Immune Receptor Repertoire sequencing (sr-AIRR-seq) methods aim to characterize the expressed antibody (Ab) repertoire, through the sequencing of immunoglobulin-encoding transcripts; available technologies/methods meet this objective to differing extents.
- Profiling of the variable region even in part, defines variable (V), diversity (D) and joining (J) gene usage while also providing characterization of complementarity determining regions (CDR) 1, 2, and 3, which are hypervariable and directly interact with target antigen.
- CDR3-targeted profiling approaches allow for V, D, and J gene assignments but do not provide complete resolution of the entire antibody transcript.
- FLAIRR-Seq near full-length long read AIRR-seq
- RACE rapid amplification of cDNA ends
- SMRT single molecule, real-time sequencing
- RNA ribonucleic acid
- Ig immunoglobulin
- Ab an antibody
- the method comprising targeting and amplifying the RNA using a 5 ’-Rapid Amplification of cDNA Ends (RACE) method, wherein the RACE method converts the RNA into a complementary deoxyribonucleic acid (cDNA), subsequently targeting an Ig heavy chain cDNA transcript and an Ig light chain cDNA transcript through a primer-directed specific polymerase chain reaction (PCR), wherein the Ig heavy chain cDNA transcripts and the Ig light chain cDNA transcripts are targeted for amplification, integrating the Ig heavy chain cDNA transcripts and the Ig light chain cDNA transcripts into an array of eight transcripts to generate a nucleic acid comprising at least 5600 base pairs (bps), analyzing the cDNA transcripts using a single molecule, real-time sequencing (SMRT) method, and detecting at least 700 base pairs
- RACE 5 ’-Rapid Amplification of cDNA End
- the method detects about 700 bps to about 900 bps of the variable region and the constant region, or fragments thereof, of the Ig light chain cDNA transcript. In some embodiments, the method detects about 1500 bps to about 2100 bps of the variable region and the constant region, or fragments thereof, of the Ig heavy chain cDNA transcript.
- the method identifies a gene or an allele encoding one or more segments of the variable region comprising a variability (V) peptide, a diversity (D) peptide, a joining (J) peptide, or combinations thereof.
- V variability
- D diversity
- J joining
- the method identifies a gene or an allele encoding one or more segments of the constant region of a heavy chain (CH) or light chain (CL) comprising CHI, CH2, CH3, CH4, CL1 or combinations thereof.
- CH heavy chain
- CL light chain
- the method identifies a gene or allele encoding an Ig isotype or an Ig subisotype. In some embodiments, the method identifies an expansion of the Ig isotype or Ig subisotype.
- the method identifies a class switch recombination (CSR) event of the Ig subisotype.
- the Ig isotype comprises an IgM, IgD, IgG, IgA or an IgE isotype.
- the IgG comprises an IgGl, IgG2, IgG3, or IgG4 subisotype.
- the IgA comprises an IgAl or IgA2 subisotype.
- the RNA is isolated from peripheral blood mononuclear cells (PBMCs), purified B cells, solid tumors, healthy tissue, or whole blood.
- PBMCs peripheral blood mononuclear cells
- the method develops an Ig profile of a subject.
- RNA sample from the subject
- sequencing the RNA using the method of any preceding aspect, developing an Ig profile from the RNA, and administering a therapeutic agent to the subject, wherein the Ig profile indicates the disease.
- the disease comprises an infectious disease, an autoimmune disease, a neoproliferative disease, a neurodegenerative disease, a respiratory disease, a congenital disease, a gastrointestinal (GI) disease, a metabolic disease, or a cardiovascular disease.
- infectious disease an infectious disease, an autoimmune disease, a neoproliferative disease, a neurodegenerative disease, a respiratory disease, a congenital disease, a gastrointestinal (GI) disease, a metabolic disease, or a cardiovascular disease.
- GI gastrointestinal
- FIGS. 1A, IB, 1C, and ID show an overview of Ab structure, comparative performance of alternate methods, and the FLAIRR-seq molecular method.
- Figure 1A shows the schematic representation of the IGH locus, heavy chain transcript structure, and functional immunoglobulin protein.
- Figure IB shows the comparative schematic showing coverage across the heavy chain transcript of commonly used sr-AIRR-seq and IsoSeq methods compared with FEAIRR-seq.
- Figure 1C shows the FEAIRR-seq molecular pipeline overview: RNA (green) was converted to first-strand cDNA (red) using the 5’ RACE method, incorporating a 5’ TSO- UMI (pink) via template switch.
- Second strand amplification specifically targeted immunoglobulin molecules through priming of the 5’ TSO-UMI and the 3’ constant region IGH exon 3 (CH3) for IgG/A/E, or CH4 for IgM.
- a 16 bps barcode was incorporated into the 3’ CH3/CH4 primers to enable sample multiplexing post-amplification.
- Figure ID shows the integrated Genomics Viewer alignment showing near full-length single molecule structure of IGHG4 (IgG4)- specific FEAIRR-seq resolved transcripts.
- FIGS. 2A, 2B, 2C, 2D, and 2E show robust characterization of V, D, and J genes and additional variable region features using FEAIRR-seq compared to sr-AIRR-seq.
- Figure 2A shows the spearman ranked correlations and p-values of V, D, and J gene usage frequencies identified by FLAIRR-seq performed on matched total PBMC and purified B cells.
- Figure 2B shows the heatmap of Spearman ranked correlations and p-values of V, D and J gene usage frequencies between FLAIRR-seq and sr-AIRR-seq processed samples.
- Figure 2C shows the violin plots of somatic hypermutation frequencies defined by FLAIRR-seq or sr-AIRR-seq in IgM- and IgG-specific repertoires.
- Figure 2D shows the CDR3 length defined by FLAIRR-seq or sr-AIRR-seq analysis in both IgM- and IgG-specific repertoires. Significant differences of CDR3 length and SHM were measured by unpaired T-test with Bonferroni correction between FLAIRR-seq and sr-AIRR-seq data indicated by * (p ⁇ 0.05) or ** (p ⁇ 0.01).
- Figure 2E shows the Spearman ranked correlation of V, D, and J gene usage frequencies between FLAIRR-seq- based and Iso-Seq-based repertoire profiling.
- FIGS. 3A, 3B, 3C, 3D, and 3E show that FLAIRR-seq provides novel IGHC resolution for allelic discovery and allows variable gene haplotyping.
- Figure 3A shows the overview of experimental design and analysis pipelines of genotyping by IGenotyper (gDNA) and FLAIRR-seq (RNA) methods.
- Figure 3B shows the schematic depicting IGHC alleles identified by IGenotyper, partitioned by identification as (i) exact matches to documented IMGT alleles, (ii) novel alleles that are not in IM GT, or (iii) extended alleles.
- Figure 3C shows the pie chart and stacked bar graph representing the total number of alleles and fraction of each category identified per IGHC gene as identified by either IGenotyper or FLAIRR-seq. Bar charts showing number of IGHC alleles from FLAIRR-seq that were resolved, ambiguous or unresolved when compared to IGenotyper alleles. * Indicates additional allele found due to IGHG4 duplication.
- Figure 3D shows the table visualizing novel and extended alleles resolved by IGenotyper data. Extended alleles are denoted by *(allele number)-FL and novel alleles are denoted by FL_(number); alleles resolved by FLAIRR-seq are marked with a dot (•).
- FIGS. 3E shows usage of the haplotyping tool RAbHIT, FLAIRR-seq data to enable inference of germline haplotypes.
- Venn diagrams show number of IGHV haplotype allele/deletion calls when using IGHJ6 (standard input) or IGHM (novel input, enabled by FLAIRR-seq method) anchors for each haplotype.
- DEL deletion
- NDA non-reliable allele annotation
- Unk unknown allele
- FIGS. 4A, 4B, and 4C show that FLAIRR-seq resolves subisotype specific repertoire diversity.
- Figure 4A shows the bar plots showing highly resolved distribution of unique VDJ sequences across isotypes, subisotypes, and subisotype alleles in one representative sample “1013” characterized by FLAIRR-seq; standard sr-AIRR-seq methods resolve sequences only to the isotype level (IGHG, red bar).
- Figure 4B shows the circos plots showing V family gene usage frequency within IgG- and IgM-isotypes and subisotype- specific alleles for “1013”.
- Figure 4C shows the left, boxplots of V gene family usage frequencies within IGHG1, IGHG2, IGHG3, and IGHG4 repertoires across all ten individuals.
- Top right principal component analysis of V gene family usage by subisotype; plot includes the first two principal components, and individual repertoires are colored by IGHG subisotype.
- Bottom right boxplots showing sequence frequency of IGHV1 and IGHV3 family genes by subisotype across all 10 samples.
- FIGS. 5A, 5B, 5C, 5D, 5E, 5F, and 5G show that FLAIRR-seq resolves subisotype- specific clonal expansion and facilitates haplotype analysis of CSR in a patient hospitalized for CO VID-19.
- Figure 5A shows the overview of experimental design: whole blood-derived RNA was collected on days 1, 4, 8 and 13 post-hospitalization and used for FLAIRR-seq profiling.
- Figure 5B shows the bar plot showing the percentage of clones represented by each subisotype across timepoints.
- Figure 5D shows the polarity, or the fraction of clones needed to comprise 80% of the repertoire, reported as fraction of total subisotype-specific repertoires across time.
- Figure 5E shows the distribution of a single clone “9900” across isotypes and subisotypes over time, suggesting ongoing CSR of this clone, as identified by variable region usage across multiple isotype and subisotype- specific constant regions.
- Figure 5F shows the phylogenetic tree constructed from sequences associated with the clone 9900 lineage, with the inferred germline sequence as the outgroup (star). Shapes and colors of tips (sequences) indicate time point and isotype/subisotype.
- the scale bar represents the number of mutations between each node in the tree.
- Figure 5G shows the tile plot showing the assignment of IGHC alleles to their respective haplotypes, based on the frequency of observations in which each IGHC allele was linked to each respective allele of heterozygous V genes, IGHV3-7 and IGHV3-48; light gray denotes IGHC alleles for which haplotype assignment was not possible.
- FIG. 6 shows the agreement in calling IGHV gene usage in sample “1013” when analyzed with either the Immcantation or MiXCR analysis pipelines, demonstrating applicability of FLAIRR-seq data within multiple available analysis tools.
- FIG. 7 shows ongoing optimization and modification to the FLAIRR-seq molecular method to enhance overall yield and accuracy given the advancing throughput of sequencing technology.
- FIGS. 8A, 8B, 8C, 8D, 8E, 8F, and 8G show successful targeted enrichment of all near full-length seven antibody or immunoglobulin heavy and light transcript chains, including IgM (Figure 8A), IgG ( Figure 8B), IgD ( Figure 8C), IgA (Figure 8D), IgE ( Figure 8E), IGL ( Figure 8F), and IGK ( Figure 8G).
- IgM Figure 8A
- IgG Figure 8B
- IgD Figure 8C
- IgA Figure 8D
- IgE Figure 8E
- IGL Figure 8F
- IGK Figure 8G
- Figure 9A shows that as a function of the total, more novel alleles were discovered in IgG4 than any other subisotype.
- Figure 9B shows that mapping of the novel allele sequences identified single nucleotide variation (represented by colored, vertical lines where each color used represents a different nucleotide base, A, C, T or G) and structural variation (represented by present or absent grey boxes, which represent coding exons).
- FLAIRR-seq enables allele- specific identification of both these variant types, including deletions of whole exons of the hinge regions (Hl, H2, H3 or H4), which are known to contribute to antibody effector functions, such
- FIGS. 10A and 10B show the additional examples of Ig or Ab structural variation only able to be seen with FLAIRR-seq:
- Figure 10A demonstrates duplication in the IgG4 antibody subisotype gene, which results in the expression of three distinct antibody alleles in the circulating repertoire. Single nucleotide variants are represented by colored, vertical lines and are used to distinguish one transcript allele from the other two in the gene alignment. Three distinct alleles can be visualized when mapping the data with the Integrated Genome Viewer (IGV).
- Figure 10B shows an alignment of both subisotypes of IgA, IgAl and IgA2.
- IgAl is known to play a regulatory role in maintaining immune balance (homeostasis) and contains an extended hinge, compared to IgA2, in which the extended hinge region is not present.
- FIGS. 11A and 1 IB show a map of the IGLC-J gene locus on human chromosome 22.
- Figure 11 A shows the highlights of a tandem array of duplicated IGLC-J gene cassettes, IGLC- J 1 through IGLC-J7.
- Initial genomic characterization of the IGLC-J locus identified only seven IGLC and IGLJ genes; however, through our recent genomic studies, we have identified 3 additional copies of the IGLC-J3 cassette, totaling 4 copies (green boxes).
- Figure 11B (left panel) shows the bar plot showing the frequency of IGLC genes in the IGL repertoire of a single healthy donor, determined using FLAIRR-seq.
- IgG3 in particular, is known to mediate acetylcholine- specific MG.
- Figure 12B schematizes the varied IgG3, hinge- specific structural variations identified in the total cohort, showing the various deletion patterns observed including the 4 hinge-containing major (“M”) allele, the 3 hinge-containing “small” (“S”) allele, and the novel one hinge-containing “E” allele.
- Ranges can be expressed herein as from “about” one particular value, and/or to “about” another particular value. When such a range is expressed, another embodiment includes from the one particular value and/or to the other particular value. Similarly, when values are expressed as approximations, by use of the antecedent “about,” it will be understood that the particular value forms another embodiment. It will be further understood that the endpoints of each of the ranges are significant both in relation to the other endpoint, and independently of the other endpoint. It is also understood that there are a number of values disclosed herein, and that each value is also herein disclosed as “about” that particular value in addition to the value itself. For example, if the value “10” is disclosed, then “about 10” is also disclosed.
- the terms “may,” “optionally,” and “may optionally” are used interchangeably and are meant to include cases in which the condition occurs as well as cases in which the condition does not occur.
- the statement that a formulation "may include an excipient” is meant to include cases in which the formulation includes an excipient as well as cases in which the formulation does not include an excipient.
- An “increase” can refer to any change that results in a greater amount of a symptom, disease, composition, condition, or activity.
- An increase can be any individual, median, or average increase in a condition, symptom, activity, composition in a statistically significant amount.
- the increase can be a 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, or 100% increase so long as the increase is statistically significant.
- a “decrease” can refer to any change that results in a smaller amount of a symptom, disease, composition, condition, or activity.
- a substance is also understood to decrease the genetic output of a gene when the genetic output of the gene product with the substance is less relative to the output of the gene product without the substance.
- a decrease can be a change in the symptoms of a disorder such that the symptoms are less than previously observed.
- a decrease can be any individual, median, or average decrease in a condition, symptom, activity, composition in a statistically significant amount.
- the decrease can be a 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, or 100% decrease so long as the decrease is statistically significant.
- prevent or other forms of the word, such as “preventing” or “prevention,” is meant to stop a particular event or characteristic, to stabilize or delay the development or progression of a particular event or characteristic, or to minimize the chances that a particular event or characteristic will occur. Prevent does not require comparison to a control as it is typically more absolute than, for example, reduce. As used herein, something could be reduced but not prevented, but something that is reduced could also be prevented. Likewise, something could be prevented but not reduced, but something that is prevented could also be reduced. It is understood that where reduce or prevent are used, unless specifically indicated otherwise, the use of the other word is also expressly disclosed.
- the term “subject” refers to any individual who is the target of administration or treatment.
- the subject can be a vertebrate, for example, a mammal.
- the subject can be human, non-human primate, bovine, equine, porcine, canine, or feline.
- the subject can also be a guinea pig, rat, hamster, rabbit, mouse, or mole.
- the subject can be a human or veterinary patient.
- patient refers to a subject under the treatment of a clinician, e.g., physician.
- treatment refers to the medical management of a patient with the intent to cure, ameliorate, stabilize, or prevent a disease, pathological condition, or disorder.
- This term includes active treatment, that is, treatment directed specifically toward the improvement of a disease, pathological condition, or disorder, and also includes causal treatment, that is, treatment directed toward removal of the cause of the associated disease, pathological condition, or disorder.
- this term includes palliative treatment, that is, treatment designed for the relief of symptoms rather than the curing of the disease, pathological condition, or disorder; preventative treatment, that is, treatment directed to minimizing or partially or completely inhibiting the development of the associated disease, pathological condition, or disorder; and supportive treatment, that is, treatment employed to supplement another specific therapy directed toward the improvement of the associated disease, pathological condition, or disorder.
- diagnosis refers to the act of process of identifying the nature of an illness, disease, disorder, or condition in a subject by examination or monitoring of symptoms.
- whole blood refers to the composition of blood as it flows through the circulatory system, to include the red blood cells, white blood cells, and platelets suspended in plasma.
- administer refers to delivering a composition, substance, inhibitor, or medication to a subject or object by one or more the following routes: oral, topical, intravenous, subcutaneous, transcutaneous, transdermal, intramuscular, intra-joint, parenteral, intra-arteriole, intradermal, intraventricular, intracranial, intraperitoneal, intralesional, intranasal, rectal, vaginal, by inhalation or via an implanted reservoir.
- parenteral includes subcutaneous, intravenous, intramuscular, intraarticular, intra-synovial, intrastemal, intrathecal, intrahepatic, intralesional, and intracranial injections or infusion techniques.
- Antibodies (Abs) and immunoglobulins (Igs) are glycoproteins having the same structural characteristics.
- the term "antibody” is used in the broadest sense, and specifically covers monoclonal antibodies (including full length monoclonal antibodies), polyclonal antibodies, and multispecific antibodies (e.g., bispecific antibodies). While antibodies exhibit binding specificity to a specific target, immunoglobulins include both antibodies and other antibody-like molecules which lack target specificity.
- Native antibodies and immunoglobulins are usually heterotetrameric glycoproteins of about 150,000 daltons, composed of two identical light (L) chains and two identical heavy (H) chains. Each heavy chain has at one end a variable domain (VH) followed by a number of constant domains. Each light chain has a variable domain at one end (VL) and a constant domain at its other end.
- immunoglobulin fragment refers to a portion of a full-length antibody or immunoglobulin, generally the target binding or variable region.
- antibody or immunoglobulin fragments include Fab, Fab', F(ab')2 and Fv fragments.
- the phrase "functional fragment or analog" of an antibody or immunoglobulin is a compound having qualitative biological activity in common with a full-length antibody or immunoglobulin.
- a functional fragment or analog of an anti-IgE is one which can bind to an IgE immunoglobulin in such a manner so as to prevent or substantially reduce the ability of such molecule from having the ability to bind to the high affinity receptor, FcsRI.
- Fv functional fragment
- F(ab) and F(ab')2 fragments refers to Fv, F(ab) and F(ab')2 fragments.
- An "Fv” fragment is the minimum antibody fragment which contains a complete target recognition and binding site. This region consists of a dimer of one heavy and one light chain variable domain in a tight, non-covalent association (VH-VL dimer). It is in this configuration that the three CDRs of each variable domain interact to define a target binding site on the surface of the VH-VL dimer. Collectively, the six CDRs confer target binding specificity to the antibody.
- variable domain or half of an Fv comprising only three CDRs specific for a target
- sFv single-chain Fv
- Single-chain Fv or sFv antibody fragments comprise the VH and VL domains of an antibody, wherein these domains are present in a single polypeptide chain.
- the Fv polypeptide further comprises a polypeptide linker between the VH and VL domains which enables the sFv to form the desired structure for target binding.
- the term “monoclonal antibody” as used herein refers to an antibody obtained from a substantially homogeneous population of antibodies, i.e., the individual antibodies within the population are identical except for possible naturally occurring mutations that may be present in a small subset of the antibody molecules.
- immunotherapy and “immunotherapeutic” refers to the treatment of disease by activating or suppressing the immune system.
- the most effective immunotherapies are cell-based immunotherapies that utilize lymphocytes, macrophages, dendritic cells, natural killer cells, cytotoxic T lymphocytes, etc. to defend the body against cancer by targeting abnormal antigens expressed on the surface of tumor cells.
- treat include partially or completely delaying, alleviating, mitigating, or reducing the intensity of one or more attendant symptoms of a disorder or condition and/or alleviating, mitigating, or impeding one or more causes of a disorder or condition.
- Treatments according to the disclosure may be applied preventively, prophylactically, palliatively, or remedially . Treatments are administered to a subject prior to onset (e.g., before obvious signs of disease), during early onset (e.g., upon initial signs and symptoms of disease), or after an established development of disease.
- nucleic acid is a chemical compound that serves as the primary informationcarrying molecules in cells and make up the cellular genetic material. Nucleic acids comprise nucleotides, which are the monomers made of a 5-carbon sugar (usually ribose or deoxyribose), a phosphate group, and a nitrogenous base. A nucleic acid can also be a deoxyribonucleic acid (DNA) or a ribonucleic acid (RNA). It should be noted that the preferred nucleic acid used herein comprises an RNA.
- a “nucleotide” is a compound consisting of a nucleoside, which consists of a nitrogenous base and a 5-carbon sugar, linked to a phosphate group forming the basic structural unit of nucleic acids, such as DNA or RNA.
- the four types of nucleotides are adenine (A), cytosine (C), guanine (G), and thymine (T), each of which are bound together by a phosphodiester bond to form a nucleic acid molecule.
- FLAIRR-Seq near full-length long read AIRR-seq
- RACE rapid amplification of cDNA ends
- SMRT single molecule, real-time sequencing
- RNA ribonucleic acid
- Ig immunoglobulin
- Ab an antibody
- the method comprising targeting and amplifying the RNA using a 5 ’-Rapid Amplification of cDNA Ends (RACE) method, wherein the RACE method converts the RNA into a complementary deoxyribonucleic acid (cDNA), subsequently targeting an Ig heavy chain cDNA transcript and an Ig light chain cDNA transcript through a primer-directed specific polymerase chain reaction (PCR), wherein the Ig heavy chain cDNA transcripts and the Ig light chain cDNA transcripts are targeted for selective amplification, integrating the Ig heavy chain cDNA transcripts and the Ig light chain cDNA transcripts into an array of eight (8) transcripts to generate a nucleic acid comprising at least 5600 base pairs (bps), analyzing the Ig heavy chain cDNA transcripts and the Ig light chain cDNA transcripts using a single molecule,
- RACE 5 ’-Rapid Amplification of cDNA End
- RNA ribonucleic acid
- Ig immunoglobulin
- the method comprising targeting and amplifying the RNA using a 5 ’-Rapid Amplification of cDNA Ends (RACE) method, wherein the RACE method converts the RNA into a complementary deoxyribonucleic acid (cDNA), subsequently targeting an Ig heavy chain cDNA transcript and an Ig light chain cDNA transcript through a primer-directed specific polymerase chain reaction (PCR), wherein the cDNA transcripts are targeted for amplification, analyzing the cDNA transcripts using a single molecule, real-time sequencing (SMRT) method, and detecting at least 700 base pairs (bps) within a variable region and a constant region of the Ig.
- RACE 5 ’-Rapid Amplification of cDNA Ends
- RNA ribonucleic acid
- Ig immunoglobulin
- the method comprising targeting and amplifying the RNA using a 5 ’-Rapid Amplification of cDNA Ends (RACE) method, wherein the RACE method converts the RNA into a complementary deoxyribonucleic acid (cDNA), subsequently targeting an Ig heavy chain cDNA transcript and/or an Ig light chain cDNA transcript through a primer-directed specific polymerase chain reaction (PCR), wherein the cDNA transcripts are targeted for selective amplification, cDNA transcripts are concatenated and then analyzed using a single molecule, real-time sequencing (SMRT) method.
- RACE 5 ’-Rapid Amplification of cDNA Ends
- the method detects about 700 bps to about 900 bps of the variable region and the constant region, or fragments thereof, of the Ig light chain cDNA transcript. In some embodiments, the method detects about 1500 bps to about 2100 bps of the variable region and the constant region, or fragments thereof, of the Ig heavy chain cDNA transcript.
- the method comprises identifying at least 1500, 1600, 1700, 1800, 1900, 2000, 2100, or more base pairs (bps) within the variable region and a constant region of the heavy chain transcripts, and 700, 800, 900, or more bps within the variable region and constant region of light chain transcripts.
- the method detects a portion or nearly complete constant region of the heavy chain transcript. In some embodiments, the method detects a portion or nearly complete constant regions of the light chain transcript. It should be understood that the terms “a portion”, “nearly complete”, or “near full-length” refer to detecting partial or fragments of an Ig. The method can detect a non-limiting percentage ranging from about 50%, 60%, 70%, 75%, 80%, 85%, 90%, 95%, or 99% of an Ig. It should also be understood that the terms “a portion”, “nearly complete”, or “near full length” can be used interchangeably throughout the disclosure.
- the method comprises identifying at least 1500 bps of an IgD, IgG, IgA, or IgE isotypes. In some embodiments, the method comprises identifying at least 2000 bps of an IgM isotype. In some embodiments, the method comprises identifying at least 700 bps of an IgL or IgK light chain isotype.
- the method identifies a gene or an allele encoding one or more segments of the variable region comprising a variability (V) peptide, a diversity (D) peptide, a joining (J) peptide, or combinations thereof.
- the method can be used to directly identify the V, D, and J genes and alleles encoding the variable regions of heavy and light chain transcripts.
- the method identifies a gene or an allele encoding one or more segments of the constant region of a heavy chain (CH) or light chain (CL/CK) comprising CHI, CH2, CH3, CH4, CL1, CK1, or combinations thereof.
- the method can be used to directly identify the constant genes and alleles encoding the constant regions of heavy and light chain transcripts.
- the method identifies a gene or allele encoding an Ig isotype or an Ig subisotype.
- the method identifies an expansion of the Ig isotype or Ig subisotype compared to current knowledge.
- the method allows for resolution of the isotype and subisotype of the expressed immunoglobulin heavy and light chain transcripts, including detailing relative frequency of specific immunoglobulin isotypes or immunoglobulin subisotypes in a total antibody repertoire.
- the method identifies a class switch recombination (CSR) event of the Ig subisotype.
- the Ig isotype comprises an IgD, IgG, IgA, IgE, or an IgM isotype.
- the IgG comprises an IgGl, IgG2, IgG3, or IgG4 subisotype.
- the method identifies a class switch recombination (CSR) event within an immunoglobulin isotype or subisotype.
- IgM isotype constant region genes to constant region genes associated with the IgD, IgG, IgA, or IgE isotypes, including their subisotypes (e.g., IgGl, IgG2, IgG3, IgG4, IgAl, or IgA2 subisotypes).
- the Ig isotype comprises an IgM, IgD, IgA, IgG or an IgE isotype.
- the IgA comprises an IgAl or IgA2 subisotype
- the IgG comprises an IgGl, IgG2, IgG3, or IgG4 subisotype.
- this method can identify select genomic variation, including but not limited to single nucleotide variants and structural variations, associated with known impact on antibody function.
- the RNA is isolated from peripheral blood mononuclear cells (PBMCs), purified B cells, solid tumors, healthy tissue, or whole blood.
- PBMCs peripheral blood mononuclear cells
- the method develops an Ig profile of a subject. In some embodiments, the method characterizes the expressed immunoglobulin transcript profile of a subject.
- RNA sample from the subject
- sequencing the RNA using the method of any preceding aspect, developing an Ig profile and/or antibody repertoire profile from the RNA, and administering a therapeutic agent to the subject, wherein the Ig profile and/or antibody repertoire profile indicates the disease.
- RNA sample from the subject
- sequencing the RNA using the method of any preceding aspect, developing an Ig profile and/or antibody repertoire profile from the RNA, and using said profile to predict the responsiveness to vaccination, therapeutic treatment, and/or natural resolution(s) of disease.
- RNA samples can be isolated longitudinally over one or more days prior to sequencing and developing the Ig profile. In some embodiments, RNA samples are isolated for 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14 or more days prior to sequencing and developing the Ig profile.
- the disease comprises an infectious disease, an autoimmune disease, a neoproliferative disease, a neurodegenerative disease, a respiratory disease, a congenital disease, a gastrointestinal (GI) disease, a metabolic disease, or a cardiovascular disease.
- infectious disease an infectious disease, an autoimmune disease, a neoproliferative disease, a neurodegenerative disease, a respiratory disease, a congenital disease, a gastrointestinal (GI) disease, a metabolic disease, or a cardiovascular disease.
- GI gastrointestinal
- the cancer includes, but is not limited to acoustic neuroma, adenocarcinoma, adrenal gland cancer, anal cancer, angiosarcoma (e.g., lymphangiosarcoma, lymphangioendotheliosarcoma, hemangiosarcoma), appendix cancer, benign monoclonal gammopathy, biliary cancer (e.g., cholangiocarcinoma), bladder cancer, breast cancer (e.g., adenocarcinoma of the breast, papillary carcinoma of the breast, mammary cancer, medullary carcinoma of the breast), brain cancer (e.g., meningioma; glioma, e.g., astrocytoma, oligodendroglioma; medulloblastoma), bronchus cancer, carcinoid tumor, cervical cancer (e.g., cervical adenocarcinoma), choriocarcinoma, chord
- HCC hepatocellular cancer
- lung cancer e.g., bronchogenic carcinoma, small cell lung cancer (SCLC), non-small cell lung cancer (NSCLC), adenocarcinoma of the lung
- myelofibrosis MF
- chronic idiopathic myelofibrosis chronic myelocytic leukemia (CML), chronic neutrophilic leukemia (CNL), hypereosinophilic syndrome (HES)
- neuroblastoma e.g., neurofibromatosis (NF) type 1 or type 2, schwannomatosis
- neuroendocrine cancer e.g., gastroenteropancreatic neuroendoctrine tumor (GEP-NET), carcinoid tumor
- osteosarcoma ovarian cancer (e.g., cystadenocarcinoma, ovarian embryonal carcinoma, ovarian adenocarcinoma), papillary adenocarcinoma, pancreatic cancer (e.g., pancreatic adenocarcinoma, intraductal papillary mucinous neoplasm (IPMN), Islet cell tumors), penile cancer (e.g., Paget's disease of the
- the autoimmune disease includes, but it is not limited to Myasthenia Gravis, diabetes mellitus Type 1, Multiple Sclerosis, Rheumatic Heart Fever, and Rheumatic Arthritis.
- the neurodegenerative disease includes, but is not limited to Alzheimer’s disease, ataxia, Huntington’s disease, Parkinson’s disease, amyotrophic lateral sclerosis (ALS), Friedreich ataxia, Lewy body disease, spinal muscular atrophy, Alpers’ disease, Batten disease, Cerebro-oculo-facio- skeletal syndrome, Leigh syndrome, Prion diseases, monomelic amyotrophy, multiple system atrophy, striatonigral degeneration, motor neuron disease, multiple sclerosis (MS), Creutzfeldt-Jakob disease, Parkinsonism, spinocerebellar ataxia, dementia, and other related diseases.
- ALS amyotrophic lateral sclerosis
- ALS amyotrophic lateral sclerosis
- Friedreich ataxia Lewy body disease
- spinal muscular atrophy Alpers’ disease
- Batten disease Cerebro-oculo-facio- skeletal syndrome
- Leigh syndrome Prion diseases, monomelic amyotrophy, multiple system atrophy, stria
- the infectious disease includes, but is not limited to common cold, influenza (including, but not limited to human, bovine, avian, porcine, and simian strains of influenza), measles, acquired immune deficiency syndrome/human immunodeficiency virus (AIDS/HIV), anthrax, botulism, cholera, Campylobacter infections, chickenpox, chlamydia infections, cryptosporidosis, dengue fever, diphtheria, hemorrhagic fevers, Escherichia coli (E.
- influenza including, but not limited to human, bovine, avian, porcine, and simian strains of influenza
- measles including, but not limited to human, bovine, avian, porcine, and simian strains of influenza
- AIDS/HIV acquired immune deficiency syndrome/human immunodeficiency virus
- anthrax botulism
- cholera Campylobacter infections
- chickenpox chickenpox
- coli infections, ehrlichiosis, gonorrhea, hand-foot-mouth disease, hepatitis A, hepatitis B, hepatitis C, legionellosis, leprosy, leptospirosis, listeriosis, malaria, meningitis, meningococcal disease, mumps, pertussis, polio, pneumococcal disease, paralytic shellfish poisoning, rabies, rocky mountain spotted fever, rubella, salmonella, shigellosis, small pox, syphilis, tetanus, trichinosis (trichinellosis), tuberculosis (TB), typhoid fever, typhus, west nile virus, yellow fever, yersiniosis, and zika.
- the cardiovascular disease includes, but is not limited to coronary artery disease, high/low blood pressure, cardiac arrest/heart failure, congestive heart failure, congenital heart defects/diseases (including, but not limited to atrial septal defects, atrioventricular septal defects, coarctation of the aorta, double-outlet right ventricle, d- transposition of the great arteries, Ebstein anomaly, hypoplastic left heart syndrome, and interrupted aortic arch), arrhythmia, peripheral artery disease, stroke, cerebrovascular disease, renal artery stenosis, aortic aneurysm, cardiomyopathies, hypertensive heart disease, pulmonary heart disease, cardiac dysrhythmias, endocarditis, inflammatory cardiomegaly, myocarditis, eosinophilic myocarditis, valvular heart diseases, rheumatic heart diseases, and other related cardiovascular diseases.
- coronary artery disease high/low blood pressure
- cardiac arrest/heart failure congestive
- the respiratory disease includes, but is not limited to asthma, chronic obstructive pulmonary disease (COPD), pulmonary fibrosis, pneumonia, bronchitis (chronic or acute bronchitis), emphysema, cystic fibrosis/bronchiectasis, pleural effusion, acute chest syndrome, acute respiratory distress syndrome, asbestosis, aspergilosis, severe acute respiratory syndrome (including, but not limited to SARS-CoV-1 and SARS-CoV-2), respiratory syncytial virus (RSV), middle eastern respiratory syndrome (MERS), mesothelioma, pneumothorax, pulmonary arterial hypertension, pulmonary hypertension, pulmonary embolism, sarcoidosis, sleep apnea, and other respiratory diseases.
- COPD chronic obstructive pulmonary disease
- pulmonary fibrosis pneumonia
- bronchitis chronic or acute bronchitis
- emphysema cystic fibrosis/bronchiect
- the congenital disease includes, but is not limited to albinism, amniotic band syndrome, anencephaly, Angelman syndrome, Barth syndrome, chromosomal abnormalities (including, but not limited to abnormalities to chromosome 9, 10, 16, 18, 20, 21, 22, X chromosome, and Y chromosome), cleft lip/palate, club foot, congenital adrenal hyperplasia, congenital hyperinsulinism, congenital sucrase-isomaltase deficiency (CSID), cystic fibrosis, De Lange syndrome, fetal alcohol syndrome, first arch syndrome, gestational diabetes, Haemophilia, heterochromia, Jacobsen syndrome, Katz syndrome, Klinefelter syndrome, Kabuki syndrome, Kyphosis, Larsen syndrome, Laurence-Moon syndrome, macrocephaly, Marfan syndrome, microcephaly, Nager’s syndrome, neonatal jaundice, neurofibromatosis, Noonan syndrome, Pallister-Killian syndrome, Pierre
- the gastrointestinal disease includes, but is not limited to heartburn, irritable bowel syndrome, lactose intolerance, gallstones, cholecystitis, cholangitis, anal fissure, hemorrhoids, proctitis, colon polyps, infective colitis, ulcerative colitis, ischemic colitis, Crohn’s disease, radiation colitis, celiac disease, diarrhea (chronic or acute), constipation (chronic or acute), diverticulosis, diverticulitis, acid reflux (gastroesophageal reflux (GER) or gastroesophageal reflux disease (GERD)), Hirschsprung disease, abdominal adhesions, achalasia, acute hepatic porphyria (AHP), anal fistulas, bowel incontinence, centrally mediated abdominal pain syndrome (CAPS), clostridioides difficile infection, cyclic vomiting syndrome (CVS), dyspepsia, eosinophilic gastroente
- the metabolic disease includes, but is not limited to diabetes mellitus Type I, diabetes mellitus Type II, familial hypercholesterolemia, Gaucher disease, Hunter syndrome, Krabbe syndrome, metachromatic leukodystrophy, Niemann-Pick syndrome, phenylketonuria (PKU), Tay-Sachs disease, Wilson’s disease, hemachromatosis, mitochondrial disorders or diseases (including, but not limited to Alpers Disease; Barth syndrome; beta.
- cytochrome c oxidase (COX) deficiency LHON Leber Hereditary Optic Neuropathy; MM Mitochondrial Myopathy: LIMM Lethal Infantile Mitochondrial Myopathy; MMC Maternal Myopathy and Cardiomyopathy; NARP Neurogenic muscle weakness, Ataxia, and Retinitis Pigmentosa; Leigh Disease: FICP — Fatal Infantile Cardiomyopathy Plus, a MELAS-associated cardiomyopathy: MELAS Mitochondrial Encephalomyopathy with Lactic Acidosis and Strokelike episodes; LDYT Leber's hereditary optic neuropathy and Dystonia; MERRF Myoclonic Epilepsy and Ra
- RACE sr-AIRR-seq methods capture the full-length VDJ transcript without variable region-targeted multiplexed primer pools, therefore limiting the impact of primer bias and enabling discovery of novel V, D, and J genes and alleles, but do not include resolution of the complete constant region.
- 5’ RACE methods often prime from the first immunoglobulin heavy chain locus constant (IGHC) gene exon (CHI), allowing for determination of heavy chain isotype only.
- the Iso-Seq method utilizes long-read sequencing to capture full-length transcripts expressing a poly (A) tail through oligo dT-based priming.
- This approach can be used to characterize full-length immunoglobulin transcripts; however, it provides lower throughput, sequencing depth, and increased cost compared to all targeted AIRR-seq methods. This is due to the fact that Iso-Seq generates a complete transcriptome per sample without any enrichment of immunoglobulin heavy or light chain transcript sequences, and thus non-immunoglobulin sequences are generated as the majority of the preparation and data. If the data is only to be used for immunoglobulin profiling, these excess data are then filtered out and discarded ( Figure IB).
- FLAIRR-seq was benchmarked by comparing IG heavy chain variable (IGHV), diversity (IGHD), and joining (IGHJ) gene usage, complementarity-determining region 3 (CDR3) length, and somatic hypermutation to matched datasets generated with standard 5’ RACE sr-AIRR-seq and full- length isoform sequencing. Together these data demonstrate robust, FLAIRR-seq performance using RNA samples derived from peripheral blood mononuclear cells, purified B cells, and whole blood, which recapitulated results generated by commonly used methods, while additionally resolving novel IG heavy chain constant (IGHC) gene features.
- IGHV IG heavy chain variable
- IGHD diversity
- IGHJ joining
- CDR3 complementarity-determining region 3
- FLAIRR-seq data provides, for the first time, simultaneous, single-molecule characterization of IGHV, IGHD, IGHJ, and IGHC region genes and alleles, allele-resolved subisotype definition, and high- resolution identification of class-switch recombination within a clonal lineage.
- FLAIRR-seq of the IgM and IgG repertoires from 10 individuals resulted in the identification of 32 unique IGHC alleles, 28 (87%) of which were previously uncharacterized.
- these data demonstrate the capabilities of FLAIRR-seq to characterize IGHV, IGHD, IGHJ, and IGHC gene diversity for the most comprehensive view of bulk expressed Ab repertoires to date.
- Antibodies or immunoglobulins (IGs) are the primary effectors of humoral immunity and are found as both membrane-bound receptors on B cells and circulating, secreted proteins. Both membrane -bound B cell receptors (BCRs) and secreted Abs act to recognize and bind antigen. All Abs and BCRs are composed of two identical heavy and light chains that are post-translationally associated.
- the heavy chain is comprised of two distinct domains: (i) the variable domain (Fab), which allows for antigen binding, and (ii) the constant domain (Fc), which modulates downstream effector functions.
- the light chain also includes a variable domain that, once post-translationally associated with the heavy chain variable domain, interacts with cognate antigen.
- Abs are grouped into discrete isotypes and subisotypes (i.e., IgM, IgD, IgGl, IgG2, IgG3, IgG4, IgAl, IgA2, and IgE), based on the expression of specific constant (C) genes within the IG heavy chain locus (IGH).
- IgM Fc receptors
- IgG2 IgG2, IgG3, IgG4, IgAl, IgA2, and IgE
- C constant genes within the IG heavy chain locus
- Each isotype and subisotype has unique effector properties that together represent the wide diversity of Ab- mediated functions, including binding of Fc receptors (FCR), activation of complement, opsonization, antibody-dependent cellular cytotoxicity (ADCC) and antibody-dependent cellular phagocytosis (ADCP).
- FCR Fc receptors
- ADCC antibody-dependent cellular cytotoxicity
- ADCP antibody-dependent cellular phagocytosis
- the IG genomic loci are highly polymorphic and harbor diverse and complex sets of genes that recombine in each B cell to encode up to 10 13 unique specificities.
- B cells create this expansive catalog of specificities through somatic recombination of the variable (V), diversity (D), and joining (J) genes in IGH, and V and J genes from the corresponding light chain loci, lambda (IGL) and kappa (IGK).
- VDJ recombination in IGH a single D and J gene are first recombined, while the intervening and unselected D and J gene sequences are removed by RAG recombinase.
- Naive B cells which develop in the bone marrow from hematopoietic stem cell progenitors, have undergone VDJ recombination but have not yet encountered antigen, and solely express IgM and IgD. These naive B cells then migrate to B cell zones in secondary lymphoid tissues where they encounter antigen, driving further maturation and class switch recombination (CSR) to enable the most effective humoral responses.
- CSR class switch recombination
- CSR mediates the excision of IGHC genes at the DNA level, which leads to the utilization and linkage of different IGHC genes to the same VDJ, ultimately resulting in class switching to alternate isotypes (IgG, IgA, or IgE) and their respective subisotypes (IgGl, IgG2, IgG3, IgG4, IgAl, or IgA2).
- the IgG isotype class is represented by four subisotypes: IgGl, IgG2, IgG3, and IgG4.
- IgGl is typically the most abundant circulating IgG and mediates proinflammatory responses;
- IgG2 targets bacterial polysaccharides, providing protection from bacterial pathogens;
- IgG3 confers protection against intracellular bacterial infections and enables clearing of parasites; and
- IgG4 contains exclusive structural and functional characteristics often resulting in anti-inflammatory and tolerance-inducing effects.
- IgA In the case of IgA is most often associated with conferring mucosal immunity, subisotypes IgAl and IgA2 differ only in the extent of hinge regions found in the constant regions. Serum IgAl is thought to be necessary to regulate immune homeostasis, while IgA2 modulates inflammatory effects. Multiple studies have identified Ab-mediated subisotype-specific pathogenicity in the context of autoimmune diseases and cancer highlighting the need for further investigation of subisotype-specific repertoires.
- FLAIRR-seq a targeted 5’ RACE-based amplification of near full-length IgG and IgM heavy chain transcripts, paired with single molecule real time (SMRT) sequencing, is presented resulting in highly accurate (mean read accuracy -Q60, 99.9999% accurate), near full-length Ab sequences from RNA derived from whole blood, isolated PBMC, and purified B cells.
- FLAIRR-seq performs comparably to standard 5 ’RACE sr-AIRR-seq methods and single-molecule isoform sequencing (Iso-Seq) strategies for characterizing the expressed Ab variable region-based repertoire.
- FLAIRR-seq data including phased identification of IGHV, IGHD, IGHJ, and IGHC genes, facilitating the profiling of subisotype- and IGHC allele- specific repertoires and CSR characterization are further highlighted.
- PBMC peripheral blood mononuclear cells
- B cells from healthy donors, or whole blood collected from hospitalized COVID-19 patients (Supplementary Table 1).
- PBMC peripheral blood mononuclear cells
- STMCELL Technologies commercially available healthy donor PBMC (STEMCELL Technologies) and a subset of matched purified B cells were utilized to generate sr-AIRR-seq and FLAIRR-seq validation datasets.
- Full-length isoform sequencing (Iso-Seq) was performed using B cells isolated from the PBMC of a healthy, consented 57-year-old male donor at the University of Louisville (UofL) School of Medicine.
- UofL Institutional Review Board approved sample collection (IRB 14.0661).
- PBMCs Frozen healthy donor PBMCs were purchased, thawed, and aliquoted for use in downstream experiments (STEMCELL Technologies). For Iso-Seq analyses, 175mL of venous blood was collected in a final concentration of 6mM K3EDTA using standard phlebotomy. PBMCs were isolated using Sepmate PBMC Isolation Tubes (STEMCELL Technologies), with an additional granulocyte depletion step using the RosetteSep Human Granulocyte Depletion Cocktail (STEMCELL Technologies) as directed by the manufacturer. B cells from the freshly collected and frozen healthy donor PBMC were isolated using the EasySep Human Pan-B Cell Enrichment Kit, as described by the manufacturer (STEMCELL Technologies).
- B cells including plasma cells, were isolated by negative selection using coated magnetic particles.
- the B cell enrichment cocktail was added to the sample and mixed for a 5-minute incubation at room temperature, followed by addition of magnetic particles and further incubation for 5 minutes on the benchtop.
- the sample tube was then placed on an EasySep magnet (STEMCELL Technologies), and purified B cells were carefully eluted from the magnetic particles and immediately used for RNA extraction.
- genomic DNA gDNA
- RNA RNA was co-extracted using the AllPrep DNA/RNA Mini Kit (Qiagen) according to the manufacturer’s instructions.
- purified Pan-B cells were lysed in Buffer RLT Plus, and RNA was extracted with the RNeasy Plus Mini Kit (Qiagen) per the manufacturer’s protocol; no gDNA was collected from this sample.
- COVID-19 whole blood-derived RNA was extracted from samples collected in Tempus Blood RNA tubes using the Tempus Spin RNA Isolation Kit (ThermoFisher) as described by the manufacturer.
- RNA and gDNA were assessed using the Qubit 4.0 fluorometer, with the RNA HS Assay Kit and Qubit DNA HS Assay Kit, respectively (ThermoFisher Scientific). RNA and gDNA integrity were evaluated using the Bioanalyzer RNA Nano Kit and DNA 1200 Kit, respectively (Agilent Technologies). Extracted RNA and gDNA were stored at -80°C and -20°C, respectively, until used.
- FLAIRR-seq targeted amplification of heavy chain transcripts
- reaction conditions were used: (i) a primary master mix was prepared with 4.0 pL 5X First-Strand Buffer, 0.5 pL DTT (100 mM), and 1.0 pL dNTPs (20 mM) per reaction and set aside until needed; (ii) in a separate 0.2-mL PCR tube, 10 pL of sample RNA and 1 pL 5 ’-CDS Primer A were combined and incubated in a thermal cycler at 72°C (lid temperature: 105°C) for 3 minutes, followed by cooling to 42°C for 2 minutes; (iii) after cooling, tubes were spun briefly to collect contents and IpL (12pM) of the 5’ TSO-UMI was added to the RNA; (iv) 0.5 pL of RNase inhibitor and 2.0 pL of SMARTScribe Reverse Transcriptase were added to the primary master mix tube per sample and 8 pL of the combined master mix was then added to each RNA-containing sample tube.
- First-strand cDNA synthesis reactions were incubated in a thermal cycler at 42°C (lid temperature: 105°C) for 90 mins, followed by heat inactivation at 70°C for 10 minutes.
- Total first strand cDNA generated in this reaction was diluted 1:2 with Tricine-EDTA Buffer before moving onto targeted heavy chain transcript amplification.
- targeted IgG and IgM transcript amplification reactions were performed using barcoded IgG (3’ primer binding in the constant region exon 3, CH3) or IgM (3’ primer binding in the constant region exon 4, CH4)-specific primers and the following conditions: (i) 5 pL of diluted first-strand cDNA was added to 0.2-mL PCR tubes; (ii) a master mix was generated using 10 pL 5X PrimeSTAR GXL Buffer, 4 pL GXL dNTP mixture, 28 pL PCR-grade water, 1 pL PrimeSTAR GXL Polymerase and 1 pL lOx UPM form the SMARTer RACE 573’ Kit per reaction; (iii) 44 pL master mix was added to each reaction tube followed by 1 pL of the appropriate barcoded IgG (CH3) or IgM (CH4) primer.
- IgG- (63.5°C) and IgM-specific primers (60°C) Different temperatures were used for annealing of IgG- (63.5°C) and IgM-specific primers (60°C) to account for primer specific melting temperatures and to enhance targeted amplification specificity.
- Amplification conditions for full-length IgG were: 1 minute at 95°C, followed by 35 amplification cycles of 95°C for 30 sec, 63.5°C for 20 sec, and 2 minutes at 68°C, followed by a final extension for 3 minutes at 68°C and hold at 4°C.
- Amplification conditions for full-length IgM were: 1 minute at 95°C, followed by 35 amplification cycles of 95°C for 30 sec, 60°C for 20 sec., and 2 minutes at 68°C, followed by a final extension for 3 minutes at 68°C and hold at 4°C.
- Final amplification reactions were purified using a l.lx (vokvol) cleanup with ProNex magnetic beads (Promega) .
- Successfully amplified products were quantified with Qubit dsDNA HS assay (ThermoFisher Scientific) and length was evaluated with the Fragment Analyzer Genomic DNA HS assay (Agilent). Samples were equimolar pooled in 8-plexes for SMRTbell library preparation and sequencing.
- SMRTbell sequencing templates Eight-plex pools of targeted IgG or IgM amplicons were prepared into SMRTbell sequencing templates according to the “Procedure and Checklist for Iso-Seq Express Template for Sequel and Sequel II systems” protocol starting at the “DNA Damage Repair” step and using the SMRTbell Express Template Prep Kit 2.0, with some modifications ( Pacific Biosciences). Briefly, targeted IgG and IgM amplicons underwent enzymatic DNA damage and end repair, followed by ligation with overhang SMRTbell adapters as specified in the protocol.
- SMRTbell libraries were treated with a nuclease cocktail to remove unligated amplified products, using the SMRTbell Enzyme Cleanup Kit, as recommended by the manufacturer ( Pacific Biosciences). Briefly, after heat-killing the ligase with an incubation at 65°C, samples were treated with a nuclease cocktail at 37°C for 1 hour, and then purified with a 1.1X Pronex cleanup. Final SMRTbell libraries were evaluated for quantity and quality using the Qubit dsDNA HS assay and Fragment Analyzer Genomic DNA assay, respectively.
- Matched healthy donor RNA was used to generate targeted IgG and IgM sr-AIRR-seq libraries using the SMARTer Human BCR IgG IgM H/K/L Profiling Kit (Takara Bio USA) according to the manufacturer’s instructions with no modifications. Briefly, for each sample, proprietary IgG and IgM primers were used to amplify heavy chain transcripts following a 5’RACE reaction. sr-AIRR-seq libraries were then quality controlled using the 2100 Bioanalyzer High Sensitivity DNA Assay Kit (Agilent) and the Qubit 3.0 Fluorometer dsDNA High Sensitivity Assay Kit.
- RNA extracted from healthy sorted B cells was used to generate Iso-Seq SMRTbell libraries following the “Procedure & Checklist Iso-Seq Express Template Preparation for the Sequel II System” with minor adaptations compared to the manufacturer’s instructions. Briefly, Iso-Seq libraries were generated using 500 ng high-quality (RIN > 8) RNA as input into oligo- dT primed cDNA synthesis (NEB). Barcoded primers were incorporated into the cDNA during second strand synthesis. Following double-stranded cDNA amplification, transcripts from two samples sourced from purified B cells and NK cells were equimolar pooled.
- SMRTbells were generated from the pooled cDNA as described above for the FLAIRR-seq amplification products, including the addition of a nuclease digestion step. Quantity and quality of the final Iso-Seq libraries were performed with the Qubit dsDNA High Sensitivity Assay Kit and Agilent Fragment Analyzer Genomic DNA assay, respectively. This 2-plex Iso-Seq pool was sequenced using primer v4 and polymerase v2.1 on the Sequel lie system with a 30-hour movie. HiFi reads were generated on instrument before analyses.
- FLNC full-length non-concatemer
- IgG and IgM CH3 or CH4 primer sequences were identified with an error rate of 0.2, and primers identified were then noted in FASTQ headers using “MaskPrimers align”.
- 12 basepair (bp) UMIs were located and extracted using “Maskprimers extract”. Sequences found to have the same UMIs were grouped and aligned using “AlignSets muscle,” with a consensus sequence generated for each UMI using “BuildConsensus”. Mate pairing of sr-AIRR-seq reads was conducted using a reference-guided alignment requiring a minimum of a 5 bp overlap via “AssemblePairs sequential”.
- Consensus reads were then generated as described above, including removal of sequences with ⁇ 2 supporting reads. Read counts following each step of data filtration for AIRR-seq and FLAIRR-seq are represented in Supplementary Table 3. pRESTO-filtered reads for both sr-AIRR-seq and ELAIRR-seq data were then input into the Change-0 tool (Table 1).
- Iso-Seq required no initial processing from pRESTO and was input into Change-0 for IG gene reference alignment along with sr-AIRR-seq and FLAIRR- seq data using “igblastn”, clonal clustering using “DefineClones”, and germline reconstruction and conversion using “CreateGermlines.py” and the GRCh38 chromosome 14 germline reference. Fully processed and annotated data was then converted into a TSV format for use in downstream analyses. The Alakazam Immcantation tool suite was then used to quantify gene usage, calculate CDR3 length, assess somatic hypermutation frequencies (SHM) which were compared by unpaired T-tests with Bonferroni multiple testing correction.
- SHM somatic hypermutation frequencies
- Targeted IG gDNA capture long-read sequencing, and IGenotyper analyses.
- FLAIRR-seq validation samples also underwent IGHC targeted enrichment and long- read sequencing as previously described. Briefly, gDNA was mechanically sheared and size selected to include 5-9kb fragments using the BluePippin (Sage Science). Samples were then end repaired and A-tailed using the standard KAPA library preparation protocol (Roche). Universal priming sequences and barcodes were then ligated onto samples for multiplexing (Pacific Biosciences). Barcoded gDNA libraries were captured using IGH-specific probes following a SeqCap protocol (Roche). 26 IGH-enriched samples were purified and pooled together for SMRTbell library prep as described above, including the final nuclease digestion step.
- IGenotyper was used to detect single nucleotide variants and assemble sequences into haplotype- specific assemblies for downstream IGHC gene genotyping. Alleles were then extracted from assemblies using a bed file containing coordinates for each IGHC gene exon.
- reads were then aligned to the IMGT database (downloaded on 2/21/22) and assigned as an exact match to IMGT, “novel” if there was no match to the IMGT database or “novel, extended” if a match was detected to a partial allele found in IMGT, but the IMGT allele was a substring of the IGenotyper identified allele. This set of alleles was then used as a ground truth dataset.
- IGHC length 900bp-1100bp
- SNV phase single nucleotide variants
- IGHJ6 is standardly used for sr-AIRR-seq haplotype inference
- TigGER was employed to infer novel IGHV alleles, and generate sample-level IGHV genotypes using. Rearranged sequences within the Change-0 table were then reannotated taking into account sample genotype and detected novel alleles. Updated annotations were then used to infer haplotypes using RAbHIT version 0.2.4. Both IGHJ6 and IGHM were used as anchor points for haplotyping, and the resulting haplotypes were compared.
- FEAIRR-seq and sr-AIRR-seq analyses were performed on ten healthy donor PBMC samples to compare both library preparation methods.
- FEAIRR-seq data were filtered from the initial HiFi reads (>Q20) to include only >Q40 reads.
- the average read quality of these filtered reads was >Q60 (99.9999%), with a pass filter rate ranging from 88%-93% of total reads.
- sr-AIRR-seq FASTQ bases were trimmed to retain sequences with an average quality of Q20 (99%).
- FEAIRR-seq resulted in comparable or, in many cases, increased number of unique VDJ sequences, clones, and CDR3 sequences identified compared to the matched sr- AIRR-seq-derived samples in both the IgG and IgM repertoires (schematized in Figure 2A).
- These basic sequencing and initial analysis metrics demonstrated that FEAIRR-seq produced high-quality variable region data for detailed Ab repertoire analyses and is amenable to analysis using existing sr-AIRR-seq analysis tools. Comparative costs were calculated based on the cost per “actionable read”, defined as the read number per sample and per method after quality filtration and assembly, but prior to cluster consensus.
- This calculation represented the total unique single molecule or assembled templates captured by either method that passed all necessary quality control irrespective of biologic repertoire diversity or clonality, removing technology/platform biases.
- this “per actionable read” price was used to calculate the cost for 15,000 “actionable reads” as a standard sample. sr-AIRR-seq cost $25.50 per sample, whereas the FLAIRR-seq cost was $33.57 per sample. These costs reflect reagents and consumables only, instrumentation and labor are not included.
- RNA extraction (i) RNA derived from bulk PBMC, and (ii) RNA isolated from purified pan B-cells, followed by FLAIRR-seq preparation, SMRT sequencing, and Immcantation analysis of both groups.
- IGHV, IGHD, and IGHJ gene usage correlations between groups are shown in Figure 2A and demonstrate a significant association (p-values ranging from 0.033 to 4.1e' 16 ) strongly supporting the conclusion that B cell isolation before RNA extraction was not necessary to achieve comparable gene usage metrics. The limited differences that were observed could be explained by template sampling differences between the two experiments. Due to the strong associations observed and the ease of processing PBMC in bulk, RNA derived directly from PBMC aliquots were further used for the remainder of the analyses.
- FLAIRR-seq has very limited to no primer-driven bias compared to whole transcriptome data.
- this benchmarking dataset confirmed that FLAIRR-seq is comparable other state-of-the-art methods, providing robust characterization of commonly used repertoire metrics, with limited increases in per sample cost.
- IGenotyper and FLAIRR-seq provide constant region gene allele identification and allow for haplotyping of variable genes
- the novel value added by FLAIRR-seq is improved resolution of IGHC, including estimation of IGHC gene and allele usage, subisotype identification, and phasing of variable and constant regions for comprehensive repertoire analysis.
- IGHC gene and allele identification was first established by targeted sequencing of the germline IGH locus ( Figure 3 A) using IGenotyper.
- IGenotyper IGHG1, IGHG2, IGHG3, IGHG4 and IGHM alleles called by IGenotyper (see Methods) were assigned to one of three categories, schematized in Figure 3B: (i) “exact match” - alleles documented in the IM GT database; (ii) “novel not in IMGT” - alleles not documented in the IMGT database; or (iii) “extended” - alleles that matched partial alleles in the IMGT database (i.e., those only spanning a subset of exons), but were extended by sequences in our dataset.
- the IGenotyper-derived IGHC gene database was used as the ground-truth for evaluating the capability of FLAIRR-seq to identify and resolve IGHC gene alleles.
- 19/32 (59%) IGenotyper alleles were resolved at 100% identity; no additional false-positive alleles were identified.
- 8 had allele defining single nucleotide variants (SNVs) 3’ of the FLAIRR-seq primers.
- haplotype 1 represented by IGHJ6*03 and IGHM_FL_4, the IGHJ6*03-derived haplotype had 39 IGHV genes for which either an allele or deletion call was made.
- IGHM_FL_4 the same allele/deletion calls were made for 39 of these genes; in addition, using IGHM as the anchor gene, assignments were made for an additional 5 IGHV genes that had “unknown” designations using IGHJ6.
- FLAIRR-seq enables isotype-, subisotype-, and allele-specific repertoire analyses
- IGHG and IGHM alleles identified in each sample were used to annotate reads in each respective repertoire. These assignments allowed for partitioning of the repertoire by isotype, subisotype and IGHC allele (Figure 4). To demonstrate this, the same representative sample (“1013”) that was heterozygous for all IGHC genes was utilized. As shown in Figure 4A, IGHC gene assignments allow for subisotype and allele level frequencies to be estimated as a proportion of the overall IgG and IgM repertoires. In addition, detailed analyses of the repertoire can be conducted within each of these compartments. For example, Figure 4B shows the frequencies of IGHV gene subfamilies for each IGHG and IGHM allele identified in these two samples.
- IGHV1 and IGHV3 were statistically different between subisotypes (P ⁇ 0.01, ANOVA); IGHV1 usage was elevated in IGHG1 and IGHG4, whereas IGHV3 was elevated in IGHG2 and IGHG3.
- FLAIRR-seq identifies subisotype-specific clonal expansion and CSR in longitudinal samples.
- FLAIRR-seq The utility of FLAIRR-seq in clinical samples was investigated, particularly to observe changes in immune repertoires over time. Ab responses are highly dynamic, with specific Ab clones expanding upon activation by antigen. It was contemplated whether class switch recombination could be captured by FLAIRR-seq, given the capability to identify clones with the subisotype resolved. FLAIRR-seq resolved repertoires were evaluated over time in four samples collected from one individual over their >13-day hospitalization for severe CO VID- 19 disease. Blood draws were taken on days 1, 4, 8, and 13 post-hospitalization ( Figure 5A) and analyzed with FLAIRR-seq across all time points.
- CSR mediates the switching of Abs from one sub/isotype to another. This occurs through the somatic recombination of IGHC genes, which brings the switched/selected IGHC genes adjacent to the recombined IGHV, IGHD, and IGHJ segments, facilitating transcription.
- the switching of isotypes and subisotypes can result in changes to associated effector functions of the Ab while maintaining antigen- specific variable regions.
- IGHC alleles were also resolved from this individual, with the exception of IGHG1 alleles which were ambiguous.
- both IGHG1 and IGHG2 were heterozygous ( Figure 5G).
- FLAIRR-seq provides the novel ability to use IGHC gene usage to identify subisotypes and genotype heavy chain transcripts, linking these data back to evaluate subisotype- specific repertoires, clonal expansion, and CSR.
- IGHC variation Underscoring the underappreciated extent of IGHC variation, our profiling of a restricted cohort of only 10 individuals from relatively homogenous backgrounds identified 4 and 7 completely novel IGHC alleles in IgM and IgG, respectively, and extended an additional 17 alleles beyond which had been available in the IMGT database.
- subisotype-specific repertoire profiling approaches may be the first step toward identification of unique clones that mediate disease pathogenicity or serve as high- resolution biomarkers to disease progression, as well as open the door for functional experiments on subisotype clones of interest, including examining the functional impact of the novel IGHC alleles identified here.
- Expanded population-based FLAIRR-seq profiling and curation of novel IGHC alleles, particularly in conjunction with IGenotyper targeted genomic assembly efforts in IGH, is a significant first step in defining the full extent of variation in a region too long assumed to be relatively invariant.
- the Fc region is known to be critical for modulating differential Ab effector functions. These differential functionalities are currently understood to be regulated by differential posttranslational modifications, such as variable glycosylation.
- FEAIRR-seq profiling is a valuable tool to investigate how genomic variation across IGHC genes impacts residue usage and resultant Fc receptor binding, signaling potential, crosslinking, and posttranslational modification, all of which alter downstream effector functions, such as ADCC, ADCP, and complement fixation.
- FEAIRR-seq can effectively examine clonal expansion and CSR in longitudinal samples, demonstrating the feasibility of using FEAIRR-seq to resolve Ab repertoire dynamics. This increased resolution furthers the understanding of Ab repertoire evolution in the transition of acute to chronic disease states, many of which are associated with overall IgG subisotype distribution changes that are thought to reflect the inflammatory milieu.
- One example is advanced melanoma, where late-stage disease is characterized by elevated IgG4 compared to IgGl, which is believed to reflect a more tolerizing, pro-tumor environment.
- FEAIRR-seq also provide insights into longitudinal Ab repertoire dynamics in COVID-19 infection following exposure to different viral variants of concern, or to assess Ab responses to viral vaccines.
- FEAIRR-seq examination of these samples may identify specific repertoire distribution patterns that act as biomarkers of disease progression. Moving forward it is critical to account for all variability within the Ab repertoire for the most comprehensive understanding of repertoire dynamics and the myriad factors impacting Ab effector function. Together, the data presented here demonstrate that the FEAIRR-seq method provides a comprehensive characterization of allele-resolved IgG and IgM repertoires, detailing variable region gene usage and measurements of maturation, isotype and subisotype identification, and the unappreciated extent of constant region variation, which will be necessary to fully appreciate the impact of IG genomic variation in health and disease.
- Example 2 Extension of FLAIRR-SEQ for all Seven Immunoglobulin Chains. While initial work was focused on benchmarking and demonstrating capability with immunoglobulins IgG and IgM, the FLAIRR-seq parameters were initially expanded to efficiently capture and profile all seven immunoglobulin chains necessary for an effective systemic humoral immune response, including IgM, IgD, IgG, IgA, IgE, IGL, IGK, and all subisotypes within (e.g., IgGl, IgG2, IgG3, IgG4, IgAl, and IgA2).
- Targeted primers were designed and, in the cases of IgG and IgM mentioned above, redesigned to incorporate concatenation linkers and represent population-based diversity in these genetic regions. All 5’ primers are the same, targeting the 5’ template switch oligonucleotide (TS) incorporated into the cDNA during the reverse transcription and rapid amplification of cDNA ends (RACE). This TSO-specific oligo is further linked to a unique molecular identifier (UMI) necessary to remove amplification artifacts during data processing. All 3’ primers are specific to the immunoglobulin transcripts themselves and act to target and enrich Ab or Ig transcripts.
- TS template switch oligonucleotide
- RACE rapid amplification of cDNA ends
- UMI unique molecular identifier
- the 3’ primer targets the constant region exon 3 (CH3) within 100 bp of the polyA tail.
- the 3’ primer targets the constant region exon 4 (CH4) within 100 bp of the polyA tail.
- Both the 5’ and 3’ gene-specific primers include a concatenation tag enabling directional ligation post-amplification, such that multiple Ig or Ab transcripts are sequenced per molecule to enhance overall depth of sequencing.
- All transcripts are concatenated in arrays of 8, creating complete, concatenated Ig profiling libraries that range in size from 5600 bp (for an array of 8 light chain transcripts at 700 bp each) to approximately 16,800 bp (for an array of 8 IgM heavy chain transcripts at 2100 bp each).
- FEAIRR-seq identifies additional variation within isotypes and/or subisotypes leveraging the single molecule nature of the Ig or Ab transcript resolution.
- healthy donors that carried a hinge deletion in their IgG3-specific repertoire were observed, which was represented in the FLAIRR-seq data as two unique Ig transcript alleles, Allele 1 carrying the hinge segment 3 (“H3”; Figure 9B, top) and Allele 2 clearly showing the absence of H3 ( Figure 9B, bottom).
- FIG. 10 Another structural variation observed and enabled by FLAIRR-seq analysis is a duplication of the IgG4 constant region allele (Figure 10). Normally, each individual carries two alleles of a gene - one donated by the maternal chromosome and the other provided by the paternal copy. In the case of a gene duplication, such as this, it is seen three alleles if one chromosome includes the duplication (as seen in Figure 10), or four alleles if the duplication is carried by both chromosomes. This creates a third, or even fourth, distinct IgG4-specific Ig or Ab repertoire in these individuals, ostensibly broadening their IgG4-specific immune response.
- Subisotype-specific repertoires are also find within the IgA- specific antibody profile, in which two subisotypes, IgAl and IgA2, are found. Although relatively similar in that both subisotypes play a role in mucosal immunity, the two subisotype forms differ in protein structure and in downstream effector function. In particular, IgAl is thought to be necessary for immune homeostasis, or balancing immune responses to control immune-regulated inflammation, whereas IgA2 modulates mucosal inflammation and drives active immune responses. Structurally, these subisotypes differ only be the presence (IgAl) or absence (IgA2) of an extended hinge domain in the constant region ( Figure 10B). Given their dramatically different biological roles, FLAIRR-seq resolution would allow investigation of subisotype- specific repertoires in the context of health and disease, providing clear separation of antibodies that belong to the regulatory versus inflammatory populations. This level of resolution is impossible with other current technologies.
- Figure 11 (B & C) provides an example of how FLAIRR-seq is used to resolve gene-level frequency estimates of the IGLC3 genes and its close duplicate copies; no other technology can allow for this information to be linked directly to recombined lambda chain V and J genes in full-length transcripts.
- FLAIRR-seq provides a new modality of profiling immune systems to identify not only variable regions responsible for antigen- specific interactions, but also the linked constant region exons that determine the downstream function once the antibody is bound. Incorporating these variants will expand the variation of how the genetics underlying the antibody response sets the stage for development of immune responses in the context of vaccination and disease, as well as identify highly reactive alleles that poised to develop neoantigen reactivity or autoimmune responses.
- MG Myasthenia Gravis
- the autoantibodies that mediate acetylcholine- specific MG have been shown to be primarily of the IgGl and IgG3 subisotypes. These autoantibodies mediate disease through one of three major pathogenic mechanisms: (i) bind and crosslink upon the acetylcholine receptor, initiating the complement cascade and destroying the neuromuscular junction; (ii) physically bind the acetylcholine receptor and block interaction with the acetylcholine ligand, interfering with the signaling potential in the neuromuscular junction; or (iii) bind the acetylcholine receptor and promote receptor internalization, removing acetylcholine receptor availability for acetylcholine binding and interrupting signal propagation.
- autoantibodies appear to be able to act through one or more of these mechanisms, and often there is a polyclonal, or multi-mechanistic, reactivity at play in MG-infected individuals. All three mechanisms require Ab or Ig binding to the acetylcholine receptor, but only mechanism (i) depends on a specific constant region mediated cascade of events to trigger complement activity and tissue destruction.
- the study aimed to perform immunogenomic profiling using FLAIRR-seq to investigate if IgG3-specific Ig genes or alleles were present in affected individuals that could confer increased complement reactivity.
- FLAIRR-seq was used to profile IgG subisotypes in 48 MG-affected individuals. Together, a total of 89 constant region alleles were identified, with 70 (79%) of these being novel compared to existing databases. When broken down by subisotype, the largest number of new alleles were identified in the IgG3 and IgG4 repertoires ( Figure 11 A). This again underscores just how little is understood about the variation in the constant region and demonstrates that the constant region varies considerably in those individuals with disease. Due to the single molecule nature of FLAIRR-seq data, we can investigate the profile of full-length Ig or Ab transcripts within a particular subisotype compartment.
- FIG. 11 schematizes those alleles identified, “M” or “major” allele represents IgG3 transcripts that contain all 4 hinge regions; “S” or “short” indicates those missing a single hinge domain (similar to that shown in Figure 9B); additionally, a third or “E” allele was identified retaining only one hinge domain.
- M or “major” allele represents IgG3 transcripts that contain all 4 hinge regions
- S or “short” indicates those missing a single hinge domain (similar to that shown in Figure 9B); additionally, a third or “E” allele was identified retaining only one hinge domain.
- HD healthy donors
- Table 4A (Continued). sr-AIRR-seq pRESTO and Change-0 Pipeline Read Counts. Table 4A. (Continued) sr-AIRR-seq pRESTO and Change-0 Pipeline Read Counts.
- Table 4B FLAIRR-seq pRESTO and Change-0 Pipeline Read Counts.
- Table 4B (Continued) FLAIRR-seq pRESTO and Change-0 Pipeline Read Counts.
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Organic Chemistry (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Zoology (AREA)
- Wood Science & Technology (AREA)
- Analytical Chemistry (AREA)
- Immunology (AREA)
- General Engineering & Computer Science (AREA)
- Microbiology (AREA)
- Molecular Biology (AREA)
- Physics & Mathematics (AREA)
- Biotechnology (AREA)
- Biochemistry (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- Genetics & Genomics (AREA)
- Cell Biology (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The present disclosure provides a near full-length long read AIRR-seq (FLAIRR-Seq) method that utilizes targeted amplification by 5' rapid amplification of cDNA ends (RACE), combined with single molecule, real-time sequencing to generate highly accurate immunoglobulin (Ig) heavy chain and light chain transcripts.
Description
METHODS FOR SINGLE MOLECULE RESOLUTION OF NEAR FULL-LENGTH IMMUNOGLOBULIN HEAVY AND LIGHT CHAIN REPERTOIRES
CROSS-REFERENCE TO RELATED APPLICATIONS
This application claims priority to, and the benefit of, U.S. Provisional Patent Application No. 63/535,326, filed August 30, 2023, which is incorporated by reference herein in its entirety.
REFERENCE TO SEQUENCE LISTING
The sequence listing submitted on August 30, 2024, as an .XML file entitled “11258- 011WOl_ST26” created on August 29, 2024, and having a file size of 38,326 bytes is hereby incorporated by reference pursuant to 37 C.F.R. § 1.52(e)(5).
FIELD
The present disclosure relates to long-read sequencing methods to generate highly accurate immunoglobulin heavy and light chain transcripts.
BACKGROUND
Current short-read Adaptive Immune Receptor Repertoire sequencing (sr-AIRR-seq) methods aim to characterize the expressed antibody (Ab) repertoire, through the sequencing of immunoglobulin-encoding transcripts; available technologies/methods meet this objective to differing extents. Profiling of the variable region, even in part, defines variable (V), diversity (D) and joining (J) gene usage while also providing characterization of complementarity determining regions (CDR) 1, 2, and 3, which are hypervariable and directly interact with target antigen. CDR3-targeted profiling approaches allow for V, D, and J gene assignments but do not provide complete resolution of the entire antibody transcript.
These limitations stress the need to develop an end-to-end pipeline to target, profile, and characterize the Ab heavy (IGH) and light chain (IGK and IGL) repertoires in the context of specific isotypes (e.g., IgG, IgM, IgA, IgE) and sub-isotypes (e.g., IgGl, IgG2, IgG3, IgG4, IgAl, IgA2). The methods disclosed herein address these and other needs.
SUMMARY
Disclosed herein is a novel near full-length long read AIRR-seq (FLAIRR-Seq) method that utilizes targeted amplification by 5’ rapid amplification of cDNA ends (RACE), combined with single molecule, real-time (SMRT) sequencing to generate highly accurate immunoglobulin heavy and light chain transcripts.
In one aspect, disclosed herein is a method of sequencing a ribonucleic acid (RNA) encoding an immunoglobulin (Ig) or an antibody (Ab), the method comprising targeting and amplifying the RNA using a 5 ’-Rapid Amplification of cDNA Ends (RACE) method, wherein the RACE method converts the RNA into a complementary deoxyribonucleic acid (cDNA), subsequently targeting an Ig heavy chain cDNA transcript and an Ig light chain cDNA transcript through a primer-directed specific polymerase chain reaction (PCR), wherein the Ig heavy chain cDNA transcripts and the Ig light chain cDNA transcripts are targeted for amplification, integrating the Ig heavy chain cDNA transcripts and the Ig light chain cDNA transcripts into an array of eight transcripts to generate a nucleic acid comprising at least 5600 base pairs (bps), analyzing the cDNA transcripts using a single molecule, real-time sequencing (SMRT) method, and detecting at least 700 base pairs (bps) within a variable region and a constant region of the Ig.
In some embodiments, the method detects about 700 bps to about 900 bps of the variable region and the constant region, or fragments thereof, of the Ig light chain cDNA transcript. In some embodiments, the method detects about 1500 bps to about 2100 bps of the variable region and the constant region, or fragments thereof, of the Ig heavy chain cDNA transcript.
In some embodiments, the method identifies a gene or an allele encoding one or more segments of the variable region comprising a variability (V) peptide, a diversity (D) peptide, a joining (J) peptide, or combinations thereof.
In some embodiments, the method identifies a gene or an allele encoding one or more segments of the constant region of a heavy chain (CH) or light chain (CL) comprising CHI, CH2, CH3, CH4, CL1 or combinations thereof.
In some embodiments, the method identifies a gene or allele encoding an Ig isotype or an Ig subisotype. In some embodiments, the method identifies an expansion of the Ig isotype or Ig subisotype.
In some embodiments, the method identifies a class switch recombination (CSR) event of the Ig subisotype. In some embodiments, the Ig isotype comprises an IgM, IgD, IgG, IgA or
an IgE isotype. In some embodiments, the IgG comprises an IgGl, IgG2, IgG3, or IgG4 subisotype. In some embodiments, the IgA comprises an IgAl or IgA2 subisotype.
In some embodiments, the RNA is isolated from peripheral blood mononuclear cells (PBMCs), purified B cells, solid tumors, healthy tissue, or whole blood.
In some embodiments, the method develops an Ig profile of a subject.
In another aspect, disclosed herein is a method of treating or preventing a disease in a subject in need thereof, the method comprising isolating an RNA sample from the subject, sequencing the RNA using the method of any preceding aspect, developing an Ig profile from the RNA, and administering a therapeutic agent to the subject, wherein the Ig profile indicates the disease.
In some embodiments, the disease comprises an infectious disease, an autoimmune disease, a neoproliferative disease, a neurodegenerative disease, a respiratory disease, a congenital disease, a gastrointestinal (GI) disease, a metabolic disease, or a cardiovascular disease.
BRIEF DESCRIPTION OF FIGURES
The accompanying figures, which are incorporated in and constitute a part of this specification, illustrate several aspects described below.
FIGS. 1A, IB, 1C, and ID show an overview of Ab structure, comparative performance of alternate methods, and the FLAIRR-seq molecular method. Figure 1A shows the schematic representation of the IGH locus, heavy chain transcript structure, and functional immunoglobulin protein. Figure IB shows the comparative schematic showing coverage across the heavy chain transcript of commonly used sr-AIRR-seq and IsoSeq methods compared with FEAIRR-seq. Figure 1C shows the FEAIRR-seq molecular pipeline overview: RNA (green) was converted to first-strand cDNA (red) using the 5’ RACE method, incorporating a 5’ TSO- UMI (pink) via template switch. Second strand amplification specifically targeted immunoglobulin molecules through priming of the 5’ TSO-UMI and the 3’ constant region IGH exon 3 (CH3) for IgG/A/E, or CH4 for IgM. A 16 bps barcode was incorporated into the 3’ CH3/CH4 primers to enable sample multiplexing post-amplification. Figure ID shows the integrated Genomics Viewer alignment showing near full-length single molecule structure of IGHG4 (IgG4)- specific FEAIRR-seq resolved transcripts.
FIGS. 2A, 2B, 2C, 2D, and 2E show robust characterization of V, D, and J genes and additional variable region features using FEAIRR-seq compared to sr-AIRR-seq. Figure 2A shows the spearman ranked correlations and p-values of V, D, and J gene usage frequencies
identified by FLAIRR-seq performed on matched total PBMC and purified B cells. Figure 2B shows the heatmap of Spearman ranked correlations and p-values of V, D and J gene usage frequencies between FLAIRR-seq and sr-AIRR-seq processed samples. Figure 2C shows the violin plots of somatic hypermutation frequencies defined by FLAIRR-seq or sr-AIRR-seq in IgM- and IgG-specific repertoires. Figure 2D shows the CDR3 length defined by FLAIRR-seq or sr-AIRR-seq analysis in both IgM- and IgG-specific repertoires. Significant differences of CDR3 length and SHM were measured by unpaired T-test with Bonferroni correction between FLAIRR-seq and sr-AIRR-seq data indicated by * (p < 0.05) or ** (p < 0.01). Figure 2E shows the Spearman ranked correlation of V, D, and J gene usage frequencies between FLAIRR-seq- based and Iso-Seq-based repertoire profiling. These data indicate that FLAIRR-seq is capable of identifying longer CDR3 regions than those identified with current sr-AIRR-seq methods; longer CDR3 regions are often associated with extensively hypermutated, broadly neutralizing antibody developing in diseases such as human immunodeficiency virus (HIV).
FIGS. 3A, 3B, 3C, 3D, and 3E show that FLAIRR-seq provides novel IGHC resolution for allelic discovery and allows variable gene haplotyping. Figure 3A shows the overview of experimental design and analysis pipelines of genotyping by IGenotyper (gDNA) and FLAIRR-seq (RNA) methods. Figure 3B shows the schematic depicting IGHC alleles identified by IGenotyper, partitioned by identification as (i) exact matches to documented IMGT alleles, (ii) novel alleles that are not in IM GT, or (iii) extended alleles. Figure 3C shows the pie chart and stacked bar graph representing the total number of alleles and fraction of each category identified per IGHC gene as identified by either IGenotyper or FLAIRR-seq. Bar charts showing number of IGHC alleles from FLAIRR-seq that were resolved, ambiguous or unresolved when compared to IGenotyper alleles. * Indicates additional allele found due to IGHG4 duplication. Figure 3D shows the table visualizing novel and extended alleles resolved by IGenotyper data. Extended alleles are denoted by *(allele number)-FL and novel alleles are denoted by FL_(number); alleles resolved by FLAIRR-seq are marked with a dot (•). Figure 3E shows usage of the haplotyping tool RAbHIT, FLAIRR-seq data to enable inference of germline haplotypes. Venn diagrams show number of IGHV haplotype allele/deletion calls when using IGHJ6 (standard input) or IGHM (novel input, enabled by FLAIRR-seq method) anchors for each haplotype. Tile plots showing IGHV gene haplotypes inferred using either IGHJ6 anchors or IGHM anchors for one sample. Dark gray represents a deletion (DEL), off- white a non-reliable allele annotation (NRA), and light gray represents an unknown allele (Unk).
FIGS. 4A, 4B, and 4C show that FLAIRR-seq resolves subisotype specific repertoire diversity. Figure 4A shows the bar plots showing highly resolved distribution of unique VDJ sequences across isotypes, subisotypes, and subisotype alleles in one representative sample “1013” characterized by FLAIRR-seq; standard sr-AIRR-seq methods resolve sequences only to the isotype level (IGHG, red bar). Figure 4B shows the circos plots showing V family gene usage frequency within IgG- and IgM-isotypes and subisotype- specific alleles for “1013”. Figure 4C shows the left, boxplots of V gene family usage frequencies within IGHG1, IGHG2, IGHG3, and IGHG4 repertoires across all ten individuals. Top right, principal component analysis of V gene family usage by subisotype; plot includes the first two principal components, and individual repertoires are colored by IGHG subisotype. Bottom right, boxplots showing sequence frequency of IGHV1 and IGHV3 family genes by subisotype across all 10 samples.
FIGS. 5A, 5B, 5C, 5D, 5E, 5F, and 5G show that FLAIRR-seq resolves subisotype- specific clonal expansion and facilitates haplotype analysis of CSR in a patient hospitalized for CO VID-19. Figure 5A shows the overview of experimental design: whole blood-derived RNA was collected on days 1, 4, 8 and 13 post-hospitalization and used for FLAIRR-seq profiling. Figure 5B shows the bar plot showing the percentage of clones represented by each subisotype across timepoints. Figure 5C shows the Simpson’s diversity index (q=2) for all clones in each subisotype across four timepoints; IgG4 not included due to low sequence counts. Figure 5D shows the polarity, or the fraction of clones needed to comprise 80% of the repertoire, reported as fraction of total subisotype-specific repertoires across time. Figure 5E shows the distribution of a single clone “9900” across isotypes and subisotypes over time, suggesting ongoing CSR of this clone, as identified by variable region usage across multiple isotype and subisotype- specific constant regions. Figure 5F shows the phylogenetic tree constructed from sequences associated with the clone 9900 lineage, with the inferred germline sequence as the outgroup (star). Shapes and colors of tips (sequences) indicate time point and isotype/subisotype. The scale bar represents the number of mutations between each node in the tree. The subclade within the red box is represented by multiple time points and subisotypes, providing evidence of CSR. Figure 5G shows the tile plot showing the assignment of IGHC alleles to their respective haplotypes, based on the frequency of observations in which each IGHC allele was linked to each respective allele of heterozygous V genes, IGHV3-7 and IGHV3-48; light gray denotes IGHC alleles for which haplotype assignment was not possible. Analysis of sequences in (Figure 5F) revealed that the IGHG1 and IGHG2 alleles represented in the phylogeny came from the same haplotype (IGHGl*02/*07, IGHG2*08, IGHM_FL_2), further confirming that different isotype and subisotype usage was a result of CSR.
FIG. 6 shows the agreement in calling IGHV gene usage in sample “1013” when analyzed with either the Immcantation or MiXCR analysis pipelines, demonstrating applicability of FLAIRR-seq data within multiple available analysis tools.
FIG. 7 shows ongoing optimization and modification to the FLAIRR-seq molecular method to enhance overall yield and accuracy given the advancing throughput of sequencing technology.
FIGS. 8A, 8B, 8C, 8D, 8E, 8F, and 8G show successful targeted enrichment of all near full-length seven antibody or immunoglobulin heavy and light transcript chains, including IgM (Figure 8A), IgG (Figure 8B), IgD (Figure 8C), IgA (Figure 8D), IgE (Figure 8E), IGL (Figure 8F), and IGK (Figure 8G).
FIGS. 9A and 9B show the characterization of variation if IgG-specific repertoires in healthy donors (n=48) using FLAIRR-seq. The majority of alleles (44/70, 63%) identified in these individuals were novel compared to the IMmunoGeneTics Database (IMGT), a standard database in the field. Figure 9A shows that as a function of the total, more novel alleles were discovered in IgG4 than any other subisotype. Figure 9B shows that mapping of the novel allele sequences identified single nucleotide variation (represented by colored, vertical lines where each color used represents a different nucleotide base, A, C, T or G) and structural variation (represented by present or absent grey boxes, which represent coding exons). FLAIRR-seq enables allele- specific identification of both these variant types, including deletions of whole exons of the hinge regions (Hl, H2, H3 or H4), which are known to contribute to antibody effector functions, such as complement fixation.
FIGS. 10A and 10B show the additional examples of Ig or Ab structural variation only able to be seen with FLAIRR-seq: Figure 10A demonstrates duplication in the IgG4 antibody subisotype gene, which results in the expression of three distinct antibody alleles in the circulating repertoire. Single nucleotide variants are represented by colored, vertical lines and are used to distinguish one transcript allele from the other two in the gene alignment. Three distinct alleles can be visualized when mapping the data with the Integrated Genome Viewer (IGV). Figure 10B shows an alignment of both subisotypes of IgA, IgAl and IgA2. IgAl is known to play a regulatory role in maintaining immune balance (homeostasis) and contains an extended hinge, compared to IgA2, in which the extended hinge region is not present.
FIGS. 11A and 1 IB show a map of the IGLC-J gene locus on human chromosome 22. Figure 11 A shows the highlights of a tandem array of duplicated IGLC-J gene cassettes, IGLC- J 1 through IGLC-J7. Initial genomic characterization of the IGLC-J locus identified only seven IGLC and IGLJ genes; however, through our recent genomic studies, we have identified 3
additional copies of the IGLC-J3 cassette, totaling 4 copies (green boxes). Figure 11B (left panel) shows the bar plot showing the frequency of IGLC genes in the IGL repertoire of a single healthy donor, determined using FLAIRR-seq. Whereas previous techniques may have been able to differentiate frequency differences between only IGLC1, IGLC2, IGLC3, IGLC4, IGLC5, IGLC6, and IGLC7, with FLAIRR-seq, we can now resolve frequency patterns to the level of gene duplicates (as shown for IGLC3-1, IGLC3-2, IGLC3-3, and IGLC3-4). Figure 11B (right panel) shows further resolution of IGLC3 gene duplicate frequencies in the IGL repertoire, the bar plot reveals a dominant contribution of IGLC3-3 and IGLC3-4 expression in the specific IGLC3 repertoire.
FIGS. 12A, 12B, and 12C show the IgG specific antibody variation in a cohort of individuals (n=48) presenting with Myasthenia Gravis (MG) across disease states. Figure 12A shows the distribution of novel IgG subisotype- specific alleles revealed by FLAIRR-seq analysis compared to those alleles known in the standard IMGT database, demonstrating that the majority of alleles (70/89, 79%) identified were novel compared to standard databases, highlighting the need for constant region profiling, which is only enabled by FLAIRR-seq. Specifically, most new alleles were identified in the IgG3 and IgG4 subisotype- specific repertoires. IgG3, in particular, is known to mediate acetylcholine- specific MG. Figure 12B schematizes the varied IgG3, hinge- specific structural variations identified in the total cohort, showing the various deletion patterns observed including the 4 hinge-containing major (“M”) allele, the 3 hinge-containing “small” (“S”) allele, and the novel one hinge-containing “E” allele. Figure 12C quantifies the distribution of these allelic variants among the MG individuals (n=48) compared to age- and sex-matched healthy donors (n=48). More structural variants, including hinge deletions and IgG3 gene duplications, leading to the expression of three alleles, rather than two, were observed in MG subjects compared to healthy donors.
DETAILED DESCRIPTION
The following description of the disclosure is provided as an enabling teaching of the disclosure in its best, currently known embodiment(s). To this end, those skilled in the relevant art will recognize and appreciate that many changes can be made to the various embodiments of the invention described herein, while still obtaining the beneficial results of the present disclosure. It will also be apparent that some of the desired benefits of the present disclosure can be obtained by selecting some of the features of the present disclosure without utilizing other features. Accordingly, those who work in the art will recognize that many modifications and adaptations to the present disclosure are possible and can even be desirable in certain
circumstances and are a part of the present disclosure. Thus, the following description is provided as illustrative of the principles of the present disclosure and not in limitation thereof.
Reference will now be made in detail to the embodiments of the invention, examples of which are illustrated in the drawings and the examples. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein.
Terminology
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood to one of ordinary skill in the art to which this disclosure belongs. The term “comprising” and variations thereof as used herein is used synonymously with the term “including” and variations thereof and are open, non-limiting terms. Although the terms “comprising” and “including” have been used herein to describe various embodiments, the terms “consisting essentially of’ and “consisting of’ can be used in place of “comprising” and “including” to provide for more specific embodiments and are also disclosed.
In this specification and in the claims which follow, reference will be made to a number of terms which shall be defined to have the following meanings:
As used in this disclosure and in the appended claims, the singular forms “a”, “an”, “the”, include plural referents unless the context clearly dictates otherwise.
Ranges can be expressed herein as from “about” one particular value, and/or to “about” another particular value. When such a range is expressed, another embodiment includes from the one particular value and/or to the other particular value. Similarly, when values are expressed as approximations, by use of the antecedent “about,” it will be understood that the particular value forms another embodiment. It will be further understood that the endpoints of each of the ranges are significant both in relation to the other endpoint, and independently of the other endpoint. It is also understood that there are a number of values disclosed herein, and that each value is also herein disclosed as “about” that particular value in addition to the value itself. For example, if the value “10” is disclosed, then “about 10” is also disclosed. It is also understood that when a value is disclosed that “less than or equal to” the value, “greater than or equal to the value” and possible ranges between values are also disclosed, as appropriately understood by the skilled artisan. For example, if the value “10” is disclosed the “less than or equal to 10”as well as “greater than or equal to 10” is also disclosed. It is also understood that throughout the application, data is provided in a number of different formats, and that this data represents endpoints and starting points, and ranges for any combination of the data points. For
example, if a particular data point “10” and a particular data point 15 are disclosed, it is understood that greater than, greater than or equal to, less than, less than or equal to, and equal to 10 and 15 are considered disclosed as well as between 10 and 15. It is also understood that each unit between two particular units are also disclosed. For example, if 10 and 15 are disclosed, then 11, 12, 13, and 14 are also disclosed.
The following definitions are provided for the full understanding of terms used in this specification.
The terms "about" and "approximately" are defined as being “close to” as understood by one of ordinary skill in the art. In one non-limiting embodiment the terms are defined to be within 10%. In another non-limiting embodiment, the terms are defined to be within 5%. In still another non-limiting embodiment, the terms are defined to be within 1%.
As used herein, the terms "may," "optionally," and "may optionally" are used interchangeably and are meant to include cases in which the condition occurs as well as cases in which the condition does not occur. Thus, for example, the statement that a formulation "may include an excipient" is meant to include cases in which the formulation includes an excipient as well as cases in which the formulation does not include an excipient.
The term “comprising”, and variations thereof as used herein is used synonymously with the term “including” and variations thereof and are open, non-limiting terms. Although the terms “comprising” and “including” have been used herein to describe various embodiments, the terms “consisting essentially of’ and “consisting of’ can be used in place of “comprising” and “including” to provide for more specific embodiments and are also disclosed.
An "increase" can refer to any change that results in a greater amount of a symptom, disease, composition, condition, or activity. An increase can be any individual, median, or average increase in a condition, symptom, activity, composition in a statistically significant amount. Thus, the increase can be a 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, or 100% increase so long as the increase is statistically significant.
A "decrease" can refer to any change that results in a smaller amount of a symptom, disease, composition, condition, or activity. A substance is also understood to decrease the genetic output of a gene when the genetic output of the gene product with the substance is less relative to the output of the gene product without the substance. Also, for example, a decrease can be a change in the symptoms of a disorder such that the symptoms are less than previously observed. A decrease can be any individual, median, or average decrease in a condition, symptom, activity, composition in a statistically significant amount. Thus, the decrease can be
a 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, or 100% decrease so long as the decrease is statistically significant.
By “prevent” or other forms of the word, such as “preventing” or “prevention,” is meant to stop a particular event or characteristic, to stabilize or delay the development or progression of a particular event or characteristic, or to minimize the chances that a particular event or characteristic will occur. Prevent does not require comparison to a control as it is typically more absolute than, for example, reduce. As used herein, something could be reduced but not prevented, but something that is reduced could also be prevented. Likewise, something could be prevented but not reduced, but something that is prevented could also be reduced. It is understood that where reduce or prevent are used, unless specifically indicated otherwise, the use of the other word is also expressly disclosed.
The term “subject” refers to any individual who is the target of administration or treatment. The subject can be a vertebrate, for example, a mammal. In one aspect, the subject can be human, non-human primate, bovine, equine, porcine, canine, or feline. The subject can also be a guinea pig, rat, hamster, rabbit, mouse, or mole. Thus, the subject can be a human or veterinary patient. The term “patient” refers to a subject under the treatment of a clinician, e.g., physician.
The term “treatment” refers to the medical management of a patient with the intent to cure, ameliorate, stabilize, or prevent a disease, pathological condition, or disorder. This term includes active treatment, that is, treatment directed specifically toward the improvement of a disease, pathological condition, or disorder, and also includes causal treatment, that is, treatment directed toward removal of the cause of the associated disease, pathological condition, or disorder. In addition, this term includes palliative treatment, that is, treatment designed for the relief of symptoms rather than the curing of the disease, pathological condition, or disorder; preventative treatment, that is, treatment directed to minimizing or partially or completely inhibiting the development of the associated disease, pathological condition, or disorder; and supportive treatment, that is, treatment employed to supplement another specific therapy directed toward the improvement of the associated disease, pathological condition, or disorder.
As used herein, “diagnose”, “diagnosed”, “diagnosing”, and any grammatical variations thereof as used herein, refers to the act of process of identifying the nature of an illness, disease, disorder, or condition in a subject by examination or monitoring of symptoms.
As used herein, “whole blood” refers to the composition of blood as it flows through the circulatory system, to include the red blood cells, white blood cells, and platelets
suspended in plasma.
The term “administer,” “administering”, or derivatives thereof refer to delivering a composition, substance, inhibitor, or medication to a subject or object by one or more the following routes: oral, topical, intravenous, subcutaneous, transcutaneous, transdermal, intramuscular, intra-joint, parenteral, intra-arteriole, intradermal, intraventricular, intracranial, intraperitoneal, intralesional, intranasal, rectal, vaginal, by inhalation or via an implanted reservoir. The term “parenteral” includes subcutaneous, intravenous, intramuscular, intraarticular, intra-synovial, intrastemal, intrathecal, intrahepatic, intralesional, and intracranial injections or infusion techniques.
Antibodies (Abs) and immunoglobulins (Igs) are glycoproteins having the same structural characteristics. The term "antibody" is used in the broadest sense, and specifically covers monoclonal antibodies (including full length monoclonal antibodies), polyclonal antibodies, and multispecific antibodies (e.g., bispecific antibodies). While antibodies exhibit binding specificity to a specific target, immunoglobulins include both antibodies and other antibody-like molecules which lack target specificity. Native antibodies and immunoglobulins are usually heterotetrameric glycoproteins of about 150,000 daltons, composed of two identical light (L) chains and two identical heavy (H) chains. Each heavy chain has at one end a variable domain (VH) followed by a number of constant domains. Each light chain has a variable domain at one end (VL) and a constant domain at its other end.
The terms “immunoglobulin fragment” or "antibody fragment" refer to a portion of a full-length antibody or immunoglobulin, generally the target binding or variable region. Examples of antibody or immunoglobulin fragments include Fab, Fab', F(ab')2 and Fv fragments. The phrase "functional fragment or analog" of an antibody or immunoglobulin is a compound having qualitative biological activity in common with a full-length antibody or immunoglobulin. For example, a functional fragment or analog of an anti-IgE is one which can bind to an IgE immunoglobulin in such a manner so as to prevent or substantially reduce the ability of such molecule from having the ability to bind to the high affinity receptor, FcsRI. As used herein, "functional fragment" with respect to antibodies, refers to Fv, F(ab) and F(ab')2 fragments. An "Fv" fragment is the minimum antibody fragment which contains a complete target recognition and binding site. This region consists of a dimer of one heavy and one light chain variable domain in a tight, non-covalent association (VH-VL dimer). It is in this configuration that the three CDRs of each variable domain interact to define a target binding site on the surface of the VH-VL dimer. Collectively, the six CDRs confer target binding specificity to the antibody. However, even a single variable domain (or half of an Fv
comprising only three CDRs specific for a target) has the ability to recognize and bind the target, although at a lower affinity than the entire binding site. "Single-chain Fv" or "sFv" antibody fragments comprise the VH and VL domains of an antibody, wherein these domains are present in a single polypeptide chain. Generally, the Fv polypeptide further comprises a polypeptide linker between the VH and VL domains which enables the sFv to form the desired structure for target binding.
The term “monoclonal antibody” as used herein refers to an antibody obtained from a substantially homogeneous population of antibodies, i.e., the individual antibodies within the population are identical except for possible naturally occurring mutations that may be present in a small subset of the antibody molecules.
The terms “immunotherapy” and “immunotherapeutic” refers to the treatment of disease by activating or suppressing the immune system. In cancer treatment, the most effective immunotherapies are cell-based immunotherapies that utilize lymphocytes, macrophages, dendritic cells, natural killer cells, cytotoxic T lymphocytes, etc. to defend the body against cancer by targeting abnormal antigens expressed on the surface of tumor cells.
The terms “treat,” “treating,” and grammatical variations thereof as used herein, include partially or completely delaying, alleviating, mitigating, or reducing the intensity of one or more attendant symptoms of a disorder or condition and/or alleviating, mitigating, or impeding one or more causes of a disorder or condition. Treatments according to the disclosure may be applied preventively, prophylactically, palliatively, or remedially . Treatments are administered to a subject prior to onset (e.g., before obvious signs of disease), during early onset (e.g., upon initial signs and symptoms of disease), or after an established development of disease.
A “nucleic acid” is a chemical compound that serves as the primary informationcarrying molecules in cells and make up the cellular genetic material. Nucleic acids comprise nucleotides, which are the monomers made of a 5-carbon sugar (usually ribose or deoxyribose), a phosphate group, and a nitrogenous base. A nucleic acid can also be a deoxyribonucleic acid (DNA) or a ribonucleic acid (RNA). It should be noted that the preferred nucleic acid used herein comprises an RNA. A “nucleotide” is a compound consisting of a nucleoside, which consists of a nitrogenous base and a 5-carbon sugar, linked to a phosphate group forming the basic structural unit of nucleic acids, such as DNA or RNA. The four types of nucleotides are adenine (A), cytosine (C), guanine (G), and thymine (T), each of which are bound together by a phosphodiester bond to form a nucleic acid molecule.
A “full length” polynucleotide sequence is one containing at least a translation initiation codon (e.g., methionine) followed by an open reading frame and a translation termination
codon. A “full length” polynucleotide sequence encodes a “full length” polypeptide sequence.
Methods of Sequencing
Disclosed herein is a novel near full-length long read AIRR-seq (FLAIRR-Seq) method that utilizes targeted amplification by 5’ rapid amplification of cDNA ends (RACE), combined with single molecule, real-time (SMRT) sequencing to generate highly accurate immunoglobulin (Ig) heavy and light chain transcripts across all isotypes and sub-isotypes.
In one aspect, disclosed herein is a method of sequencing a ribonucleic acid (RNA) encoding an immunoglobulin (Ig) or an antibody (Ab), the method comprising targeting and amplifying the RNA using a 5 ’-Rapid Amplification of cDNA Ends (RACE) method, wherein the RACE method converts the RNA into a complementary deoxyribonucleic acid (cDNA), subsequently targeting an Ig heavy chain cDNA transcript and an Ig light chain cDNA transcript through a primer-directed specific polymerase chain reaction (PCR), wherein the Ig heavy chain cDNA transcripts and the Ig light chain cDNA transcripts are targeted for selective amplification, integrating the Ig heavy chain cDNA transcripts and the Ig light chain cDNA transcripts into an array of eight (8) transcripts to generate a nucleic acid comprising at least 5600 base pairs (bps), analyzing the Ig heavy chain cDNA transcripts and the Ig light chain cDNA transcripts using a single molecule, real-time sequencing (SMRT) method, and detecting at least 700 base pairs (bps) within a variable region and a constant region of the Ig.
In one aspect, disclosed herein is a method of sequencing a ribonucleic acid (RNA) encoding an immunoglobulin (Ig), the method comprising targeting and amplifying the RNA using a 5 ’-Rapid Amplification of cDNA Ends (RACE) method, wherein the RACE method converts the RNA into a complementary deoxyribonucleic acid (cDNA), subsequently targeting an Ig heavy chain cDNA transcript and an Ig light chain cDNA transcript through a primer-directed specific polymerase chain reaction (PCR), wherein the cDNA transcripts are targeted for amplification, analyzing the cDNA transcripts using a single molecule, real-time sequencing (SMRT) method, and detecting at least 700 base pairs (bps) within a variable region and a constant region of the Ig.
In one aspect, disclosed herein is a method of sequencing a ribonucleic acid (RNA) encoding an immunoglobulin (Ig), the method comprising targeting and amplifying the RNA using a 5 ’-Rapid Amplification of cDNA Ends (RACE) method, wherein the RACE method converts the RNA into a complementary deoxyribonucleic acid (cDNA), subsequently targeting an Ig heavy chain cDNA transcript and/or an Ig light chain cDNA transcript through a primer-directed specific polymerase chain reaction (PCR), wherein the cDNA transcripts are
targeted for selective amplification, cDNA transcripts are concatenated and then analyzed using a single molecule, real-time sequencing (SMRT) method.
In some embodiments, the method detects about 700 bps to about 900 bps of the variable region and the constant region, or fragments thereof, of the Ig light chain cDNA transcript. In some embodiments, the method detects about 1500 bps to about 2100 bps of the variable region and the constant region, or fragments thereof, of the Ig heavy chain cDNA transcript.
In some embodiments, the method comprises identifying at least 1500, 1600, 1700, 1800, 1900, 2000, 2100, or more base pairs (bps) within the variable region and a constant region of the heavy chain transcripts, and 700, 800, 900, or more bps within the variable region and constant region of light chain transcripts.
In some embodiments, the method detects a portion or nearly complete constant region of the heavy chain transcript. In some embodiments, the method detects a portion or nearly complete constant regions of the light chain transcript. It should be understood that the terms “a portion”, “nearly complete”, or “near full-length” refer to detecting partial or fragments of an Ig. The method can detect a non-limiting percentage ranging from about 50%, 60%, 70%, 75%, 80%, 85%, 90%, 95%, or 99% of an Ig. It should also be understood that the terms “a portion”, “nearly complete”, or “near full length” can be used interchangeably throughout the disclosure.
In some embodiments, the method comprises identifying at least 1500 bps of an IgD, IgG, IgA, or IgE isotypes. In some embodiments, the method comprises identifying at least 2000 bps of an IgM isotype. In some embodiments, the method comprises identifying at least 700 bps of an IgL or IgK light chain isotype.
In some embodiments, the method identifies a gene or an allele encoding one or more segments of the variable region comprising a variability (V) peptide, a diversity (D) peptide, a joining (J) peptide, or combinations thereof. In some embodiments, the method can be used to directly identify the V, D, and J genes and alleles encoding the variable regions of heavy and light chain transcripts.
In some embodiments, the method identifies a gene or an allele encoding one or more segments of the constant region of a heavy chain (CH) or light chain (CL/CK) comprising CHI, CH2, CH3, CH4, CL1, CK1, or combinations thereof. In some embodiments, the method can be used to directly identify the constant genes and alleles encoding the constant regions of heavy and light chain transcripts.
In some embodiments, the method identifies a gene or allele encoding an Ig isotype or an Ig subisotype. In some embodiments, the method identifies an expansion of the Ig isotype or Ig subisotype compared to current knowledge. In some embodiments, the method allows for resolution of the isotype and subisotype of the expressed immunoglobulin heavy and light chain transcripts, including detailing relative frequency of specific immunoglobulin isotypes or immunoglobulin subisotypes in a total antibody repertoire.
In some embodiments, the method identifies a class switch recombination (CSR) event of the Ig subisotype. In some embodiments, the Ig isotype comprises an IgD, IgG, IgA, IgE, or an IgM isotype. In some embodiments, the IgG comprises an IgGl, IgG2, IgG3, or IgG4 subisotype. In some embodiments, the method identifies a class switch recombination (CSR) event within an immunoglobulin isotype or subisotype. CSR events are observed in the data as the same variable region genes transitioning from usage of IgM isotype constant region genes to constant region genes associated with the IgD, IgG, IgA, or IgE isotypes, including their subisotypes (e.g., IgGl, IgG2, IgG3, IgG4, IgAl, or IgA2 subisotypes). In some embodiments, the Ig isotype comprises an IgM, IgD, IgA, IgG or an IgE isotype. In some embodiments, the IgA comprises an IgAl or IgA2 subisotype, and the IgG comprises an IgGl, IgG2, IgG3, or IgG4 subisotype.
In some embodiments, this method can identify select genomic variation, including but not limited to single nucleotide variants and structural variations, associated with known impact on antibody function.
In some embodiments, the RNA is isolated from peripheral blood mononuclear cells (PBMCs), purified B cells, solid tumors, healthy tissue, or whole blood.
In some embodiments, the method develops an Ig profile of a subject. In some embodiments, the method characterizes the expressed immunoglobulin transcript profile of a subject.
In another aspect, disclosed herein is a method of treating or preventing a disease in a subject in need thereof, the method comprising isolating an RNA sample from the subject, sequencing the RNA using the method of any preceding aspect, developing an Ig profile and/or antibody repertoire profile from the RNA, and administering a therapeutic agent to the subject, wherein the Ig profile and/or antibody repertoire profile indicates the disease.
In another aspect, disclosed herein is a method of treating or preventing a disease in a subject in need thereof, the method comprising isolating an RNA sample from the subject, sequencing the RNA using the method of any preceding aspect, developing an Ig profile and/or
antibody repertoire profile from the RNA, and using said profile to predict the responsiveness to vaccination, therapeutic treatment, and/or natural resolution(s) of disease.
It should be understood that multiple RNA samples can be isolated longitudinally over one or more days prior to sequencing and developing the Ig profile. In some embodiments, RNA samples are isolated for 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14 or more days prior to sequencing and developing the Ig profile.
In some embodiments, the disease comprises an infectious disease, an autoimmune disease, a neoproliferative disease, a neurodegenerative disease, a respiratory disease, a congenital disease, a gastrointestinal (GI) disease, a metabolic disease, or a cardiovascular disease.
In some embodiments, the cancer includes, but is not limited to acoustic neuroma, adenocarcinoma, adrenal gland cancer, anal cancer, angiosarcoma (e.g., lymphangiosarcoma, lymphangioendotheliosarcoma, hemangiosarcoma), appendix cancer, benign monoclonal gammopathy, biliary cancer (e.g., cholangiocarcinoma), bladder cancer, breast cancer (e.g., adenocarcinoma of the breast, papillary carcinoma of the breast, mammary cancer, medullary carcinoma of the breast), brain cancer (e.g., meningioma; glioma, e.g., astrocytoma, oligodendroglioma; medulloblastoma), bronchus cancer, carcinoid tumor, cervical cancer (e.g., cervical adenocarcinoma), choriocarcinoma, chordoma, craniopharyngioma, colorectal cancer (e.g., colon cancer, rectal cancer, colorectal adenocarcinoma), epithelial carcinoma, ependymoma, endothelio sarcoma (e.g., Kaposi's sarcoma, multiple idiopathic hemorrhagic sarcoma), endometrial cancer (e.g., uterine cancer, uterine sarcoma), esophageal cancer (e.g., adenocarcinoma of the esophagus, Barrett's adenocarinoma), Ewing's sarcoma, eye cancer (e.g., intraocular melanoma, retinoblastoma), familiar hypereosinophilia, gall bladder cancer, gastric cancer (e.g., stomach adenocarcinoma), gastrointestinal stromal tumor (GIST), head and neck cancer (e.g., head and neck squamous cell carcinoma, oral cancer (e.g., oral squamous cell carcinoma (OSCC), throat cancer (e.g., laryngeal cancer, pharyngeal cancer, nasopharyngeal cancer, oropharyngeal cancer)), hematopoietic cancers (e.g., leukemia such as acute lymphocytic leukemia (ALL) (e.g., B-cell ALL, T-cell ALL), acute myelocytic leukemia (AML) (e.g., B-cell AML, T-cell AML), chronic myelocytic leukemia (CML) (e.g., B-cell CML, T-cell CML), and chronic lymphocytic leukemia (CLL) (e.g., B-cell CLL, T-cell CLL); lymphoma such as Hodgkin lymphoma (HL) (e.g., B-cell HL, T-cell HL) and non-Hodgkin lymphoma (NHL) (e.g., B-cell NHL such as diffuse large cell lymphoma (DLCL) (e.g., diffuse large B-cell lymphoma (DLBCL)), follicular lymphoma, chronic lymphocytic leukemia/small lymphocytic lymphoma (CLL/SLL), mantle cell lymphoma (MCL), marginal zone B-cell
lymphomas (e.g., mucosa-associated lymphoid tissue (MALT) lymphomas, nodal marginal zone B-cell lymphoma, splenic marginal zone B-cell lymphoma), primary mediastinal B-cell lymphoma, Burkitt lymphoma, lymphoplasmacytic lymphoma (i.e., “Waldenstrom's macroglobulinemia”), hairy cell leukemia (HCL), immunoblastic large cell lymphoma, precursor B-lymphoblastic lymphoma and primary central nervous system (CNS) lymphoma; and T-cell NHL such as precursor T-lymphoblastic lymphoma/leukemia, peripheral T-cell lymphoma (PTCL) (e.g., cutaneous T-cell lymphoma (CTCL) (e.g., mycosis fungiodes, Sezary syndrome), angioimmunoblastic T-cell lymphoma, extranodal natural killer T-cell lymphoma, enteropathy type T-cell lymphoma, subcutaneous panniculitis-like T-cell lymphoma, anaplastic large cell lymphoma); a mixture of one or more leukemia/lymphoma as described above; and multiple myeloma (MM)), heavy chain disease (e.g., alpha chain disease, gamma chain disease, mu chain disease), hemangioblastoma, inflammatory myofibroblastic tumors, immunocytic amyloidosis, kidney cancer (e.g., nephroblastoma a.k.a. Wilms' tumor, renal cell carcinoma), liver cancer (e.g., hepatocellular cancer (HCC), malignant hepatoma), lung cancer (e.g., bronchogenic carcinoma, small cell lung cancer (SCLC), non-small cell lung cancer (NSCLC), adenocarcinoma of the lung), leiomyosarcoma (LMS), mastocytosis (e.g., systemic mastocytosis), myelodysplastic syndrome (MDS), mesothelioma, myeloproliferative disorder (MPD) (e.g., polycythemia Vera (PV), essential thrombocytosis (ET), agnogenic myeloid metaplasia (AMM) a.k.a. myelofibrosis (MF), chronic idiopathic myelofibrosis, chronic myelocytic leukemia (CML), chronic neutrophilic leukemia (CNL), hypereosinophilic syndrome (HES)), neuroblastoma, neurofibroma (e.g., neurofibromatosis (NF) type 1 or type 2, schwannomatosis), neuroendocrine cancer (e.g., gastroenteropancreatic neuroendoctrine tumor (GEP-NET), carcinoid tumor), osteosarcoma, ovarian cancer (e.g., cystadenocarcinoma, ovarian embryonal carcinoma, ovarian adenocarcinoma), papillary adenocarcinoma, pancreatic cancer (e.g., pancreatic adenocarcinoma, intraductal papillary mucinous neoplasm (IPMN), Islet cell tumors), penile cancer (e.g., Paget's disease of the penis and scrotum), pinealoma, primitive neuroectodermal tumor (PNT), prostate cancer (e.g., prostate adenocarcinoma), rectal cancer, rhabdomyosarcoma, salivary gland cancer, skin cancer (e.g., squamous cell carcinoma (SCC), keratoacanthoma (KA), melanoma, basal cell carcinoma (BCC)), small bowel cancer (e.g., appendix cancer), soft tissue sarcoma (e.g., malignant fibrous histiocytoma (MFH), liposarcoma, malignant peripheral nerve sheath tumor (MPNST), chondrosarcoma, fibrosarcoma, myxosarcoma), sebaceous gland carcinoma, sweat gland carcinoma, synovioma, testicular cancer (e.g., seminoma, testicular embryonal carcinoma), thyroid cancer (e.g.,
papillary carcinoma of the thyroid, papillary thyroid carcinoma (PTC), medullary thyroid cancer), urethral cancer, vaginal cancer and vulvar cancer (e.g., Paget's disease of the vulva).
In some embodiments, the autoimmune disease includes, but it is not limited to Myasthenia Gravis, diabetes mellitus Type 1, Multiple Sclerosis, Rheumatic Heart Fever, and Rheumatic Arthritis.
In some embodiments, the neurodegenerative disease includes, but is not limited to Alzheimer’s disease, ataxia, Huntington’s disease, Parkinson’s disease, amyotrophic lateral sclerosis (ALS), Friedreich ataxia, Lewy body disease, spinal muscular atrophy, Alpers’ disease, Batten disease, Cerebro-oculo-facio- skeletal syndrome, Leigh syndrome, Prion diseases, monomelic amyotrophy, multiple system atrophy, striatonigral degeneration, motor neuron disease, multiple sclerosis (MS), Creutzfeldt-Jakob disease, Parkinsonism, spinocerebellar ataxia, dementia, and other related diseases.
In some embodiments, the infectious disease includes, but is not limited to common cold, influenza ( including, but not limited to human, bovine, avian, porcine, and simian strains of influenza), measles, acquired immune deficiency syndrome/human immunodeficiency virus (AIDS/HIV), anthrax, botulism, cholera, Campylobacter infections, chickenpox, chlamydia infections, cryptosporidosis, dengue fever, diphtheria, hemorrhagic fevers, Escherichia coli (E. coli) infections, ehrlichiosis, gonorrhea, hand-foot-mouth disease, hepatitis A, hepatitis B, hepatitis C, legionellosis, leprosy, leptospirosis, listeriosis, malaria, meningitis, meningococcal disease, mumps, pertussis, polio, pneumococcal disease, paralytic shellfish poisoning, rabies, rocky mountain spotted fever, rubella, salmonella, shigellosis, small pox, syphilis, tetanus, trichinosis (trichinellosis), tuberculosis (TB), typhoid fever, typhus, west nile virus, yellow fever, yersiniosis, and zika.
In some embodiments, the cardiovascular disease includes, but is not limited to coronary artery disease, high/low blood pressure, cardiac arrest/heart failure, congestive heart failure, congenital heart defects/diseases (including, but not limited to atrial septal defects, atrioventricular septal defects, coarctation of the aorta, double-outlet right ventricle, d- transposition of the great arteries, Ebstein anomaly, hypoplastic left heart syndrome, and interrupted aortic arch), arrhythmia, peripheral artery disease, stroke, cerebrovascular disease, renal artery stenosis, aortic aneurysm, cardiomyopathies, hypertensive heart disease, pulmonary heart disease, cardiac dysrhythmias, endocarditis, inflammatory cardiomegaly, myocarditis, eosinophilic myocarditis, valvular heart diseases, rheumatic heart diseases, and other related cardiovascular diseases.
In some embodiments, the respiratory disease includes, but is not limited to asthma, chronic obstructive pulmonary disease (COPD), pulmonary fibrosis, pneumonia, bronchitis (chronic or acute bronchitis), emphysema, cystic fibrosis/bronchiectasis, pleural effusion, acute chest syndrome, acute respiratory distress syndrome, asbestosis, aspergilosis, severe acute respiratory syndrome (including, but not limited to SARS-CoV-1 and SARS-CoV-2), respiratory syncytial virus (RSV), middle eastern respiratory syndrome (MERS), mesothelioma, pneumothorax, pulmonary arterial hypertension, pulmonary hypertension, pulmonary embolism, sarcoidosis, sleep apnea, and other respiratory diseases.
In some embodiments, the congenital disease includes, but is not limited to albinism, amniotic band syndrome, anencephaly, Angelman syndrome, Barth syndrome, chromosomal abnormalities (including, but not limited to abnormalities to chromosome 9, 10, 16, 18, 20, 21, 22, X chromosome, and Y chromosome), cleft lip/palate, club foot, congenital adrenal hyperplasia, congenital hyperinsulinism, congenital sucrase-isomaltase deficiency (CSID), cystic fibrosis, De Lange syndrome, fetal alcohol syndrome, first arch syndrome, gestational diabetes, Haemophilia, heterochromia, Jacobsen syndrome, Katz syndrome, Klinefelter syndrome, Kabuki syndrome, Kyphosis, Larsen syndrome, Laurence-Moon syndrome, macrocephaly, Marfan syndrome, microcephaly, Nager’s syndrome, neonatal jaundice, neurofibromatosis, Noonan syndrome, Pallister-Killian syndrome, Pierre Robin syndrome, Poland syndrome, Prader-Willi syndrome, Rett syndrome, sickle cell disease, Smith-Lemli- Optiz syndrome, spina bifida, congenital syphilis, teratoma, Treacher Collins syndrome, Turner syndrome, Umbilical hernia, Usher syndrome, Waardenburg syndrome, Werner syndrome, Wolf-Hirschhorn syndrome, Wolff-Parkinson-White syndrome, and other congenital diseases or disorders.
In some embodiments, the gastrointestinal disease includes, but is not limited to heartburn, irritable bowel syndrome, lactose intolerance, gallstones, cholecystitis, cholangitis, anal fissure, hemorrhoids, proctitis, colon polyps, infective colitis, ulcerative colitis, ischemic colitis, Crohn’s disease, radiation colitis, celiac disease, diarrhea (chronic or acute), constipation (chronic or acute), diverticulosis, diverticulitis, acid reflux (gastroesophageal reflux (GER) or gastroesophageal reflux disease (GERD)), Hirschsprung disease, abdominal adhesions, achalasia, acute hepatic porphyria (AHP), anal fistulas, bowel incontinence, centrally mediated abdominal pain syndrome (CAPS), clostridioides difficile infection, cyclic vomiting syndrome (CVS), dyspepsia, eosinophilic gastroenteritis, globus, inflammatory bowel disease, malabsorption, scleroderma, volvulus, and other gastrointestinal diseases.
In some embodiments, the metabolic disease includes, but is not limited to diabetes mellitus Type I, diabetes mellitus Type II, familial hypercholesterolemia, Gaucher disease, Hunter syndrome, Krabbe syndrome, metachromatic leukodystrophy, Niemann-Pick syndrome, phenylketonuria (PKU), Tay-Sachs disease, Wilson’s disease, hemachromatosis, mitochondrial disorders or diseases (including, but not limited to Alpers Disease; Barth syndrome; beta. -oxidation defects :camitine-acyl-carnitine deficiency; carnitine deficiency; coenzyme Q10 deficiency; Complex I deficiency; Complex II deficiency; Complex III deficiency; Complex IV deficiency: Complex V deficiency; cytochrome c oxidase (COX) deficiency, LHON Leber Hereditary Optic Neuropathy; MM Mitochondrial Myopathy: LIMM Lethal Infantile Mitochondrial Myopathy; MMC Maternal Myopathy and Cardiomyopathy; NARP Neurogenic muscle weakness, Ataxia, and Retinitis Pigmentosa; Leigh Disease: FICP — Fatal Infantile Cardiomyopathy Plus, a MELAS-associated cardiomyopathy: MELAS Mitochondrial Encephalomyopathy with Lactic Acidosis and Strokelike episodes; LDYT Leber's hereditary optic neuropathy and Dystonia; MERRF Myoclonic Epilepsy and Ragged Red Muscle Fibers; MHCM Maternally inherited Hypertrophic CardioMyopathy; CPEO Chronic Progressive External Opthalmoplegia; KSS Kearns Sayre Syndrome; DM Diabetes Mellitus; DMDF Diabetes Mellitus+DeaFness; CIPO Chronic Intestinal Pseudoobstruction with myopathy and Opthalmoplegia; DEAF Maternally inherited DEAFness or aminoglycoside-induced DEAFness; PEM Progressive encephalopathy; SNHL SensoriNeural Hearing Loss; Encephalomyopathy; Mitochondrial cytopathy: Dilated Cardiomyopathy: GER Gastrointestinal Reflux: DEMCHO Dementia and Chorea; AMDF Ataxia, Myoclonus; Exercise Intolerance: ESOC Epilepsy, Strokes, Optic atrophy, & Cognitive decline; FBSN Familial Bilateral Striatal Necrosis: FSGS Focal Segmental Glomerulosclerosis: LIMM Lethal Infantile Mitochondrial Myopathy; MDM Myopathy and Diabetes Mellitus: MEPR Myoclonic Epilepsy and Psychomotor Regression; MERME MERRF/MELAS overlap disease; MHCM Maternally Inherited Hypertrophic CardioMyopathy; MICM Maternally Inherited Cardiomyopathy; MILS Maternally Inherited Leigh Syndrome; Mitochondrial Encephalocardiomyopathy; Multisystem Mitochondrial Disorder (myopathy, encephalopathy, blindness, hearing loss, peripheral neuropathy); NAION Nonarteritic Anterior Ischemic Optic Neuropathy; NIDDM Non-Insulin Dependent Diabetes Mellitus; PEM Progressive Encephalopathy; PME Progressive Myoclonus Epilepsy; RTT Rett Syndrome: SIDS Sudden Infant Death Syndrome: MIDD Maternally Inherited Diabetes and Deafness; and MODY Maturity-Onset Diabetes of the Young, and MNGIE), and other metabolic diseases.
The present disclosure also provides methods that resolves limitations to current approaches of profiling and/or resolving Ig repertoires.
Multiplexed primer-based short read (sr)-AIRR-seq strategies may generate full variable region content but require specific primers to known targets and therefore may miss novel genes and alleles. 5’ RACE sr-AIRR-seq methods capture the full-length VDJ transcript without variable region-targeted multiplexed primer pools, therefore limiting the impact of primer bias and enabling discovery of novel V, D, and J genes and alleles, but do not include resolution of the complete constant region. For heavy chain transcript sequencing, 5’ RACE methods often prime from the first immunoglobulin heavy chain locus constant (IGHC) gene exon (CHI), allowing for determination of heavy chain isotype only. Additional methods have been developed that shift amplification strategies by capturing additional IGHC region sequence to enable sub isotype resolution; however, these methods sacrifice full and contiguous characterization of the V gene. All commonly used sr-AIRR-seq methods are further technically limited by the length restrictions (< 600nt) of short-read next-generation sequencing. As a result, no current sr-AIRR-seq strategy resolves the complete heavy chain transcript, including all IGHC (e.g., CHI, CH2, CH3 for IgD, IgG, IgA, and IgE, and additionally, CH4 in IgM) exons alongside the recombined V, D, and J genes. These same limitations exist for the light chain transcripts as well, which are encoded by the genes within the immunoglobulin lambda (IGL) and kappa (IGK) loci. A notable consideration is that population-based polymorphisms within the IGH, IGL, and IGK V, D (IGH only), and J loci are far more extensive than previously known; similarly IGHC, IGLC, and IGKC regions have also been shown to contain genomic diversity, although the extent of this diversity has not been fully explored. Although it is understood that the constant region of the light and heavy chains mediates chain pairing and many Ab effector function(s), respectively, there is very limited knowledge as to how genetic variation in this region may impact functional capabilities or posttranslational modification. As such, there is a growing need to understand genomic variation across the complete Ab molecule.
Alternatively, and in contrast to sr-AIRR-seq methods, the Iso-Seq method utilizes long-read sequencing to capture full-length transcripts expressing a poly (A) tail through oligo dT-based priming. This approach can be used to characterize full-length immunoglobulin transcripts; however, it provides lower throughput, sequencing depth, and increased cost compared to all targeted AIRR-seq methods. This is due to the fact that Iso-Seq generates a complete transcriptome per sample without any enrichment of immunoglobulin heavy or light chain transcript sequences, and thus non-immunoglobulin sequences are generated as the
majority of the preparation and data. If the data is only to be used for immunoglobulin profiling, these excess data are then filtered out and discarded (Figure IB).
A number of embodiments of the disclosure have been described. Nevertheless, it will be understood that various modifications may be made without departing from the spirit and scope of the invention. Accordingly, other embodiments are within the scope of the following claims.
By way of non-limiting illustration, examples of certain embodiments of the present disclosure are given below.
EXAMPLES
The following examples are set forth below to illustrate the compositions, devices, methods, and results according to the disclosed subject matter. These examples are not intended to be inclusive of all aspects of the subject matter disclosed herein, but rather to illustrate representative methods and results. These examples are not intended to exclude equivalents and variations of the present invention which are apparent to one skilled in the art.
Example 1: FLAIRR-SEQ of IgG and IgM
Current Adaptive Immune Receptor Repertoire Sequencing using short read sequencing strategies (sr-AIRR-seq) resolve expressed antibody (Ab) transcripts with limited resolution of the constant region. Here we present a novel near full-length long read AIRR-seq (FLAIRR-Seq) method that utilizes targeted amplification by 5’ rapid amplification of cDNA ends (RACE), combined with single molecule, real-time sequencing to generate highly accurate (greater than Q40, or > 99.99% accurate) IG heavy chain transcripts. FLAIRR-seq was benchmarked by comparing IG heavy chain variable (IGHV), diversity (IGHD), and joining (IGHJ) gene usage, complementarity-determining region 3 (CDR3) length, and somatic hypermutation to matched datasets generated with standard 5’ RACE sr-AIRR-seq and full- length isoform sequencing. Together these data demonstrate robust, FLAIRR-seq performance using RNA samples derived from peripheral blood mononuclear cells, purified B cells, and whole blood, which recapitulated results generated by commonly used methods, while additionally resolving novel IG heavy chain constant (IGHC) gene features. FLAIRR-seq data provides, for the first time, simultaneous, single-molecule characterization of IGHV, IGHD, IGHJ, and IGHC region genes and alleles, allele-resolved subisotype definition, and high- resolution identification of class-switch recombination within a clonal lineage. In conjunction
with genomic sequencing and genotyping of IGHC genes, FLAIRR-seq of the IgM and IgG repertoires from 10 individuals resulted in the identification of 32 unique IGHC alleles, 28 (87%) of which were previously uncharacterized. Together, these data demonstrate the capabilities of FLAIRR-seq to characterize IGHV, IGHD, IGHJ, and IGHC gene diversity for the most comprehensive view of bulk expressed Ab repertoires to date.
Antibodies (Abs) or immunoglobulins (IGs) are the primary effectors of humoral immunity and are found as both membrane-bound receptors on B cells and circulating, secreted proteins. Both membrane -bound B cell receptors (BCRs) and secreted Abs act to recognize and bind antigen. All Abs and BCRs are composed of two identical heavy and light chains that are post-translationally associated. The heavy chain is comprised of two distinct domains: (i) the variable domain (Fab), which allows for antigen binding, and (ii) the constant domain (Fc), which modulates downstream effector functions. The light chain also includes a variable domain that, once post-translationally associated with the heavy chain variable domain, interacts with cognate antigen. In humans, Abs are grouped into discrete isotypes and subisotypes (i.e., IgM, IgD, IgGl, IgG2, IgG3, IgG4, IgAl, IgA2, and IgE), based on the expression of specific constant (C) genes within the IG heavy chain locus (IGH). Each isotype and subisotype has unique effector properties that together represent the wide diversity of Ab- mediated functions, including binding of Fc receptors (FCR), activation of complement, opsonization, antibody-dependent cellular cytotoxicity (ADCC) and antibody-dependent cellular phagocytosis (ADCP).
To facilitate the development of diverse Ab repertoires capable of recognizing the wide range of pathogens humans encounter, the IG genomic loci are highly polymorphic and harbor diverse and complex sets of genes that recombine in each B cell to encode up to 1013 unique specificities. B cells create this expansive catalog of specificities through somatic recombination of the variable (V), diversity (D), and joining (J) genes in IGH, and V and J genes from the corresponding light chain loci, lambda (IGL) and kappa (IGK). During VDJ recombination in IGH, a single D and J gene are first recombined, while the intervening and unselected D and J gene sequences are removed by RAG recombinase. After D and J genes are joined, further recombination of a specific V gene to the DJ gene cassette completes the formation of the full VDJ rearrangement. Following transcription of the recombined VDJ, a single constant region gene is spliced together with the VDJ cassette to generate the completed heavy chain transcript. Recombination at IGL and IGK occurs similarly, recombining V and J genes only, followed by splicing to light chain constant region genes. Heavy and light chain transcripts are independently translated and linked via covalent cysteine bonds resulting in a
fully functional protein prior to B cell cell-surface expression or secretion (Figure 1A). Naive B cells, which develop in the bone marrow from hematopoietic stem cell progenitors, have undergone VDJ recombination but have not yet encountered antigen, and solely express IgM and IgD. These naive B cells then migrate to B cell zones in secondary lymphoid tissues where they encounter antigen, driving further maturation and class switch recombination (CSR) to enable the most effective humoral responses. CSR mediates the excision of IGHC genes at the DNA level, which leads to the utilization and linkage of different IGHC genes to the same VDJ, ultimately resulting in class switching to alternate isotypes (IgG, IgA, or IgE) and their respective subisotypes (IgGl, IgG2, IgG3, IgG4, IgAl, or IgA2).
The IgG isotype class is represented by four subisotypes: IgGl, IgG2, IgG3, and IgG4. Each IgG subisotype circulates at varied frequencies and facilitates unique immune functions. For example, IgGl is typically the most abundant circulating IgG and mediates proinflammatory responses; IgG2 targets bacterial polysaccharides, providing protection from bacterial pathogens; IgG3 confers protection against intracellular bacterial infections and enables clearing of parasites; and IgG4 contains exclusive structural and functional characteristics often resulting in anti-inflammatory and tolerance-inducing effects. In the case of IgA is most often associated with conferring mucosal immunity, subisotypes IgAl and IgA2 differ only in the extent of hinge regions found in the constant regions. Serum IgAl is thought to be necessary to regulate immune homeostasis, while IgA2 modulates inflammatory effects. Multiple studies have identified Ab-mediated subisotype-specific pathogenicity in the context of autoimmune diseases and cancer highlighting the need for further investigation of subisotype-specific repertoires.
Here, FLAIRR-seq, a targeted 5’ RACE-based amplification of near full-length IgG and IgM heavy chain transcripts, paired with single molecule real time (SMRT) sequencing, is presented resulting in highly accurate (mean read accuracy -Q60, 99.9999% accurate), near full-length Ab sequences from RNA derived from whole blood, isolated PBMC, and purified B cells. When analyzed with the Immcantation sr-AIRR-seq tool suite, FLAIRR-seq performs comparably to standard 5 ’RACE sr-AIRR-seq methods and single-molecule isoform sequencing (Iso-Seq) strategies for characterizing the expressed Ab variable region-based repertoire. The features of FLAIRR-seq data, including phased identification of IGHV, IGHD, IGHJ, and IGHC genes, facilitating the profiling of subisotype- and IGHC allele- specific repertoires and CSR characterization are further highlighted.
Materials and Methods
Sample collections
Experiments were conducted using healthy donor peripheral blood mononuclear cells (PBMC), purified B cells from healthy donors, or whole blood collected from hospitalized COVID-19 patients (Supplementary Table 1). Commercially available healthy donor PBMC (STEMCELL Technologies) and a subset of matched purified B cells were utilized to generate sr-AIRR-seq and FLAIRR-seq validation datasets. Full-length isoform sequencing (Iso-Seq) was performed using B cells isolated from the PBMC of a healthy, consented 57-year-old male donor at the University of Louisville (UofL) School of Medicine. The UofL Institutional Review Board approved sample collection (IRB 14.0661). For COVID- 19 affected patient samples (n=5), whole blood was collected from the Mount Sinai CO VID- 19 biobank cohort of hospitalized COVID-19 patients, approved by the Institutional Review Board at the Icahn School of Medicine at Mount Sinai.
PBMC isolation and B cell purification
Frozen healthy donor PBMCs were purchased, thawed, and aliquoted for use in downstream experiments (STEMCELL Technologies). For Iso-Seq analyses, 175mL of venous blood was collected in a final concentration of 6mM K3EDTA using standard phlebotomy. PBMCs were isolated using Sepmate PBMC Isolation Tubes (STEMCELL Technologies), with an additional granulocyte depletion step using the RosetteSep Human Granulocyte Depletion Cocktail (STEMCELL Technologies) as directed by the manufacturer. B cells from the freshly collected and frozen healthy donor PBMC were isolated using the EasySep Human Pan-B Cell Enrichment Kit, as described by the manufacturer (STEMCELL Technologies). Briefly, B cells, including plasma cells, were isolated by negative selection using coated magnetic particles. First, the B cell enrichment cocktail was added to the sample and mixed for a 5-minute incubation at room temperature, followed by addition of magnetic particles and further incubation for 5 minutes on the benchtop. The sample tube was then placed on an EasySep magnet (STEMCELL Technologies), and purified B cells were carefully eluted from the magnetic particles and immediately used for RNA extraction.
Genomic DNA and RNA extraction
For the healthy frozen PBMC and matched purified B cells, genomic DNA (gDNA) and RNA were co-extracted using the AllPrep DNA/RNA Mini Kit (Qiagen) according to the manufacturer’s instructions. For the freshly processed UofL healthy donor PBMC, purified Pan-B cells were lysed in Buffer RLT Plus, and RNA was extracted with the RNeasy Plus Mini Kit (Qiagen) per the manufacturer’s protocol; no gDNA was collected from this sample. COVID-19 whole blood-derived RNA was extracted from samples collected in Tempus Blood
RNA tubes using the Tempus Spin RNA Isolation Kit (ThermoFisher) as described by the manufacturer. For all samples, concentrations of RNA and gDNA (when appropriate) were assessed using the Qubit 4.0 fluorometer, with the RNA HS Assay Kit and Qubit DNA HS Assay Kit, respectively (ThermoFisher Scientific). RNA and gDNA integrity were evaluated using the Bioanalyzer RNA Nano Kit and DNA 1200 Kit, respectively (Agilent Technologies). Extracted RNA and gDNA were stored at -80°C and -20°C, respectively, until used.
FLAIRR-seq targeted amplification of heavy chain transcripts
Extracted RNA was thawed on ice and converted to first strand complementary DNA (cDNA) using the SMARTer RACE 573’ Kit (Takara Bio USA), as described by the manufacturer and a custom oligonucleotide that contained the template switch oligo and a unique molecular identifier (5’ TSO-UMI) for template switch during first strand cDNA synthesis. The following reaction conditions were used: (i) a primary master mix was prepared with 4.0 pL 5X First-Strand Buffer, 0.5 pL DTT (100 mM), and 1.0 pL dNTPs (20 mM) per reaction and set aside until needed; (ii) in a separate 0.2-mL PCR tube, 10 pL of sample RNA and 1 pL 5 ’-CDS Primer A were combined and incubated in a thermal cycler at 72°C (lid temperature: 105°C) for 3 minutes, followed by cooling to 42°C for 2 minutes; (iii) after cooling, tubes were spun briefly to collect contents and IpL (12pM) of the 5’ TSO-UMI was added to the RNA; (iv) 0.5 pL of RNase inhibitor and 2.0 pL of SMARTScribe Reverse Transcriptase were added to the primary master mix tube per sample and 8 pL of the combined master mix was then added to each RNA-containing sample tube. First-strand cDNA synthesis reactions were incubated in a thermal cycler at 42°C (lid temperature: 105°C) for 90 mins, followed by heat inactivation at 70°C for 10 minutes. Total first strand cDNA generated in this reaction was diluted 1:2 with Tricine-EDTA Buffer before moving onto targeted heavy chain transcript amplification.
To specifically amplify heavy chain transcripts from total first-strand cDNA, targeted IgG and IgM transcript amplification reactions were performed using barcoded IgG (3’ primer binding in the constant region exon 3, CH3) or IgM (3’ primer binding in the constant region exon 4, CH4)-specific primers and the following conditions: (i) 5 pL of diluted first-strand cDNA was added to 0.2-mL PCR tubes; (ii) a master mix was generated using 10 pL 5X PrimeSTAR GXL Buffer, 4 pL GXL dNTP mixture, 28 pL PCR-grade water, 1 pL PrimeSTAR GXL Polymerase and 1 pL lOx UPM form the SMARTer RACE 573’ Kit per reaction; (iii) 44 pL master mix was added to each reaction tube followed by 1 pL of the appropriate barcoded IgG (CH3) or IgM (CH4) primer. Different temperatures were used for annealing of IgG- (63.5°C) and IgM-specific primers (60°C) to account for primer specific melting temperatures
and to enhance targeted amplification specificity. Amplification conditions for full-length IgG were: 1 minute at 95°C, followed by 35 amplification cycles of 95°C for 30 sec, 63.5°C for 20 sec, and 2 minutes at 68°C, followed by a final extension for 3 minutes at 68°C and hold at 4°C. Amplification conditions for full-length IgM were: 1 minute at 95°C, followed by 35 amplification cycles of 95°C for 30 sec, 60°C for 20 sec., and 2 minutes at 68°C, followed by a final extension for 3 minutes at 68°C and hold at 4°C. Final amplification reactions were purified using a l.lx (vokvol) cleanup with ProNex magnetic beads (Promega) . Successfully amplified products were quantified with Qubit dsDNA HS assay (ThermoFisher Scientific) and length was evaluated with the Fragment Analyzer Genomic DNA HS assay (Agilent). Samples were equimolar pooled in 8-plexes for SMRTbell library preparation and sequencing.
FLAIRR-seq SMRTbell library preparation and sequencing
Eight-plex pools of targeted IgG or IgM amplicons were prepared into SMRTbell sequencing templates according to the “Procedure and Checklist for Iso-Seq Express Template for Sequel and Sequel II systems” protocol starting at the “DNA Damage Repair” step and using the SMRTbell Express Template Prep Kit 2.0, with some modifications (Pacific Biosciences). Briefly, targeted IgG and IgM amplicons underwent enzymatic DNA damage and end repair, followed by ligation with overhang SMRTbell adapters as specified in the protocol. To increase consistency in SMRTbell loading on the Sequel lie system, we further treated the SMRTbell libraries with a nuclease cocktail to remove unligated amplified products, using the SMRTbell Enzyme Cleanup Kit, as recommended by the manufacturer (Pacific Biosciences). Briefly, after heat-killing the ligase with an incubation at 65°C, samples were treated with a nuclease cocktail at 37°C for 1 hour, and then purified with a 1.1X Pronex cleanup. Final SMRTbell libraries were evaluated for quantity and quality using the Qubit dsDNA HS assay and Fragment Analyzer Genomic DNA assay, respectively. Sequencing of each 8-plex, barcoded sample pool was performed on one SMRTcell 8M using primer v4 and polymerase v2.1 on the Sequel lie system with 30 hr movies. Demultiplexed, high-fidelity circular consensus sequence reads (“HiFi reads”) were generated on the instrument for downstream analyses. sr-AIRR-seq SMARTer Human BCR IgG/IgM sequencing
Matched healthy donor RNA was used to generate targeted IgG and IgM sr-AIRR-seq libraries using the SMARTer Human BCR IgG IgM H/K/L Profiling Kit (Takara Bio USA) according to the manufacturer’s instructions with no modifications. Briefly, for each sample, proprietary IgG and IgM primers were used to amplify heavy chain transcripts following a 5’RACE reaction. sr-AIRR-seq libraries were then quality controlled using the 2100
Bioanalyzer High Sensitivity DNA Assay Kit (Agilent) and the Qubit 3.0 Fluorometer dsDNA High Sensitivity Assay Kit. Sequencing on the MiSeq platform using 300 bp paired-end reads was performed using the 600-cycle MiSeq Reagent Kit v3 (Illumina) according to the manufacturer’s instructions, and FASTQ reads were generated using the associated DRAGEN software package (Illumina).
B cell Iso-Seq
RNA extracted from healthy sorted B cells was used to generate Iso-Seq SMRTbell libraries following the “Procedure & Checklist Iso-Seq Express Template Preparation for the Sequel II System” with minor adaptations compared to the manufacturer’s instructions. Briefly, Iso-Seq libraries were generated using 500 ng high-quality (RIN > 8) RNA as input into oligo- dT primed cDNA synthesis (NEB). Barcoded primers were incorporated into the cDNA during second strand synthesis. Following double-stranded cDNA amplification, transcripts from two samples sourced from purified B cells and NK cells were equimolar pooled. SMRTbells were generated from the pooled cDNA as described above for the FLAIRR-seq amplification products, including the addition of a nuclease digestion step. Quantity and quality of the final Iso-Seq libraries were performed with the Qubit dsDNA High Sensitivity Assay Kit and Agilent Fragment Analyzer Genomic DNA assay, respectively. This 2-plex Iso-Seq pool was sequenced using primer v4 and polymerase v2.1 on the Sequel lie system with a 30-hour movie. HiFi reads were generated on instrument before analyses. Demultiplexing of barcoded samples and generation of full-length non-concatemer (FLNC) predicted transcripts were performed using the Iso-Seq v3 pipeline available through SMRTLink (v.10.2). B -cell-derived FLNC reads were mapped to the human genome using the GMAP reference database and reads derived from chromosome 14 were extracted for downstream IGH transcript characterization via Immcantation, as described below.
Immcantation analyses of IgG and IgM repertoires
Analyses of FLAIRR-seq, sr-AIRR-seq, and Iso-Seq datasets were performed using Immcantation tools. Demultiplexed barcoded HiFi (for SMRT sequencing data) or FASTQ (for sr-AIRR-seq) reads were first processed using the pRESTO tool for quality control, UMI processing, and error profiling. For sr-AIRR-seq, pRESTO analysis data from paired-end reads (“Rl” and “R2”) were trimmed to remove bases with < Q20 read quality and/or <125 bp length using the “FilterSeq trimqual” and “FilterSeq length”, respectively. IgG and IgM CH3 or CH4 primer sequences were identified with an error rate of 0.2, and primers identified were then noted in FASTQ headers using “MaskPrimers align”. Next, 12 basepair (bp) UMIs were located and extracted using “Maskprimers extract”. Sequences found to have the same UMIs
were grouped and aligned using “AlignSets muscle,” with a consensus sequence generated for each UMI using “BuildConsensus”. Mate pairing of sr-AIRR-seq reads was conducted using a reference-guided alignment requiring a minimum of a 5 bp overlap via “AssemblePairs sequential”. After collapsing consensus reads with the same UMI (“conscount”) using “CollapseSeq”,” reads with < 2 supporting sequences were removed from downstream analysis. For pRESTO processing of FLAIRR-seq, single HiFi reads (“Rl”) reads did not require trimming due to > Q20 sequence quality across all bases. 5’TSO-UMI and CH3 or CH4 region primers were identified along with a 22 bp UMI with an error rate of 0.3 using “MaskPrimers align”. Reads were then grouped and aligned using “AlignSets muscle”. Due to the single molecule nature of FLAIRR-seq reads, no mate pairing was required. Consensus reads were then generated as described above, including removal of sequences with < 2 supporting reads. Read counts following each step of data filtration for AIRR-seq and FLAIRR-seq are represented in Supplementary Table 3. pRESTO-filtered reads for both sr-AIRR-seq and ELAIRR-seq data were then input into the Change-0 tool (Table 1). Iso-Seq required no initial processing from pRESTO and was input into Change-0 for IG gene reference alignment along with sr-AIRR-seq and FLAIRR- seq data using “igblastn”, clonal clustering using “DefineClones”, and germline reconstruction and conversion using “CreateGermlines.py” and the GRCh38 chromosome 14 germline reference. Fully processed and annotated data was then converted into a TSV format for use in downstream analyses. The Alakazam Immcantation tool suite was then used to quantify gene usage, calculate CDR3 length, assess somatic hypermutation frequencies (SHM) which were compared by unpaired T-tests with Bonferroni multiple testing correction. Alakazam was also used to analyze clonal diversity. SCOPer clonal assignment by spectral clustering was conducted for COVID-19 patient time course samples. For clonal lineage tree analysis, the Dowser tool was used to examine clonal diversity and CSR over time. After trimming of 3’ primers, FLAIRR-seq data were also processed using the MiXCR generic BCR amplicon pipeline with visualization using the immunarch R package (FIG. 6).
Targeted IG gDNA capture, long-read sequencing, and IGenotyper analyses.
FLAIRR-seq validation samples also underwent IGHC targeted enrichment and long- read sequencing as previously described. Briefly, gDNA was mechanically sheared and size selected to include 5-9kb fragments using the BluePippin (Sage Science). Samples were then end repaired and A-tailed using the standard KAPA library preparation protocol (Roche). Universal priming sequences and barcodes were then ligated onto samples for multiplexing (Pacific Biosciences). Barcoded gDNA libraries were captured using IGH-specific probes
following a SeqCap protocol (Roche). 26 IGH-enriched samples were purified and pooled together for SMRTbell library prep as described above, including the final nuclease digestion step. Pooled SMRTbells were annealed to primer v4, bound to polymerase v2.0, and sequenced on the Sequel lie system with 30h movies. After sequencing, HiFi reads were generated and analyzed by the IGenotyper pipeline. In brief, IGenotyper was used to detect single nucleotide variants and assemble sequences into haplotype- specific assemblies for downstream IGHC gene genotyping. Alleles were then extracted from assemblies using a bed file containing coordinates for each IGHC gene exon. After sequences were extracted, reads were then aligned to the IMGT database (downloaded on 2/21/22) and assigned as an exact match to IMGT, “novel” if there was no match to the IMGT database or “novel, extended” if a match was detected to a partial allele found in IMGT, but the IMGT allele was a substring of the IGenotyper identified allele. This set of alleles was then used as a ground truth dataset.
IGHC gene genotyping with FLAIRR-seq
To genotype IGHC genes and alleles from FLAIRR-seq data, productive reads were filtered by IGHC length (900bp-1100bp) and aligned to the chromosome 14 hg38 reference using minimap2 along with SAMtools to generate sorted and indexed bam files. WhatsHap was used to identify genotype, and phase single nucleotide variants (SNV). Phased SNVs were used to assign each read to a haplotype using MsPAC. Reads from each haplotype and gene were clustered using CD-HIT using a 100% identity clustering threshold parameter, and a single representative read from the largest cluster was aligned to the IGenotyper curated alleles using BLAST to determine the closest matching IGHC gene and allele. The representative read was selected based on 100% identity to all other sequences in that cluster.
Inference of IGHV, IGHD, and IGHJ gene haplotypes from FLAIRR-seq data using IGHC gene anchors
To test the ability of IGHC genes to be used for the inference of IGHV, IGHD, and IGHJ haplotypes from FLAIRR-seq data, one sample that was heterozygous for both IGHM and IGHJ6 (IGHJ6 is standardly used for sr-AIRR-seq haplotype inference) was chosen. For this sample, TigGER was employed to infer novel IGHV alleles, and generate sample-level IGHV genotypes using. Rearranged sequences within the Change-0 table were then reannotated taking into account sample genotype and detected novel alleles. Updated annotations were then used to infer haplotypes using RAbHIT version 0.2.4. Both IGHJ6 and IGHM were used as anchor points for haplotyping, and the resulting haplotypes were compared.
Data availability:
All IGenotyper, FLAIRR-seq, sr-AIRR-seq and Iso-Seq data generated for this study are available in the Sequence Read Archive using the identifier BioProject ID PRNJA922682 www.ncbi.nlm.nih. gov/bioproject/?term=922682.
Results
Gene usage, CDR3, and SHM profiles characterized from FLAIRR-seq data are comparable to AIRR-seq and Iso-Seq
Current methods for commercially available 5’ RACE sr-AIRR-seq utilize targeted amplification of the variable region and, in some cases, a small portion of the first constant region exon (CHI), in conjunction with short-read sequencing to characterize IG repertoires. However, this minimal examination of the IGHC gene sequence is primarily used to define isotypes. No current method defines both the heavy chain variable and constant regions allowing for both subisotype classification and IGHC allele-level resolution. To address these technical limitations, we developed the FLAIRR-Seq method (Figure 1C), a targeted 5’ RACE approach combined with SMRT sequencing to generate highly accurate, near full-length IgG (-1500 bp) and/or IgM (2000 bp) sequences, allowing for direct, simultaneous analysis of the heavy chain variable and constant regions (Figure ID), including gene/allele identification for IGHV, IGHD, IGHJ and IGHC segments, and isotype- and subisotype- specific repertoire profiling.
To evaluate and validate the capabilities of FEAIRR-seq and sr-AIRR-seq analyses were performed on ten healthy donor PBMC samples to compare both library preparation methods. FEAIRR-seq data were filtered from the initial HiFi reads (>Q20) to include only >Q40 reads. The average read quality of these filtered reads was >Q60 (99.9999%), with a pass filter rate ranging from 88%-93% of total reads. sr-AIRR-seq FASTQ bases were trimmed to retain sequences with an average quality of Q20 (99%). These filtered reads were used as input into the Immcantation suite, specifically the pREST-0 and Change-0 tools, for IGHV, IGHD, and IGHJ gene assignment, and repertoire feature analyses, including identification of clones, extent of somatic hypermutation, and evaluation of CDR3 lengths. As shown in Table 1, fewer overall FEAIRR-seq reads were used as input into the Immcantation analyses, after required filtration and read assembly steps (which were not needed for the high-quality single-molecule FEAIRR-seq reads). FEAIRR-seq resulted in comparable or, in many cases, increased number of unique VDJ sequences, clones, and CDR3 sequences identified compared to the matched sr- AIRR-seq-derived samples in both the IgG and IgM repertoires (schematized in Figure 2A). These basic sequencing and initial analysis metrics demonstrated that FEAIRR-seq produced
high-quality variable region data for detailed Ab repertoire analyses and is amenable to analysis using existing sr-AIRR-seq analysis tools. Comparative costs were calculated based on the cost per “actionable read”, defined as the read number per sample and per method after quality filtration and assembly, but prior to cluster consensus. This calculation represented the total unique single molecule or assembled templates captured by either method that passed all necessary quality control irrespective of biologic repertoire diversity or clonality, removing technology/platform biases. To remove the impact of pooling differences, this “per actionable read” price was used to calculate the cost for 15,000 “actionable reads” as a standard sample. sr-AIRR-seq cost $25.50 per sample, whereas the FLAIRR-seq cost was $33.57 per sample. These costs reflect reagents and consumables only, instrumentation and labor are not included.
While optimizing FLAIRR-seq sample preparation, it was examined whether upstream isolation of B cells before FLAIRR-seq molecular preparation would enhance the ability to detect IGHV, IGHD, and IGHJ gene usage. To do this, aliquots of PBMC (n=4) were split into two groups for RNA extraction: (i) RNA derived from bulk PBMC, and (ii) RNA isolated from purified pan B-cells, followed by FLAIRR-seq preparation, SMRT sequencing, and Immcantation analysis of both groups. IGHV, IGHD, and IGHJ gene usage correlations between groups are shown in Figure 2A and demonstrate a significant association (p-values ranging from 0.033 to 4.1e'16) strongly supporting the conclusion that B cell isolation before RNA extraction was not necessary to achieve comparable gene usage metrics. The limited differences that were observed could be explained by template sampling differences between the two experiments. Due to the strong associations observed and the ease of processing PBMC in bulk, RNA derived directly from PBMC aliquots were further used for the remainder of the analyses.
FLAIRR-seq performance was established by comparing its output to the commonly used 5’ RACE sr-AIRR-seq method. 5’ RACE sr-AIRR-seq was chosen as it provides resolution of the complete variable region and a small portion of IGHC, allowing for isotype differentiation. Following preparation of both FLAIRR-seq and sr-AIRR-seq libraries from healthy donor PBMC samples (n=10), multiple repertoire features were compared to benchmark FLAIRR-seq performance. First, IGHV, IGHD, and IGHJ gene usage frequencies were evaluated. Significant correlations between FLAIRR-seq and sr-AIRR-seq datasets in IGHV, IGHD and IGHJ gene usage for both IgM (V genes: r = 0.93-0.97, p=<2.2e'16; D genes: r = 0.98-0.99, p=<2.2e'16; J genes: r = 0.94-1.0, p = 0.0028-0.017) and IgG isotypes (V genes: r = 0.90-0.96, p=<2.2e'16; D genes: r = 0.87-0.99, p=<2.2e'16-6.1e'14;J genes: r = 0.89-1.0, p = 0.0028-0.033) were observed (Figure 2B), indicating that FLAIRR-seq comparably resolves
IGHV, IGHD, and IGHJ gene usage profiles. To note, IGHJ genes showed lower levels of significance (larger p-values) across all comparisons due to the relatively few genes that make up the IGHJ gene family compared to the more diverse IGHV and IGHD families. Next, the performance of both methods in terms of resolving SHM frequencies (Figure 2C) was investigated, and complementarity determining region 3 (CDR3) lengths (Figure 2D), which are often used as measures of evaluating B cell affinity maturation, was also investigated. Although occasional statistically significant differences in the SHM frequency was observed between sr-AIRR-seq and FLAIRR-seq data using the same samples, these differences were not seen across all samples, showing that sample-to-sample variation may drive this observation rather than technology-based discrepancies. It was found that CDR3 lengths were consistently longer in the FLAIRR-seq datasets for both the IgM and IgG isotypes in most donors. The characterization of unusually long CDR3 regions (> 40 nt) in the IgG sequences with FLAIRR-seq is likely due to the higher contiguity and quality afforded by the longer read lengths, which are less likely to be spanned by short-read 2x300 bp paired-end sequencing strategies. Together, these data demonstrate that FLAIRR-seq achieves comparable gene usage profiles, and improved resolution of long CDR3 sequences.
Others have recognized the power of long-read sequencing to resolve B cell repertoires using bulk Iso-Seq methods, allowing for the examination of full-length transcripts from isolated B cells. As mentioned above, the Iso-Seq method captures full-length transcripts expressing a poly(A) tail without bias though a considerable amount of non-repertoire data is discarded. To investigate whether the untargeted transcriptome-wide Iso-Seq method would resolve a qualitatively different repertoire than FLAIRR-seq, which would have indicated FLAIRR-seq-driven primer bias, matched Iso-Seq and FLAIRR-seq was performed on purified B-cell derived RNA. IGHV, IGHD, and IGHJ gene usage frequencies were compared between Iso-Seq and FLAIRR-seq datasets (Spearman’s rank correlation), revealing significant correlations between usage profiles (V genes: r = 0.94, p=2.2e'16; D genes: r = 0.92, p =1.4e'13; J genes: r = 1.0, p = 0.0028; Figure 2E). These data strongly show that FLAIRR-seq has very limited to no primer-driven bias compared to whole transcriptome data. Collectively, this benchmarking dataset confirmed that FLAIRR-seq is comparable other state-of-the-art methods, providing robust characterization of commonly used repertoire metrics, with limited increases in per sample cost.
IGenotyper and FLAIRR-seq provide constant region gene allele identification and allow for haplotyping of variable genes
The novel value added by FLAIRR-seq is improved resolution of IGHC, including estimation of IGHC gene and allele usage, subisotype identification, and phasing of variable and constant regions for comprehensive repertoire analysis. To evaluate the capabilities and accuracy of IGHC gene and allele identification with FLAIRR-seq, a ground truth dataset of IGHC alleles for all 10 samples was first established by targeted sequencing of the germline IGH locus (Figure 3 A) using IGenotyper. IGHG1, IGHG2, IGHG3, IGHG4 and IGHM alleles called by IGenotyper (see Methods) were assigned to one of three categories, schematized in Figure 3B: (i) “exact match” - alleles documented in the IM GT database; (ii) “novel not in IMGT” - alleles not documented in the IMGT database; or (iii) “extended” - alleles that matched partial alleles in the IMGT database (i.e., those only spanning a subset of exons), but were extended by sequences in our dataset. IGenotyper identified a total of 32 unique IGHG1, IGHG2, IGHG3, IGHG4 and IGHM alleles across all individuals, as schematized in Figure 3C. Among these 32 alleles, only 4 were documented in IMGT, the remaining represented novel alleles (n=l 1) or extensions (n=17) of known alleles. In aggregate, a greater number of IGHG4 alleles than for any of the IGHG genes was observed. Among these alleles were 4 sequences represented by suspected duplications of IGHG4. In fact, 3 IGHG4 gene alleles in 4/10 samples was observed, indicating the presence of gene duplications in these donors; in all cases, these alleles were also identified in the FLAIRR-seq data (see below). Given the relatively small size of this proof-of-concept healthy donor cohort, the identification of 28 (87%) novel or extended alleles underscores the extensive polymorphism In this region and reflects the paucity of information regarding this locus in existing immunogenomics databases.
Next, the IGenotyper-derived IGHC gene database was used as the ground-truth for evaluating the capability of FLAIRR-seq to identify and resolve IGHC gene alleles. Based on the analysis workflow for identifying IGHC alleles from FLAIRR-seq data, 19/32 (59%) IGenotyper alleles were resolved at 100% identity; no additional false-positive alleles were identified. Of the alleles that were not unambiguously resolved by the FLAIRR-seq pipeline, 8 had allele defining single nucleotide variants (SNVs) 3’ of the FLAIRR-seq primers. The rate of true-positive allele calls using FLAIRR-seq across all 10 samples ranged from 5% for IGHG1 to 90% for IGHM (Figures 3C and 3D). As a result, the IGHC genotypes inferred by FLAIRR-seq have some limitations, but on the whole allow for much greater resolution of IGHC variation in the expressed repertoire than currently used methods.
Previous studies have demonstrated the use of IGHJ6 heterozygosity to infer haplotypes of V and D genes from sr-AIRR-seq data. However, the frequency of IGHJ6 heterozygotes in the population can vary. Therefore, it was sought to assess the utility of leveraging IGHC polymorphism resolved by FLAIRR-seq for haplotyping IGHV alleles with the publicly available tool RAbHIT. A single donor (“1013”) from the cohort that was heterozygous for both IGHJ6 (*02 and *03) and IGHM (FL_2 and FL_4) was selected. Importantly, each IGHJ6 allele was associated to the respective IGHM allele from the corresponding haplotype (Figure 3E). After defining germline IGHV alleles using TIgGER, IGHV haplotype inferences were generated and compared using either IGHJ6 or IGHM alleles as anchor genes using RAbHIT (Figure 3E). Although some allele assignments were ambiguous (’’unknown”) using both methods, a strong consensus between haplotype inferences was observed using the two anchor genes. For haplotype 1, represented by IGHJ6*03 and IGHM_FL_4, the IGHJ6*03-derived haplotype had 39 IGHV genes for which either an allele or deletion call was made. When using IGHM_FL_4, the same allele/deletion calls were made for 39 of these genes; in addition, using IGHM as the anchor gene, assignments were made for an additional 5 IGHV genes that had “unknown” designations using IGHJ6. Similarly, of the allele/deletion calls made for 41 IGHV genes on haplotype 2 assigned to IGHJ6*02, 40 of these gene had identical assignments to IGHM_FL_2. Together these results indicate that IGHC variants can be utilized for haplotype inference from repertoire data when commonly used IGHJ or IGHD genes are homozygous in individuals of interest.
FLAIRR-seq enables isotype-, subisotype-, and allele-specific repertoire analyses
IGHG and IGHM alleles identified in each sample were used to annotate reads in each respective repertoire. These assignments allowed for partitioning of the repertoire by isotype, subisotype and IGHC allele (Figure 4). To demonstrate this, the same representative sample (“1013”) that was heterozygous for all IGHC genes was utilized. As shown in Figure 4A, IGHC gene assignments allow for subisotype and allele level frequencies to be estimated as a proportion of the overall IgG and IgM repertoires. In addition, detailed analyses of the repertoire can be conducted within each of these compartments. For example, Figure 4B shows the frequencies of IGHV gene subfamilies for each IGHG and IGHM allele identified in these two samples. Importantly, the average subisotype distribution of unique sequences resolved with FLAIRR-seq across our 10 subjects were in line with what has been reported for healthy individuals, with the most prevalent being IGHG1 (52%) and IGHG2 (36%), whereas IGHG3 (6%) and IGHG4 (5%) were seen at lower levels. Using standard sr-AIRR-seq analyses, allele- resolved V gene usage or enrichment within subisotype populations could not be identified,
which is important for linking subisotype functionality to particular antigen- specific VDJ clones.
Through the partitioning of repertoire sequences by subisotype and IGHC allele, it was found that FLAIRR-seq also allowed for trends to be assessed in aggregate across donors. To demonstrate this, V gene family usage partitioned by IgG subisotype was further examined across all 10 healthy donors. This analysis revealed expected patterns, in that IGHV1, IGHV3 and IGHV4 subfamily genes were dominant across the 4 subisotypes (Figure 4C). However, significant variation was observed in subfamily proportions between subisotypes, associated with distinct profiles in specific subisotypes (Figure 4C). Specifically, the estimated frequencies of IGHV1 and IGHV3 were statistically different between subisotypes (P<0.01, ANOVA); IGHV1 usage was elevated in IGHG1 and IGHG4, whereas IGHV3 was elevated in IGHG2 and IGHG3. These analyses demonstrate the unique capability of FLAIRR-seq to examine variation in the expressed repertoire at the level of isotype, subisotype, and IGHC allele. As samples sizes increase, it has been contemplated that a multitude of additional repertoire features will become accessible to this kind of analysis leading to novel discoveries linking VDJ and IGHC genetic signatures.
FLAIRR-seq identifies subisotype-specific clonal expansion and CSR in longitudinal samples.
The utility of FLAIRR-seq in clinical samples was investigated, particularly to observe changes in immune repertoires over time. Ab responses are highly dynamic, with specific Ab clones expanding upon activation by antigen. It was contemplated whether class switch recombination could be captured by FLAIRR-seq, given the capability to identify clones with the subisotype resolved. FLAIRR-seq resolved repertoires were evaluated over time in four samples collected from one individual over their >13-day hospitalization for severe CO VID- 19 disease. Blood draws were taken on days 1, 4, 8, and 13 post-hospitalization (Figure 5A) and analyzed with FLAIRR-seq across all time points. After initial FLAIRR-seq processing and analysis, unique clones were defined using SCOPer, which clusters sequences based on CDR3 similarity and mutations in IGHV and IGHJ genes. This analysis allowed for the estimation of subisotype-specific clone counts across the four timepoints examined. Overall, IgGl dominated the repertoire at all four time points, but the proportion of subisotypes fluctuated over time (Figure 5B). Specifically, the IGHG2 and IGHG3- specific repertoires expanded from day 1 to day 4, but then contracted in overall frequency from day 8 to day 13 (Figure 5B). To assess clonal diversity within each subisotype repertoire across time, the Simpson’s diversity index (q=2) was calculated using Alakazam. All subisotype-specific
repertoires became less diverse from day 1 to day 4, showing clonal expansion across the IGHG repertoire (Figure 5C). To note, IGHG4 was not included in diversity calculations because IGHG4 would have required higher sequencing depth to ascertain diversity, given the overall lower expression of IgG4 transcripts in this individual. Subisotype-specific repertoire polarity was also assessed by calculating the fraction of clones needed to represent 80% of the total repertoire (Figure 5D), with lower fractions representing more polarized and clonally expended repertoires. Results of this analysis were consistent with the diversity index, demonstrating an increase in clonal expansion (i.e., decreased polarity) at day 4 across IGHG1, IGHG2, and IGHG3 compartments, which returned to baseline at later timepoints.
CSR mediates the switching of Abs from one sub/isotype to another. This occurs through the somatic recombination of IGHC genes, which brings the switched/selected IGHC genes adjacent to the recombined IGHV, IGHD, and IGHJ segments, facilitating transcription. The switching of isotypes and subisotypes can result in changes to associated effector functions of the Ab while maintaining antigen- specific variable regions. Given the ability of FLAIRR- seq to resolve clones with subisotype and IGHC allele resolution, as proof-of-concept, we sought to assess whether FLAIRR-seq could allow for more detailed haplotype-level analysis of CSR through the course of infection. To do this, the largest clones identified in the dataset that were both represented by multiple isotypes and found across timepoints. In total, using SCOper, 19 unique clonal lineages were identified that met this criteria. Detailed analysis was focused on one of the largest clones, “9900” (Figure 5E), comprising IGHM, IGHG1, and IGHG2 sequences. On day 1 post-hospitalization, clone 9900 sequences were represented by both IGHM and IGHG2. At day 4, IGHG2 was the only isotype observed, whereas again at day 8, both IGHM and IGHG2 were observed, as well as IGHG1. To visualize CSR, a phylogeny was built using Dowser. Highlighted in the red box on the phylogenetic tree shown in Figure 5F, a single subclade was observed that is represented by IGHM, IGHG1, and IGHG2.
IGHC alleles were also resolved from this individual, with the exception of IGHG1 alleles which were ambiguous. Critically, both IGHG1 and IGHG2 were heterozygous (Figure 5G). Through the assignment IGHG alleles to haplotypes within this individual using heterozygous V genes (IGHV3-7 and IGHV3-48), it was determined that sequences from clone 9900 (Figure 5F) utilized IGHG1 and IGHG2 alleles from the same haplotype, associated with IGHV3-7*07 and IGHV3-48*03. This observation offers direct characterization of CSR events occurring on the same chromosome. When looking across the remaining clones (n=8) in this
dataset that spanned time points and were represented by IGHG1 and IGHG2 subisotypes, all of these clones used IGHG alleles were confirmed from the same haplotype.
Together, these data provide demonstrative proof-of-concept evidence that FLAIRR- seq profiling performs robustly on clinical samples, including RNA directly extracted from whole blood. In addition, these data provide novel repertoire resolution extending what would have been possible with standard sr-AIRR-seq methods, including analysis of subisotype- specific repertories, evaluation of clonal expansion, and characterization of CSR in the IgG and IgM repertoires.
Discussion
Here the development, validation, and application of FLAIRR-seq, a novel method to resolve near full-length Ab transcripts from bulk PBMC-, isolated B cell- and whole blood- derived total RNA is presented. FLAIRR-seq enables highly accurate, simultaneous resolution of variable and constant regions and shows that IGHC polymorphism is far more extensive than previously assumed. FLAIRR-seq performed equivalent to or with increased resolution compared to existing standard 5’RACE AIRR-seq methods when resolving V, D, and J gene calls, CDR3 lengths, and SHM signatures, suggesting that our CH3/CH4 targeting strategies did not compromise variable region characterization while simultaneously adding the capability to resolve IGHC variation. Little to no primer bias was observed when compared to Ab repertoire profiling from total mRNA Iso-Seq methods. FLAIRR-seq provides the novel ability to use IGHC gene usage to identify subisotypes and genotype heavy chain transcripts, linking these data back to evaluate subisotype- specific repertoires, clonal expansion, and CSR. Underscoring the underappreciated extent of IGHC variation, our profiling of a restricted cohort of only 10 individuals from relatively homogenous backgrounds identified 4 and 7 completely novel IGHC alleles in IgM and IgG, respectively, and extended an additional 17 alleles beyond which had been available in the IMGT database.
The unique capabilities of FLAIRR-seq allows for novel examination of Ab repertoires, including the characterization of variable gene usage and clonotype distribution within unique subisotype subsets. This perspective provides key insights into dynamic Ab responses in diseases known to be mediated by subisotype-specific processes or have skewed subisotype distribution as predictive markers of disease, including Myasthenia gravis (mediated by pathogenic autoantibodies across subisotypes that give rise to varied disease pathologies), Acute Rheumatic Fever (associated with elevated IgG3), and melanoma (skewing towards IgG4 in late- stage disease thought to be indicative of tolerogenic responses and poor prognosis). These subisotype-specific repertoire profiling approaches may be the first step
toward identification of unique clones that mediate disease pathogenicity or serve as high- resolution biomarkers to disease progression, as well as open the door for functional experiments on subisotype clones of interest, including examining the functional impact of the novel IGHC alleles identified here. Expanded population-based FLAIRR-seq profiling and curation of novel IGHC alleles, particularly in conjunction with IGenotyper targeted genomic assembly efforts in IGH, is a significant first step in defining the full extent of variation in a region too long assumed to be relatively invariant.
The Fc region is known to be critical for modulating differential Ab effector functions. These differential functionalities are currently understood to be regulated by differential posttranslational modifications, such as variable glycosylation. FEAIRR-seq profiling is a valuable tool to investigate how genomic variation across IGHC genes impacts residue usage and resultant Fc receptor binding, signaling potential, crosslinking, and posttranslational modification, all of which alter downstream effector functions, such as ADCC, ADCP, and complement fixation.
FEAIRR-seq can effectively examine clonal expansion and CSR in longitudinal samples, demonstrating the feasibility of using FEAIRR-seq to resolve Ab repertoire dynamics. This increased resolution furthers the understanding of Ab repertoire evolution in the transition of acute to chronic disease states, many of which are associated with overall IgG subisotype distribution changes that are thought to reflect the inflammatory milieu. One example is advanced melanoma, where late-stage disease is characterized by elevated IgG4 compared to IgGl, which is believed to reflect a more tolerizing, pro-tumor environment. As shown by the data presented here, FEAIRR-seq also provide insights into longitudinal Ab repertoire dynamics in COVID-19 infection following exposure to different viral variants of concern, or to assess Ab responses to viral vaccines. FEAIRR-seq examination of these samples may identify specific repertoire distribution patterns that act as biomarkers of disease progression. Moving forward it is critical to account for all variability within the Ab repertoire for the most comprehensive understanding of repertoire dynamics and the myriad factors impacting Ab effector function. Together, the data presented here demonstrate that the FEAIRR-seq method provides a comprehensive characterization of allele-resolved IgG and IgM repertoires, detailing variable region gene usage and measurements of maturation, isotype and subisotype identification, and the unappreciated extent of constant region variation, which will be necessary to fully appreciate the impact of IG genomic variation in health and disease.
Example 2: Extension of FLAIRR-SEQ for all Seven Immunoglobulin Chains.
While initial work was focused on benchmarking and demonstrating capability with immunoglobulins IgG and IgM, the FLAIRR-seq parameters were initially expanded to efficiently capture and profile all seven immunoglobulin chains necessary for an effective systemic humoral immune response, including IgM, IgD, IgG, IgA, IgE, IGL, IGK, and all subisotypes within (e.g., IgGl, IgG2, IgG3, IgG4, IgAl, and IgA2). Modifications have been made to the aforementioned molecular pipelines to enhance overall SMRT sequencing throughput and cost-effectiveness of this method, employing transcript concatenation (Figure 7) prior to long-read sequencing to increase overall sequencing depth (6 to 8-fold increased) while maintaining accuracy >99.99% (Q40), an order of magnitude increased compared to other sr-AIRR-seq methods.
Targeted primers were designed and, in the cases of IgG and IgM mentioned above, redesigned to incorporate concatenation linkers and represent population-based diversity in these genetic regions. All 5’ primers are the same, targeting the 5’ template switch oligonucleotide (TS) incorporated into the cDNA during the reverse transcription and rapid amplification of cDNA ends (RACE). This TSO-specific oligo is further linked to a unique molecular identifier (UMI) necessary to remove amplification artifacts during data processing. All 3’ primers are specific to the immunoglobulin transcripts themselves and act to target and enrich Ab or Ig transcripts. In the cases of IgD, IgG, IgA, and IgE isotypes, the 3’ primer targets the constant region exon 3 (CH3) within 100 bp of the polyA tail. For IgM, the 3’ primer targets the constant region exon 4 (CH4) within 100 bp of the polyA tail. Both the 5’ and 3’ gene-specific primers include a concatenation tag enabling directional ligation post-amplification, such that multiple Ig or Ab transcripts are sequenced per molecule to enhance overall depth of sequencing. All transcripts are concatenated in arrays of 8, creating complete, concatenated Ig profiling libraries that range in size from 5600 bp (for an array of 8 light chain transcripts at 700 bp each) to approximately 16,800 bp (for an array of 8 IgM heavy chain transcripts at 2100 bp each).
Successful targeted amplification (Figure 8A-G) and sequencing has been performed for all seven chains. When analyzed, many novel constant region alleles were constantly identified, as compared to those available in current immunological databases. For example, a recent study of IgG heavy chain transcripts from 48 healthy donor samples were examined with both the DNA-based genotyping pipeline (described above), and FEAIRR-seq. Of the total 70 IGHC alleles identified in this cohort, 44 (63%) were novel compared to existing databases (Figure 9A). These alleles are differentiated by both single nucleotide variation, and structural variation, including duplication and deletion of large gene regions in these loci that coincide with novel Ig subisotype genes and alleles. Unlike existing methods, FEAIRR-seq identifies
additional variation within isotypes and/or subisotypes leveraging the single molecule nature of the Ig or Ab transcript resolution. For example, healthy donors that carried a hinge deletion in their IgG3-specific repertoire were observed, which was represented in the FLAIRR-seq data as two unique Ig transcript alleles, Allele 1 carrying the hinge segment 3 (“H3”; Figure 9B, top) and Allele 2 clearly showing the absence of H3 (Figure 9B, bottom). This observation is critically important to Ab or Ig function, as variation within constant region exons, hinge regions in particular, in antibody transcripts are known to impact immunoglobulin effector functions, including Fc receptor avidity and affinity, fixation of complement, antibodydependent cytotoxicity, and antibody-dependent cellular phagocytosis.
Another structural variation observed and enabled by FLAIRR-seq analysis is a duplication of the IgG4 constant region allele (Figure 10). Normally, each individual carries two alleles of a gene - one donated by the maternal chromosome and the other provided by the paternal copy. In the case of a gene duplication, such as this, it is seen three alleles if one chromosome includes the duplication (as seen in Figure 10), or four alleles if the duplication is carried by both chromosomes. This creates a third, or even fourth, distinct IgG4-specific Ig or Ab repertoire in these individuals, ostensibly broadening their IgG4-specific immune response. This is particularly intriguing for IgG4, which undergoes antibody chain “swapping” to limit Ab cross-linking and resultant downstream effector functions, such as those described above. An individual carrying additional alleles as a result of gene duplication may demonstrate impact on antibody activity, a point that is under active investigation. No other sr-AIRR-seq method has the resolution to identify and visualize these constant region alterations and, as a result, misses defining this extensive diversity and its potential impact or predictive value for antibody function and, therefore, cannot include these data in genetic models of immune- mediated disease or response to vaccination or immunotherapies.
Subisotype- specific repertoires are also find within the IgA- specific antibody profile, in which two subisotypes, IgAl and IgA2, are found. Although relatively similar in that both subisotypes play a role in mucosal immunity, the two subisotype forms differ in protein structure and in downstream effector function. In particular, IgAl is thought to be necessary for immune homeostasis, or balancing immune responses to control immune-regulated inflammation, whereas IgA2 modulates mucosal inflammation and drives active immune responses. Structurally, these subisotypes differ only be the presence (IgAl) or absence (IgA2) of an extended hinge domain in the constant region (Figure 10B). Given their dramatically different biological roles, FLAIRR-seq resolution would allow investigation of subisotype- specific repertoires in the context of health and disease, providing clear separation of antibodies
that belong to the regulatory versus inflammatory populations. This level of resolution is impossible with other current technologies.
Genomic variation in the IGLC gene region and the impact this variation has on the expressed Ab light chain repertoire remains unexplored. However, it was recently reported that extensive genetic diversity in this region of the human genome, including the nucleotide resolution of novel IGLC alleles and genes, the latter of which have arisen through recent gene duplications, and are associated with variable numbers of IGLC genes among individuals. The longest haplotype we have characterized includes an additional 3 copies of the IGLC3 gene (Figure 11 A). FLAIRR-seq allows for the first exploration of light chain repertoires considering the potential impact of IGLC gene and allele diversity. Specifically, Figure 11 (B & C) provides an example of how FLAIRR-seq is used to resolve gene-level frequency estimates of the IGLC3 genes and its close duplicate copies; no other technology can allow for this information to be linked directly to recombined lambda chain V and J genes in full-length transcripts.
Together, these data, even in healthy donor samples demonstrate extensive variation across all immunoglobulin transcripts and highlight an underappreciated extent of variation in the constant region exons themselves. FLAIRR-seq provides a new modality of profiling immune systems to identify not only variable regions responsible for antigen- specific interactions, but also the linked constant region exons that determine the downstream function once the antibody is bound. Incorporating these variants will expand the variation of how the genetics underlying the antibody response sets the stage for development of immune responses in the context of vaccination and disease, as well as identify highly reactive alleles that poised to develop neoantigen reactivity or autoimmune responses.
Example 3: Application of FLAIRR-SEQ to Autoimmune Myasthenia Gravis.
To investigate whether specific Ab or Ig signatures could be identified and associated with the development of autoimmune disease, a pilot study was performed in a cohort of 48 individuals diagnosed with acetylcholine-receptor targeted Myasthenia Gravis (MG). MG is a disease in which autoantibodies target the neuromuscular junction, disrupting acetylcholine interactions with its cognate receptor (e.g., acetylcholine receptor) leading to delayed, blocked, or missing signaling potential to maintain muscle activity. This disease is most often initially diagnosed due to drooping in the face, specifically the eye, lips, or cheeks and, as it progresses, can lead to disrupted and life-threatening complications caused by unsuccessful swallowing or
breathing. Although much work has been done to understand the mechanisms of action underlying MG pathogenesis, there remains no cure.
The autoantibodies that mediate acetylcholine- specific MG have been shown to be primarily of the IgGl and IgG3 subisotypes. These autoantibodies mediate disease through one of three major pathogenic mechanisms: (i) bind and crosslink upon the acetylcholine receptor, initiating the complement cascade and destroying the neuromuscular junction; (ii) physically bind the acetylcholine receptor and block interaction with the acetylcholine ligand, interfering with the signaling potential in the neuromuscular junction; or (iii) bind the acetylcholine receptor and promote receptor internalization, removing acetylcholine receptor availability for acetylcholine binding and interrupting signal propagation. Thus far, autoantibodies appear to be able to act through one or more of these mechanisms, and often there is a polyclonal, or multi-mechanistic, reactivity at play in MG-infected individuals. All three mechanisms require Ab or Ig binding to the acetylcholine receptor, but only mechanism (i) depends on a specific constant region mediated cascade of events to trigger complement activity and tissue destruction. The study aimed to perform immunogenomic profiling using FLAIRR-seq to investigate if IgG3-specific Ig genes or alleles were present in affected individuals that could confer increased complement reactivity.
FLAIRR-seq was used to profile IgG subisotypes in 48 MG-affected individuals. Together, a total of 89 constant region alleles were identified, with 70 (79%) of these being novel compared to existing databases. When broken down by subisotype, the largest number of new alleles were identified in the IgG3 and IgG4 repertoires (Figure 11 A). This again underscores just how little is understood about the variation in the constant region and demonstrates that the constant region varies considerably in those individuals with disease. Due to the single molecule nature of FLAIRR-seq data, we can investigate the profile of full-length Ig or Ab transcripts within a particular subisotype compartment. When further investigating the IgG3 compartment, thought to be the most pathogenic in acetylcholine- specific MG disease, hinge deletions specific to IgG3 antibodies were identified. Figure 11 schematizes those alleles identified, “M” or “major” allele represents IgG3 transcripts that contain all 4 hinge regions; “S” or “short” indicates those missing a single hinge domain (similar to that shown in Figure 9B); additionally, a third or “E” allele was identified retaining only one hinge domain. Given the known role of the hinge deletions enhancing complement activity, a known mechanism of MG disease, the distribution of hinge alleles were evaluated in MG donors compared to those healthy donors (“HD”) characterized from Figure 9. An increased frequency of hinge deletions were observed in MG autoimmune patients as seen in the right stacked bar
from Figure 11B. Moreover, IgG3 duplications were observed in MG subjects that were not observed in healthy donors. These duplications resulted in the expression of three IgG3 alleles, akin to what was shown in Figure 10 for IgG4. These duplications did contain hinge deletions and possibly could contribute to pathogenesis. Only Ig or Ab profiling with FLAIRR-seq would identify these variants, which may impact overall antibody function in the context of this autoantibody-mediated autoimmune disease. Functional studies are underway.
It will be apparent to those skilled in the art that various modifications and variations can be made in the present disclosure without departing from the scope or spirit of the invention. Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the methods disclosed herein. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.
TABLES
Table 1: Comparative analysis metrics between sr-AIRR-seq and FLAIRR-seq on matched samples
Table 2. FLAIRR-seq Donor Information
Table 3. Primers and Barcodes used for FLAIRR-seq Molecular Method
Table 4A (Continued). sr-AIRR-seq pRESTO and Change-0 Pipeline Read Counts.
Table 4A. (Continued) sr-AIRR-seq pRESTO and Change-0 Pipeline Read Counts.
Table 4B. FLAIRR-seq pRESTO and Change-0 Pipeline Read Counts.
Table 4B. (Continued) FLAIRR-seq pRESTO and Change-0 Pipeline Read Counts.
Claims
1. A method of sequencing a ribonucleic acid (RNA) encoding an immunoglobulin (Ig) or an antibody (Ab), the method comprising: targeting and amplifying the RNA using a 5 ’-Rapid Amplification of cDNA Ends (RACE) method, wherein the RACE method converts the RNA into a complementary deoxyribonucleic acid (cDNA), subsequently targeting an Ig heavy chain cDNA transcript and an Ig light chain cDNA transcript through a primer-directed specific polymerase chain reaction (PCR), wherein the Ig heavy chain cDNA transcripts and the Ig light chain cDNA transcripts are targeted for amplification, integrating the Ig heavy chain cDNA transcripts and the Ig light chain cDNA transcripts into an array of eight transcripts to generate a nucleic acid comprising at least 5600 base pairs (bps), analyzing the cDNA transcripts using a single molecule, real-time sequencing (SMRT) method, and detecting at least 700 base pairs (bps) within a variable region and a constant region of the Ig.
2. The method of claim 1, wherein the method detects about 700 bps to about 900 bps of the variable region and the constant region, or fragments thereof, of the Ig light chain cDNA transcript.
3. The method of claim 1 or 2, wherein the method detects about 1500 bps to about 2100 bps of the variable region and the constant region, or fragments thereof, of the Ig heavy chain cDNA transcript.
4. The method of any one of claims 1-3, wherein the method identifies a gene or an allele encoding one or more segments of the variable region comprising a variability (V) peptide, a diversity (D) peptide, a joining (J) peptide, or combinations thereof.
5. The method of any one of claims 1-4, wherein the method identifies a gene or an allele encoding one or more segments of the constant region of a heavy chain (CH) comprising CHI, CH2, CH3, CH4, or combinations thereof.
6. The method of any one of claims 1-5, wherein the method identifies a gene or allele encoding an Ig isotype or an Ig subisotype.
7. The method of any one of claims 1-6, wherein the method identifies an expansion of the Ig isotype or Ig subisotype.
8. The method of any one of claims 1-7, wherein the method identifies a class switch recombination (CSR) event of the Ig subisotype.
9. The method of any one of claims 1-8, wherein the Ig isotype comprises an IgM, IgD, IgG, IgA or an IgE isotype.
10. The method of any one of claims 1-9, wherein the IgG comprises an IgGl, IgG2, IgG3, or IgG4 subisotype.
11. The method of any one of claims 1-10, wherein the IgA comprises an IgAl or IgA2 subisotype.
12. The method of any one of claims 1-11, wherein the RNA is isolated from peripheral blood mononuclear cells (PBMCs), purified B cells, solid tumors, healthy tissue, or whole blood.
13. The method of any one of claims 1-12, wherein the method develops an Ig profile of a subject.
14. A method of treating or preventing a disease in a subject in need thereof, the method comprising: isolating an RNA sample from the subject, sequencing the RNA using the method of any one of claims 1-13, developing an Ig profile from the RNA, and
administering a therapeutic agent to the subject, wherein the Ig profile indicates the disease.
15. The method of claim 14, wherein the disease comprises an infectious disease, an autoimmune disease, a neoproliferative disease, a neurodegenerative disease, a respiratory disease, a congenital disease, a gastrointestinal (GI) disease, a metabolic disease, or a cardiovascular disease.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202363535326P | 2023-08-30 | 2023-08-30 | |
| US63/535,326 | 2023-08-30 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2025049921A1 true WO2025049921A1 (en) | 2025-03-06 |
Family
ID=94820443
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2024/044692 Pending WO2025049921A1 (en) | 2023-08-30 | 2024-08-30 | Methods for single molecule resolution of near full-length immunoglobulin heavy and light chain repertoires |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2025049921A1 (en) |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20180371106A1 (en) * | 2012-07-10 | 2018-12-27 | Board Of Regents, The University Of Texas System | Monoclonal antibodies for use in diagnosis and therapy of cancers and autoimmune disease |
| US20220088174A1 (en) * | 2018-10-26 | 2022-03-24 | Dana-Farber Cancer Institute, Inc. | Genomic variants in ig gene regions and uses of same |
-
2024
- 2024-08-30 WO PCT/US2024/044692 patent/WO2025049921A1/en active Pending
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20180371106A1 (en) * | 2012-07-10 | 2018-12-27 | Board Of Regents, The University Of Texas System | Monoclonal antibodies for use in diagnosis and therapy of cancers and autoimmune disease |
| US20220088174A1 (en) * | 2018-10-26 | 2022-03-24 | Dana-Farber Cancer Institute, Inc. | Genomic variants in ig gene regions and uses of same |
Non-Patent Citations (2)
| Title |
|---|
| FORD EASTON E, TIERI DAVID, RODRIGUEZ OSCAR L, FRANCOEUR NANCY J, SOTO JUAN, KOS JUSTIN T, PERES AYELET, GIBSON WILLIAM S, SILVER : "FLAIRR-Seq: A Method for Single-Molecule Resolution of Near Full-Length Antibody H Chain Repertoires", THE JOURNAL OF IMMUNOLOGY, vol. 210, no. 10, 15 May 2023 (2023-05-15), pages 1607 - 1619, XP093290245, ISSN: 0022-1767, DOI: 10.4049/jimmunol.2200825 * |
| RODRIGUEZ OSCAR L., SAFONOVA YANA, SILVER CATHERINE A., SHIELDS KAITLYN, GIBSON WILLIAM S., KOS JUSTIN T., TIERI DAVID, KE HANZHON: "Genetic variation in the immunoglobulin heavy chain locus shapes the human antibody repertoire", NATURE COMMUNICATIONS, vol. 14, no. 1, XP093076165, DOI: 10.1038/s41467-023-40070-x * |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US12344902B2 (en) | Microsatellite instability detection in cell-free DNA | |
| JP7357661B2 (en) | Identification of polynucleotides associated with the sample | |
| JP6905934B2 (en) | Multiple gene analysis of tumor samples | |
| Imkeller et al. | Assessing human B cell repertoire diversity and convergence | |
| Boyd et al. | High‐throughput DNA sequencing analysis of antibody repertoires | |
| EP3423828A1 (en) | Methods and systems for evaluating tumor mutational burden | |
| US20230121729A1 (en) | TCR/BCR Profiling | |
| AU2019316556A1 (en) | Methods for assessing the risk of developing progressive multifocal leukoencephalopathy caused by john cunningham virus by genetic testing | |
| US20230298702A1 (en) | Methods and compositions relating to detection of recombination and rearrangement events | |
| WO2025049921A1 (en) | Methods for single molecule resolution of near full-length immunoglobulin heavy and light chain repertoires | |
| Beaulaurier et al. | De novo antibody identification in human blood from full-length single B cell transcriptomics and matching haplotype-resolved germline assemblies | |
| US20230416810A1 (en) | Compositions and methods for immune repertoire monitoring | |
| Guo et al. | Novel rhesus macaque immunoglobulin germline genes identified by three sequencing approaches | |
| Beaulaurier et al. | De novo antibody discovery in human blood from full-length single B cell transcriptomics and matching haplotyped-resolved germline assemblies | |
| Ford | Uncovering the hidden diversity of antibody heavy chains and their implications for autoantibody mediated disease. | |
| US20250388975A1 (en) | Microsatellite instability detection in cell-free dna | |
| Ford et al. | FLAIRR-seq: A novel method for single molecule resolution of near full-length immunoglobulin heavy chain repertoires | |
| Chow | Targeted Capture and Sequencing of Immunoglobulin Rearrangements in Multiple Myeloma to Enable Detection of Minimal Residual Disease | |
| WO2025059338A1 (en) | Methods for analyzing nucleic acids using sequence read family size distribution | |
| HK40002957B (en) | Methods and systems for evaluating tumor mutational burden | |
| HK40002957A (en) | Methods and systems for evaluating tumor mutational burden |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 24861152 Country of ref document: EP Kind code of ref document: A1 |