WO2025160484A1 - Biomarqueurs d'adn microbien et acellulaire humain pour diagnostiquer et évaluer la gravité d'une maladie intestinale inflammatoire - Google Patents
Biomarqueurs d'adn microbien et acellulaire humain pour diagnostiquer et évaluer la gravité d'une maladie intestinale inflammatoireInfo
- Publication number
- WO2025160484A1 WO2025160484A1 PCT/US2025/013060 US2025013060W WO2025160484A1 WO 2025160484 A1 WO2025160484 A1 WO 2025160484A1 US 2025013060 W US2025013060 W US 2025013060W WO 2025160484 A1 WO2025160484 A1 WO 2025160484A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- subject
- cfna
- disease
- ibd
- sample
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
- C12Q1/6883—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6806—Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
- C12Q1/6888—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for detection or identification of organisms
- C12Q1/689—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for detection or identification of organisms for bacteria
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/112—Disease subtyping, staging or classification
Definitions
- Massively parallel sequencing is a high-throughput technology that can generate an enormous amount of information about the genetic makeup of an organism.
- MPS is particularly useful for genomic studies that analyze sequences across a genome such as whole genome sequencing.
- MPS can be used to study cell-associated DNA, as well as cell- free DNA shed into a variety of samples including blood.
- MPS can also be used for metagenomic sequencing applications that detect microbial genomes in a sample.
- IBD inflammatory bowel disease
- UC ulcerative colitis
- CD Crohn’s disease
- IBD inflammatory bowel disease
- An adequate treatment plan also benefits from knowledge of the severity of the disease. Nonetheless, accurate diagnosis of inflammatory diseases such as inflammatory bowel disease (IBD) and categorization of their severity remains a challenge, especially when pursued in a non-invasive manner.
- IBD inflammatory bowel disease
- a method disclosed herein comprises: providing a sample from the subject, the sample comprising cell-free nucleic acids (cfNA), wherein the cfNA comprises subject cell-free nucleic acids (subject cfNA), microbial cell- free nucleic acids (mcfNA), or a mixture thereof; performing a sequencing assay on the cfNA to obtain sequence data; applying a classifier to the sequence data, wherein the classifier can detect at least one subtype of IBD based on an mcfNA signature, a subject cfNA signature, or a combination thereof; and determining that the subject has a subtype of inflammatory bowel disorder based at least in part on the applying of the classifier to the sequence data.
- cfNA cell-free nucleic acids
- mcfNA microbial cell- free nucleic acids
- the classifier can detect ulcerative colitis (UC) or Crohn’s disease (CD) in the subject. In some embodiments, the classifier can distinguish between UC and CD in the subject. In some embodiments, the classifier detects UC, detects CD, or distinguishes between UC and CD at an AUC of greater than 0.7.
- the subject prior to (b), the subject is determined to have IBD with an unknown subtype. In some embodiments, the subject has previously had an endoscopy procedure. In some embodiments, the endoscopy procedure indicated that the subject had indeterminate colitis.
- the classifier can detect at least one subtype of IBD based on an mcfNA signature and wherein the sequence data comprises data from mcfNA in the sample. In some embodiments, the classifier can detect at least one subtype of IBD based on an mcfNA signature and on a subject cfNA signature and wherein the sequence data comprises data from mcfNA in the sample and the subject cfNA in the sample.
- a method disclosed herein comprises: providing a sample from the subject, the sample comprising cell-free nucleic acids (cfNA), wherein the cfNA comprise subject cell-free nucleic acids (subject cfNA), microbial cell-free nucleic acids (mcfNA), or a mixture thereof; performing a sequencing assay on the cfNA to obtain sequence data; applying a classifier to the sequence data, wherein the classifier can detect any of the following categories of severity: mild disease, moderate disease, severe disease, remission, or combination thereof; and categorizing severity of disease in the subject based at least in part on the applying of the classifier to the sequence data.
- cfNA cell-free nucleic acids
- mcfNA microbial cell-free nucleic acids
- a method disclosed herein comprises: providing an initial sample from the subject with the IBD of unknown subtype, the initial sample comprising cfNA comprising subject cell-free nucleic acids (subject cfNA), microbial cell-free nucleic acids (mcfNA), or a mixture thereof and wherein the cfNA comprises double-stranded cfNA, single-stranded cfNA, degraded double-stranded cfNA (degraded dscfNA) and degraded single-stranded cfNA (degraded sscfNA); denaturing the double-stranded cfNA and degraded dscfNA into single-stranded fragments such that the cfNA comprises: (i) the single-stranded
- the cfNA comprise cfNA fragments at least 20 bases in length. In some embodiments, the cfNA comprises cfNA fragments less than 200 bases in length. In some embodiments, the cfNA comprises cfNA fragments less than 100 bases in length. In some embodiments, the genetic locus is located within the mcfNA. In some embodiments, the genetic locus is located within singlestranded cfNA or degraded sscfNA in the initial sample. In some embodiments, the genetic locus is located within double-stranded cfNA or degraded dscfNA in the initial sample.
- the preparing a cfNA library comprises attaching 5’ adapters, 3’ adapters, or both 5’ and 3’ adapters to the cfNA.
- the attaching 5’ adapters, 3’ adapters, or both 5’ and 3’ adapters to the cfNA occurs in the initial sample wherein the initial sample has not been subjected to an extraction assay.
- a method disclosed herein comprises: providing a sample from the subject with the IBD, the sample comprising: cell -free nucleic acids (cfNA), wherein the cfNA comprise subject cell-free nucleic acids (subject cfNA), microbial cell-free nucleic acids (mcfNA), or a mixture thereof; physically enriching the sample for subject cfNA, mcfNA fragments, or both, that are less than a cutoff value between 70-200 bases to produce a size-enriched fraction of cfNA; and analyzing a genetic locus in the size-enriched fraction of cfNA, wherein the genetic locus is located within the mcfNA, the subject cfNA, or both.
- cfNA cell -free nucleic acids
- mcfNA microbial cell-free nucleic acids
- the genetic locus comprises a plurality of genetic loci that constitute a signature that distinguishes between ulcerative colitis (UC) and Crohn’s disease (CD).
- the cutoff value is less than about 90 bases.
- the methods disclosed herein further comprise physically enriching the sample for subject cfNA and mcfNA greater than about 10 bases in length.
- the genetic locus analyzed in the size-enriched fraction of cfNA is located within the mcfNA.
- the physically enriching comprises directly physically enriching the sample or indirectly physically enriching the sample.
- the directly physically enriching comprises subjecting the cfNA to a size selection device.
- the size selection device comprises beads or a gel electrophoresis device.
- the indirectly physically enriching results from a process that does not involve use of sharp cutoff values for size selection.
- the methods disclosed herein comprise screening the subject for IBD prior to, simultaneously, or following the providing the sample from the subject in (a).
- the methods disclosed herein comprise performing an endoscopic procedure on the subject that identifies the subject as having indeterminate colitis or obtaining results from an endoscopic procedure performed on the subject, wherein the endoscopic procedure indicated the subject has indeterminate colitis.
- the endoscopy is performed prior to, simultaneously, or following the providing the sample from the subject in (a).
- the physically enriching comprises enriching the sample for degraded cfNA.
- the methods disclosed herein comprise: treating the subject for the subtype of IBD by administering a medicament for the subtype of IBD to the subject, wherein the diagnosis is based at least in part on a method comprising: providing a sample from the subject, wherein the subject has an unknown subtype of inflammatory bowel disease (IBD), the sample comprising cell-free nucleic acids (cfNA), wherein the cfNA comprises a mixture of subject cell-free nucleic acids (subject cfNA) and microbial cell- free nucleic acids (mcfNA); performing a sequencing assay on the cfNA to obtain sequence data; applying a classifier to the sequence data, wherein the classifier can detect at least one subtype of IBD based on a mcfNA signature, a subject cfNA signature, or a combination thereof; and determining that
- the methods disclosed herein comprise: treating the subject for the subtype of IBD by administering a medicament for the subtype of IBD to the subject at a dose consistent with the severity of IBD, wherein the diagnosis is based at least in part on a method comprising: providing a sample from the subject, the sample comprising cell-free nucleic acids (cfNA), wherein the cfNA comprises a mixture of subject cell-free nucleic acids (subject cfNA) and microbial cell-free nucleic acids (mcfNA); performing a sequencing assay on the cfNA to obtain sequence data; applying a classifier to the sequence data, wherein the classifier can categorize severity of a subtype of IBD based on a mcfNA signature, a subject cfNA signature,
- the methods disclosed herein comprise: treating the subject for the IBD subtype in remission by administering a first medicament to the subject at an initial maintenance dose; between 2 weeks to 6 months after (a), providing a sample from the subject, the sample comprising cell-free nucleic acids (cfNA), wherein the cfNA comprises a mixture of subject cell-free nucleic acids (subject cfNA) and microbial cell-free nucleic acids (mcfNA); performing a sequencing assay on the mixture of mcfNA and subject cfNA to obtain sequence data; applying a classifier to the sequence data, wherein the classifier can categorize a subtype of IBD as being severe, moderate, mild, or in remission, based on a mcfNA signature, a subject cfNA signature, or a combination thereof;
- the methods disclosed herein comprise: treating the subject for the IBD subtype by administering a first medicament to the subject at an initial induction dose; about 2 to 6 weeks after (a), providing a sample from the subject, the sample comprising cell-free nucleic acids (cfNA), wherein the cfNA comprises a mixture of subject cell-free nucleic acids (subject cfNA) and microbial cell-free nucleic acids (mcfNA); performing a sequencing assay on the mixture of mcfNA and subject cfNA to obtain sequence data; applying a classifier to the sequence data, wherein the classifier can categorize a subtype of IBD as being severe, moderate, mild, or in remission, based on a mcfNA signature, a subject cfNA signature, or a combination thereof; categorizing the subtype of IBD
- the methods disclosed herein comprise: at least 2-6 weeks following the surgical intervention, providing a sample from the subject, the sample comprising cell-free nucleic acids (cfNA), wherein the cfNA comprises a mixture of subject cell-free nucleic acids (subject cfNA) and microbial cell-free nucleic acids (mcfNA);performing a sequencing assay on the mixture of mcfNA and subject cfNA to obtain sequence data; applying a classifier to the sequence data, wherein the classifier can categorize a subtype of IBD as being severe, moderate, mild, or in remission, based on a mcfNA signature, a subject cfNA signature, or a combination thereof; categorizing the subtype of IBD as severe, moderate, mild or in remission, based at least
- an endoscopic procedure performed prior to (a) has indicated the subject has indeterminate colitis.
- the IBD is considered to be indeterminate because of inaccessible regions of the bowel or because of indeterminate disease manifestation.
- the indeterminate disease manifestation is due to a finding of continuous inflammatory patches that could be consistent with CD or UC.
- the classifier can detect ulcerative colitis (UC) or Crohn’s disease (CD).
- the classifier can distinguish between UC and CD.
- the maintenance therapy comprises an anti-inflammatory drug or corti co- steroid.
- the at least two types of IBD comprise disease that is mild, moderate, severe, or in remission, or any combination thereof.
- the subject is determined to be in remission, further comprising administering a maintenance treatment to the subject. In some embodiments, if the subject is determined to have mild, moderate, or severe disease, administering a drug for mild, moderate, or severe disease. In some embodiments, if a subject is determined to have a mild, moderate, or severe subtype of IBD, and the subject is receiving a drug regimen, further comprising adjusting or changing the drug regimen. In some embodiments, if the subject is determined to have mild, moderate, or severe disease, performing an endoscopy on the subject in order to identify regions of inflammation.
- the methods disclosed herein further comprise performing surgery on the patient to remove all of, or a portion of, the anus, rectum, large intestine, small intestine, digestive tract, or any combination thereof.
- the screening of the subject comprises detection of a lesion, ulceration, inflammation, bleeding pattern, or disease location in the gastrointestinal tract.
- the subject has an indeterminate endoscopic biopsy prior to the providing the sample in (a).
- the subject has had an indeterminate biopsy that detected continuous inflammatory regions or lesions along a section of the gastrointestinal tract.
- the subject has had an endoscopic procedure of the gastrointestinal tract that results in a finding of IBD, UC, CD, or combination thereof.
- the method is performed in order to confirm the finding of IBD, UC, CD, or combination thereof.
- the methods disclosed herein further comprise performing an endoscopic procedure on the gastrointestinal tract of the subject if UC, CD, mild disease, moderate disease, or severe disease is detected by the method.
- the methods disclosed herein comprise collecting a sample from a patient with an inflammatory bowel disease (IBD) of unknown subtype, wherein: the sample comprises cell- free nucleic acids (cfNA) comprising human cell-free nucleic acids (human cfNA), microbial cell- free nucleic acids (mcfNA), or a mixture thereof; the patient is planning to undergo an endoscopy procedure; and the sites to be biopsied by the endoscopy procedure have not yet been determined; performing a sequencing assay on the cfNA to obtain sequence data; applying a classifier to the sequence data, wherein the classifier can distinguish between at least two types of inflammatory bowel disorder (IBD) based on a mcfNA signature, a subject cfNA signature, or a combination thereof; and determining that the subject has a first type of IBD based at least in part on the applying of the classifier to the sequence
- IBD inflammatory bowel disorder
- the first type of IBD is Crohn’s disease (CD).
- the second type of IBD is ulcerative colitis (UC).
- the targeting of multiple locations along the entirety of the digestive tract comprises performing the endoscopic procedure on an ilium, a large bowel, or both.
- the at least two types of IBD comprise at least two of: mild IBD, moderate IBD, severe IBD, or IBD in remission.
- the methods disclosed herein further comprise screening the subject for IBD.
- the screening comprises a physical examination, blood test, stool test, imaging study, reported symptoms, medications, personal medical history, family medical history, or any combination thereof.
- the screening comprises a clinical evaluation based on one or more features selected from the group consisting of: abdominal pain, rectal bleeding, weight loss, fatigue, personal medical history, family medical history, lifestyle, or any combination thereof.
- the screening is inconclusive for IBD, subtype of IBD, or severity of subtype of IBD.
- the methods disclosed herein further comprise physically enriching the sample for human and microbial cell-free nucleic acid fragments that are less than a cutoff value between 70-200 bases.
- the physically enriching comprises producing a size-enriched fraction of the cfNA by: binding the cfNA to a solid support capable of separating the cfNA based on size and eluting the cfNA from the solid support to obtain size-selected cfNA; subjecting the cfNA to a size-selection electrophoresis; or a combination of (a) and (b).
- the method further comprises analyzing a genetic locus in the size-enriched fraction of cfNA.
- the analyzing the genetic locus in the size-enriched fraction of cfNA comprises sequencing the size-enriched fraction of cfNA to obtain sequence reads.
- the analyzing or determining the genetic locus comprises determining a ratio of a number of sequence reads covering the genetic locus to the number of reads covering a flanking region.
- the number of sequence reads covering the first genomic region comprises the number of sequence reads within 500 bp of the genetic locus, and wherein the number of sequence reads covering the flanking region comprises a number of sequence reads located more than 5OObp from the genetic locus but within 2000 bp in either a 5’ or 3’ direction from the genetic locus.
- the ratio comprises a natural log of the number of sequence reads covering the first genomic region divided by the number of sequence reads covering the flanking region.
- the ratio comprises a peak-to-flank ratio or a trough-to-flank ratio of sequence reads.
- the methods disclosed herein further comprise determining that a gene is differentially enriched when the value of the ratio for the gene is 0.3 or greater.
- the sample comprises a biological fluid.
- the method further comprises analyzing a genetic locus in the mcfNA or the subject cfNA, or a combination thereof.
- the methods disclosed herein further comprise analyzing a genetic locus within a human genome.
- the genetic locus is within microbial cell-free DNA derived from a Propionibactierum, a Lactococcus, Haemophilus, an Escherichia, Rothia, a Malassezia, a Streptococcus, a Rothia, an Actinomyces, a Klebsiella, a Dermacoccus, an Acinetobacter, a Lactobacillus, or any combination thereof.
- the genetic locus is within microbial cell-free DNA derived from Acinetobacter, Staphylococcus, Blautia, Anoxybacillus, Paracoccus, or any combination thereof.
- the analyzing comprises performing high throughput sequencing on the sample.
- the biological fluid is a non-fecal biological fluid.
- the sample comprises blood, serum, plasma, cerebrospinal fluid, fluid from lavage, fluid from bronchoalveolar lavage, synovial fluid, or urine.
- the sample is a plasma sample.
- the sample comprises cfNA derived from or purified from a biological fluid.
- the subject with an inflammatory bowel disorder has symptoms consistent with both ulcerative colitis and Crohn’s Disease.
- the symptoms comprise: long-term inflammation of the digestive tract, digestive discomfort, stomach cramps and pain, diarrhea, constipation, urgent need to have a bowel movement, feeling as though a bowel movement was incomplete, rectal bleeding, loss of appetite, weight loss, fatigue, night sweats, irregular periods, or any combination thereof.
- the subject is human.
- the methods disclosed herein further comprise administering a medicament to the subject.
- the methods disclosed herein further comprise distinguishing between a first, a second and a third type of inflammatory bowel disease or disorder.
- the analyzing comprises analyzing data generated from sequence reads obtained from both the human and mcfNA.
- the first type of inflammatory bowel disease or disorder comprises ulcerative colitis and the second type of inflammatory bowel disease or disorder is Crohn’s Disease.
- the first type of inflammatory bowel disease or disorder comprises a severe inflammatory bowel disease or disorder and the second type of inflammatory bowel disease or disorder comprises a moderate inflammatory bowel disease.
- the methods disclosed herein further comprise distinguishing between a first, a second and a third type of inflammatory bowel disease or disorder.
- the third type of inflammatory bowel disease or disorder comprises a mild inflammatory bowel disease or disorder.
- the method further comprises determining that the inflammatory bowel disorder comprises Crohn’s disease.
- the method further comprises determining that the inflammatory bowel disorder comprises ulcerative colitis.
- the analyzing the sequence reads comprises use of machine learning.
- the analyzing the sequence reads comprises applying one or more classifiers to sequence data.
- the one or more classifiers are assessed using a receiver operating characteristics curve (ROC).
- the methods disclosed herein further comprise determining a diversity of overall detected microbes from the microbial cell-free DNA, wherein the subject is determined to have an active disease when the sample from the subject has a higher alpha diversity of overall detected microbes relative to an otherwise comparable sample with a lower alpha diversity of overall detected microbes.
- the methods disclosed herein further comprise determining the disease is severe, and wherein the determining comprises detecting a higher alpha diversity of overall detected microbes in the sample from the subject relative to an otherwise comparable sample with a lower alpha diversity of overall detected microbes.
- the methods disclosed herein further comprise determining the disease comprises ulcerative colitis, wherein the determining comprises analyzing the sequence reads to detect in the sample a presence of cell-free DNA derived from a Propionibactierum, a Lactococcus, Haemophilus, an Escherichia, Rothia, a Malassezia, Streptococcus, Rothia, a Malassezia, an Actinomyces, a Klebsiella, a Dermacoccus, an Acinetobacter, a Lactobacillus, or any combination thereof.
- the methods disclosed herein further comprise determining the disease comprises ulcerative colitis, wherein the determining comprises analyzing the sequence reads to detect in the sample a presence of cell-free DNA derived from an Afipia, a Bifidobacterium, a Brochothrix, a Companilactobacillus, a Coprobacillus, an Escherichia, a Leptospira, a Methylobacterium, a Pantoea, a Parasutterella, a Pichia, a Proteus, a Sphingobacterium, an Achromobacter, an Acinetobacter, an Actinomyces, an Aeribacillus, an Alishewanella, an Alistipes, an Alphacoronavirus, an Alphainfluenzavirus, an Anoxybacillus, an Aureobasidium, a Bacillus, a Betacoronavirus, a Blautia, a Burkholderia,
- Incertae Sedis UnClass an Eubacteriales UnClass UnClass, an Eubacterium, a Finegoldia, a Francisella, a Geobacillus, an Inovirus, an Intestinibacter, a Kingella, a Klebsiella, a Kocuria, a Lachnoanaerobaculum, a Lachnoclostridium, a Lentivirus, a Leuconostoc, a Mediterraneibacter, a Megasphaera, a Melampsora, a Mesorhizobium, a Methanococcus, a Methanosarcina, a Methanothrix, a Methylorubrum, a Moraxella, a Mycobacterium, a Mycolicibacterium, a Nesterenkonia, an Orthopoxvirus, a Paenibacillus, a Paenirhodobacter, a Paraburkholderia, a Paracoccus
- the methods disclosed herein further comprise determining the disease comprises Crohn’s disease, wherein the determining comprises analyzing the sequence reads to detect in the sample a presence of cell-free DNA derived from Propionibactierum, Lactococcus, Haemophilus, Escherichia, Rothia, Malassezia, Streptococcus, Actinomyces, Klebsiella, Dermacoccus, Acinetobacter, Lactobacillus, or any combination thereof.
- the methods disclosed herein further comprise determining the disease is Crohn’s disease, wherein the determining comprises analyzing the sequence reads to detect in the sample a presence of cell-free DNA derived from an Afipia, a Bifidobacterium, a Brochothrix, a Companilactobacillus, a Coprobacillus, an Escherichia, a Leptospira, a Methylobacterium, a Pantoea, a Parasutterella, a Pichia, a Proteus, a Sphingobacterium, an Achromobacter, an Acinetobacter, an Actinomyces, an Aeribacillus, an Alishewanella, an Alistipes, an Alphacoronavirus, an Alphainfluenzavirus, an Anoxybacillus, an Aureobasidium, a Bacillus, a Betacoronavirus, a Blautia, a Burkholderia,
- Incertae Sedis UnClass an Eubacteriales UnClass UnClass, an Eubacterium, a Finegoldia, a Francisella, a Geobacillus, an Inovirus, an Intestinibacter, a Kingella, a Klebsiella, a Kocuria, a Lachnoanaerobaculum, a Lachnoclostridium, a Lentivirus, a Leuconostoc, a Mediterraneibacter, a Megasphaera, a Melampsora, a Mesorhizobium, a Methanococcus, a Methanosarcina, a Methanothrix, a Methylorubrum, a Moraxella, a Mycobacterium, a Mycolicibacterium, a Nesterenkonia, an Orthopoxvirus, a Paenibacillus, a Paenirhodobacter, a Paraburkholderia, a Paracoccus
- the methods disclosed herein further comprise determining the disease comprises mild, moderate, or severe ulcerative colitis, wherein the determining comprises analyzing the sequence reads to detect an elevated or decreased level of cell-free DNA in the sample derived from Acinetobacter, Staphylococcus, Blautia, Anoxybacillus, Paracoccus, or any combination thereof, relative to an otherwise comparable sample from a subject that is asymptomatic or in remission for ulcerative colitis.
- the methods disclosed herein further comprise determining the disease comprises mild, moderate, or severe ulcerative colitis, wherein the determining comprises analyzing the sequence reads to detect an elevated or decreased level of cell-free DNA in the sample derived from an Uroviricota, a Chromadorea, a Clostridia, a Coriobacteriia, a Sordariomycetes, a Eubacteriales, a Propionibacteriales, a Sordariales, an Alicyclobacillaceae, a Bacteroidaceae, a Burkholderiales UnClass, a Caudoviricetes UnClass UnClass, a Clostridiaceae, an Eubacteriales UnClass, a Lachnospiraceae, a Micrococcaceae, an Onchocercidae, an Oscillospiraceae, a Phyllobacteriaceae, a Propionibacteriaceae, a Sordariaceae,
- the methods disclosed herein further comprise determining the disease comprises mild, moderate, or severe Crohn’s disease, wherein the determining comprises analyzing the sequence reads to detect an elevated presence of cell-free DNA in the sample derived from Malassezia, Acinetobacter, Streptococcus, Lactobacillus, Lactobacillus, or any combination thereof, relative to an otherwise comparable sample from a subject that is asymptomatic or in remission for Crohn’s disease.
- the methods disclosed herein further comprise determining the disease comprises mild, moderate, or severe Crohn’s disease, wherein the determining comprises analyzing the sequence reads to detect an elevated presence of cell-free DNA in the sample derived from Malassezia restricta, Acinetobacter baumannii, Streptococcus sanguinis. Lactobacillus plantarum, Lactobacillus crispatus, or any combination thereof, relative to an otherwise comparable sample from a subject that is asymptomatic or in remission for Crohn’s disease.
- the methods disclosed herein further comprise determining that the subject has a disease based at least in part on the analyzing the genetic locus.
- the determining that the subject has a disease based on the biomarkers comprises a sensitivity of AUC >0.95.
- the sequencing comprises sequencing 100 million to 400 million paired-end reads per sample.
- the sequencing comprises sequencing 300 million to 400 million paired-end reads per sample.
- the sequencing comprises from 5X to 10X average coverage per base pair per sample.
- the sequencing further comprises enriching for cell-free nucleic acids of less than 170 base pairs in length.
- the analyzing the sequence reads comprises calculating a log2 effect size, wherein a positive number is an increase, and a negative number is a decrease.
- the methods disclosed herein further comprise generating a DNA library from the cell-free DNA prior to the sequencing.
- the generating the DNA library comprises attaching an adapter to one or both ends of the DNA to produce adapted DNA.
- the medicament comprises an aminosalicylate, a mesalamine, a 5-aminosalicylic acid (5-ASA), a steroid, a prednisolone, a thiopurine, a JAK inhibitor, an anti-TNF antibody, a TNF inhibitor, a ustekinumab, a tofacitinib, an infliximab, a golimumab, a vedolizumab, a mirikizumab, a a4p7 integrin, an IL- 12 cytokine, an IL-23 cytokine, a small molecule targeting sphingosine- 1 -phosphate, or any combination thereof.
- the subject prior to (b), the subject is determined to have IBD with an unknown subtype. In some embodiments, the subject has previously had an endoscopy procedure. In some embodiments, the endoscopy procedure indicated that the subject had indeterminate colitis.
- the cfNA comprises cfDNA. In some embodiments, the cfNA comprises cfRNA, mRNA. In some embodiments, the cfNA comprises bacterial cfDNA, fungal cfDNA, parasitic cfDNA, viral cfDNA, or any combination thereof. In some embodiments, the cfNA is not mitochondrial DNA.
- the method does not comprise analysis of total cfNA, total subject cfNA, total mcfNA, or any combination thereof.
- the cfNA does not comprise fetal nucleic acids.
- the subject is not a transplant recipient.
- the cfNA does not comprise a mixture of donor-derived cfDNA and host-derived cfDNA.
- the analyzing or determining the genetic locus comprises determining a ratio of a number of sequence reads covering the genetic locus to the number of reads covering a flanking region.
- the number of sequence reads covering the first genomic region comprises the number of sequence reads within 500 bp of the genetic locus, and wherein the number of sequence reads covering the flanking region comprises a number of sequence reads located more than 500bp from the genetic locus but within 2000 bp in either a 5’ or 3’ direction from the genetic locus.
- the ratio comprises a natural log of the number of sequence reads covering the first genomic region divided by the number of sequence reads covering the flanking region. In some embodiments, the ratio comprises a peak-to-flank ratio or a trough-to-flank ratio of sequence reads.
- the methods disclosed herein further comprise determining that a gene is differentially enriched when the value of the ratio for the gene is 0.3 or greater. In some embodiments, the methods disclosed herein further comprise analyzing a genetic locus in the mcfNA or the subject cfNA, or a combination thereof. In some embodiments, the methods disclosed herein further comprise analyzing a genetic locus within a human genome. In some embodiments, the method comprises detecting: a relative enrichment, wherein a relative enrichment comprises a peak-to-flank ratio of at least 0.1; a relative depletion, wherein the relative depletion comprises a trough-to-flank ratio of at least 0.1; or a combination of (a) and (b).
- the method comprises distinguishing between the subject having Crohn’s disease and being healthy based at least in part on detecting: a relative enrichment of GPIHBP1, SLC6A4, ZCWPW1, LINC00482, FENDRR, or any combination thereof; a relative depletion of C3, SNURF, SNRPN, NM_001371415, ACE2, or any combination thereof; or any combination of (a) and (b).
- the methods disclosed herein comprise distinguishing between the subject having Ulcerative colitis and being healthy based at least in part on detecting: a relative enrichment of ACE2, C3, NM_001371415, SNURF, TMEM259, or any combination thereof; a relative depletion of LINC00482, NR2F1, FENDRR, APOH, SLC6A4, or any combination thereof; or any combination of (a) and (b).
- the method comprises distinguishing between the subject having Ulcerative colitis and having Crohn’s disease based at least in part on detecting: a relative enrichment of HTT, PDGFA, HRAT92, MS4A10, KANK2, or any combination thereof; a relative depletion of SNORA23, AIF1, FANCF, ASF1B, TMEM259, or any combination thereof; or any combination of (a) and (b).
- the method comprises distinguishing between the subject having mild Ulcerative colitis and having moderate Ulcerative colitis based at least in part on detecting: a relative enrichment of RHO, CASC4, KLK9, HSPA12B, NM 001371415, or any combination thereof; a relative depletion of HCN2, LINC01410, MBNL3, PUS1, LTBP3, or any combination thereof; or any combination of (a) and (b).
- the method comprises distinguishing between the subject having mild Ulcerative colitis and being in remission for Ulcerative colitis based at least in part on detecting: a relative enrichment of MIA3, RHO, 0LIG2, MAML3, SDC2, or any combination thereof; a relative depletion of HCN2, TBX10, RUBCNL, KANK2, ARHGEF18, or any combination thereof; or any combination of (a) and (b).
- the method comprises distinguishing between the subject having mild ulcerative colitis and having severe Ulcerative colitis based at least in part on detecting: a relative enrichment of GDPD5, FBXO27, FAM110C, GFI1B, IFI27L2, or any combination thereof; a relative depletion of NM 001371343, DNAJB8-AS1, EYS, TPSB2, TTLL10, or any combination thereof; or any combination of (a) and (b).
- the method comprises distinguishing between the subject having moderate Ulcerative colitis and being in remission for Ulcerative colitis based at least in part on detecting: a relative enrichment of ARHGAP33, KDM1A, MYB, SDC2, RPS6KA2, or any combination thereof; a relative depletion of SIDT1, RUBCNL, DMRTB1, TMEM143, E2F3, or any combination thereof; or any combination of (a) and (b).
- the method comprises distinguishing between the subject having moderate Ulcerative colitis and having severe Ulcerative colitis based at least in part on detecting: a relative enrichment of HPSE, ANGPTL4, HSPB2-C1 lorf52, IFI27L2, KLHL22, or any combination thereof; a relative depletion of ADAMTS14, TBX15, MY018B, NM_001371417, SLC22A18AS, or any combination thereof; or any combination of (a) and (b).
- the method comprises distinguishing between the subject having severe Ulcerative colitis and being in remission for Ulcerative colitis based at least in part on detecting: a relative enrichment of EX0C2, IGSF3, PTGIS, ECHS1, H0XC8, or any combination thereof; a relative depletion of DNAJB8-AS, DMRTB1, MS4A10, MY018B, C8orf74, or any combination thereof; or any combination of (a) and (b).
- the method comprises distinguishing between the subject having mild Crohn’s disease and having moderate Crohn’s disease based at least in part on detecting a relative enrichment of POP7, FGF6, or a combination thereof.
- the method comprises distinguishing between the subject having mild Crohn’s disease and being in remission for Crohn’s disease based at least in part on detecting a relative enrichment of ST8SIA2, TNKS1BP1, POP7, or any combination thereof. In some embodiments, the method comprises distinguishing between the subject having mild Crohn’s disease and having severe Crohn’s disease based at least in part on detecting: a relative enrichment of SMYD4, LPCAT2, AADACL3, DPYSL5, TDRP, LOC100240728, or any combination thereof; a relative depletion of SOX11, HSPB9, MIR378I, GAL3ST4, or any combination thereof; or any combination of (a) and (b).
- the method comprises distinguishing between the subject having moderate Crohn’s disease and being in remission for Crohn’s disease based at least in part on detecting a relative enrichment of MUC12, NAA40, UBE2E2-AS1, MHENCR, UBE2E2, or any combination thereof. In some embodiments, the method comprises distinguishing between the subject having moderate Crohn’s disease and having severe Crohn’s disease based at least in part on detecting: a relative enrichment of KPTN, SHISA5, PRSS38, SLCO2A1, AMDHD2, ST8SIA2, or any combination thereof; a relative depletion of SOX11, CLEC4G, MIR378I, or any combination thereof; or any combination of (a) and (b).
- the method comprises distinguishing between the subject having severe Crohn’s disease and being in remission for Crohn’s disease based at least in part on detecting: a relative enrichment of ST8SIA2, CLIP3, LOC101929243, AMDHD2, UBE2E2, or any combination thereof a relative depletion of GAL3ST4, CLEC4G, MIR378I, HSPB9, SOX11, or any combination thereof; or any combination of (a) and (b).
- Disclosed herein in some embodiments is a system configured to perform any one of the methods disclosed herein. In some embodiments, the system is configured to present a result of the analysis of the sequence reads to the healthcare provider.
- the system is configured to provide the healthcare provider with a clinical interpretation of the result of the analysis of the sequence reads.
- the clinical interpretation of the result of the analysis of the sequence reads comprises that the patient has ulcerative colitis, Crohn’s disease, or any other IBD syndrome or disease.
- the system is configured to recommend to the healthcare provider an administration of a therapy for the disease.
- FIG. 1 shows a Pilot Study Biosample Outline.
- FIG. 2A shows a density plot of MPM abundances per sample.
- FIG. 2B shows the frequency of microbes within a given MPM range per sample.
- FIG. 3A shows the fraction of microbes detected in each sample at various down sampling on to original full sampling.
- FIG. 3B shows a Pearson correlation between down sampling and original sampling.
- FIG. 4A shows the number of detected microbes unique to the disclosed methods versus the number of detected microbes unique to stool whole genome shotgun sequencing (WGS) versus the overlap between both.
- FIG. 4C shows samples with elevated number of microbes unique to KT are associated with severe diseases.
- FIG. 4D shows a ratio of number of overlapping microbes versus unique to stool.
- FIG. 5A shows a number of detected microbes, specifically eukaryotes, unique to the disclosed methods versus unique to stool by ITS2 sequencing versus overlap between both.
- FIG. 5B shows samples with elevated number of microbes, specifically eukaryotes/fungi, unique to KT are associated with severe disease states.
- FIGs. 6A-6D show the alpha diversity measurements for UC and asymptomatic patient populations.
- FIG. 6A and FIG. 6C show a species level assessment of unique detected microbes and Simpson’s evenness index.
- FIG. 6B and FIG. 6D show a genus level assessment of unique detected microbes and Simpson’s evenness index.
- FIG. 7A shows beta diversity assessment via PCoA with top three features.
- FIG. 7B shows a number of differentially detected microbes between asymptomatic and remission, mild, moderate, severe disease states for UC.
- FIG. 7C shows a critical feature assessment by PERMANOVA.
- FIG. 8A and FIG. 8B show alpha diversity measurements for UC and asymptomatic patient populations.
- FIG. 8A shows a species level assessment of unique detected microbes.
- FIG. 8B shows a genus level assessment of unique detected microbes.
- FIG. 9A shows a beta diversity assessment via PCoA.
- FIG. 9B shows a number of differentially detected microbes between asymptomatic and remission, mild, moderate, severe disease states for UC.
- FIG. 9C shows a critical feature assessment of PERMANOVA.
- FIG. 10A, FIG. 10B, and FIG. 10C show Binary Classifier for asymptomatic/remission versus mild/moderate/severe disease for Ulcerative colitis. Classifier is assessed by receiver operating characteristics curve (ROC), as shown in FIG. 10A, and Precision recall curve, as shown in FIG. 10B.
- FIG. 10C shows a list is critical features as determined by SHapley Additive exPlanations (SHAP).
- SHapley Additive exPlanations SHapley Additive exPlanations
- FIG. 11 A, FIG. 11B, and FIG. 11C show a Binary Classifier for asymptomatic/remission versus mild/moderate/severe disease for Crohn’s Disease. Classifier is assessed by ROC, as shown in FIG. 11A and Precision recall curve, as shown in FIG. 11B.
- FIG. 11 C shows a list is critical features as determined by SHAP.
- FIG. 12A, FIG 12B, and FIG. 12C show sample distribution and disease characterization for UC patients.
- FIG. 12A shows the number of samples from a site with a given disease label.
- FIG. 12B shows the association between disease severity label (Mayo score) and modified Mayo endoscopy score (MMES).
- FIG. 12C shows the Mayo Endoscopy sub-score for each region of the colon for each patient (columns).
- FIG. 13A, FIG. 13B, and FIG. 13C shows Ulcerative colitis patient treatment profiles.
- FIG. 13A shows counts for patients on a given treatment.
- FIG. 13B shows a correlation matrix between treatment and disease severity label.
- FIG. 13C shows unsupervised clustering of patients based on treatment profiles.
- FIG. 14A shows the correlation of high sensitivity CRP with disease severity (Mayo score label) UC samples.
- the Y axis bar break occurs and continues at 200 pg/g and is used for visualization.
- FIG. 14B shows the correlation between fecal calprotectin with disease severity label (Mayo score label) for UC samples.
- the Y axis bar break occurs and continue at 10 mg/L and is used for visualization.
- FIG. 15A and FIG. 15B show the distribution of abundance prior to regression of sitespecific signal.
- FIG. 15C and FIG. 15D show the distribution of abundance post regression of site-specific signal.
- FIG. 16 shows the sum of MPM across disease severity groups for UC samples stratified by clustered treatment groups.
- FIG. 17A and FIG. 17B show the enrichment of signal for stool-associated microbes from UC samples.
- FIG. 18A, FIG. 18B, and FIG. 18C show a linear discriminant analysis of UC samples. Remission, mild, severe are separated upon three dimensions in a supervised manner.
- FIGs. 19A-19F show a differential abundance analysis between disease severity labels for UC cohort using phylogenetically clustered microbes.
- FIG. 20A and FIG. 20B show Classifier performance characteristics for the UC cohort.
- FIG. 20A shows an AUC-RUC plot for predicted disease severity label. AUC is determined via one versus rest methodology.
- FIG. 20B shows a confusion matrix of true labels versus predicted labels.
- FIGs. 21A-21D show SHAP values to assess feature importance for UC based classifier.
- FIG. 22A, FIG. 22B, and FIG. 22C show sample distribution and disease characterization for CD patients.
- FIG. 22A shows the number of samples from a site with a given disease label.
- FIG. 22B shows an association between disease severity label (CDAI) and simple endoscopic score for CD (SES-CD).
- FIG. 22C shows an SES-CD subscore for each region of the colon and ileum for each patient (Columns).
- FIG. 23A, FIG. 23B, and FIG. 23C show the treatment profiles of Crohn’s disease patients.
- FIG. 23A shows the counts for patients on a given treatment.
- FIG. 23B shows a correlation matrix between treatment and CDAI.
- FIG. 23C shows unsupervised clustering of patients based on treatment profiles.
- FIG. 24A and FIG. 24B show a correlation of fecal calprotectin and high sensitivity CRP with disease severity (CDAI).
- Y axis bar breaks occur and continue at 200 pg/g in FIG. 24A and at 20 mg/L in FIG. 24B, respectively. The Y axis bar breaks are used for visualization.
- FIG. 25A and FIG. 25B show distribution of abundance prior to regression of site-specific signal.
- FIG. 25C and FIG. 25D show distribution of abundance post-regression of site-specific signal.
- FIG. 26 shows the sum of MPM across disease severity groups stratified by clustered treatment groups.
- FIG. 27A and FIG. 27B show the enrichment of signal for stool-associated microbes from CD samples.
- FIG. 28A, FIG. 28B, and FIG. 28C show a linear discriminant analysis of CD samples. Remission, mild, moderate, severe are separated upon three dimensions in a supervised manner.
- FIGs. 29A-29F shows a differential analysis between disease severity labels for a CD cohort.
- FIG. 29A shows a differential abundance analysis for the CD cohort with a disease severity label of mild remission.
- FIG. 29B shows a differential abundance analysis for the CD cohort with a disease severity label of moderate remission.
- FIG. 29C shows a differential abundance analysis for the CD cohort with a disease severity label of remission-severe.
- FIG. 29D shows a differential abundance analysis for the CD cohort with a disease severity label of mild-moderate.
- FIG. 29E shows a differential abundance analysis for the CD cohort with a disease severity label of mild- severe.
- FIG. 29F shows a differential abundance analysis for the CD cohort with a disease severity label of moderate-severe.
- FIG. 30A and FIG. 30B show Classifier performance characteristics for the CD cohort.
- FIG. 30A shows an AUC-ROC plot for predicted severity label. AUC is determined via one versus rest methodology.
- FIG. 30B shows a confusion matrix of true labels versus predicted labels.
- FIG. 31A, FIG. 31B, FIG. 31C, and FIG. 31D show SHAP values to assess feature importance for CD based classifier.
- FIG. 32A and FIG. 32B show classifier performance for predicting ulcerative colitis based on down-sampled reads (10% of original).
- FIG. 32A is an AUC-ROC plot.
- FIG. 32B is a confusion matrix.
- FIG. 32C and FIG. 32D show classifier performance for predicting Crohn’s disease severity based on down-sampled reads (10% of original).
- FIG. 32C is an AUC-ROC plot.
- FIG. 32D is a confusion matrix.
- FIGs. 33A-33I demonstrate plasma mcfDNA as an analyte to distinguish between Crohn’s disease, ulcerative colitis, and healthy/asymptomatic individuals.
- FIG. 33A shows a projection of samples via Bray-Curtis PCoA.
- FIG. 33B shows a projection of samples via linear discriminant analysis.
- FIG. 33C shows a differentially abundance analysis of samples from Crohn’s Disease patients and asymptomatic individuals.
- FIG. 33D shows a differentially abundance analysis of samples from ulcerative colitis patients and asymptomatic individuals.
- FIG. 33E shows a differentially abundance analysis of samples from Crohn’s disease patients and ulcerative colitis patients.
- FIG. 33F shows a classifier performance via x-cross validation.
- FIG. 33H shows an AUC-ROC plot to predict CD, UC, and asymptomatic labels.
- Classifier performance via LOSO leave one site out
- FIG. 33G and 331 show confusion matrix performance.
- FIG. 34A, FIG. 34C, and FIG. 34E show a summary of quality control metrics for ulcerative colitis cohorts.
- FIG. 34A shows reads passing a QC filter.
- FIG. 34C shows reads aligning to the human reference.
- FIG. 34E shows the number of deduplicated pathogen reads.
- FIG. 34B, FIG. 34D, and FIG. 34F shows a summary of quality control metrics for Crohn’s disease cohorts.
- FIG. 34B shows reads passing a QC filter.
- FIG. 34D shows reads aligning to the human reference.
- FIG. 34F shows the number of deduplicated pathogen reads.
- “AC” refers to analytical controls used in each sequencing run.
- RD refers to research samples of interest.
- EC refers to negative, non-template controls.
- AC refers to assay controls which contained a set concentration of a given set of microbes.
- FIGs. 35A-35F show disease type classification performance between healthy, ulcerative colitis (UC), and Crohn’s disease (CD) (all classes of disease, remission, mild, moderate, severe). Performance was assessed via 10-fold cross-validation repeated five times across different partitions (FIGs. 35B, 35D, and 35F), and using a leave one clinical site out strategy (LOSO) (FIGs. 35A, 35C, and 35E).
- FIGs. 35A-B Microbial Classifier
- FIGs. 35C-D Human Classifier
- FIGs. 35E-F Joint Classifier
- FIGs. 36A-36C show disease type classification performance between healthy, active UC, and active CD (mild, moderate severe disease) (FIG. 36A: Microbial Classifier; FIG. 36B: Human Classifier; FIG. 36C: Joint Classifier). Performance was assessed via 10-fold cross-validation repeated five times across different partitions.
- FIGs. 37A-37C show disease activity classification performance between UC disease activity groups (remission, mild moderate, severe) (FIG. 37A: Microbial Classifier; FIG. 37B: Human Classifier; FIG. 38C: Joint Classifier). Performance was assessed via 10-fold cross- validation repeated three times across different partitions.
- FIGs. 38A-38C show disease activity classification between UC disease activity groups (remission, mild) versus (moderate, severe) (FIG. 38A: Microbial Classifier; FIG. 38B: Human Classifier; FIG. 38C: Joint Classifier). Performance was assessed via 10-fold cross-validation repeated three times across different partitions.
- FIGs. 39A-39C show disease type classification performance between CD disease activity (remission, mild moderate, severe) (FIG. 39A: Microbial Classifier; FIG. 39B: Human Classifier; FIG. 39C: Joint Classifier). Performance was assessed via 10-fold cross-validation repeated three times across different partitions.
- FIGs. 40A-40C show disease type classification performance between CD disease activity groups (remission, mild) versus (moderate, severe) (FIG. 40A: Microbial Classifier; FIG. 40B: Human Classifier; FIG. 40C: Joint Classifier). Performance was assessed via 10-fold cross- validation repeated three times across different partitions.
- FIG. 41 shows a computer control system that is programmed or otherwise configured to implement methods provided herein. DETAILED DESCRIPTION OF THE INVENTION
- the term “or” is used to refer to a nonexclusive “or”; as such, “A or B” includes “A but not B,” “B but not A,” and “A and B,” unless otherwise indicated.
- the term “about” generally means plus or minus ten percent (10%) of a value, inclusive of the value, unless otherwise indicated by the context of the usage.
- “about 100” refers to any number from 90 to 110 and includes the number 100, unless otherwise indicated by the context in which the term is used.
- the term “about” a range refers to that range minus 10% of its lowest value and plus 10% of its greatest value.
- the terms “increased,” “increasing,” or “increase” are used herein to generally mean an increase by a statically significant amount.
- the terms “increased” or “increase” mean an increase of at least 10% as compared to a reference level, for example an increase of at least about 10%, at least about 20%, or at least about 30%, or at least about 40%, or at least about 50%, or at least about 60%, or at least about 70%, or at least about 80%, or at least about 90% or up to and including a 100% increase or any increase between 10-100% as compared to a reference level, standard, or control.
- Other examples of “increase” include an increase of at least 2-fold, at least 5-fold, at least 10-fold, at least 20-fold, at least 50-fold, at least 100-fold, at least 1000-fold or more as compared to a reference level.
- decreased means a reduction by at least 10% as compared to a reference level, for example a decrease by at least about 20%, or at least about 30%, or at least about 40%, or at least about 50%, or at least about 60%, or at least about 70%, or at least about 80%, or at least about 90% or up to and including a 100% decrease (e.g., absent level or non-detectable level as compared to a reference level), or any decrease between 10-100% as compared to a reference level.
- a 100% decrease e.g., absent level or non-detectable level as compared to a reference level
- a marker or symptom by these terms is meant a statistically significant decrease in such level.
- the decrease can be, for example, at least 10%, at least 20%, at least 30%, at least 40% or more, and is preferably down to a level accepted as within the range of normal for an individual without a given disease.
- range format is merely for convenience and brevity and should not be construed as an inflexible limitation on the scope of the disclosure. Accordingly, the description of a range should be considered to have specifically disclosed all the possible subranges as well as individual numerical values within that range.
- “abundance” refers to the quantity of something, such as, for example, the quantity or number of molecules, such as nucleic acids.
- “relative abundance” is the abundance of a molecule or molecules of interest per abundance of a reference molecule or molecules of interest.
- relative abundance of target nucleic acid molecules e.g., microbial cell-free nucleic acids
- reference nucleic acids e.g., host nucleic acids, synthetic nucleic acid added to the sample, etc.
- absolute abundance is the abundance of molecules per a defined unit of initial sample or sample quantity.
- absolute abundance of target nucleic acid molecules refers to the abundance per defined unit of sample quantity (e.g., sample volume, sample mass etc.).
- “adapter” or “portions of an adapter” refers to a chemically synthesized, single-stranded, or double-stranded oligonucleotide that can be attached, e.g., covalently (e.g., ligation, primer extension) or non-covalently (e.g., hybridization), to the ends of nucleic acid molecules, such as DNA or RNA molecules.
- Adapter sequences can be of any length.
- Adapter can refer to either a full-length adapter or a portion of the adapter, e.g., partial adapters can be attached in some embodiments before the full-lengths are introduced by e.g., indexing primers in amplification steps.
- 3'-end adapters and 5'-end adapters can be full-length or a portion of an adapter sequence that are attached to the opposite ends of a target nucleic acid, a copy of a target nucleic acid, or a target nucleic acid complement.
- 3'-end adapters and 5'-end adapters sequences end up being attached to the opposite ends of e.g., a template that can be sequenced that comprises target nucleic acid, a copy of a target nucleic acid, and/or a target nucleic acid complement.
- the 3'-end adapter and 5'-end adapter sequences can be the same or they can be different.
- antibody refers to a type of immunoglobulin molecule and is used in the broadest sense to include intact antibodies as well as antibody fragments.
- antibodies comprise at least one antigen-binding domain.
- an antibody as described herein can have an antigen binding domain or antigen binding region, the antigen binding domain or antigen binding region being specific for an antigen.
- the antigen is a bulky moiety, such as digoxigenin.
- “bulky moiety” refers to a molecule that takes up more space than is conventionally required or a molecule that forms a complex that takes up more space than is conventionally required.
- a bulky moiety may comprise any reactive group capable of forming covalent, non-covalent, or coordinating chemical bonds.
- the bulky moiety comprises one or more azide groups and products of reactions with azide groups, one or more small molecules, one or more polyhistidine tags, one or more antigens, and/or one or more proteins.
- the bulky moiety comprises digoxigenin.
- a bulky moiety for example, can include a functional group that is sterically hindering and can prevent certain enzymatic or chemical reactions from occurring.
- a bulky group can block a position.
- a bulky group can also affect a molecule's shape and reactivity and so prevent a reaction from occurring through the steric hindrance.
- Moieties attached to the groups by covalent, non-covalent or coordinating bonds can provide the bulkiness of the groups.
- a bulky molecule such as a protein or polymer can be attached covalently to an azide group; a bulky entity such as a bead, protein or polymer can be attached to a his tag through coordinating bonds using Ni ions; or an antigen antibody can be attached to an antigen attached to an adapter such as an anti-digoxigenin antibody can be attached to digoxigenin.
- Examples of bulky moieties can also include, but are not limited to, complexes between any of the molecules disclosed herein and their respective binding and reaction partners including, for example, without limitation, complexes between digoxigenin and an anti- digoxigenin antibody; polyhistidine tag and aNi-NTA-containing polymer; a protein and a binding partner; an azide group and a covalently bound large molecule; and the biotin and streptavidin complex.
- Additional examples of bulky molecules include, for example, without limitation, biotin, azide groups and products of reactions with azide groups, one or more small molecules, one or more polyhistidine tags, and/or one or more proteins.
- Some embodiments further comprise introducing a bulky moiety into the splint oligonucleotide. Some embodiments further comprise introducing a bulky moiety on the template switching oligos; such bulky moieties can reduce concatemer formation.
- the bulky moiety can be introduced at a position that has the lowest effect on adapter attachment efficiency, such as the 5'-end region of the splint oligonucleotide, close to the 5'-end region of the splint oligonucleotide, or away from the ligation junction in ligation-based adapter attachment reactions.
- control refers to a standard of comparison.
- a “negative control” refers to a standard of comparison that is used to identify contaminants from samples, or to identify the nature of a signal in the absence of a sample (e.g., a background signal).
- a “positive control” refers to a standard of comparison that is designed to produce a positive result or signal. Generally, the presence of a substance (e.g., nucleic acid) is detected in a positive control that is run during an assay.
- Some embodiments of the disclosure comprise a positive and/or negative control. Some embodiments of the disclosure comprise an initial sample or samples without a positive and/or negative control. Some embodiments of the disclosure comprise an initial sample or samples without a positive control. Some embodiments of the disclosure comprise an initial sample or samples without a negative control.
- “denaturing” refers to a process in which biomolecules, such as proteins or nucleic acids, lose their native or higher order structure.
- Native and higher order structure can include, for example, without limitation, quaternary structure, tertiary structure, or secondary structure.
- a double-stranded nucleic acid molecule can be denatured into two singlestranded molecules.
- dephosphorylation or “dephosphorylating” refers to removal of a terminal phosphate group, such as the 5'- and/or 3'-end phosphate, from a nucleic acid, such as DNA to generate 5'- and/or 3'-hydroxyl groups.
- detect refers to quantitative or qualitative detection, including, without limitation, detection by identifying the presence, absence, quantity, frequency, concentration, sequence, form, structure, origin, or amount of an analyte.
- digoxigenin refers to a bulky molecule or its complex comprising the structure:
- GC-bias refers to differential performance (e.g., amplification) or treatment of nucleic acids having different GC content but identical length.
- GC-content or “guanine-cytosine content” refer to the percentage or quantity of nitrogenous bases in a nucleic acid, such as a DNA or RNA molecule, that are either guanine or cytosine or their chemical modifications.
- host refers to an organism that harbors another organism or microbe.
- a living thing e.g. a mammal such as a human being can be a host that harbors a microbe or pathogen, the microbe or pathogen being the non-host.
- identifying sequence element or “identifying tag” refers to an element of a sequence that identifies an index, a code, a barcode, a random sequence, an adapter, an overhang of non-templated nucleic acids, a tag comprising one or more non-templated nucleotides, a priming sequence, unique molecular identifiers, or any combination thereof.
- KI enow fragment refers to a large protein fragment of DNA polymerase I that retains the 5' 3' polymerase activity and the 3' 5' exonuclease activity for removal of precoding nucleotides and proofreading but loses its 5' 3' exonuclease activity.
- ligating refers to the joining of two ends of nucleic acid fragments through the action of an enzyme. DNA molecules and RNA molecules can be ligated. There are many methods of ligation and one skilled in the art would readily understand methods of ligation other than those disclosed herein.
- length bias refers to a bias with respect to length of a particular nucleic acid size or fragment length created by a sequencing library generation process as opposed to another size or fragment length. It can be preferable to reduce length bias for consistent or more accurate results. In some aspects, it can be preferable to increase a length bias for a certain range or against a certain range.
- pathogen refers to a microbe that can cause a disease, ailment, or an infection.
- microbes include one or more species or strains from one or more of the following genera: Coniosporium, Hantavirus, Talaromyces, Machlomovirus, Betatetravirus, Raoultella, Aeromonas, Ephemerovirus, Empedobacter, Loa, Macluravirus, Stenotrophomonas, Alfamovirus, Rosavirus, Emmonsia, Aggregatibacter, Orthopneumovirus, Weeksella, Nairovirus, Salivirus, Weissella, Mosavirus, Gammapartitivirus, Strongyloides, Passerivirus, Erysipelatoclostridium, Bacillarnavirus, lotatorquevirus, Taenia, Trypanosoma, Olsenella, Cladosporium, Rhizobium, Prevotella, Leclercia, Paracoccus, liarvirus, Lagovirus, Rasamsonia, Plasmodium, Acremonium, Chlamydia, Clon
- Microbes or pathogens can include archaea, bacteria, yeast, fungi, molds, protozoans, nematodes, eukaryotes, and/or viruses.
- Microbes or pathogens can also include DNA viruses, RNA viruses, culturable bacteria, additional fastidious and unculturable bacteria, mycobacteria, and eukaryotic pathogens (See, Bennett et al., Mandell, Douglas, and Bennett's Principles and Practice of Infectious Diseases, Ninth Edition; Elsevier, 2019; and Netter's Infectious Disease, 2nd Edition, Jong and Stevens, eds. Elsevier, 2021). Microbes or pathogens can also include any of the microbes known to a person of skill.
- microbe generally refers to bacteria, fungi, protists, parasites, viruses, or other entities that are usually detectable using a microscope.
- microorganism refers to a uni- or multi- cellular organism, such as, for example, a microscopic organism or macroscopic organism including but not limited to bacteria, fungi, protists, and parasites.
- Microbes are often pathogens responsible for disease, but can also exist in a non-pathogenic, symbiotic, commensalistic, mutualistic, or amensalistic relationship with a host, such as a human.
- nucleic acid refers to a polymer or oligomer of nucleotides and is generally synonymous with the term “polynucleotide” or “oligonucleotide.”
- Nucleic acids may comprise a deoxyribonucleotide, a ribonucleotide, a deoxyribonucleotide analog, chemically modified canonical deoxyribonucleotides, ribonucleotides, and/or ribonucleotide analog, nucleic acids with modified backbones, or any combination thereof.
- Nucleic acids can be of any length.
- a nucleic acid may comprise one or more modified nucleotides, such as methylated nucleotides or methylated nucleotide analogs. If present, modifications to the structure can be imparted before or after assembly of the polymer.
- the sequence of nucleotides can be interrupted by non-nucleotide components.
- a nucleic acid can be further modified after polymerization, such as by conjugation with a labeling component.
- a nucleic acid can be single-stranded, double-stranded, have higher numbers of strands (e.g, triple- stranded), and/or have a higher order of structure (e.g, tertiary, or quaternary structure).
- a target nucleic acid can be any type, category, or subcategory of nucleic acids.
- removal or extraction refers to steps prior to the start of generating or preparing a nucleic acid library that separate nucleic acids from at least one component with which they are normally associated. Removal or extraction of nucleic acids can refer to the process of creating an initial sample from a raw biological sample. For example, without limitation, the fractionation of whole blood into its component parts, such as plasma, can be considered to involve removal or extraction. Similarly, purification or isolation of DNA from a sample (e.g., plasma sample) can be considered extraction.
- Sequequencing generally refers to methods and technologies for determining the sequence of nucleotide bases in one or more polynucleotides. Sequencing can involve basic methods including Maxam-Gilbert sequencing and chain-termination methods, or de novo sequencing methods including shotgun sequencing and bridge PCR, or next-generation sequencing (NGS) methods (or massively-parallel sequencing method) including but not limited to polony sequencing, pyrosequencing, sequencing-by-synthesis, sequencing by ligation, ion semiconductor sequencing, single molecule sequencing, single-molecule real-time sequencing, nanopore sequencing, and others.
- NGS next-generation sequencing
- Sequencing can be performed by various systems currently available, such as, without limitation, a sequencing system by Illumina®, Pacific Biosciences®, Oxford Nanopore®, Genia Technologies®, or Life Technologies® and others.
- Such devices can provide a plurality of raw genetic data corresponding to the genetic information of a host (e.g., human), a non-host (e.g., a pathogen, an organ donor), a host-derived variant genetic sequence (e.g., a single nucleotide polymorphism), and/or combinations thereof as generated by the device from a sample provided by the subject.
- a host e.g., human
- a non-host e.g., a pathogen, an organ donor
- a host-derived variant genetic sequence e.g., a single nucleotide polymorphism
- derived from encompasses the terms “originated from,” “obtained from,” “obtainable from” and “created from,” and generally indicates that one specified material finds its origin in another specified material or has features that can be described with reference to the specified material.
- a sample can be derived from a blood draw
- a nucleic acid can be derived from a sample
- a sequence read can be derived from sequencing a nucleic acid, or any combination thereof.
- the phrase “uniformly distributed” refers to a distribution that is continuous or uniform between members of a family such that for each member of a family there is a predictable or symmetric interval between them.
- the term “non-uniformly distributed” refers to a distribution of members of a family that does not have a predictable or symmetric interval between them.
- This disclosure provides, in some embodiments, methods of using a sample comprising cell-free nucleic acids (cfNA) to assess the type and/or severity of gastrointestinal disorders such as inflammatory bowel disease.
- the subject may be a human subject.
- the methods can be practiced using samples that are not feces, such as plasma.
- the methods can distinguish between two or more different types of conditions, such as two different subtypes of inflammatory bowel disease.
- the methods provided herein can distinguish between ulcerative colitis (UC) and Crohn’s disease (CD), or can distinguish between UC, CD, and healthy.
- Distinguishing between different types of disease can, in some embodiments, inform a treatment regimen so that the treatment can be better tailored for the type of disease identified.
- the methods provided herein can also differentiate diseases based on relative severity.
- the methods can distinguish between severe, moderate, mild, in remission, or healthy conditions.
- differentiation based on severity can inform a treatment regimen.
- the method comprises enriching a sample for cfNA of a particular size, degradation status, and strandedness. For example, in some cases, the method comprises enriching cfNA for lengths between about 10 bases and 200 bases.
- the method comprises producing a single-stranded library by denaturing double-stranded and single-stranded fragments in a sample of cfNA so that the sample comprises single-stranded fragments present in the original sample as well as single-stranded fragments resulting from denaturing the double-stranded fragments. In some embodiments, the method does not comprise enriching a sample.
- This disclosure also provides microbial and/or mammalian (e.g., human) classifiers for differentiating between types or severity of disease.
- the classifiers are based on microbial cell-free nucleic acid (e.g., mcfDNA) and/or mammalian (e.g., human) cfNA (e.g., human cfDNA, circulating mRNA) signatures that are able to distinguish between different types of conditions (e.g., between ulcerative colitis, Crohn’s disease and/or healthy conditions).
- the classifiers are based on microbial cell-free nucleic acid (e.g., mcfDNA) and/or mammalian (e.g., human) NA (e.g., human cfDNA, circulating mRNA) signatures that are able to distinguish between different severities of a condition (e.g., between healthy, remission, mild disease, moderate disease, or severe disease).
- microbial cell-free nucleic acid (e.g., mcfDNA) signatures reflect microbes at the level of species, strain, genus, family, order, class, order, phylum, and/or kingdom.
- the mammalian or human signatures contain one or more biomarkers that are gene-level biomarkers, pathway-level biomarkers, or a combination thereof.
- a plurality of biomarkers are controlled by a common regulatory element.
- the classifiers provided herein may also, in some cases, be based on mammalian or human fragmentomics signatures.
- the mammalian or human fragmentomic signature reflects non-random patterns present in cell-free nucleic acids (e.g., cfDNA).
- Some examples of fragmentomic features include, but are not limited to, location, size, orientation, motifs, and/or cut-site motifs.
- the classifiers can be based on analysis of cell-free nucleic acids (e.g., human cfDNA) associated with genes, promoters, and/or enhancers, as an indication of gene activation.
- the fragmentomics markers are gene-level biomarkers, pathway-level biomarkers, or a combination thereof.
- the classifiers are based on microbial cell-free nucleic acid (e.g., mcfDNA) signatures and subject cell-free nucleic acid (e.g., subject cfDNA) signatures that, in combination, are able to distinguish between different types of conditions (e.g., between ulcerative colitis, Crohn’s disease and healthy conditions).
- mcfDNA microbial cell-free nucleic acid
- subject cell-free nucleic acid e.g., subject cfDNA signatures that, in combination, are able to distinguish between different types of conditions (e.g., between ulcerative colitis, Crohn’s disease and healthy conditions).
- the classifiers are based on microbial cell-free nucleic acid (e.g., mcfDNA) signatures and subject cell-free nucleic acid (e.g., subject cfDNA) signatures that, in combination, are able to distinguish between different severities of a condition (e.g., between healthy, remission, mild disease, moderate disease, and/or severe disease, or a combination thereof).
- the methods comprise combining data obtained from processing of subject cell-free DNA and data obtained from processing of microbial cell-free DNA.
- a subject may be a human subject.
- the methods disclosed herein may comprise combining data obtained from processing of subject cell-free DNA or microbial cell-free DNA and data obtained from any one or more other diagnostic methods or medical means, such as medical records, symptoms, or vital signs, etc.
- the cell-free nucleic acids are sequenced (e.g., by massively parallel sequencing, NGS, or beyond NGS) to obtain sequence reads.
- the sequence reads can be used to identify microbial sequences and/or human sequences in a sample.
- the microbial sequences can be used to detect microbes at a species, strain, genus, family, order, class, order, phylum, and/or kingdom level.
- the human sequences can be used to identify human genes that are dysregulated (e.g., upregulated or downregulated) in a particular condition.
- the identification of dysregulated genes is based on a fragmentomics analysis, particularly fragmentomics of cfDNA at regulatory elements such as promoters.
- the human and microbial cell-free nucleic acids are present as a combination within a single sample and are maintained in the single sample through sequencing analysis.
- a single sample containing a combination of human and microbial cell-free nucleic acids is divided and separately processed, with one of the samples analyzed for microbial cell-free nucleic acids, while another sample is analyzed for subject cell-free nucleic acids.
- samples can be processed differently; for example, one sample can be physically enriched for a certain length of DNA or RNA, and one sample can be physically enriched for a different length of DNA or RNA.
- one or more features of data may be identified via processing of subject cell-free DNA and/or microbial cell-free DNA, and such data can, in some instances, be aggregated or combined.
- a classifier is trained on input of data obtained from processing of subject cell-free DNA or microbial cell-free DNA.
- a classifier may be trained on input of data obtained from processing of both subject cell-free DNA and microbial cell-free DNA.
- the subject may be a human subject.
- the classifier is trained on data indicating a type of microbe and/or the identity of one or more dysregulated genes.
- This disclosure provides, in some embodiments, methods of using a sample physically enriched for microbial cell-free nucleic acids (e.g., microbial cell-free DNA) or relatively short fragments of mammalian cell-free nucleic acids (e.g., relatively short fragments of subject cell- free DNA) to assess the type and/or severity of gastrointestinal disorders such as inflammatory bowel disease.
- the methods comprise physically enriching cfNA by size selection using a size-selection method described herein.
- size selection comprises applying the cfNA to a solid support (e.g., magnetic beads).
- the methods can comprise enriching for degraded nucleic acid fragments.
- the degraded nucleic acid fragments comprise degraded double-stranded cfNA (degraded dscfNA) or degraded single-stranded cfNA (degraded sscfNA).
- the subject may be a human subject.
- the methods can be practiced using samples that are not feces, such as plasma.
- the methods can distinguish between two or more different types of conditions, such as two different types of inflammatory bowel disease.
- the methods provided herein can distinguish between ulcerative colitis and Crohn’s disease, or can distinguish between ulcerative colitis, Crohn’s disease, and healthy.
- Distinguishing between different types of disease can, in some embodiments, inform a treatment regimen so that the treatment can be better tailored for the type of disease identified.
- the methods provided herein can also differentiate disease based on relative severity. For example, the methods provided herein can, in some cases, can distinguish between severe, moderate, mild, in remission, or healthy conditions. Differentiation based on severity can, in some embodiments, inform a treatment regimen. In some embodiments, the method does not comprise enriching a sample.
- This disclosure provides, in some embodiments, methods of using a sample comprising a plurality of cell-free nucleic acids (cfNA) of different sources or strandedness to assess the type and/or severity of gastrointestinal disorders, such as inflammatory bowel disease.
- the plurality of the cfNA can comprise subject cell-free nucleic acids (subject cfNA), microbial cell-free nucleic acids (mcfNA), or a mixture thereof.
- the cfNA comprises double-stranded cfNA, single-stranded cfNA, degraded double-stranded cfNA (degraded dscfNA) and/or degraded single-stranded cfNA (degraded sscfNA).
- the methods comprise denaturing the double-stranded cfNA and/or the degraded dscfNA into single-stranded fragments.
- the methods comprise preparing a single-stranded DNA library from the denatured cfDNA.
- the methods comprise physically enriching short or degraded nucleic acids.
- the methods can distinguish between two or more different types of conditions, such as two different types of inflammatory bowel disease.
- the methods provided herein can distinguish between ulcerative colitis and Crohn’s disease, or can distinguish between ulcerative colitis, Crohn’s disease, and healthy. Distinguishing between different types of disease can, in some embodiments, inform a treatment regimen so that the treatment can be better tailored for the type of disease identified.
- the methods provided herein can also differentiate disease based on relative severity. For example, the methods provided herein can, in some cases, can distinguish between severe, moderate, mild, in remission, or healthy conditions. Differentiation based on severity can, in some embodiments, inform a treatment regimen. In some embodiments, the method does not comprise enriching a sample.
- the methods provided herein can be applied in various ways to aid the care of patients with IBD.
- the methods described herein can identify a specific subtype of IBD. For example, certain methods can detect that a patient has ulcerative colitis (UC). The patient can then be treated with medications and doses specifically tailored to UC, potentially avoiding medications or doses that are not optimal for treating UC.
- Another advantage of identifying the subtype of IBD is that it can guide which regions of the gastrointestinal tract are assessed during an endoscopy procedure. For instance, endoscopies for UC often target the large intestine and rectum, whereas endoscopies for CD are often more invasive and may involve parts of the small intestine, such as the ileum.
- the methods described herein detect UC in the subject, the subject may undergo an endoscopy focused on the large intestine and rectum, potentially avoiding the more invasive endoscopy typically indicated for CD.
- the methods provided herein can distinguish UC from CD in a non-surgical way and help patients avoid unnecessary procedures.
- the methods provided herein can detect the severity of a subtype of IBD (e.g., severe, moderate, mild, or in remission). Understanding the severity of a subtype of IBD is highly valuable for determining an appropriate course of treatment. For example, if a patient is identified as having severe or moderate disease, a clinician may prioritize ordering an endoscopy for further evaluation. Conversely, if the patient is identified as being in remission or having mild disease, the clinician may opt to manage the condition using standard medical care before considering an endoscopy.
- a subtype of IBD e.g., severe, moderate, mild, or in remission.
- the methods provided herein can monitor disease progression or assess treatment efficacy in a subject diagnosed with a subtype of IBD.
- the methods can inform selection of initial treatment regimen in a treatment-naive subject.
- the subject has received a treatment for the subtype of IBD and the methods can inform refinement of this treatment regimen.
- the subject may be in remission at the time of sample collection.
- the subject may have a flare of worsening symptoms at the time of sample collection.
- the methods described herein can categorize the severity of the subject’s IBD at the time of sample collection and inform subsequent clinical decisions.
- the current treatment regimen e.g., a maintenance dose of the medication
- the classifier categorizes the disease state as in remission
- the maintenance dose of the initial medication can also be reduced or discontinued. In this case, a less aggressive treatment may be adopted in the subject in remission.
- the classifier categorizes the subtype of IBD as mild, moderate, or severe, more aggressive treatments may be recommended based on the severity of the disease.
- These treatments may include increasing the dose of the initial medication above the maintenance dose, administering a second medication in addition to, or as a replacement for, the initial medication, performing an endoscopic procedure to further evaluate the severity and identify the location of the disease, surgically removing a portion of the GI tract if the endoscopy confirms the presence of moderate or severe IBD, or any combination thereof.
- the methods provided herein can monitor disease progression or treatment efficacy in a subject having with a subtype of IBD in remission.
- the methods can analyze a sample comprising cfNA from a subject who has been in remission on a specific IBD treatment over a period of time (e.g., 2-6 weeks).
- the methods can categorize the severity of the subject’s IBD and inform subsequent clinical decisions. If the subtype of IBD is categorized as being in remission, the current treatment regimen (e.g., a maintenance dose of the initial medication) should be continued. However, if the classifier categorizes the IBD as mild, moderate, or severe, more aggressive treatments may be recommended based on the severity of the disease.
- the methods provided herein can monitor the response to surgical intervention in a subject with a subtype of IBD.
- the methods can analyze a sample from the subject a few weeks (e.g., 2-6 weeks) after the surgical procedure, and the classifier described herein can categorize the severity of the subtype of IBD. If the subtype of IBD is categorized as being in remission, a maintenance therapy can be initiated or continued. If the subtype of IBD is categorized as mild, moderate, or severe, the treatment regimen may be adjusted by modifying either the type of medication or its dosage. Additionally, an endoscopic procedure may be performed to further evaluate the severity of the IBD or to identify the location of the lesions.
- the endoscopic procedure confirms that the subject is suffering from a moderate or severe subtype of IBD
- another surgery may be indicated to remove a portion of the subject’s digestive tract.
- the methods described herein can ensure ongoing management of the disease based on its progression and response to treatment.
- detecting or identifying an inflammatory disease comprises identifying an inflammatory disease subtype, or distinguishing between two or more subtypes (e.g., distinguishing between ulcerative colitis (UC) and Crohn’s disease).
- detecting or identifying an inflammatory disease comprises identifying a disease severity or distinguishing between two or more disease severities (e.g., between moderate and severe disease or disorder).
- identifying an inflammatory disease severity may comprise classifying a disease as mild, moderate, severe, in remission, and/or otherwise healthy, or any combination thereof.
- an inflammatory disease disclosed herein is an inflammatory disease that affects the digestive system, gastrointestinal system or tract, or gut, or a combination thereof.
- an inflammatory disease e.g., inflammatory bowel disorder (IBD)
- IBD inflammatory bowel disorder
- an inflammatory disease comprises an inflammatory bowel disease (IBD).
- an inflammatory disease may comprise Crohn’s disease, ulcerative colitis, microscopic colitis, autoimmune enteropathy, celiac disease, chronic radiation enteritis, and/or diverticulitis.
- IBD may comprise one or more diseases including but not limited to ulcerative colitis (UC), Crohn’ s disease, bowel inflammation, microscopic colitis, and/or indeterminate colitis.
- microscopic colitis may comprise lymphocytic colitis.
- microscopic colitis may comprise collagenous colitis.
- microscopic colitis may comprise lymphocytic colitis and collagenous colitis.
- IBD can be associated with inflammation of bowels, inflammation of a colon, or a combination thereof.
- IBD may be localized to a single segment of a digestive tract; in some instances, IBD may be present in multiple sites of the digestive tract. In some embodiments, IBD may be present in alternating segments of the digestive tract separated by healthy tissue.
- Crohn’s disease may affect any part of the gastrointestinal tract. Inflammation in Crohn’s disease can occur in patches, often with healthy tissue between affected areas.
- the gastrointestinal tract comprises the mouth and/or the anus.
- Crohn’s disease may affect the small bowel, particularly the end of the small bowel (ilium).
- Crohn’s disease may affect the end of the small bowel.
- Crohn’s disease may affect the colon.
- Crohn’s disease may affect the beginning of the colon.
- Crohn’s disease may affect the end of the small bowel and the beginning of the colon.
- Crohn’s disease may comprise inflammation of the intestinal wall.
- Crohn’s disease may comprise inflammation of one or more layers of the intestinal wall. In some embodiments, Crohn’s disease may comprise inflammation of all layers of the intestinal wall. In some embodiments, ulcerative colitis may affect the colon. In some embodiments, ulcerative colitis may affect the rectum.
- ulcerative colitis may affect the colon and the rectum.
- ulcerative colitis may comprise inflammation of the intestinal wall.
- ulcerative colitis may comprise inflammation of the large intestine.
- ulcerative colitis may comprise inflammation of the superficial lining of the large intestine.
- ulcerative colitis may comprise ulcers.
- ulcerative colitis may comprise ulcers in the large intestine.
- ulcerative colitis may comprise ulcers along the superficial lining of the large intestine.
- microscopic colitis may comprise inflammation of the colon.
- microscopic colitis may comprise inflammation of the colon that is only visible under a microscope.
- diverticulitis may comprise inflammation and/or infection of small pouches (e.g., diverticula) along the walls of the colon.
- celiac disease may comprise an autoimmune disorder.
- celiac disease may comprise an intolerance to gluten.
- celiac disease may comprise an intolerance to gluten which leads to damage of the small intestine.
- autoimmune enteropathy may comprise immune-mediated damage of the gastrointestinal tract.
- autoimmune enteropathy may comprise severe diarrhea and/or malabsorption.
- autoimmune enteropathy may affect the small intestine.
- chronic radiation enteritis may comprise chronic inflammation of the bowel.
- chronic radiation enteritis may comprise chronic inflammation of the bowel due to previous radiation therapy to the abdomen, pelvis, and/or rectum.
- an IBD may affect one or of a colon, a right colon, a transverse colon, an ilium, a small intestine, a large intestine, a stomach, a mouth, a rectum, an anus, an esophagus, a large bowel, and/or a pharynx.
- ulcerative colitis may be restricted to the colon.
- ulcerative colitis may present with continuous inflammation of the mucosal layer.
- Crohn’s disease may affect any part of the gastrointestinal tract.
- Crohn’s disease may affect one or more deeper layers of the bowel wall. In some embodiments, Crohn’s disease may affect the large bowel. In some embodiments, Crohn’s disease may affect the small bowel. In some embodiments, Crohn’s disease may affect the large bowel and the small bowel. In some embodiments, Crohn’s disease may result in one or more fistulas. In some embodiments, Crohn’s disease may result in one or more strictures. In some embodiments, Crohn’s disease may comprise complications. In some embodiments, complications resulting from Crohn’s disease may comprise one or more fistulas and/or one or more strictures. In some embodiments, Crohn’s disease may require one or more surgical interventions.
- complications resulting from Crohn’s disease may require one or more surgical interventions.
- ulcerative colitis may require one or more surgical interventions.
- a surgical intervention may be curative for ulcerative colitis.
- a colectomy may be curative for ulcerative colitis.
- a surgical intervention may not be curative for Crohn’s disease.
- one or more surgical interventions may be used to manage symptoms of Crohn’s disease.
- IBD may comprise extraintestinal manifestations.
- ulcerative colitis may comprise extraintestinal manifestations.
- Crohn’s disease may comprise extraintestinal manifestations.
- extraintestinal manifestations may comprise but are not limited to one or more of arthritis, a skin condition, erythema nodosum, pyoderma gangrenosum, an eye condition, uveitis, episcleritis, a liver condition, a blood vessel condition, joint pain, a skin lesion, inflammation, inflammation of the eye, fatigue, osteoporosis, hepatitis, thrombosis, and any combination thereof.
- one or more extraintestinal manifestations of Crohn’s disease may differ from one or more extraintestinal manifestations of ulcerative colitis.
- a particular set of one or more extraintestinal manifestations may inform a treatment regimen for an IBD.
- Crohn’s disease may comprise arthritis. In some embodiments, Crohn’s disease may comprise one or more skin conditions. In some embodiments, Crohn’s disease may be treated using methotrexate. In some embodiments, Crohn’s disease manifesting with rheumatologic symptoms may be treated using methotrexate.
- UC is characterized by inflammation of a colon.
- UC is associated with ulcers of the digestive tract.
- UC may be confined to the mucosal layer of the colon.
- UC may present with continuous inflammation.
- UC can affect the large intestine, rectum, colon, or any combination thereof.
- UC may result in an increased risk of colorectal cancer.
- a total colectomy may be used to treat UC.
- a total colectomy may be curative for UC.
- the methods provided herein are particularly useful for detecting a subtype of IBD (e.g., UC or CD) when a subject has indeterminate IBD.
- IBD e.g., UC or CD
- Indeterminate colitis can share characteristics of both UC and Crohn’s Disease (CD) and is generally not distinguishable as either UC or CD using an endoscopy and/or colonoscopy.
- the methods and compositions disclosed herein may distinguish a type or subtype of IBD when other diagnostic techniques indicate an IBD is indeterminate. For example, in some cases, an endoscopy may yield indeterminate results. Potential causes of indeterminant results include when a patient has continuous regions of inflammation. Crohn’s disease is associated with patches of inflammation.
- Severe CD can sometimes present as “continuous” if there is a high density of the CD patches, that makes the patches appear continuous. In such cases, differentiation between CD and UC is difficult. Indeterminant results from an endoscopy can also result when a region of the colon is inaccessible to the endoscope. The region can be inaccessible due to a number of factors including stricture or blockage.
- the methods provided herein are particularly useful for detecting a subtype of IBD (e.g., UC or CD) when a subject has IBD with an unknown subtype.
- IBD with an unknown subtype occurs when a clinical evaluation is inconclusive as to whether the subject as UC or CD. This may be, for example, because the subject has symptoms that present both in UC and CD.
- the methods and compositions disclosed herein may allow for distinguishing between UC and CD without performing a colonoscopy, or in cases of indeterminate colitis. In some embodiments, the methods and compositions disclosed herein may be more effective at differentiating between CD and UC than a colonoscopy. In some embodiments, the methods and compositions disclosed herein may be used to distinguish between a CD and a UC to avoid an incorrect treatment regimen.
- Crohn’s disease is characterized by inflammation of one or more parts of a digestive tract.
- Crohn’s disease may comprise swelling of the bowels.
- Crohn’s disease may affect the small intestine, the large intestine, or a combination thereof.
- Crohn’s disease may present with transmural inflammation.
- Crohn’s disease may affect the entire depth of the intestinal wall.
- Crohn’s disease may result in one or more of strictures, fistulas, and abscesses.
- Crohn’s disease may result in one or more perianal fistulas.
- Crohn’s disease may result in skip lesions.
- a skip lesion may comprise areas of disease separated by healthy tissue.
- Crohn’s disease may result in an increased risk of colorectal cancer.
- a surgery may be used to treat Crohn’s disease.
- a surgery may be used to manage symptoms of Crohn’s disease.
- Crohn’s disease may recur following a surgery.
- IBD may comprise an imbalance of one or more strains of bacteria in a digestive tract.
- an imbalance of one or more strains of bacteria in a digestive tract may be localized to a segment of the digestive tract.
- an imbalance of one or more strains of bacteria in a digestive tract may be localized to multiple segments of the digestive tract.
- the methods provided herein comprise detecting or diagnosing an inflammatory disease such as IBD, at least in part, based on symptoms, patient presentation, personal medical history, and/or family medical history.
- symptoms and/or patient presentation can be used along with the methods provided herein to detect or diagnose a disease or disorder (e.g., inflammatory disease).
- the methods comprise collecting a sample from a subject or patient with one or more symptoms of IBD and analyzing nucleic acids within the sample (e.g., cfDNA, mcfDNA) to determine whether the symptoms are caused by a particular subtype of inflammatory disease (e.g., a subtype of IBD).
- the methods comprise collecting a sample from a subject or patient with one or more symptoms of IBD and analyzing nucleic acids within the sample (e.g., cfDNA, mcfDNA) to determine whether the symptoms are associated with a certain severity of inflammatory disease (e.g., IBD, a subtype of IBD).
- a certain severity of inflammatory disease e.g., IBD, a subtype of IBD.
- an IBD may be diagnosed via a physical examination, blood test, stool test, imaging study, reported symptoms, medications, personal medical history, family medical history, or any combination thereof.
- symptoms of IBD may include but are not limited to: pain, discomfort, cramps, diarrhea, blood in the stool, mucus in the stool, stool urgency, weight loss, fever, poor appetite, mouth sores, low blood count, iron deficiency anemia, fatigue, joint pain, rashes, swelling, kidney stones, dehydration, malnutrition, bowel blockage, fistula, vomiting, colon cancer, rectal cancer, reduced bone density, eye irritation, skin changes, inflammation, inflammation of the skin, inflammation of the eye, inflammation of the joints, inflammation of the liver, inflammation of the bile ducts, delayed and/or impaired growth (in children), depression, anxiety, distress, mental health disturbance, and/or disruptions in daily functioning, or any combination thereof.
- symptoms of Crohn’s disease may include but are not limited to: abdominal pain, diarrhea, weight loss, fatigue, fever, blood in stool, reduced appetite, nausea, bloating, joint pain, skin rashes, and/or mouth sores, or any combination thereof.
- symptoms of ulcerative colitis may include but are not limited to: abdominal pain, diarrhea (often with blood), fatigue, weight loss, urgency to have bowel movements, rectal bleeding, fever, nausea, loss of appetite, and/or anemia, or any combination thereof.
- symptoms of an inflammatory bowel condition may include but are not limited to one or more of: long-term inflammation of the digestive tract, digestive discomfort, stomach cramps and pain, diarrhea, constipation, urgent need to have a bowel movement, feeling as though a bowel movement was incomplete, rectal bleeding, loss of appetite, weight loss, fatigue, night sweats, irregular periods, or any combination thereof.
- IBD may have a genetic component.
- IBD may comprise cell death in the intestinal tract.
- IBD may comprise release of cell-free DNA into the bloodstream at the intestinal tract.
- Crohn’s disease may have a genetic component.
- ulcerative colitis may have a genetic component.
- IBD in a subject may be classified as mild, moderate, severe, or in remission.
- ulcerative colitis in a subject may be classified as mild, moderate, severe, or in remission, or otherwise healthy, or any combination thereof.
- mild ulcerative colitis may comprise fewer than four rectal bleeding episodes per day.
- moderate ulcerative colitis may comprise more than four rectal bleeding episodes per day.
- severe ulcerative colitis may comprise more than four rectal bleeding episodes per day.
- severe ulcerative colitis may comprise more than four rectal bleeding episodes per day in addition to one or more systemic symptoms including but not limited to fever and anemia.
- UC may start with mild symptoms and gradually worsen.
- Crohn’s disease in a subject may be classified as mild, moderate, severe, or in remission.
- mild Crohn’s disease may comprise abdominal pain and/or diarrhea.
- moderate Crohn’s disease may comprise abdominal pain, diarrhea, weight loss, and/or nutritional deficiency.
- severe Crohn’s disease may comprise abdominal pain, diarrhea, weight loss, nutritional deficiency, one or more intestinal obstructions, and/or limited daily function.
- Crohn’s disease may be progressive.
- Crohn’s disease may start with mild symptoms and gradually worsen.
- distinguishing an inflammatory disease may comprise distinguishing between subtypes of disease that have one or more overlapping symptoms.
- distinguishing between subtypes of disease may comprise distinguishing between inflammatory bowel disorder and a healthy state.
- distinguishing between subtypes of disease may comprise distinguishing between inflammatory bowel disorder and a state of remission.
- distinguishing between subtypes of disease may comprise distinguishing between Crohn’s disease and ulcerative colitis.
- distinguishing between subtypes of disease may comprise distinguishing between Crohn’s disease and a state of remission.
- distinguishing between subtypes of disease may comprise distinguishing between Crohn’s disease and a healthy state. In some embodiments, distinguishing between subtypes of disease may comprise distinguishing between ulcerative colitis and a state of remission. In some embodiments, distinguishing between subtypes of disease may comprise distinguishing between ulcerative colitis and a healthy state.
- identifying or detecting an inflammatory disease may comprise distinguishing between one or more severity levels of a disease.
- a severity level of a disease may comprise a diagnosed severity.
- a diagnosed severity may comprise a diagnosed severity based on patient symptoms.
- severity of a disease may change over time.
- severity of a disease may increase over time.
- severity of a disease may decrease over time.
- severity of a disease may remain the same over time.
- distinguishing between one or more severity levels of a disease may comprise distinguishing between a severe disease, a moderate disease, a mild disease, a disease in remission, and a healthy state.
- distinguishing between one or more severity levels of a disease may comprise distinguishing between a severe disease, a moderate disease, a mild disease, and a disease in remission. In some embodiments, distinguishing between one or more severity levels of a disease may comprise distinguishing between a severe disease, a moderate disease, and a mild disease. In some embodiments, distinguishing between one or more severity levels of a disease may comprise distinguishing between a severe disease and a moderate disease. In some embodiments, distinguishing between one or more severity levels of a disease may comprise distinguishing between a severe disease and a mild disease. In some embodiments, distinguishing between one or more severity levels of a disease may comprise distinguishing between a moderate disease and a mild disease.
- distinguishing between one or more severity levels of a disease may comprise distinguishing between a severe disease and a disease in remission. In some embodiments, distinguishing between one or more severity levels of a disease may comprise distinguishing between a moderate disease and a disease in remission. In some embodiments, distinguishing between one or more severity levels of a disease may comprise distinguishing between a mild disease and a disease in remission. In some embodiments, distinguishing between one or more severity levels of a disease may comprise distinguishing between a moderate disease and a healthy state. In some embodiments, distinguishing between one or more severity levels of a disease may comprise distinguishing between a mild disease and a healthy state.
- distinguishing between one or more severity levels of a disease may comprise distinguishing between a severe disease and a healthy state. In some embodiments, distinguishing between one or more severity levels of a disease may comprise distinguishing between a disease in remission and a healthy state.
- a severity level of an IBD may be determined according to a scoring tool.
- a severity level of Crohn’s disease may be determined according to a scoring tool.
- a severity level of ulcerative colitis may be determined according to a scoring tool.
- a Simplified Endoscopic Score for Crohn’s Disease (SES-CD) may be used to determine a severity level of Crohn’s disease.
- an SES-CD may comprise findings obtained via endoscopy.
- an SES-CD may comprise findings obtained via endoscopic observation of intestinal mucosa.
- an SES-CD may be used in clinical practice. In some embodiments, an SES-CD may be used to assess disease activity. In some embodiments, an SES-CD may be used to assess response to treatment. In some embodiments, an SES-CD may be used to assess disease progression. In some embodiments, an SES-CD may assess one or more of the following segments of bowel: ileum, right colon, transverse colon, and/or left colon/sigmoid.
- an SES-CD may comprise the following scoring:
- an SES-CD may comprise scoring four parameters.
- the four parameters of an SES-CD may comprise size of ulcers, ulcerated surface, affected surface, and stenosis.
- an SES-CD may comprise scoring four parameters separately.
- an SES-CD may comprise scoring four parameters separately for each segment of bowel.
- scores of separate parameters evaluated in an SES-CD may be added together across all segments.
- a total score may be calculated in an SES-CD.
- a total score calculated in an SES- CD may comprise the sum of scores obtained for four parameters as described herein across all examined segments.
- an SES-CD total score may comprise a minimum total score of 0. In some embodiments, an SES-CD total score of 0 may indicate no disease activity. In some embodiments, an SES-CD total score may comprise a value of from 0 to 12. In some embodiments, an SES-CD total score may comprise 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, or 12. In some embodiments, an SES-CD may be used to guide therapeutic decisions. In some embodiments, an SES-CD may be used to monitor disease progression. In some embodiments, an SES-CD may be used to monitor disease remission in response to treatment.
- a Mayo Endoscopic Score may be used to determine an inflammation severity of a disease.
- a Mayo Endoscopic Score may comprise four categories of severity.
- a Mayo Endoscopic Score may be determined via endoscopy.
- each category of a Mayo Endoscopic Score may be assigned according to worst affected area observed during an endoscopy.
- a Mayo Endoscopic Score may provide a visual-based assessment of inflammation severity.
- a Mayo Endoscopic Score may be used to determine extent of disease activity.
- a Mayo Endoscopic Score may be used to guide one or more treatment decisions.
- a Mayo Endoscopic Score may be used to evaluate a response to a therapy.
- a Mayo Endoscopic Score may comprise a score of 0, 1, 2, or 3.
- a Mayo Endoscopic Score may comprise the following scoring:
- distinguishing an inflammatory disease may comprise distinguishing between one or more severity levels of a disease. In some embodiments, distinguishing an inflammatory disease may comprise distinguishing between one or more severity levels of an ulcerative colitis. In some embodiments, distinguishing between one or more severity levels of an ulcerative colitis may comprise distinguishing between a severe ulcerative colitis, a moderate ulcerative colitis, a mild ulcerative colitis, an ulcerative colitis in remission, and a healthy state. In some embodiments, distinguishing between one or more severity levels of an ulcerative colitis may comprise distinguishing between a severe ulcerative colitis, a moderate ulcerative colitis, a mild ulcerative colitis, and an ulcerative colitis in remission.
- distinguishing between one or more severity levels of an ulcerative colitis may comprise distinguishing between a severe ulcerative colitis, a moderate ulcerative colitis, and a mild ulcerative colitis. In some embodiments, distinguishing between one or more severity levels of an ulcerative colitis may comprise distinguishing between a severe ulcerative colitis and a moderate ulcerative colitis. In some embodiments, distinguishing between one or more severity levels of an ulcerative colitis may comprise distinguishing between a severe ulcerative colitis and a mild ulcerative colitis. In some embodiments, distinguishing between one or more severity levels of an ulcerative colitis may comprise distinguishing between a moderate ulcerative colitis and a mild ulcerative colitis.
- distinguishing between one or more severity levels of an ulcerative colitis may comprise distinguishing between a severe ulcerative colitis and an ulcerative colitis in remission. In some embodiments, distinguishing between one or more severity levels of an ulcerative colitis may comprise distinguishing between a moderate ulcerative colitis and an ulcerative colitis in remission. In some embodiments, distinguishing between one or more severity levels of an ulcerative colitis may comprise distinguishing between a mild ulcerative colitis and an ulcerative colitis in remission. In some embodiments, distinguishing between one or more severity levels of an ulcerative colitis may comprise distinguishing between a moderate ulcerative colitis and a healthy state.
- distinguishing between one or more severity levels of an ulcerative colitis may comprise distinguishing between a mild ulcerative colitis and a healthy state. In some embodiments, distinguishing between one or more severity levels of an ulcerative colitis may comprise distinguishing between a severe ulcerative colitis and a healthy state. In some embodiments, distinguishing between one or more severity levels of an ulcerative colitis may comprise distinguishing between an ulcerative colitis in remission and a healthy state. [0132] In some embodiments, distinguishing an inflammatory disease may comprise distinguishing between one or more severity levels of a disease. In some embodiments, distinguishing an inflammatory disease may comprise distinguishing between one or more severity levels of a Crohn’s disease.
- distinguishing between one or more severity levels of a Crohn’s disease may comprise distinguishing between a severe Crohn’s disease and a moderate Crohn’s disease. In some embodiments, distinguishing between one or more severity levels of a Crohn’s disease may comprise distinguishing between a severe Crohn’s disease and a mild Crohn’s disease. In some embodiments, distinguishing between one or more severity levels of a Crohn’s disease may comprise distinguishing between a moderate Crohn’s disease and a mild Crohn’s disease. In some embodiments, distinguishing between one or more severity levels of a Crohn’s disease may comprise distinguishing between a severe Crohn’s disease and a Crohn’s disease in remission.
- distinguishing between one or more severity levels of a Crohn’s disease may comprise distinguishing between a moderate Crohn’s disease and a Crohn’s disease in remission. In some embodiments, distinguishing between one or more severity levels of a Crohn’s disease may comprise distinguishing between a mild Crohn’s disease and a Crohn’s disease in remission. In some embodiments, distinguishing between one or more severity levels of a Crohn’s disease may comprise distinguishing between a moderate Crohn’s disease and a healthy state. In some embodiments, distinguishing between one or more severity levels of a Crohn’s disease may comprise distinguishing between a mild Crohn’ s disease and a healthy state.
- distinguishing between one or more severity levels of a Crohn’s disease may comprise distinguishing between a severe Crohn’s disease and a healthy state. In some embodiments, distinguishing between one or more severity levels of a Crohn’s disease may comprise distinguishing between a Crohn’s disease in remission and a healthy state.
- RNA enrichment may comprise physical enrichment.
- physical enrichment can refer to a method by which nucleic acids are separated, filtered, amplified, or otherwise selected for a characteristic.
- a characteristic can comprise a size, a length, a sequence, a physical modification (such as, but not limited to, phosphorylation, methylation, etc.), a strandedness, and/or a species.
- a strandedness can comprise a sing-strandedness or a double-strandedness.
- physically enriching can comprise increasing a percentage of nucleic acids within a plurality of nucleic acids that contain a characteristic.
- cell-free DNA in a sample may be physically enriched before sequencing. In some embodiments, cell-free DNA in a sample may be computationally enriched after sequencing. In some embodiments, cell-free DNA in a sample may not be enriched before sequencing. In some embodiments, cell-free DNA in a sample may not be enriched after sequencing. In some embodiments, cell-free DNA in a sample may not be enriched before or after sequencing. In some embodiments, cell-free DNA may be physically enriched for cell-free DNA fragments of a certain size and/or a range of sizes.
- cell-free DNA may be physically enriched for cell-free DNA fragments of a certain size, or range of sizes, to distinguish between subtypes of IBD and/or to distinguish between severity of IBD.
- the methods comprise enriching for mcfDNA in a sample comprising human and mcfDNA to distinguish between subtypes of IBD and/or to distinguish between severity of IBD.
- enriching for mcfDNA may comprising physically enriching for a particular size or length value to distinguish between subtypes of IBD and/or to distinguish between severity of IBD.
- the methods comprise enriching for human cfDNA in a sample comprising human and mcfDNA to distinguish between subtypes of IBD and/or to distinguish between severity of IBD.
- enriching for human cfDNA may comprising physically enriching for human cfDNA of a particular size or length value.
- cell-free DNA may be physically enriched for cell-free DNA fragments of a certain size or range of sizes.
- cell-free DNA may be physically enriched for cell-free DNA fragments of a certain size prior to sequencing.
- physically enriching for cell-free DNA fragments of a certain size may increase the analytical power of an assay.
- physically enriching for cell-free DNA fragments of a certain size may be critical to detecting or identifying a disease and/or a disease severity in a subject.
- physically enriching for cell-free DNA fragments of a certain size may comprise enriching for fragments of a certain length meeting a cutoff value.
- a cutoff value may encompass a range of values.
- a cutoff value may encompass any of the values between a range of values.
- this range can include physical enrichment ranges such as 5-100 bases, 6-120 bases, 8-75 bases, etc.
- physically enriching for less than 100 bases includes a method that physically enriches for less than 60 bases.
- a cutoff value may encompass a range of base pair lengths. In some embodiments, a cutoff value may encompass an upper limit. In some embodiments, a cutoff value may encompass a lower limit. In some embodiments, a cutoff value may encompass an upper limit and a lower limit.
- an upper limit and/or a lower limit may comprise a direct limit according to a physical technique.
- a direct limit according to a physical technique may comprise using gel-electrophoresis to remove fragments below and/or above a set length.
- an indirect limit according to a physical technique may comprise using a technique that has a tendency for enriching for a particular size.
- using beadbased size selection to physically enrich for an approximate fragment size can be an indirect technique.
- bead-based size selection may comprise a solid support capable of separating cell-free nucleic acids based on size.
- bead-based size selection may comprise eluting cell-free nucleic acids from a solid support.
- the solid support comprises beads; in some cases, a solid support can comprise magnetic beads.
- cell-free DNA fragments may be physically enriched for cell-free DNA fragments that are greater than 5 bp, greater than 10 bp, greater than 15 bp, greater than 20 bp, greater than 25 bp, greater than 30 bp, greater than 35 bp, greater than 40 bp, greater than 45 bp, or greater than 50 bp in length.
- cell-free DNA fragments may be physically enriched for cell-free DNA fragments that are less than 200 bp, less than 190 bp, less than 180 bp, less than 170 bp, less than 160 bp, less than 150 bp, less than 140 bp, less than 130 bp, less than 120 bp, less than 110 bp, less than 100 bp, less than 90 bp, less than 80 bp, less than 70 bp, less than 60 bp, less than 50, less than 40 bp in length, less than 30 bp in length, less than 20 bp in length, or less than 10 bp in length.
- cell-free DNA fragments may be physically enriched for cell-free DNA fragments that are about 200 bp, about 190 bp, about 180 bp, about 170 bp, about 160 bp, about 150 bp, about 140 bp, about 130 bp, about 120 bp, about 110 bp, about 100 bp, about 90 bp, about 80 bp, about 70 bp, about 60 bp, about 50 bp, about 40 bp in length, about 30 bp in length, about 20 bp in length, or about 10 bp in length.
- microbial cell-free DNA fragments may be physically enriched for microbial cell- free DNA fragments that are greater than 5 bp, greater than 10 bp, greater than 15 bp, greater than 20 bp, greater than 25 bp, greater than 30 bp, greater than 35 bp, greater than 40 bp, greater than 45 bp, or greater than 50 bp in length.
- sequence reads generated from microbial cell-free DNA may be physically enriched for microbial cell-free DNA fragments that are less than 200 bp, less than 190 bp, less than 180 bp, less than 170 bp, less than 160 bp, less than 150 bp, less than 140 bp, less than 130 bp, less than 120 bp, less than 110 bp, less than 100 bp, less than 90 bp, less than 80 bp, less than 70 bp, less than 60 bp, less than 50, less than 40 bp in length, less than 30 bp in length, less than 20 bp in length, or less than 10 bp in length.
- sequence reads generated from microbial cell-free DNA may be physically enriched for fragments that are about 200 bp, about 190 bp, about 180 bp, about 170 bp, about 160 bp, about 150 bp, about 140 bp, about 130 bp, about 120 bp, about 110 bp, about 100 bp, about 90 bp, about 80 bp, about 70 bp, about 60 bp, about 50 bp, about 40 bp in length, about 30 bp in length, about 20 bp in length, or about 10 bp in length.
- subject cell-free DNA fragments may be physically enriched for subject cell-free DNA fragments that are greater than 5 bp, greater than 10 bp, greater than 15 bp, greater than 20 bp, greater than 25 bp, greater than 30 bp, greater than 35 bp, greater than 40 bp, greater than 45 bp, or greater than 50 bp in length.
- sequence reads generated from subject cell-free DNA may be physically enriched for fragments that are less than 200 bp, less than 190 bp, less than 180 bp, less than 170 bp, less than 160 bp, less than 150 bp, less than 140 bp, less than 130 bp, less than 120 bp, less than 110 bp, less than 100 bp, less than 90 bp, less than 80 bp, less than 70 bp, less than 60 bp, less than 50, less than 40 bp in length, less than 30 bp in length, less than 20 bp in length, or less than 10 bp in length.
- sequence reads generated from subject cell-free DNA may be physically enriched for fragments that are about 200 bp, about 190 bp, about 180 bp, about 170 bp, about 160 bp, about 150 bp, about 140 bp, about 130 bp, about 120 bp, about 110 bp, about 100 bp, about 90 bp, about 80 bp, about 70 bp, about 60 bp, about 50 bp, about 40 bp in length, about 30 bp in length, about 20 bp in length, or about 10 bp in length.
- cell- free DNA fragments may be physically enriched for cell-free DNA fragments that are 10 - 200 bp in length.
- cell-free DNA fragments may be physically enriched for cell- free DNA fragments that are 50 - 200 bp in length. In some embodiments, cell-free DNA fragments may be physically enriched for cell-free DNA fragments that are 10 - 100 bp in length. In some embodiments, cell-free DNA fragments may be physically enriched for cell-free DNA fragments that are 50 - 150 bp in length. In some embodiments, cell-free DNA fragments may be physically enriched for cell-free DNA fragments that are 150 - 200 bp in length. In some embodiments, cell-free DNA fragments may be physically enriched for cell-free DNA fragments that are less than 170 bp in length.
- any of the lengths disclosed herein may comprise native DNA. In some embodiments, any of the lengths disclosed herein may comprise native DNA and one or more adapters. In some embodiments, DNA may be physically enriched for single-stranded DNA (ssDNA). In some embodiments, DNA may be physically enriched for ssDNA of a particular length of any of the lengths disclosed herein. In some embodiments, DNA may be physically enriched for double-stranded DNA (dsDNA). In some embodiments, DNA may be physically enriched for dsDNA of a particular length of any of the lengths disclosed herein.
- ssDNA single-stranded DNA
- dsDNA double-stranded DNA
- DNA may be physically enriched for dsDNA of a particular length of any of the lengths disclosed herein.
- enrichment of cell-free DNA fragments may comprise bead-based size selection.
- enrichment of cell-free DNA fragments may comprise the use of a solvent, for example, an acetate, alcohol, such as methanol, ethanol, or isopropanol.
- enrichment of cell-free DNA fragments e.g., mcfDNA or relatively short human or host cfDNA
- size selection electrophoresis e.g., gel-electrophoresis.
- enrichment of microbial cell-free DNA fragments and.
- enrichment may comprise use of size-selection electrophoresis (e.g., gel-electrophoresis).
- enrichment may comprise contacting a sample with a solid support (e.g., beads) to separate nucleic acids based on size or length.
- enrichment may comprise contacting a solid support (e.g., beads) with a solution comprising solvent, for example, an acetate, alcohol, such as methanol, ethanol, or isopropanol, such as after nucleic acids within a sample are bound to the solid support.
- enrichment may comprise the use of binding buffer.
- sequence reads generated from microbial cell-free DNA may be physically enriched for short fragments. In some embodiments, sequence reads generated from microbial cell-free DNA may be physically enriched for ultra-degraded fragments.
- physical enrichment may comprise bead-based size selection. In some embodiments, physical enrichment of subject cell-free DNA fragments may comprise gelelectrophoresis. In some embodiments, physical enrichment may comprise the use of a solvent, for example, an acetate, alcohol, such as methanol, ethanol, or isopropanol. In some embodiments, physical enrichment may comprise the use of binding buffer.
- relative enrichment or relative depletion of microbial cell-free DNA may be detected, calculated, or a combination thereof after physical enrichment of the microbial cell-free DNA. In some embodiments, relative enrichment or relative depletion of subject cell-free DNA may be detected, calculated, or a combination thereof after physical enrichment of the subject cell-free DNA.
- cell-free DNA fragments are sequenced following enrichment.
- sequencing may comprise high throughput sequencing.
- sequencing may comprise massively parallel sequencing, Next Generation sequencing, and/or post-Next Generation sequencing.
- cell-free DNA fragments are sequenced via deep sequencing.
- cell-free DNA fragments are sequenced using ultra deep sequencing.
- sequencing cell-free DNA fragments may comprise sequencing from 100 million to 200 million reads per sample of cell-free DNA fragments.
- ultra deep sequencing of cell-free DNA fragments may comprise sequencing from 100 million to 200 million reads per sample of cell-free DNA fragments. In some embodiments, sequencing cell-free DNA fragments may comprise sequencing from 100 million to 300 million reads per sample of cell-free DNA fragments. In some embodiments, ultra deep sequencing of cell-free DNA fragments may comprise sequencing from 200 million to 300 million reads per sample of cell-free DNA fragments. In some embodiments, sequencing cell-free DNA fragments may comprise sequencing from 200 million to 400 million reads per sample of cell-free DNA fragments. In some embodiments, ultra deep sequencing of cell-free DNA fragments may comprise sequencing from 200 million to 400 million reads per sample of cell-free DNA fragments.
- sequencing cell-free DNA fragments may comprise sequencing from 300 million to 400 million reads per sample of cell-free DNA fragments.
- ultra deep sequencing of cell-free DNA fragments may comprise sequencing from 300 million to 400 million reads per sample of cell-free DNA fragments.
- ultra deep sequencing of cell-free DNA fragments may comprise sequencing greater than 100 million, 200 million, 300 million, 400 million, 500 million, 600 million, 700 million, or 800 million reads per sample of cell-free DNA fragments.
- cell-free DNA fragments can be sequenced using 10-20 million, 20-50 million, or 50-100 million reads per sample.
- cell-free DNA fragments can be sequenced at an average depth of about 5-10, about 10-20, about 20-30, about 30-40, about 40-50, or about 50-100 reads per base.
- sequencing of cell-free DNA fragments may comprise 5X coverage per base pair. In some embodiments, sequencing of cell-free DNA fragments may comprise 10X coverage per base pair. In some embodiments, sequencing of cell-free DNA fragments may comprise 15X coverage per base pair. In some embodiments, sequencing of cell-free DNA fragments may comprise 5X, 10X, 15X, 20X, 25X, 30X, 35X, 40X, 45X, 50X, 55X, 60X, 65X, 70X, 75X, 80X, 85X, 90X, 95X, or 100X coverage per base pair. In some embodiments, sequencing of cell-free DNA fragments may comprise at least 2X, 3X, 4X, 5X, 6X.
- sequencing of cell-free DNA fragments may comprise at most 8X, 9X, 10X, 15X, 20X, 25X, 30X, 35X, 40X, 45X, 50X, 55X, 60X, 65X, 70X, 75X, 80X, 85X, 90X, 95X, or 100X coverage per base pair.
- sequencing of cell-free DNA fragments may comprise at most 8X, 9X, 10X, 15X, 20X, 25X, 30X, 35X, 40X, 45X, 50X, 55X, 60X, 65X, 70X, 75X, 80X, 85X, 90X, 95X, or 100X coverage per base pair.
- one or more samples of cell-free DNA fragments may be grouped for sequencing.
- sequencing of cell-free DNA fragments may comprise Illumina® sequencing.
- one or more samples of cell-free DNA fragments may be grouped for sequencing on an Illumina® 6000 S4 flowcell.
- at least 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 samples of cell-free DNA fragments may be grouped for sequencing.
- sequencing of cell-free DNA fragments may generate at least 100 million, at least 150 million, at least 200 million, at least 250 million, at least 300 million, at least 350 million, at least 400 million, at least 450 million, or at least 500 million paired-end reads per dataset.
- a genetic locus can comprise a microbial genetic locus.
- a locus can comprise a human genetic locus.
- a genetic locus can comprise one or more regions of DNA in a genome.
- a genetic locus may comprise a genomic locus.
- a genome may comprise a human genome.
- a genome may comprise a microbial genome.
- detecting or identifying relative enrichment and/or relative depletion of cell-free DNA may comprise identifying one or more relatively enriched and/or relatively depleted promoters. In some embodiments, detecting or identifying relatively enriched and/or relatively depleted cell-free DNA may comprise identifying one or more relatively enriched and/or relatively depleted transcription start sites (TSSs). In some embodiments, detecting or identifying relatively enriched and/or relatively depleted cell-free DNA may comprise generating sequence reads from the cell-free DNA.
- determining that a genetic locus is relatively enriched and/or relatively depleted may comprise determining a natural log ratio of a number of sequence reads. In some embodiments, determining a natural log ratio of a number of sequence reads may comprise determining a natural log ratio of a number of sequence reads covering a region within 500 bp of a genetic locus to a number of sequence reads located more than 500 bp but within 2000 bp in either the 5’ direction or the 3’ direction from the genetic locus.
- determining a natural log ratio of a number of sequence reads may comprise determining a natural log ratio of a number of sequence reads covering a region within 50, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, or 600 bp of a genetic locus to a number of sequence reads located more than 50, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, or 600 bp, but within 500, 1000, 1500, 2000, or 2500 bp in either the 5’ direction or the 3’ direction from the genetic locus.
- detecting or identifying relatively enriched cell-free DNA may comprise calculating a peak-to- flank natural log ratio.
- detecting or identifying relatively depleted cell-free DNA may comprise calculating a trough-to-flank natural log ratio.
- a peak- to-flank natural log ratio can have a value of 0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95, 1, or greater.
- a trough-to-flank natural log ratio can have a value of -0.1, -0.15, -0.2, -0.25, -0.3, -0.35, -0.4, -0.45, -0.5, -0.55, -0.6, -0.65, -0.7, -0.75, -0.8, -0.85, -0.9, -0.95, -1, or less.
- a natural log ratio of sequence reads may indicate a relative enrichment at a given genetic locus.
- a natural log ratio of sequence reads may indicate a relative depletion at a given genetic locus.
- an absolute value of a peak-to-flank natural log ratio or of a trough-to-flank natural log ratio of 0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95, 1, or greater may indicate a relative enrichment of a genetic locus.
- a peak-to-flank natural log ratio of 0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95, 1, or greater may indicate a relative enrichment of a genetic locus.
- an absolute value of a trough-to-flank natural log ratio or of a trough-to-flank natural log ratio of -0.1, -0.15, -0.2, -0.25, -0.3, -0.35, -0.4, -0.45, -0.5, -0.55, -0.6, -0.65, -0.7, - 0.75, -0.8, -0.85, -0.9, -0.95, -1, or less may indicate a relative depletion of a genetic locus.
- a trough-to-flank natural log ratio of -0.1, -0.15, -0.2, -0.25, -0.3, -0.35, -0.4, -0.45, -0.5, -0.55, -0.6, -0.65, -0.7, -0.75, -0.8, -0.85, -0.9, -0.95, -1, or less may indicate a relative depletion of a genetic locus.
- one or more relatively enriched and/or relatively depleted genomic loci may constitute a signature.
- a signature may indicate a presence of an inflammatory disorder.
- a signature may indicate a presence of an inflammatory bowel disorder.
- a signature may indicate a presence of ulcerative colitis.
- a signature may indicate a presence of Crohn’s disease.
- a signature may indicate a disease severity.
- a signature may indicate a mild, moderate, or severe disease severity or a disease in remission.
- subject cell-free DNA may be sequenced to generate sequence reads.
- sequence reads generated from subject cell-free DNA may be mapped onto one or more reference genomes.
- sequence reads generated from subject cell-free DNA may be analyzed to identify one or more relatively enriched or relatively depleted promoters.
- sequence reads generated from subject cell-free DNA may be analyzed to identify one or more relatively enriched or relatively depleted regulatory elements.
- sequence reads generated from subject cell-free DNA may be analyzed to identify one or more relatively enriched or relatively depleted transcription start sites (TSSs).
- sequence reads generated from subject cell-free DNA may be analyzed to identify one or more relatively enriched or relatively depleted genes.
- sequence reads generated from subject cell-free DNA may be analyzed via paired-end sequencing.
- paired-end alignments may be generated from subject cell-free DNA sequence reads.
- analysis of paired-end alignments may be used to identify one or more relatively enriched or relatively depleted genes.
- one or more relatively enriched or relatively depleted genes may comprise one or more gene-specific biomarkers.
- gene-specific biomarkers may be computed using one or more fragmentomic-based algorithms.
- one or more fragmentomic-based algorithms may comprise for example refTSS and/or ENCODE.
- additional analysis of data obtained from processing of nucleic acids as described herein may be performed using machine learning.
- machine learning may comprise the use of a machine-learning classifier.
- a machine-learning classifier is also known as a trained algorithm.
- performance of a machine-learning classifier as disclosed herein may be assessed using a 10-fold cross-validation.
- a 10-fold cross-validation may be performed multiple times across different partitions.
- a 10-fold cross- validation may comprise a leave one clinical site out (LOSO) strategy.
- a machine-learning classifier may comprise a generative model.
- a patient may be a human patient.
- a patient may be healthy.
- a patient may have a disease in remission.
- a patient may have a disease.
- a patient may have an inflammatory disorder.
- a patient may have an autoimmune condition.
- a patient may have an inflammatory bowel disorder.
- an inflammatory bowel disorder may comprise ulcerative colitis.
- an inflammatory bowel disorder may comprise Crohn’s disease.
- one or more genes listed in Table 2 may be analyzed to determine a type and/or severity of an inflammatory bowel disorder.
- one or more of the following genes may be analyzed to determine a type and/or severity of an inflammatory bowel disorder: 'C3', 'SNURF, 'SNRPN', 'ACE2', 'TMEM259', 'FANCF', 'C21orf62-ASl', 'STXBP1', 'ZCCHC10', and/or 'ZNF148'.
- one or more of the following genes may be analyzed to determine a type and/or severity of an inflammatory bowel disorder: 'GPIHBP1', 'PAXBP1', 'AIFl', 'POFUT2', 'SCAT1', 'LINC00205', 'MMGT1', 'SIGLEC15', 'SNAPC4', and/or 'LINC01970'.
- one or more of the following genes may be analyzed to determine a type and/or severity of an inflammatory bowel disorder: 'SLC6A4', 'C8orf58', 'ZCWPW1', 'NR2F1-AS1', 'LINC00482', 'AURKC, 'CCDC120', 'FENDRR', 'KIF9', and/or 'TESMIN'.
- one or more of the following genes may be analyzed to determine a type and/or severity of an inflammatory bowel disorder: 'WDR26', 'TMEM88B', 'NR2F1', 'PLXNB2', 'PXDN', 'FAM155A', 'KCNQ2', 'BAIAP2L1', 'CST3', and/or 'SYCE3'.
- one or more of the following genes may be analyzed to determine a type and/or severity of an inflammatory bowel disorder: 'ECHS1', 'UNC80', 'PNKD', 'CRYBB2', 'KIF25-AS1', 'APOH', 'TSPYL2', 'CELF4', 'HAR1B', and/or 'SNORA23'.
- one or more of the genes disclosed herein may be fed into a classifier to determine a type and/or severity of an inflammatory bowel disorder.
- relative enrichment or relative depletion of 'C3' may be analyzed to determine a type and/or a severity of an inflammatory bowel disorder.
- relative enrichment or relative depletion of 'SNURF' may be analyzed to determine a type and/or a severity of an inflammatory bowel disorder.
- relative enrichment or relative depletion of ‘NR2FE may be analyzed to determine a type and/or a severity of an inflammatory bowel disorder.
- relative enrichment or relative depletion of ‘GPUFBPl’ may be analyzed to determine a type and/or a severity of an inflammatory bowel disorder.
- relative enrichment or relative depletion of ‘HTT’ may be analyzed to determine a type and/or a severity of an inflammatory bowel disorder.
- relative enrichment or relative depletion of ‘SNORA23' may be analyzed to determine a type and/or a severity of an inflammatory bowel disorder.
- one or more genes may be relatively enriched or relatively depleted in a patient sample for a patient who has a disease as compared to a healthy control sample. In some embodiments, one or more genes may be relatively enriched or relatively depleted in a patient sample for a patient who has a disease as compared to a sample from a patient in remission. In some embodiments, one or more genes may be relatively enriched or relatively depleted in a patient sample for a patient who has a disease in remission as compared to a healthy control sample. In some embodiments, one or more genes may be relatively enriched or relatively depleted in a patient sample for a patient who has Crohn’s disease as compared to a healthy control sample.
- one or more genes may be relatively enriched or relatively depleted in a patient sample for a patient who has ulcerative colitis as compared to a healthy control sample. In some embodiments, one or more genes may be relatively enriched or relatively depleted in a patient sample for a patient who has Crohn’s disease as compared to a sample from a patient who has ulcerative colitis. In some embodiments, one or more genes may be relatively enriched or relatively depleted in a patient sample for a patient who has a severe disease as compared to a healthy control sample. In some embodiments, one or more genes may be relatively enriched or relatively depleted in a patient sample for a patient who has a moderate disease as compared to a healthy control sample.
- one or more genes may be relatively enriched or relatively depleted in a patient sample for a patient who has a mild disease as compared to a healthy control sample. In some embodiments, one or more genes may be relatively enriched or relatively depleted in a patient sample for a patient who has a severe disease as compared to a sample from a patient who has a moderate disease. In some embodiments, one or more genes may be relatively enriched or relatively depleted in a patient sample for a patient who has a severe disease as compared to a sample from a patient who has a mild disease. In some embodiments, one or more relatively enriched or relatively depleted genes may be fed into a classifier or a trained algorithm.
- a relatively enriched or relatively depleted gene may meet a statistical significance of p ⁇ 0.05 or ⁇ 0.01.
- one or more of the following genes may be relatively enriched or relatively depleted in a patient sample for a patient who has Crohn’s disease compared to a healthy control sample: 'NR0B1', 'NDP', 'ZBTB8B', 'PRICKLE3', 'SYP- AS1', 'TSPYL2', 'ZMYM3', 'NEXMIF', 'MAGEE2', 'BRWD3', 'TAF7L', 'GUCY2F', 'SIX3', 'RHOXF1P1', 'RHOXF1', 'FAM122B', 'MMGTF, 'L1CAM', 'HCFC1', 'LINC01119', 'ANKRD36BP2', 'C1
- one or more of the following genes may be relatively enriched or relatively depleted in a patient sample for a patient who has Crohn’s disease compared to a healthy control sample: 'C3', 'SNURF', 'SNRPN', 'NM_001371415', 'ACE2', 'TMEM259', 'FANCF', 'C21orf62-ASl', 'STXBP1', 'ZCCHC10', 'ZNF148', 'GPUffiPl', 'PAXBP1', 'AIFl', 'POFUT2', 'SCAT1', 'LINC00205', 'MMGT1', 'SIGLEC15', 'SNAPC4', 'LINC01970', 'SLC6A4', 'C8orf58', 'ZCWPW1', 'NR2F1-AS1','LIN
- one or more of the following genes may be relatively enriched or relatively depleted in a patient sample for a patient who has ulcerative colitis compared to a healthy control sample: 'LOC 101928626', 'TMEM88B', 'CDKN2A', ELAVL2', 'FAM205BP', 'DNAJB5-DT', 'FAM221B', 'LINC01410', 'NXNL2', 'NM_001371194', 'HDHD3', 'BRINP1', 'RC3H2', 'SCAI', 'FIBCDl', 'SURF1', 'NOTCH1', 'LCN8', 'PNPLA7', 'LINC00707', 'OPTN', 'MSRB2', 'NRBF2', 'COL13A1', 'CHST3', 'MARVELD1', 'AD
- one or more of the following genes may be relatively enriched or relatively depleted in a patient sample for a patient who has ulcerative colitis compared to a healthy control sample: 'ACE2', 'C3', 'NM_001371415', 'SNURF', 'TMEM259', 'SNRPN', 'FANCF', 'ZNF148', 'ZCCHC10', 'C21orf62-ASl', 'SCAT1', 'LINC00482', 'PAXBP 1', 'STXBP1', 'MMGT1', 'NR2F1- AS1', 'NR2F1', 'SYCE3', 'KIF9', 'SNAPC4', 'FENDRR', 'AIF1', 'POFUT2', 'APOH', 'LINC01970', 'SIGLEC15', 'AUR
- one or more of the following genes may be relatively enriched or relatively depleted in a patient sample for a patient who has ulcerative colitis as compared to a sample from a patient who has Crohn’s disease: 'LOC 101928626', 'TMEM88B', 'CDKN2A', 'FAM205BP', 'FAM221B', 'NXNL2', 'NM 001371194', 'HDHD3', 'RC3H2', 'SCAI', 'FIBCDl', 'SURF1', 'NOTCH1', 'LCN8', 'PNPLA7', 'MSRB2', 'NRBF2', 'COL13A1', 'CHST3', 'MARVELD1', 'ADRB1', 'GPR26', 'CAMK2N1', TTIH5', 'UPF2', 'LOC105
- one or more of the following genes may be relatively enriched or relatively depleted in a patient sample for a patient who has ulcerative colitis as compared to a sample from a patient who has Crohn’s disease: 'SNORA23', 'AIF1', 'HTT', 'FANCF', 'ASF1B', 'TMEM259', 'ERG', 'LOC105372633', 'SNURF', 'C3', 'SNRPN', 'TMEM164', 'PLEKHG3', 'LAMTOR4', 'FAM221B', 'PDGFA', 'HRAT92', 'ELMO2', 'JPH4', 'NM_001371249', 'LOC105374338', 'MS4A10', 'CHIC1', 'BMS1P14', 'KANK2', 'TICAM1',
- one or more of the following genes may be relatively enriched or relatively depleted in a patient sample for a patient who has mild ulcerative colitis as compared to a sample from a patient who has moderate ulcerative colitis: 'RHO', 'CASC4', 'KLK9', 'HSPA12B', 'NM 001371415', 'LIF', 'RTN4RL1', 'MAML3', 'HCN2', 'FLCN', 'MIA3', 'LINC01410', 'CRYBB2', 'LOCI 12268114', 'CHST3', 'MBNL3', 'C3', 'PUS1', 'ACE2', 'LTBP3', and/or 'OLIG2'.
- one or more of the following genes may be relatively enriched or relatively depleted in a patient sample for a patient who has mild ulcerative colitis as compared to a sample from a patient who is in remission from ulcerative colitis: 'MIA3', 'RHO', HCN2', 'OLIG2', 'MAML3', 'SDC2', 'TBX10', 'FBXO3', 'MAGEF1', 'RUBCNL', 'KANK2', 'ARHGEF18', 'ACE2', 'NUP155', 'ULK1', 'SUOX', 'LRMDA', 'CRYBB2', 'FBXO3-DT', 'RAB44', 'AHCY', 'LTBP3', 'MIR7-3HG', 'MIR212', 'ASGR2', 'PUS1', 'TMEM259',
- one or more of the following genes may be relatively enriched or relatively depleted in a patient sample for a patient who has mild ulcerative colitis as compared to a sample from a patient who has severe ulcerative colitis: 'GDPD5', 'FBXO27', 'F AMI IOC, 'NM_001371343', 'GFI1B', 'IFI27L2', 'KLHL22', 'CARMILl', 'N0VA1', 'EXOC2', 'CDKN2A', 'HPSE', 'COLECI 1', HIVEP2', 'OSER1', 'SNCB', 'C6orfl32', 'ANGPTL4', 'CDKN2B-AS1', 'LOC105376453', 'C19orf81', 'GPR182', 'CHMP1A', 'SURF2', HCK
- one or more of the following genes may be relatively enriched or relatively depleted in a patient sample for a patient who has moderate ulcerative colitis as compared to a sample from a patient who is in remission from ulcerative colitis: 'SIDTl', 'ARHGAP33', 'KDM1A', 'RUBCNL', 'MYB', 'SDC2', H0XC8', 'LINC00661', 'RPS6KA2', 'DMRTB1', 'SPRED3', 'TMEM143', 'MEX3A', 'LOC643339', 'MAPK15', 'MEGF6', and/or 'E2F3'.
- one or more of the following genes may be relatively enriched or relatively depleted in a patient sample for a patient who has moderate ulcerative colitis as compared to a sample from a patient who has severe ulcerative colitis: 'HPSE', 'ANGPTL4', HSPB2-C1 lorf52', 'IFI27L2', 'KLHL22', 'MARVELD3', 'ADRB1', 'NCOA7-AS1', 'LINC00539', 'ADAMTS14', 'EIF4E1B', HIVEP2', 'FAAP20', 'ZNF558', 'APCDD1L', 'EFCAB2', 'LOC105374338', 'TBX15', HIF3A', 'LDB2', 'MY018B', 'CDKN2B-AS1', 'MAB21L2', 'SNCB'
- one or more of the following genes may be relatively enriched or relatively depleted in a patient sample for a patient who is in remission from ulcerative colitis as compared to a sample from a patient who has severe ulcerative colitis: 'EXOC2', 'IGSF3', 'DNAJB8-AS1', 'DMRTB1', 'PTGIS', 'ECHS1', H0XC8', 'CARMILl', 'MS4A10', 'ANGPTL4', 'HPSE', 'LOC105376453', 'MY018B', 'LINC00706', 'C8orf74', 'ADRB1', 'E2F3', 'KDM1A', 'CDKN2A', 'LOC105374338', 'PDGFA', 'TJP3', 'MDGA2', 'TLCD3B',
- one or more of the following genes may be relatively enriched or relatively depleted in a patient sample for a patient who has mild Crohn’s disease as compared to a sample from a patient who has moderate Crohn’s disease: 'FGF6' and/or 'POP7'. In some embodiments, one or more of the following genes may be relatively enriched or relatively depleted in a patient sample for a patient who has mild Crohn’s disease as compared to a sample from a patient who is in remission from Crohn’s disease: 'ST8SIA2', 'TNKS1BPT, and/or 'POP7'.
- one or more of the following genes may be relatively enriched or relatively depleted in a patient sample for a patient who has mild Crohn’s disease as compared to a sample from a patient who has severe Crohn’s disease: 'LPCAT2', 'AADACL3', 'DPYSL5', 'TDRP', 'SOX11', 'HSPB9', 'LOCI 00240728', 'FGF6', 'TOR1A', 'SHISA5', 'PRSS38', 'SLCO2A1', 'MIR378I', 'GAL3ST4', 'LOC101929243', 'MARVELD3', 'SLC12A7', 'CLIP3', 'MUC12', 'PRDM12', 'ILRUN', 'UBE2E2', 'KPTN', 'AMDHD2', 'NOVAI', and/
- one or more of the following genes may be relatively enriched or relatively depleted in a patient sample for a patient who has moderate Crohn’s disease as compared to a sample from a patient who is in remission from Crohn’s disease: 'MUC12', 'NAA40', UBE2E2-AS1', 'MHENCR', and/or 'UBE2E2'.
- one or more of the following genes may be relatively enriched or relatively depleted in a patient sample for a patient who has moderate Crohn’s disease as compared to a sample from a patient who has severe Crohn’s disease: 'SOX11', 'PRSS38', 'SLCO2A1', 'AMDHD2', 'CLEC4G', 'ST8SIA2', 'LOC101929243', 'AADACL3', 'RAB1A', 'SCNM1', 'SHISA5', 'MIR378I', and 'KPTN'.
- one or more of the following genes may be relatively enriched or relatively depleted in a patient sample for a patient who is in remission from Crohn’s disease as compared to a sample from a patient who has severe Crohn’s disease: 'ST8SIA2', 'CLIP3', 'LOC 101929243', 'AMDHD2', 'GAL3ST4', 'UBE2E2', 'RABI A', 'SMYD4', 'MUC12', 'KPTN', 'PRDM12', UBE2E2-AS1', 'AADACL3', 'PRSS38', 'ILRUN', 'TDRP', 'TOR1A', 'ZSWIM4', 'CLEC4G', 'SLC12A7', 'MAPK8IP3', 'SLCO2A1', 'MIR378I', 'NOVAI', 'NA
- one or more of the following genes may be relatively enriched in a patient sample for a patient who has ulcerative colitis: 'C3', 'SNURF', 'SNRPN', 'NM_001371415', 'ACE2', 'TMEM259', 'FANCF', 'C21orf62-ASl', 'STXBP1', 'ZCCHC10', 'ZNF148', 'PAXBP1', 'AIF1', 'POFUT2', 'SCAT1', 'LINC00205', 'MMGT1', 'SIGLEC15', 'SNAPC4', 'LINC01970', 'C8orf58', 'NR2F1-AS1', 'AURKC, 'CCDC120', 'KIF9', 'TESMIN', 'WDR26', 'NR2F1',
- one or more of the following genes may be relatively enriched in a patient sample for a patient who has ulcerative colitis: 'C3', 'SNURF', 'NM_001371415', 'ACE2', and/or 'TMEM259'.
- one or more of the following genes may be relatively depleted in a patient sample for a patient who has ulcerative colitis: 'GPIHBPr, 'SLC6A4', 'ZCWPWT, 'FENDRR', 'FAM155A', 'MS4A10', 'AGR3', 'GPR139', and 'CYP2A6'.
- one or more of the following genes may be relatively depleted in a patient sample for a patient who has ulcerative colitis: 'LINC00482', 'NR2F1', 'FENDRR', 'APOH', and/or 'SLC6A4'.
- one or more of the following genes may be relatively enriched in a patient sample for a patient who has mild ulcerative colitis: 'RHO', 'CASC4', 'KLK9', 'HSPA12B', 'NM_001371415', 'LEF', 'RTN4RL1', 'MAML3', 'FLCN', 'MIA3', 'CRYBB2', 'LOCI 12268114', 'CHST3', 'C3', 'ACE2', 'OLIG2', 'SDC2', 'FBXO3', 'MAGEF1', 'FBXO3-DT', 'MIR7-3HG', 'MIR212', 'ASGR2', 'TMEM259', 'PAPP A', 'ANKRD36BP2', 'CCDC120', 'EIF2AK2', 'RSPH6A',
- one or more of the following genes may be relatively depleted in a patient sample for a patient who has mild ulcerative colitis: HCN2', 'LINC01410', 'MBNL3', 'PUS1', 'LTBP3', 'TBX10', 'RUBCNL', 'KANK2', 'ARHGEF18', 'SUOX', 'LRMDA', 'RAB44', 'AHCY', 'SIDT1', 'SLC22A18AS', 'IKZF2', and 'MS4A10'.
- one or more of the following genes may be relatively enriched in a patient sample for a patient who has moderate ulcerative colitis: 'RHO', 'CASC4', 'KLK9', HSPA12B', 'NM 001371415', 'LIF', 'RTN4RLF, 'MAML3', 'FLCN', 'MIA3', 'CRYBB2', 'LOCI 12268114', 'CHST3', 'C3', 'ACE2', 'OLIG2', 'SDC2', 'FBXO3', 'MAGEF1', 'FBXO3-DT', MIR7-3HG', MIR212', 'ASGR2', 'TMEM259', 'PAPPA', 'ANKRD36BP2', 'CCDC120', 'EIF2AK2', 'RSPH6A', 'EDNRB',
- one or more of the following genes may be relatively depleted in a patient sample for a patient who has moderate ulcerative colitis: 'HCN2', 'LINC01410', MBNL3', 'PUS1', 'LTBP3', 'TBX10', 'RUBCNL', 'KANK2', 'ARHGEF18', 'SUOX', 'LRMDA', 'RAB44', 'AHCY', 'SIDT1', 'SLC22A18AS', 'IKZF2', and MS4A10'.
- one or more of the following genes may be relatively enriched in a patient sample for a patient who has ulcerative colitis in remission: 'RHO', 'CASC4', 'KLK9', 'HSPA12B', 'NM_001371415', 'LIF', 'RTN4RL1', MAML3', 'FLCN', 'MIA3', 'CRYBB2', 'LOCI 12268114', 'CHST3', 'C3', 'ACE2', 'OLIG2', 'SDC2', 'FBXO3', MAGEF1', 'FBXO3-DT', 'MIR7-3HG', MIR212', 'ASGR2', 'TMEM259', 'PAPPA', 'ANKRD36BP2', 'CCDC120', 'EIF2AK2', 'RSPH6A', '
- one or more of the following genes may be relatively depleted in a patient sample for a patient who has ulcerative colitis in remission: HCN2', 'LINC01410', MBNL3', 'PUS1', 'LTBP3', 'TBX10', 'RUBCNL', 'KANK2', 'ARHGEF18', 'SUOX', 'LRMDA', 'RAB44', 'AHCY', 'SIDT1', 'SLC22A18AS', 'IKZF2', MS4A10'.
- one or more of the following genes may be relatively enriched in a patient sample for a patient who has severe ulcerative colitis: 'RHO', 'CASC4', 'KLK9', 'HSPA12B', 'NM_001371415', 'LIF', 'RTN4RLF, 'MAML3', 'FLCN', 'MIA3', 'CRYBB2', 'LOCI 12268114', 'CHST3', 'C3', 'ACE2', 'OLIG2', 'SDC2', 'FBXO3', 'MAGEF1', 'FBXO3-DT', 'MIR7-3HG', 'MIR212', 'ASGR2', 'TMEM259', 'PAPPA', 'ANKRD36BP2', 'CCDC120', 'EIF2AK2', 'RSPH6A',
- one or more of the following genes may be relatively depleted in a patient sample for a patient who has severe ulcerative colitis: 'HCN2', 'LINC01410', 'MBNL3', 'PUS1', 'LTBP3', 'TBX10', 'RUBCNL', 'KANK2', 'ARHGEF18', 'SUOX', 'LRMDA', 'RAB44', 'AHCY', 'SIDTl', 'SLC22A18AS', 'IKZF2', and 'MS4A10'.
- one or more of the following genes may be relatively enriched in a patient sample for a patient who has Crohn’s disease: 'C3', 'SNURF', 'SNRPN', 'NM_001371415', 'ACE2', 'TMEM259', 'FANCF', 'C21orf62-ASl', 'STXBP1', 'ZCCHC10', 'ZNF148', 'PAXBP1', 'AIF1', 'POFUT2', 'SCAT1', 'LINC00205', 'MMGT1', 'SIGLEC15', 'SNAPC4', 'LINC01970', 'C8orf58', 'NR2F1-AS1', 'AURKC, 'CCDC120', 'KIF9', 'TESMIN', 'WDR26', 'NR2F1'
- one or more of the following genes may be relatively enriched in a patient sample for a patient who has Crohn’s disease: 'C3', 'SNURF', 'SNRPN', 'NM_001371415', and/or 'ACE2'.
- one or more of the following genes may be relatively depleted in a patient sample for a patient who has Crohn’s disease: 'GPUFBPl', 'SLC6A4', 'ZCWPWT, 'FENDRR', 'FAM155A', 'MS4A10', 'AGR3', 'GPR139', and 'CYP2A6'.
- one or more of the following genes may be relatively depleted in a patient sample for a patient who has Crohn’s disease: 'GPH4BP1', 'SLC6A4', 'ZCWPW1', 'LINC00482', and/or 'FENDRR'.
- one or more of the following genes may be relatively enriched in a patient sample for a patient who has mild Crohn’s disease: 'FGF6', 'POP7', 'ST8SIA2', 'TNKS1BP1', 'LPCAT2', 'AADACL3', 'DPYSL5', 'TDRP', 'LOCI 00240728', 'TOR1A', 'SHISA5', 'PRSS38', 'SLCO2A1', 'LOC101929243', 'MARVELD3', 'SLC12A7', 'CLIP3', 'MUC12', 'PRDM12', 'ILRUN', 'UBE2E2', 'KPTN', 'AMDHD2', 'NOVA1', 'SMYD4', 'NAA40', 'UBE2E2-AS1', 'MHENCR',
- one or more of the following genes may be relatively depleted in a patient sample for a patient who has mild Crohn’s disease: 'SOX11', 'HSPB9', 'MIR378r, 'GAL3ST4', and 'CLEC4G'.
- one or more of the following genes may be relatively enriched in a patient sample for a patient who has moderate Crohn’s disease: 'FGF6', 'POP7', 'ST8SIA2', 'TNKS1BP1', 'LPCAT2', 'AADACL3', 'DPYSL5', 'TDRP', 'LOCI 00240728', 'TOR1A', 'SHISA5', 'PRSS38', 'SLCO2A1', 'LOC101929243', 'MARVELD3', 'SLC12A7', 'CLIP3', 'MUC12', 'PRDM12', 'UBE2E2', 'KPTN', 'AMDHD2', 'NOVAI', 'SMYD4', 'NAA40', 'UBE2E2-AS1', 'MHENCR', 'RAB1A',
- one or more of the following genes may be relatively depleted in a patient sample for a patient who has moderate Crohn’s disease: 'SOX11', 'HSPB9', 'MIR378r, 'GAL3ST4', and 'CLEC4G'.
- one or more of the following genes may be relatively enriched in a patient sample for a patient who has severe Crohn’s disease: 'FGF6', 'POP7', 'ST8SIA2', 'TNKS1BP1', 'LPCAT2', 'AADACL3', 'DPYSL5', 'TDRP', 'LOCI 00240728', 'TOR1A', 'SHISA5', 'PRSS38', 'SLCO2A1', 'LOC101929243', 'MARVELD3', 'SLC12A7', 'CLIP3', 'MUC12', 'PRDM12', 'UBE2E2', 'KPTN', 'AMDHD2', 'NOVAI', 'SMYD4', 'NAA40', 'UBE2E2-AS1', 'MHENCR', 'RAB1A',
- one or more of the following genes may be relatively depleted in a patient sample for a patient who has severe Crohn’s disease: 'SOX11', HSPB9', 'MIR378r, 'GAL3ST4', 'and CLEC4G'.
- one or more of the following genes may be relatively enriched in a patient sample for a patient who has Crohn’s disease in remission: 'FGF6', 'POP7', 'DPYSL5', 'TDRP', 'LOCI 00240728', 'TOR1A', 'SHISA5', 'PRSS38', 'SLCO2A1', 'LOC101929243', 'MARVELD3', 'SLC12A7', 'CLIP3', 'MUC12', 'PRDM12', 'UBE2E2', 'KPTN', 'AMDHD2', 'NOVAI', 'SMYD4', 'NAA40', 'UBE2E2-AS1', 'MHENCR', 'RAB1A', 'SCNM1', 'ZSWIM4', and 'MAPK8IP3'.
- one or more of the following genes may be relatively depleted in a patient sample for a patient who has Crohn’s disease in remission: 'SOX11', HSPB9', 'MIR378I', 'GAL3ST4', and 'CLEC4G'. Detecting or identifying microbes to detect or identify disease
- microbial cell-free DNA may be derived from or more microbes.
- microbial cell-free DNA may be shorter than human cell-free DNA.
- microbial cell-free DNA may be sequenced to determine abundance of one or more microbes.
- microbial cell-free DNA may be sequenced to generate sequence reads.
- sequence reads generated from microbial cell- free DNA may be mapped onto one or more reference genomes.
- mapping of sequence reads generated from microbial cell-free DNA may be used to identify one or more microbes.
- differential abundance analysis may be performed on data generated from analysis of sequence reads derived from microbial cell-free DNA.
- differential abundance analysis may compare groups of interest.
- groups of interest may comprise subjects in disease remission, subjects with mild disease, subjects with moderate disease, subjects with severe disease, and/or healthy subjects.
- effect size may be determined after differential abundance analysis is performed.
- one or more microbial features may be selected according to effect size.
- one or more microbial features may comprise one or more taxonomic ranks.
- one or more microbial features may comprise but are not limited to: species, genus, family, order, class, order, phylum, and kingdom.
- additional analysis may be performed using machine learning.
- machine learning may comprise the use of a machine-learning classifier or a trained algorithm.
- performance of a machine-learning classifier as disclosed herein may be assessed using a 10-fold cross-validation.
- a 10-fold cross-validation may be performed multiple times across different partitions.
- a 10-fold cross- validation may comprise a leave one clinical site out (LOSO) strategy.
- LOSO clinical site out
- IBD may result from the presence of or imbalance in an amount of one or more microbes in a subject.
- an imbalance in an amount of one or more microbes in a subject may comprise a statistically significant difference in detection of the one or more microbes relative to a control sample.
- one or more microbes may include but are not limited to: Bacteroides fragilis, Bacteroides thetaiotaomicron, Bacteroides vulgatus, Firmicutes, Faecalibacterium prausnitzii, Clostridium, Clostridium leptum, Clostridium difficile, Roseburia hominis, Eubacterium rectale, Lactobacillus acidophilus, Lactobacillus rhamnosus, Bifidobacterium bifidum, Bifidobacterium adolescentis, Streptococcus thermophilus, Escherichia coli, Enterococcus faecalis, Enterococcus faecium, Fusobacterium nucleatum, Prevotella copri, Prevotella histicola, Ruminococcus bromii, Ruminococcus torques, Parabacteroides distasonis, Akkermansia muciniphil
- an Acidovorax sp. SD340 an Acinetobacter baumannii, an Acinetobacter bereziniae, an Acinetobacter guillouiae, an Acinetobacter harbinensis, an Acinetobacter johnsonii, an Acinetobacter junii, an Acinetobacter Iwoffii, an Acinetobacter nosocomialis, an Acinetobacter sp. PT1, an Acinetobacter ursingii, an Actinomyces graevenitzii, an Actinomyces johnsonii, an Actinomyces oris, an Actinomyces sp. ICM47, an Actinomyces sp.
- ICM58 an Actinomyces sp. oral taxon 170, an Actinomyces sp. oral taxon 172, an Actinomyces sp. oral taxon 175, an Actinomyces sp. oral taxon 448, an Actinomyces viscosus, an Aeribacillus pallidus, an Afipia broomeae, an Afipia sp. NBIMC P1-C1, an Afipia sp. 0HSU I-C6, an Aggregatibacter aphrophilus, an Aggregatibacter segnis, an Aggregatibacter sp. oral taxon 458, an Agrobacterium sp.
- an Alicycliphilus sp. Bl an Alicyclobacillus acidocaldarius, an Alloprevotella rava, an Alloprevotella sp. oral taxon 473, an Alloprevotella tannerae, an Alternaria alternata, an Anaerobutyricum hallii, an Anaerococcus obesiensis, an Anaerostipes hadrus, an Anoxybacillus ayderensis, an Anoxybacillus flavithermus, an Anoxybacillus gonensis, an Aquabacterium parvum, an Aspergillus niger, an Aspergillus sydowii, an Aureobasidium pullulans, an Avian endogenous retrovirus EA V-HP, an Azospira oryzae, a Bacillus licheniformis, a Bacillus smithii, a Bacillus subtilis, a Bacteroides cacca
- KLE 1732 a Blautia wexlerae, a Bradyrhizobium cosmicum, a Bradyrhizobium elkanii, a Bradyrhizobium embrapense, a Bradyrhizobium japonicum, a Bradyrhizobium pachyrhizi, a Bradyrhizobium sp. 17-4, a Bradyrhizobium sp. BTAil, a Bradyrhizobium sp.
- Leaf 396 a Bradyrhizobium sp. YR681, a Bradyrhizobium viridifuturi, a Brevibacterium mcbrellneri, a Brevundimonas sp.
- KM4 a Brevundimonas vesicularis, a Brochothrix thermosphacta, a Brucella anthropi, a Brussowvirus bv2972, a Burkholderia contaminans, a Burkholderia vietnamiensis, a Caldibacillus debilis, a Caldibacillus thermoamylovorans, a Campylobacter concisus, a Campylobacter gracilis, a Campylobacter showae, a Candida tropicalis, a Candidatus Methylopumilus planktonicus, a Capnocytophaga gingivalis, a Capnocytophaga granulosa, a Capnocytophaga ochracea, a Capnocytophaga sp.
- CM59 a Capnocytophaga sp. oral taxon 329, a Capnocytophaga sp. oral taxon 332, a Capnocytophaga sploisa, a Cardiobacterium hominis, a Carnobacterium maltaromaticum, a Catonella morbi, a Ceduovirus bIL67, a Ceduovirus c2, a Chryseobacterium indologenes, a Citrobacter freundii, a Clostridia bacterium UC5.1-1D10, a Clostridia bacterium UC5.1-2H11, a Clostridiales bacterium KLE1615, a Clostridiales bacterium VE202-03, a Clostridiales bacterium VE202-07, a Clostridioides difficile, a Clostridium butyricum, a Clostridium celatum, a Clostridium para
- HPP0074 a Corynebacterium accolens, a Corynebacterium afermentans, a Corynebacterium aurimucosum, a Corynebacterium kroppenstedtii, a Corynebacterium matruchotii, a Corynebacterium minutissimum, a Corynebacterium nuruki, a Corynebacterium pseudodiphtheriticum, a Corynebacterium pseudogenitalium, a Corynebacterium sp. KPL1818, a Corynebacterium sp.
- KPL1859 a Cupriavidus pauculus, a Curvibacter americanus, a Cutibacterium acnes, a Cutibacterium namnetense, a Cyberlindnera jadinii, a Debaryomyces hansenii, a Deinococcus wulumuqiensis, a Delftia acidovorans, a Delftia lacustris, a Delftia sp. ZNC0008, a Delftia tsuruhatensis, a Dermabacter vaginalis, a Dermacoccus nishinomiyaensis, a Dermacoccus sp.
- PE3 a Diaphorobacter nitroreducens, a Diaphorobacter sp. J5-51, a Dorea longicatena, an Eikenella corrodens, an Eimeria mitis, an Empedobacter falsenii, an Enhydrobacter aerosaccus, an Enter obacter cloacae, an Enterobacter hormaechei, an Enterobacter sp. BIDMC 27, an Enter obacter sp. BIDMC 109, an Enterobacter sp. BIDMC87, an Enterobacter sp. BIDMC93, an Enterobacter sp. BWH64, an Enterobacter sp.
- MGH120 an Enterocloster clostridioformis, an Enterococcus cecorum, an Enterococcus italicus, an Escherichia coli, an Escherichia phage HK630, an Escherichia phage T7, an Eubacterium ramulus, an Exophiala oligosperma, a Facklamia hominis, a Faecalibacterium prausnitzii, a Filifactor alocis, a Finegoldia magna, a Flavonifractor plautii, a Francisella tularensis, a Fructilactobacillus sanfranciscensis, a Fusarium graminearum, a Fusobacterium hwasookii, a Fusobacterium nucleatum, a Fusobacterium periodonticum, a Gemella haemolysans, a Gemella morbil
- L2C054A000 a Mesorhizobium sp. L2C084A000, a Mesorhizobium sp. L2C089B000, a Mesorhizobium sp. LNHC209A00, a Mesorhizobium sp. LNHC220B00, a Mesorhizobium sp. LNHC229A00, a Mesorhizobium sp. LNHC252B00, a Mesorhizobium sp. LNJC384A00, a Mesorhizobium sp. LNJC405B00, a Mesorhizobium sp.
- LSHC412B00 a Mesorhizobium sp. LSHC420B00, a Mesorhizobium sp. LSHC432A00, a Mesorhizobium sp. LSJC268A00, a Mesorhizobium sp. LSJC269B00, a Mesorhizobium sp. LSJC277A00, a Methanosarcina sp. 1.H. T.1A.1, a Methanosarcina sp. 2.H. T.1A.15, a Methylobacterium sp.
- Leaf361 a Methylorubrum populi, a Me thy lover satilis universalis, a Microbacterium maritypicum, a Microbacterium oxydans, a Microbacterium sp. H83, a Micrococcus aloever ae, a Micrococcus luteus, a Micrococcus sp. CH 3, a Micrococcus sp.
- MS-ASIII-49 aModestobacter marinus, a Moineauvirus Abc2, a Moineauvirus DTI, aMoraxella catarrhalis, aMoraxella osloensis, aMorococcus cerebrosus, a Mycobacterium gastri, a Mycobacterium tuberculosis variant bovis, a Mycolicibacterium mucogenicum, a Mycolicibacterium obuense, a Neisseria cinerea, a Neisseria flavescens, a Neisseria macacae, a Neisseria mucosa, a Neisseria sicca, a Nesterenkonia massiliensis, a Nesterenkonia sp.
- AN1 a Nesterenkonia sp. JCM 19054, a Neurospora terricola, a Nocardioides sp. Rootl22, a Novosphingobium sp. AAP93, a Novosphingobium subterraneum, an Oribacterium sinus, a Paenibacillus terrigena, a Paenirhodobacter enshiensis, a Pantoea agglomerans, a Pantoea ananatis, a Pantoea dispersa, a Pantoea sp. PSNIH2, a Pantoea sp.
- aB a Pantoea vagans, a Parabacteroides distasonis, a Parabacteroides merdae, a Paracoccus sphaerophysae, a Parvimonas micra, a Pediococcus acidilactici, a Pelomonas sp. Root 1217, a Pelomonas sp. Root 1237, a Pelomonas sp. Root 1444, a Pelomonas sp.
- Root405 a Penicillium paxilli, a Penicillium roqueforti, a Peptostreptococcus stomatis, a Phocaeicola dorei, a Phocaeicola massiliensis, a Phocaeicola vulgatus, a Photobacterium phosphoreum, a Porphyromonas catoniae, a Porphyromonas endodontalis, a Porphyromonas sp.
- KLE 1280 a Prevotella aurantiaca, a Prevotella baroniae, a Prevotella buccae, a Prevotella conceptionensis, a Prevotella denticola, a Prevotella histicola, a Prevotella intermedia, a Prevotella jejuni, a Prevotella melaninogenica, a Prevotella nigrescens, a Prevotella oris, a Prevotella pallens, a Prevotella salivae, a Prevotella sp. C561, a Prevotella sp. F0091, a Prevotella sp.
- a Pseudomonas sp. AU 11447 a Pseudomonas sp. NBRC 111130, a Pseudomonas sp. NBRC 111131, a Pseudomonas sp. NBRC 111133, a Pseudomonas sp. P818, a Pseudomonas sp. W15Feb9B, a Pseudomonas toyotomiensis, a Pseudomonas weihenstephanensis, a Pseudoxanthomonas sp.
- GW2 a Pseudoxanthomonas suwonensis, a Psychrobacter sp. 1501(2011), a Ralstonia insidiosa, a Ralstonia mannitolilytica, a Ralstonia pickettii, a Raoultella planticola, a Rhizobium pusense, a Roseburia faecis, a Roseburia hominis, a Roseburia intestinalis, a Roseburia sp.
- UNK.MGS-15 a Roseomonas gilardii, a Rothia aeria, a Rothia dentocariosa, a Rothia mucilaginosa, a Ruminococcus sp.
- a Sphingobium xenophagum a Sphingomonas elodea, a Sphingomonas sp. Ant Hl 1, a Sphingopyxis sp. HO 57, a Staphylococcus arlettae, a Staphylococcus capitis, a Staphylococcus caprae, a Staphylococcus cohnii, a Staphylococcus epidermidis, a Staphylococcus gallinarum, a Staphylococcus haemolyticus, a Staphylococcus hominis, a Staphylococcus lugdunensis, a Staphylococcus pettenkoferi, a Staphylococcus saprophyticus, a Staphylococcus warneri, a Staphylococcus xylosus, a Stenotrophomonas maltophilia, a
- Streptococcus parasanguinis a Streptococcus parauberis
- Streptococcus peroris a Streptococcus pneumoniae
- Streptococcus pyogenes a Streptococcus salivarius
- Streptococcus sanguinis a Streptococcus sp. 263 SSPC
- Streptococcus sp. 343 SSPC a Streptococcus sp. A12, a Streptococcus sp. Cl 50, a Streptococcus sp. F0442, a Streptococcus sp.
- OMZ 838 a Trichuris muris, a Trypanosoma cruzi, a Turicibacter sp. H121, a Veillonella atypica, a Veillonella dispar, a Veillonella parvula, a Veillonella sp. ACPI, a Veillonella sp. HPA0037, a Veillonella tobetsuensis, a Weissella paramesenteroides, a Weizmannia coagulans, a Williamsia muralis, a Williamsia sp.
- ulcerative colitis may comprise an imbalance in one or more microbes including
- Propionibactierum acnes Lactococcus lactis, Haemophilus parainfluenzae, Escherichia Coli, Rothia dentocariosa, Malassezia restricta, Streptococcus thermophilus, Rothia dentocariosa, Malassezia restricta, Actinomyces oris, Klebsiella phage JD18, Klebsiella pneumoniae, Streptococcus aureus, Streptococcus epidermidis, Streptococcus saphrophyticus, Dermacoccus nishinomiyaensis, Acinetobacter baumannii, Streptococcus sanguinis, Lactobacillus plantarum, Lactobacillus crispatus, and any combination thereof.
- ulcerative colitis may comprise an imbalance in one or more microbes including but not limited to: Propionibactierum, a Lactococcus, Haemophilus, an Escherichia, a Rothia, a Malassezia, a Streptococcus, a Rothia, a Malassezia, an Actinomyces, a Klebsiella, a Dermacoccus, an Acinetobacter, a Lactobacillus, and any combination thereof.
- Propionibactierum a Lactococcus, Haemophilus, an Escherichia, a Rothia, a Malassezia, a Streptococcus, a Rothia, a Malassezia, an Actinomyces, a Klebsiella, a Dermacoccus, an Acinetobacter, a Lactobacillus, and any combination thereof.
- ulcerative colitis may comprise an imbalance in one or more microbes including but not limited to: an Afipia, a Bifidobacterium, a Brochothrix, a Companilactobacillus, a Coprobacillus, an Escherichia, a Leptospira, a Methylobacterium, aPantoea, a Parasutter ella, a Pichia, a Proteus, a Sphingobacterium, an Achromobacter, an Acinetobacter, an Actinomyces, an Aeribacillus, an Alishewanella, an Alistipes, an Alphacoronavirus, an Alphainfluenzavirus, an Anoxybacillus, an Aureobasidium, a Bacillus, a Betacoronavirus, a Blautia, a Burkholderia, a Caldibacillus, a Caldicellulosiruptor, a
- Incertae Sedis UnClass an Eubacteriales UnClass UnClass, an Eubacterium, a Finegoldia, a Francisella, a Geobacillus, an Inovirus, an Intestinibacter, a Kingella, a Klebsiella, a Kocuria, a Lachnoanaerobaculum, a Lachnoclostridium, a Lentivirus, a Leuconostoc, a Mediterraneibacter, a Megasphaera, a Melampsora, a Mesorhizobium, a Methanococcus, a Methanosarcina, a Methanothrix, a Methylorubrum, a Moraxella, a Mycobacterium, a Mycolicibacterium, a Nesterenkonia, an Orthopoxvirus, a Paenibacillus, a Paenirhodobacter, a Paraburkholderia, a Paracoccus
- Crohn’s disease may comprise an imbalance in one or more microbes including but not limited to: Propionibactierum, Lactococcus, Haemophilus, Escherichia, Rothia, Malassezia, Streptococcus, Actinomyces, Klebsiella, Dermacoccus, Acinetobacter, Lactobacillus, and any combination thereof.
- Crohn’s disease may comprise an imbalance in one or more microbes including but not limited to: Propionibactierum acnes, Lactococcus lactis, Haemophilus parainfluenzae, Escherichia Coli, Rothia dentocariosa, Malassezia restricta, Streptococcus thermophilus, Rothia dentocariosa, Malassezia restricta, Actinomyces oris, Klebsiella phage JD18, Klebsiella pneumoniae, Streptococcus aureus, Streptococcus epidermidis, Streptococcus saphrophyticus, Dermacoccus nishinomiyaensis, Acinetobacter baumannii, Streptococcus sanguinis, Lactobacillus plantarum, Lactobacillus crispatus, and any combination thereof.
- Propionibactierum acnes Lactococcus lactis, Haemophilus parainfluenzae, Escherich
- Crohn’s disease may comprise an imbalance in one or more microbes including but not limited to: an Afipia, a Bifidobacterium, a Brochothrix, a Companilactobacillus, a Coprobacillus, an Escherichia, a Leptospira, a Methylobacterium, a Pantoea, a Parasutterella, a Pichia, a Proteus, a Sphingobacterium, an Achromobacter, an Acinetobacter, an Actinomyces, an Aeribacillus, an Alishewanella, an Alistipes, an Alphacoronavirus, an Alphainfluenzavirus, an Anoxybacillus, an Aureobasidium, a Bacillus, a Betacoronavirus, a Blautia, a Burkholderia, a Caldibacillus, a Caldicellulosiruptor, a Camp
- Incertae Sedis UnClass an Eubacteriales UnClass UnClass, an Eubacterium, a Finegoldia, a Francisella, a Geobacillus, an Inovirus, an Intestinibacter, a Kingella, a Klebsiella, a Kocuria, a Lachnoanaerobaculum, a Lachnoclostridium, a Lentivirus, a Leuconostoc, a Mediterraneibacter, a Megasphaera, a Melampsora, a Mesorhizobium, a Methanococcus, a Methanosarcina, a Methanothrix, a Methylorubrum, a Moraxella, a Mycobacterium, a Mycolicibacterium, a Nesterenkonia, an Orthopoxvirus, a Paenibacillus, a Paenirhodobacter, a Paraburkholderia, a Paracoccus
- methods and compositions for combining data obtained from processing of human cell-free DNA and data obtained from processing of microbial cell-free DNA may comprise combining analysis of relatively enriched and/or relatively depleted human genes with analysis of an imbalance in one or more microbes to detect or identify a disease.
- combining human data obtained from processing of human cell-free DNA and data obtained from processing of microbial cell-free DNA may result in more powerful analysis than analyzing human cell-free DNA alone or microbial cell-free DNA alone.
- features determined via analysis of human cell-free DNA and features determined via analysis of microbial cell-free DNA may be fed into a joint classifier or a joint trained algorithm.
- a joint classifier may comprise machine learning. In some embodiments, a joint classifier may comprise machine learning and may be trained on input of data obtained from analysis of human cell-free DNA and data obtained from analysis of microbial cell-free DNA. In some embodiments, the machine learning model may take as input data obtained from processing human cell-free DNA concatenated with data obtained from processing microbial cell-free DNA. In some embodiments, the machine learning model may comprise two input channels and take as input data obtained from processing human cell-free DNA in a first input channel and data obtained from processing microbial cell-free DNA in a second input channel.
- machine learning may be used to analyze data obtained from processing of human cell-free DNA.
- machine learning may be used to analyze data obtained from processing of microbial cell-free DNA.
- machine learning may be used to apply findings obtained from analysis of human cell-free DNA to predict findings that may be obtained from analysis of microbial cell-free DNA in a sample or in a subject.
- machine learning may be used to apply findings obtained from analysis of microbial cell-free DNA to predict findings that may be obtained from analysis of human cell-free DNA in a sample or in a subject.
- machine learning may comprise an extreme gradient boosting (XGBoost) model.
- XGBoost extreme gradient boosting
- machine learning may comprise a gradient boosted random forest.
- machine learning may comprise a random forest.
- machine learning may comprise a gradient boosted model.
- machine learning may comprise a linear regressor.
- machine learning may comprise a logistic regressor.
- machine learning may comprise a deep learning model.
- machine learning may comprise a decision tree-based method.
- machine learning may comprise a supervised learning algorithm.
- machine learning may comprise using training data to predict a target variable.
- training data may comprise data obtained from processing of human cell-free DNA.
- training data may comprise data obtained from processing of microbial cell-free DNA.
- machine learning may comprise the use of one or more decision trees. In some embodiments, machine learning may comprise the use of multiple decision trees. In some embodiments, machine learning may comprise one or more generative models. In some embodiments, one or more generative models may be used to generate in silico data. In some embodiments, one or more generative models may be used to expand training data. In some embodiments data may be augmented. In some embodiments, one or more generative models may be used to improve prediction performance of a classifier. In some embodiments, a classifier may comprise use of feature selection. In some embodiments, one or more generative models may be combined with a classifier when using feature selection. In some embodiments, a deep learning model may be developed utilizing in silico data generated using one or more generative models. In some embodiments, a deep learning model developed utilizing in silico data does not use feature selection.
- performance of a machine-learning classifier as disclosed herein may be assessed using cross-validation.
- performance of a machine-learning classifier as disclosed herein may be assessed using a 10-fold cross-validation.
- a 10-fold cross-validation may be performed multiple times across different partitions of the dataset.
- partitions may be random.
- the partitions may be predetermined.
- the partitions may be predetermined by the clinical site from which the data in the partitions were obtained.
- a 10-fold cross-validation may comprise a leave one clinical site out (LOSO) strategy.
- performance of a machine-learning classifier as disclosed herein may be assessed using a greater than 5-fold, 10-fold, or 20-fold cross-validation.
- a greater than 5-fold, 10-fold, or 20-fold cross-validation may be performed multiple times across different partitions.
- the number of partitions of a dataset, also called folds may be greater than 5, 10, or 20.
- cross-validation may be a leave-one-out cross validation, wherein a single sample is withheld from the training data for assessing validation of the machine learning model.
- a greater than 5-fold, 10-fold, or 20-fold cross- validation may comprise a leave one clinical site out (LOSO) strategy.
- LOSO clinical site out
- a cross-validation strategy may comprise analyzing a subset of a dataset to make one or more predictions about a remainder of the dataset. In some embodiments, a cross-validation strategy may comprise analyzing a subset of a dataset to make predictions about a remainder of the dataset until all components of a dataset are predicted upon.
- a LOSO strategy may comprise datasets obtained from one or more clinical centers. In some embodiments, a LOSO strategy may comprise training a classifier using data obtained from all but one of the one or more clinical centers to make one or more predictions about the data obtained from the one clinical center not used for training (i.e., the left out clinical center). In some embodiments, a LOSO strategy may be repeated until all clinical sites are predicted upon. In some embodiments, performance of a LOSO strategy across all clinical sites may be aggregated.
- performance of a machine-learning classifier as disclosed herein may be measured with a performance metric.
- a performance metric may comprise at least one of accuracy, sensitivity, specificity, area under the curve (AUC), a receiver operator curve (ROC), Area under the receiver operator curve (AUCROC), f-measure, negative predictive value (NPV), positive predictive value (PPV), or a confusion matrix.
- AUC area under the curve
- ROC receiver operator curve
- AUCROC Area under the receiver operator curve
- NPV negative predictive value
- PV positive predictive value
- a confusion matrix e.g., a confusion matrix
- an AUC may be 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, or 1.
- an AUC may be greater than 0.5, greater than 0.6, greater than 0.7, greater than 0.8, or greater than 0.9.
- an AUC may be greater than 0.50, greater than 0.51, greater than 0.52, greater than 0.53, greater than 0.54, greater than 0.55, greater than 0.56, greater than 0.57, greater than 0.58, greater than 0.59, greater than 0.60, greater than 0.61, greater than 0.62, greater than 0.63, greater than 0.64, greater than 0.65, greater than 0.66, greater than 0.67, greater than 0.68, greater than 0.69, greater than 0.70, greater than 0.71, greater than 0.72, greater than 0.73, greater than 0.74, greater than 0.75, greater than 0.76, greater than 0.77, greater than 0.78, greater than 0.79, greater than 0.80, greater than 0.81, greater than 0.82, greater than 0.83, greater than 0.84, greater than 0.85, greater than 0.86, greater than 0.87, greater than 0.88, greater than 0.89, greater than 0.90, greater than 0.91, greater than 0.92, greater than 0.93, greater than 0.94, greater than 0.95, greater than 0.96, greater than 0.97, greater than 0.98,
- treating a disease may comprise administering a drug, or performing surgery on the subject.
- medical management of a disease or disorder may also encompass performing an endoscopy on the subject in order to further inform treatment options.
- drug drug
- medication and “medicament” are generally used interchangeably.
- a subject diagnosed using the disclosed methods may be treated with a small molecule drug.
- a drug may comprise an aminosalicylate, such as mesalamine, 5-aminosalicylic acid (5-ASA), which can be used to treat UC or CD.
- the patient can be treated with a steroid.
- a steroid may comprise prednisolone.
- ulcerative colitis may comprise steroid-dependent ulcerative colitis.
- ulcerative colitis may comprise steroid refractory ulcerative colitis.
- drugs that are used to treat steroid-dependent ulcerative colitis may also be used to treat steroid-refractory ulcerative colitis.
- thiopurines may be used to treat ulcerative colitis.
- one or more rapidly effective drugs may be used in subjects with high disease activity.
- one or more rapidly effective drugs may comprise but are not limited to JAK inhibitors, TNF antibodies, ustekinumab, and/or tofacitinib.
- an antibody drug may comprise but is not limited to one or more of infliximab, golimumab, vedolizumab, and/or mirikizumab.
- an effective treatment for mild to moderate UC may be for example 5-aminosalicylic acid.
- a moderate to severe UC may be treated with advanced therapies that target specific inflammation pathways.
- advanced therapies that target specific inflammation pathways may comprise for example monoclonal antibodies to TNF, a4p7 integrins, and IL-12 and IL-23 cytokines, as well as oral small molecule therapies targeting JAK or sphingosine- 1 -phosphate.
- a treatment for Crohn’s disease may comprise but is not limited to one or more of a corticosteroid, an immunosuppressant, a biologic, an antibiotic, an aminosalicylate, and/or a methotrexate.
- a treatment for Crohn’s disease may comprise but is not limited to one or more of: medications (such as, for example, 5-ASAs, corticosteroids, immunosuppressants, biologies like TNF inhibitors, integrin inhibitors, JAK inhibitors, antibiotics), corticosteroid enemas, topicals, nutritional therapy (such as, for example, enteral nutrition, TPN, specific diets like low FODMAP), probiotics, prebiotics, surgery (resection, ostomy, strictureplasty), bowel rest, immunomodulators, corticosteroid injections, mesalamine, azathioprine, methotrexate, cyclosporine, tacrolimus, anti-TNF agents (infliximab, adalimumab), anti-integrin therapies (vedolizumab), Janus kinase inhibitors (tofacitinib), fecal microbiota transplantation, fecal transplants, vitamin and/or
- medications such
- 5-ASA medications e.g., mesalamine, balsalazide, sulfasalazine
- a topical steroid e.g., budesonide foam
- additional treatments can be initiated alone or in combination with existing 5-ASA therapy.
- these additional treatments may include systemic corticosteroids (e.g., prednisone), immunomodulators (e.g., azathioprine, 6-MP), biologies (e.g., anti-TNF, anti- integrin agents), or a combination of these therapies.
- systemic corticosteroids e.g., prednisone
- immunomodulators e.g., azathioprine, 6-MP
- biologies e.g., anti-TNF, anti- integrin agents
- IV intravenous
- More aggressive treatments can be added as needed.
- these more aggressive treatments may include biologies (e.g., infliximab, adalimumab, golimumab, vedolizumab), immunomodulators (e.g., azathioprine, 6-MP), or calcineurin inhibitors (e.g., cyclosporine, tacrolimus), either alone or in combination.
- biologies e.g., infliximab, adalimumab, golimumab, vedolizumab
- immunomodulators e.g., azathioprine, 6-MP
- calcineurin inhibitors e.g., cyclosporine, tacrolimus
- budesonide can be a preferred treatment for mild ileocecal or right-sided CD.
- Conventional steroids e.g., prednisone
- Antibiotics e.g., metronidazole, ciprofloxacin
- 5-ASA or sulfasalazine may provide some benefit in mild colonic CD, though they are generally less effective for CD compared to UC.
- systemic corticosteroids e.g., prednisone
- Immunomodulators e.g., azathioprine, 6-MP, methotrexate
- azathioprine, 6-MP, methotrexate may be introduced for steroid-dependent patients or those with frequent relapses. Biologies may be utilized in cases of inadequate response to steroids or immunomodulators, or in patients with high-risk features.
- anti-TNF agents e.g., infliximab, adalimumab, certolizumab pegol
- anti-integrin agents e.g., vedolizumab
- IL-12/23 inhibitors e.g., ustekinumab
- systemic corticosteroids e.g., IV methylprednisolone
- Biologies e.g., anti-TNF agents, vedolizumab, ustekinumab
- calcineurin inhibitors e.g., cyclosporine, tacrolimus
- Antibiotics are indicated when there is a suspicion of abscesses, fistulas, or perianal complications. Surgery may be necessary for complications such as obstruction, perforation, abscesses that cannot be drained, or massive hemorrhage.
- certain dosages of medications may be administered to treat mild to moderate inflammatory bowel disease (IBD).
- IBD inflammatory bowel disease
- medications for mild to moderate IBD may be used to reduce inflammation.
- aminosalicylates (5- ASAs) such as, for example, mesalamine
- 5-ASAa may be administered orally.
- 5-ASAa may be administered as a rectal enema for distal inflammation.
- sulfasalazine may be administered for mild to moderate IBD at 2 to 4 grams per day, divided into 2-4 doses.
- budesonide may be administered for mild to moderate IBD at 9 mg per day.
- a corticosteroid such as, for example, prednisone
- probiotics such as, for example, Saccharomyces boulardii
- an antibiotic such as metronidazole may be administered for mild to moderate IBD at 500 mg twice a day or ciprofloxacin may be administered for mild to moderate IBD at 500 mg twice a day.
- certain dosages of medications may be administered to treat severe inflammatory bowel disease (IBD).
- IBD severe inflammatory bowel disease
- biologies and immunosuppressants may be recommended for severe IBD.
- Infliximab may be administered for severe IBD at 5 mg/kg.
- Infliximab may be administered at weeks 0, 2, and 6, followed by maintenance doses every 8 weeks.
- adalimumab may be administered for severe IBD at 160 mg initially, followed by 80 mg at week 2, then 40 mg every other week.
- vedolizumab may be administered for severe IBD at 300 mg at weeks 0, 2, and 6, then every 8 weeks.
- immunosuppressants including but not limited to azathioprine (1.5-2.5 mg/kg per day) or mercaptopurine (1-1.5 mg/kg per day) may be administered for severe IBD.
- surgical options such as, for example, resection surgery or strictureplasty
- Cimzia certolizumab pegol
- Colazal balsalazide disodium
- one or more biologic drugs may be used to treat Crohn’s disease and/or ulcerative colitis.
- one or more biologic drugs may comprise but are not limited to: Humira (adalimumab), Entyvio (vedolizumab), and Skyrizi (risankizumab).
- Medications for IBD may sometimes be used to address variabilities in disease presentation.
- dosing of medicaments for Crohn’s disease may require flexibility due to complexity and/or variability of disease presentation.
- CD can be associated with, in some cases, transmural inflammation, strictures, and/or fistulas.
- Dosing of medicaments for UC may need to manage mucosal inflammation such as mucosal inflammation localized to the colon.
- Crohn’s disease may affect one or more segmental regions of the gastrointestinal tract.
- medicaments may be used to target one or more wide areas for treatment of Crohn’s disease.
- Crohn’s disease may manifest as a systemic disease.
- Crohn’s disease may be treated as a systemic disease.
- a personalized treatment plan may be used to treat Crohn’s disease.
- a personalized treatment plan may be used to treat ulcerative colitis.
- kits which may facilitate making a patient diagnosis and/or informing a treatment regimen.
- the methods and compositions as described herein may be used to distinguish a patient with IBD from a healthy patient.
- the methods and compositions as described herein may be used to distinguish a patient with IBD from a patient in remission from IBD.
- the methods and compositions may be further used to differentiate between types and/or subtypes of IBD.
- the methods and compositions may be further used to differentiate between severities of types and/or subtypes of IBD.
- a subject may already have received a diagnosis from a physician.
- the diagnosis may be based upon a physical examination, blood test, stool test, imaging study, reported symptoms, medications, personal medical history, family medical history, or any combination thereof.
- the subject may have not yet received a diagnosis.
- the subject may have received a colonoscopy and/or an endoscopy.
- the methods and compositions disclosed herein may facilitate the determination of the need for a targeted colonoscopy and/or a targeted treatment regimen.
- the methods and compositions disclosed herein may inform the use of one or more specific drugs and/or dosages of one or more specific drugs to treat an IBD.
- the methods and compositions disclosed herein may improve the success of treatment of an IBD.
- the methods and compositions disclosed herein may reduce the cost, invasiveness, negative effects, time, and/or efficacy of treatment of an IBD. In some embodiments, the methods disclosed herein may be used to inform the use of a specific surgery for an IBD. In some embodiments, the methods and compositions disclosed herein may reduce the time to reach an accurate diagnosis for an IBD. In some embodiments, the methods and compositions disclosed herein may prevent the development of a severe IBD.
- the methods and compositions disclosed herein may shorten the timeline of treating an IBD by at least 1 month, 2 months, 3 months, 4 months, 5 months, 6 months, 7 months, 8 months, 9 months, 10 months, 11 months, 12 months, 1 year, 2 years, 3 years, 4 years, 5 years, 6 years, or more.
- the methods and compositions disclosed herein may inform a monitoring process of an IBD.
- the methods and compositions disclosed herein may be used as part of a monitoring process of an IBD.
- the methods and compositions disclosed herein may clarify a diagnosis of IBD.
- the methods and compositions disclosed herein may clarify a diagnosis of IBD when an endoscopy result is unclear.
- the methods and compositions disclosed herein may distinguish a severity of an IBD when a subject’s symptoms are mild.
- the methods and compositions disclosed herein may indicate an IBD is moderate or severe when patient symptoms are mild.
- a treatment regimen for IBD may begin with an induction phase.
- an induction phase may comprise the reduction of symptoms.
- the induction phase may achieve remission.
- the induction phase may comprise aggressive treatment.
- the induction phase may be intended to quickly control inflammation.
- a treatment regimen for IBD may comprise a maintenance phase.
- the maintenance phase may begin after remission is achieved.
- the maintenance phase may be intended to maintain remission, prevent flares, and/or minimize use of corticosteroids.
- a treatment regimen for IBD may comprise transitioning from an induction phase to a maintenance phase.
- the transitioning may be personalized to a patient’s response to treatment, side effect profiles of one or more medicaments, and/or disease severity.
- the transitioning may comprise tapering off potent induction agents such as but not limited to corticosteroids.
- the transitioning may comprise the optimizing of dosages of one or more maintenance medications.
- the subject may receive the first medicament at an initial induction dose before the application of the methods and compositions described herein.
- the methods and compositions described herein may inform whether the subject ought to continue receiving the first medicament at the initial induction dose, or whether the dose ought to be reduced or increased, or whether the medicament ought to be discontinued.
- the methods and compositions described herein may inform whether a subject ought to receive a second medicament in addition to or in place of the first medicament.
- the methods and compositions disclosed herein may be repeatedly applied one or more times or may be applied multiple times over the course of a treatment regimen for an IBD. In some embodiments, the methods and compositions disclosed herein may be applied to determine which site to apply a site-specific endoscopy to. In some embodiments, the methods and compositions disclosed herein may reduce the number of endoscopies that a patient may need to receive in order for an IBD for be treated.
- the methods and compositions disclosed herein may be used to differentiate a first type of IBD and a second type of IBD.
- the first type of IBD may be for example associated with inflammatory regions along the entirety of the digestive tract.
- the second type of IBD may be for example associated with inflammatory regions localized to the large intestine, the rectum, or both the large intestine and the rectum.
- an endoscopic biopsy procedure is performed that targets multiple locations along the entirety of the digestive tract; and if the second type of IBD is detected, an endoscopic biopsy procedure is performed that targets a region limited to the large intestine and the rectum.
- an IBD may target one or more regions of a digestive tract. In some embodiments, an IBD may target one or more of an ilium, a large bowel, or both. In some embodiments, an endoscopic procedure may be performed on one or more of an ilium, a large bowel, or both.
- a medication is administered orally, intrathecally, or subcutaneously.
- a drug may be administered orally.
- a drug may be administered rectally, e.g., in the form of a suppository, foam, or enema.
- UC ulcerative colitis
- exemplary surgical interventions for UC can include: proctocolectomy (removal of both colon and rectum), particularly for severe forms of UC. In some cases, the surgery also includes removal of the anus and creation of an external ostomy.
- Exemplary surgical interventions for CD can include strictureplasty to widen narrowed areas of the intestine caused by strictures, proctocolectomy, colectomy, fistula removal, small and large bowel resection, abscess drainage, and/or ostomy surgery.
- the methods provided herein can be used to monitor a patient subsequent to surgery.
- the monitoring can be used, for example, to determine the efficacy of the surgery and for medication management post-surgery.
- the monitoring can be performed periodically, e.g., every week, month, three months, six months, or more. If a worsening of disease is revealed during monitoring, the methods may comprise adjusting the medication of the patient or performing an endoscopy to further illuminate the nature of the disease.
- the methods of diagnosis and detection disclosed herein may be used as a diagnostic to compliment treatments disclosed herein.
- methods of combined diagnosis and therapy may include determining the length or duration of therapy, in determining the range of effective dosage, in monitoring a patient’s reaction to changes in treatment, and/or in clinical trials assessing safety or efficacy of a new method of treatment.
- disclosed diagnostics can be used before, during, and/or after a treatment phase, including surgery.
- disclosed diagnostics can be used in determining a standard of care for a patient or human subject before or during the onset of symptoms of Crohn’s disease, UC, or inflammatory bowel disease.
- the methods of diagnosis and detection disclosed herein may inform the use of an intervention. In some embodiments, the methods of diagnosis and detection disclosed herein may inform the use of a surgical intervention. In some embodiments, the methods of diagnosis and detection disclosed herein may inform the use of a therapy. In some embodiments, the methods of diagnosis and detection disclosed herein may inform the use of targeted therapy. In some embodiments, the methods of diagnosis and detection disclosed herein may inform the use of a targeted therapy comprising one or more of a drug, a surgery, a behavioral modification, a lifestyle change, a change in diet, and/or a therapeutic regimen. In some embodiments, the methods disclosed herein may inform a dosage of a drug used.
- the methods disclosed herein may be used to inform the administration of an initial induction dose of a drug. In some embodiments, the methods disclosed herein may inform the use of an endoscopic procedure. In some embodiments, the methods disclosed herein may inform the use of a site-specific endoscopy procedure.
- the methods disclosed herein may guide the selection of the type of endoscopy performed on a subject.
- the types of endoscopies can include, but are not limited to, ileoscopy, colonoscopy, ileocolonoscopy, sigmoidoscopy, upper endoscopy, esophagogastroduodenoscopy (EGD), capsule endoscopy, or balloon-assisted endoscopy.
- an endoscopy procedure e.g., colonoscopy
- endoscopies can provide a visual image of the intestinal lining that can inform a diagnosis.
- the endoscopy may also be used to biopsy tissue in order to further diagnose the subtype if IBD.
- the decision to perform a specific type of endoscopy is typically based on the patient's symptoms and suspected diagnosis. For example, if a patient presents with symptoms such as diarrhea, weight-loss, and/or abdominal pain and has IBD with an unknown subtype, the clinician may order a colonoscopy and/or ileoscopy in order to determine the precise subtype and location of the inflammation. Often, the first type of endoscopy performed is a colonoscopy in order to look for the contours of inflammation within the intestinal lining of the colon (large intestine and rectum). The clinician may determine that the patient has UC if certain hallmarks of UC are observed, such as continuous regions or bands of inflammation.
- CD may be diagnosed, if for example, inflamed patches are detected. If during the colonoscopy, the clinician begins to suspect that the subject has CD, an ileoscopy may be needed, which is a more invasive procedure than a colonoscopy. In some cases, the ileoscopy may also require a separate visit. Conversely, if continuous inflammation characteristic of ulcerative colitis (UC) is observed during the colonoscopy, the clinician may decide not to proceed with an ileoscopy.
- UC ulcerative colitis
- the IBD can be found to be indeterminate after the endoscopy, depending on the manifestation of the disease or the accessibility of the inflamed region to the scope. For example, continuous patches of inflammation might be consistent with UC, but could also be consistent with a severe CD in which multiple “patches” of inflammation have become interconnected, giving an appearance of continuity.
- the clinician may not be able to reach the affected areas. For example, the shape or constrictions in the bowel may obstruct the ability of the scope traverse it.
- the methods disclosed herein can diagnose the subtype of IBD in a non-invasive fashion and avoid certain endoscopic procedures. For example, if the methods provided herein detect that a subject has UC, the clinician can likely plan to just perform a colonoscopy. If the methods provided herein detect CD, the patient may have both a colonoscopy and an ileoscopy during the initial procedure.
- a sample provided herein may comprise a nucleic acid molecule to be sequenced by a method described herein.
- a sample generally refers to any material comprising nucleic acids that has been derived from a subject described herein.
- a sample may comprise a raw biological sample, such as whole blood.
- raw biological sample refers to an unmanipulated or unprocessed sample obtained from a subject, e.g., a host, containing or presumed to contain target nucleic acids.
- a raw biological sample has not been subjected to any extraction methods after being obtained from a subject.
- a raw biological sample can be processed or manipulated to produce an initial sample.
- a raw biological sample may comprise whole blood which is centrifuged to produce an initial sample of plasma for a sequencing assay.
- the term “initial sample” refers to a sample comprising nucleic acids derived from a raw biological sample.
- an initial sample may comprise a sample that has been processed or manipulated, such as plasma or serum.
- an initial sample may comprise target or desired nucleic acids obtained or extracted from a raw biological sample.
- an initial sample can be subj ected to a sequencing assay as described herein.
- a raw biological sample or an initial sample can be used directly in a sequencing assay as described herein without extraction of a nucleic acid.
- a nucleic acid as described herein can be extracted from a raw biological sample or an initial sample for use in a sequencing assay as described herein.
- an extraction method may comprise an alcohol-based extraction, a column purification, a filtration, a size separation, or any combination thereof.
- removal or extraction of nucleic acids refers to steps prior to the start of generating or preparing a nucleic acid library that separates nucleic acids from at least one component with which they are normally associated.
- removal or extraction of nucleic acids can refer to the process of creating an initial sample from a raw biological sample.
- the fractionation of whole blood into its component parts, such as plasma can be considered to involve removal or extraction.
- purification or isolation of DNA from a sample e.g., plasma sample
- a nucleic acid extracted from a sample can be subjected to a sequencing assay as described herein.
- a raw biological sample or an initial sample may comprise a biological sample.
- a sample may comprise a biological sample obtained or collected from a subject.
- a biological sample may comprise cells.
- a biological sample can be substantially cell-free.
- a biological sample may comprise a biological fluid.
- a biological fluid may comprise a bodily fluid of a subject (e.g., blood), a fluid obtained from the subject via a medical procedure (e.g., lavage, bronchoalveolar lavage), or any fluid obtained from processing a biopsy of the subject (e.g., serous fluid).
- a biological fluid may comprise a bodily fluid.
- a bodily fluid may comprise a non-fecal bodily fluid.
- a bodily fluid may comprise a whole blood, a plasma, a serum, a lymph, a synovial fluid, a cerebrospinal fluid (CSF), a saliva, a gastric juice, a bile, a pancreatic juice, an intestinal fluid, a respiratory tract mucosal secretion, a semen, a cervical mucus, a vaginal secretion, a urine, a sebum, a breast milk, an amniotic fluid, a pericardial fluid, a pleural fluid, a peritoneal fluid, or any combination thereof.
- a biological fluid can be processed from a bodily fluid.
- a biological fluid may comprise a plasma sample.
- a biological fluid may comprise a lavage from diagnosing, treating, or cleaning an area of a body of a subject.
- a lavage may comprise a bronchoalveolar lavage (BAL), a gastric lavage, a peritoneal lavage, a nasal lavage, a bladder lavage, a rectal lavage, a wound lavage, a joint lavage (arthrocentesis), an eye lavage, a sinus lavage, or any combination thereof.
- a biological fluid may comprise an amniotic fluid.
- a biological fluid may comprise a BAL. In some embodiments, a biological fluid may comprise a joint lavage. In some embodiments, a biological fluid may comprise a fluid obtained from processing a biopsy of a subject. In some embodiments, a biological fluid may comprise a needle aspiration fluid, a serous fluid, a microdialysis fluid, an exudate fluid, or any combination thereof.
- plasma or “blood plasma” refers to the liquid component or fraction of blood. Plasma is generally obtained by spinning a whole blood sample and removing the liquid component.
- process control molecules refers to molecules that are added to a sample before or during nucleic acid library generation to aid in the identification or quantification of nucleic acids in a sample.
- process control molecules may comprise nucleic acids.
- process control molecules may comprise synthetic nucleic acids.
- process control molecules are separate from and not integrated in the target molecules.
- process control molecules can have special features such as specific sequences, lengths, GC content, degrees of degeneracy, degrees of sequence diversity, different secondary, tertiary, or quaternary structures, and/or known starting concentrations.
- process control molecules can be used for normalizing the signal in a sample to account for variations in sample processing or to control process performance.
- process control molecules can include sample identifiers.
- process control molecules may comprise dephosphorylation control molecules, denaturation control molecules, and/or ligation control molecules.
- multiple different types or sets of control molecules can be added to a sample.
- adapter attachment control molecule refers to a control molecule that allows monitoring of the efficiency of an adapter attachment reaction.
- An adapter attachment reaction can be ligation-based, TdT-based, template-switching-based, primer- extension-based, amplification-based, or a combination thereof.
- degradation assessment molecules refers to a control molecule used to evaluate sample and spiked sample integrity during processing.
- spiked initial sample refers to an initial sample to which process control molecules (or synthetic spike-ins) have been added prior to the start of generating a sequencing library.
- sequence diversity controls refers to degenerate pools, or pools of nucleic acids with diverse sequences, which degenerate pools can often be used for diversity assessment, abundance calculation, and/or determination of information transfer efficiency/
- size controls refers to nucleic acids that are size or length or GC-content markers, which can be used for abundance normalization, development, and/or analysis purposes and other purposes.
- ID Spike(s) refers to identification spikes that can be used, for example without limitation, for sample identification tracking, cross-contamination detection, reagent tracking, and/or reagent lot tracking (See, for example, United States patent 9,976,181).
- samples or biological fluids derived from a subject may comprise nucleic acids.
- samples or biological fluids derived from a subject may comprise cell-free nucleic acids.
- a subject may comprise a human or a non-human animal.
- a subject may comprise a male or a female.
- a subject can be of any age.
- a subject can be a child.
- a subject may comprise an embryo or a fetus.
- a subject may comprise but is not limited to Homo sapiens, Caenorhabditis elegans (Nematode), Drosophila melanogaster (Fruit fly), Mus musculus (House mouse), Danio rerio (Zebrafish), Arabidopsis thaliana (Plant model), Xenopus laevis (African clawed frog), Gallus gallus (Chicken), Rattus norvegicus (Rat), Cricetus cricetus (Golden hamster), Schizosaccharomyces pombe (Fission yeast), Tetraodon nigroviridis (Pufferfish), Trichoplax adhaerens (Simple animal), Chlamydomonas reinhardtii (Green alga), Ailuropoda melanoleuca (Giant panda), Panthera leo (Lion), Canis lupus familiaris (Dog), Felis catus (
- a subject may comprise an animal.
- an animal may comprise a vector for disease transmission from which a sample is being tested to determine a presence or absence of a pathogen in the animal.
- a disease vector may comprise an animal that has come into contact with a human subject.
- an animal coming into contact with a human subject may comprise an animal biting a human, a human ingesting an animal or a secretion of an animal, or a combination thereof.
- an animal may comprise a mammal, a bird, a reptile, an amphibian, a fish, an insect, or an arachnid.
- an animal may comprise a research animal, an animal for medical use (e.g., xenotransplant donor), a companion animal, a farm animal, a working animal, a performance animal, or a wild animal.
- a mammal may comprise a non-human primate (e.g., a macaque or rhesus monkey), a rodent, a carnivore (e.g., a canine or a feline), a bat, a cetacean (e.g., a dolphin), an ungulate, or an insectivore (e.g., a hedgehog).
- an ungulate may comprise a swine, a sheep, a cow, a deer, or a horse.
- a subject may comprise a healthy subject.
- a subject can have, be suspected of having, or be at risk of having a disease or a disorder described herein.
- a disease or disorder may comprise an inflammatory bowel disorder.
- an inflammatory bowel disorder may comprise an abnormal immune response.
- an inflammatory bowel disorder may comprise but is not limited to one or more of the following: Crohn’s disease, ulcerative colitis, microscopic colitis, autoimmune enteropathy, celiac disease, chronic radiation enteritis, and/or diverticulitis.
- a medical indication related to an inflammatory bowel disorder may comprise any disease, disorder, or procedure (e.g., medical or surgical) that renders a subject wholly or partially immunocompromised or abnormally susceptible to infections.
- a subject can have or be at an elevated risk for developing an infection.
- a subject can have or be at an elevated risk for developing a cancer.
- a subject can be receiving an immunosuppressant (e.g., chemotherapy, radiation, corticosteroids, transplant medications, or certain biologies), an anti-infective agent, antibiotic, antiviral agent, or antifungal agent.
- an immunosuppressant e.g., chemotherapy, radiation, corticosteroids, transplant medications, or certain biologies
- an anti-infective agent antibiotic, antiviral agent, or antifungal agent.
- a subject can be eligible as a recipient of transplantation or is an actual recipient of a transplanted organ or graft.
- a subject can be an organ donor or preparing to be an organ donor.
- a subject may comprise an animal organ donor for use in a xenotransplant or an animal being prepared for organ donation in a xenotransplant.
- host refers to an organism that harbors another organism or microbe.
- a living thing e.g., a mammal such as a human being can be a host that harbors a microbe, the microbe being the non-host.
- host nucleic acids and all derivative terms such as “host cell-free nucleic acids”, “host cell-free DNA”, etc. refer to nucleic acids derived from the host genome.
- a host genome may comprise nucleic acids derived from a nucleus, a mitochondria, a cytoplasm, an exosome, cell-free nucleic acids derived from any of these, or any combination thereof.
- target nucleic acids may comprise host nucleic acids. In some embodiments, target nucleic acids may comprise non-host nucleic acids.
- non-host nucleic acids can be derived from a microbe.
- target nucleic acids can be derived from a plurality of microbes.
- a “microbe” can refer to a living microorganism or a non-living microscopic entity.
- a living microorganism may comprise a bacterium, a protozoa, a fungus, an archaea, an algae, a parasite, or any other living microorganism.
- a non-living microscopic entity may comprise a virus, a live virus, a replicating virus, or an attenuated virus.
- a microbe can be pathogenic to a subject (e.g., a pathogen), a commensal microbe of a subject, or a microbe present in a general environment.
- a pathogen may comprise any pathogenic or virulent microbe.
- a commensal microbe of a subject may comprise a microbe that inhabits any location in or on a subject without causing any symptom of a disease or disorder.
- a microbe of a general environment may comprise a microbe at or near a sample collection site or a microbe at or near a location of a subject.
- a microbe of a general environment comprises a commensal microbe.
- a commensal microbe of a subject may become a pathogen to the subject.
- a commensal microbe of a first subject may be a pathogen to a second subject.
- a microbe of a general environment of a subject may become a pathogen to a subject.
- a microbe of a general environment of a first subject may be a pathogen to a second subject.
- a pathogen can cause an infection or disease comprising gastrointestinal infections (e.g., Escherichia coli, Salmonella spp., Clostridioides difficile), urinary infections (e.g., Escherichia coli , skin infections (e.g., Staphylococcus aureus, including MRSA), strep throat (or scarlet fever, rheumatic fever) (e.g., Streptococcus pyogenes), tuberculosis (e.g., Mycobacterium tuberculosis), gonorrhea (e.g., Neisseria gonorrhoeae), cholera (e.g., Vibrio cholerae), Lyme disease (e.g., Borrelia burgdorferi), ulcers or stomach cancer (e.g., Helicobacter pylori), syphilis (e.g., Treponema pallidum), an infection or disease comprising gastrointestinal infections
- commensal microbes can inhabit a gastrointestinal tract, a skin, a respiratory tract, aurogenital tract, or an oral cavity of a subject.
- commensal microbes may comprise an endogenous virus (e.g., endogenous retroviruses (ERVs)) of a subject.
- EMVs endogenous retroviruses
- a commensal microbe may comprise Bacteroides fragilis, Lactobacillus acidophilus, a Bifidobacterium bifidum, an Escherichia coli (non-pathogenic strains), a Staphylococcus epidermidis, a Streptococcus salivarius, a Propionibacterium acnes, a Candida albicans (under normal conditions), a Enterococcus faecalis, a Clostridium difficile (non-toxigenic strains), a Rothia mucilaginosa, a Fusobacterium nucleatum, a Peptostreptococcus anaerobius, a Prevotella melaninogenica, or any combination thereof.
- a commensal microbe may comprise a Lactobacillus spp., a Bacteroides spp., a Faecalibacterium prausnitzii, an Escherichia coli, a Clostridium spp., an Enterococcus spp., a Staphylococcus spp., a Candida spp., an Aspergillus spp., a Porcine Endogenous Retroviruses (PERVs), an Eimeria spp., a Bifidobacterium spp., a usobacterium spp., a Simian Immunodeficiency Virus (SIV), a Simian Retrovirus (SRV), a Entamoeba spp, or any combination thereof.
- PERVs Porcine Endogenous Retroviruses
- a nucleic acid may comprise a cell-free nucleic acid (cfNA).
- cfNAs may comprise any nucleic acids described herein that are not encapsulated by a cell.
- cfNAs comprise naturally occurring cfNAs.
- cfNAs may comprise fragments of nucleic acids that float freely outside of cells in any body fluid of a subject as described herein.
- cfNAs may comprise plasma cfNAs, cerebrospinal fluid (CSF) cfNAs, saliva cfNAs, bronchoalveolar lavage (BAL) cfNAs, urine cfNAs, amniotic cfNAs, fetal cfNAs, or any combination thereof.
- CSF cerebrospinal fluid
- BAL bronchoalveolar lavage
- cfNAs comprise circulating cfNAs in a subject’s bloodstream.
- the nucleic acids comprise circulating cfDNA, circulating cfRNA, cfDNA, cfRNA, circulating DNA, circulating RNA, or any combination thereof.
- cfNA can be alternatively referred to as free-circulating nucleic acids.
- a cfNA can originate from cell death and other processes that release fragments of nucleic acids into a bloodstream.
- a cfNA can be derived from any source of nucleic acids provided herein.
- cfNA present in a raw biological sample can be isolated from genomic nucleic acid in the raw biological sample by processing the raw biological sample into an initial sample by removing intact cells.
- removing intact cells may comprise centrifuging or filtering a raw biological sample to produce a cell-free fraction of a biological fluid comprising cfNA.
- a cfNA may comprise a host nucleic acid, a non-host nucleic acid, a target nucleic acid, or a combination thereof.
- a cfNA can be derived from a host (host cell free nucleic acids or “hcfNA”) or a non-host.
- a hcfNA can be derived from nuclear nucleic acids, mitochondria nucleic acids, exosomal nucleic acids, fetal nucleic acids, or any combination thereof.
- a sample may comprise a host nucleic acid (e.g., a host cell-free nucleic acids).
- a host may comprise any subject provided herein.
- a sample may comprise non-host nucleic acids.
- non-host nucleic acids may comprise microbial nucleic acids.
- microbial nucleic acids may comprise microbial cell-free nucleic acid (mcfNA).
- target nucleic acids as used herein can refer to cfNA.
- target nucleic acids as used herein can refer to a mcfNA.
- an mcfNA can be derived from one or more species of microbe described herein.
- a mcfNA can be derived from a prokaryotic or a eukaryotic microbe.
- an mcfNA may comprise a bacterial cfNA, a fungal cfNA, a viral cfNA, a protozoan cfNA, an archaeal cfNA, an algal cfNA, or any combination thereof.
- a sample may comprise a non-microbial nucleic acid (e.g., a non-microbial cell-free nucleic acid).
- a sample may comprise mcfNAs from one or more species of microbes.
- a sample may comprise mcfNAs from at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 24, at least 25, at least 26, at least 27, at least 28, at least 29, or at least 30 species of microbes.
- a sample may comprise a mixture of nucleic acids.
- a sample may comprise non-cell-free nucleic acids.
- a sample may comprise cell-free nucleic acids.
- a sample may comprise target nucleic acids (e.g., cfNAs) and can further additionally comprise any nucleic acids provided herein.
- a sample can further comprise contaminant nucleic acids.
- contaminant nucleic acids may comprise nucleic acids from a general environment (e.g., a sample collection site).
- cell-free nucleic acids (cfNAs) may comprise a mixture of cfNAs.
- a mixture of cfNAs may comprise cfNAs originated from one or more organisms.
- a mixture of cfNAs may comprise microbial nucleic acids (e.g., mcfNAs) originated from one or more species of microbes described herein.
- an mcfNA may comprise a bacterial-derived cfNA, a fungal-derived cfNA, a viral-derived cfNA, a protozoan-derived cfNA, an archaeal-derived cfNA, an algal- derived cfNA, or any combination thereof.
- a cfNA may comprise a double-stranded nucleic acid (dsNA), a single-stranded nucleic acid (ssNA), or a combination thereof.
- a cfNA may comprise a cell-free DNA (cfDNA), a cell-free RNA (cfRNA), a cell-free DNA-RNA hybrid (cfDNA-RNA), a cell-free double-stranded DNA (cfdsDNA), a cell-free single-stranded DNA (cfssDNA), a cell-free double-stranded RNA (cfdsRNA), a cell-free single-stranded RNA (cfssRNA), or a combination thereof.
- hcfNA may comprise host cell-free DNA (hcfDNA), host cell-free RNA (hcfRNA), host cell -free DNA-RNA hybrid (hcfDNA-RNA), or a combination thereof.
- mcfNA may comprise microbial cell-free DNA (mcfDNA), microbial cell-free RNA (mcfRNA), microbial cell-free DNA-RNA hybrid (mcfDNA- RNA), or any combination thereof.
- microbial cell-free DNA (mcfDNA) may comprise microbial cell-free double-stranded DNA (mcfdsDNA) or microbial cell-free single-stranded DNA (mcfssDNA).
- microbial cell-free RNA (mcfRNA) may comprise microbial cell-free double-stranded RNA (mcfdsRNA) or microbial cell-free singlestranded RNA (mcfssRNA).
- a cfNA as disclosed herein or fragments thereof can be approximately less than about 10 bp, less than about 15 bp, less than about 20 bp, less than about 25 bp, less than about 30 bp, less than about 35 bp, less than about 40 bp, less than about 45 bp, less than about 50 bp, less than about 55 bp, less than about 60 bp, less than about 65 bp, less than about 70 bp, less than about 75 bp, less than about 80 bp, less than about 85 bp, less than about 90 bp, less than about 95 bp, less than about 100 bp, less than about 105 bp, less than about 110 bp, less than about 115 bp, less than about 120 bp, less than about 125 bp, less than about 130 bp, less than about 135 bp, less than about 140 bp, less than about 145 bp,
- cfNAs provided herein or fragments thereof can be approximately about 10 bp, about 15 bp, about 20 bp, about 25 bp, about 30 bp, about 35 bp, about 40 bp, about 45 bp, about 50 bp, about 55 bp, about 60 bp, about 65 bp, about 70 bp, about 75 bp, about 80 bp, about 85 bp, about 90 bp, about 95 bp, about 100 bp, about 105 bp, about 110 bp, about 115 bp, about 120 bp, about 125 bp, about 130 bp, about 135 bp, about 140 bp, about 145 bp, about 150 bp, about 155 bp, about 160 bp, about 165 bp, about 170 bp, about 175 bp, about 180 bp, bout 185 bp, about 190 b
- cfNAs provided herein or fragments thereof can be from about 10 bp to about 100 bp long. In some embodiments, the cfNAs provided herein or fragments thereof can be from about 30 bp to about 80 bp long. In some embodiments, the cfNAs provided herein or fragments thereof can be from about 40 bp to about 50 bp long.
- mcfNA can be present at higher concentrations relative to hcfNA at lengths that fall outside a nucleosomal interval.
- mcfNA can be enriched relative to hcfNA by enriching for cfNA of less than 180bp, less than 170bp, less than 160bp, less than 150bp, less than 140bp, less than 130bp, less than 120bp, less than HObp, less than lOObp, less than 90bp, less than 80bp, less than 70bp, less than 60bp, less than 50bp, less than 40bp, less than 30bp, or less than 20bp.
- enriching for mcfNA may comprise enriching for cfNA between 10-180 bp.
- a cfNA may comprise any nucleic acid described herein that is not encapsulated by a cell (e.g., a eukaryotic or microbial cell).
- a cfNA can originate from any nucleic acids described herein.
- a cfNA may comprise a plurality of chemical forms of deoxyribonucleic acid (DNA), ribonucleic acid (RNA), or DNA/RNA hybrid.
- nucleic acids may comprise a plurality of structural forms of DNA, RNA, or DNA/RNA hybrid.
- a cfNA may comprise linear nucleic acids or circular nucleic acids.
- a cfNA may comprise single stranded nucleic acids (ssNA), double strand nucleic acids (dsNA) or hybrid nucleic acids.
- nucleic acids can be from a genome of an organism or an organelle of a cell (e.g., an exosome or a mitochondria).
- a cfNA may comprise a mitochondrial DNA, an intercellular signal nucleic acid, an exogenous nucleic acid, a DNA enzyme, a RNA enzyme, a food-derived nucleic acid, any metabolic form of nucleic acid-based therapeutic, or any combination thereof.
- a cfNA can be derived from a member selected from the group consisting of genomic DNA, cDNA, mRNA, cRNA, tRNA, ribosomal RNA, miRNA, siRNA, nuclear DNA, nuclear RNA, mitochondrial DNA, mitochondrial RNA, exosomal DNA, exosomal RNA, fetal DNA, fetal RNA, plasmids, vectors, and any combination thereof.
- nucleic acids may comprise a mixture of nucleic acids from various sources.
- nucleic acids can be derived from a plurality of biological fluids.
- nucleic acids can be from a plurality of organisms.
- nucleic acids can be from a subject described herein.
- nucleic acids can be from one or more species of microbes described herein.
- nucleic acids may comprise environmental nucleic acids.
- environmental nucleic acids may comprise any nucleic acid at or near a sample collection site, or any nucleic acid introduced by personnel, equipment or a reagent used in collecting and/or processing a sample from a subject.
- performance of a machine-learning classifier as disclosed herein can be measured with a performance metric.
- a performance metric may comprise at least one of accuracy, sensitivity, specificity, area under the curve (AUC), a receiver operator curve (ROC), Area under the receiver operator curve (AUCROC), f-measure, negative predictive value (NPV), positive predictive value (PPV), or a confusion matrix.
- the methods provided herein can detect disease, disorders or infections with a relatively high performance metric. For example, in some embodiments, the methods provided herein can detect disease or infections with a relatively high AUC, accuracy, specificity, negative predictive value (NPV), positive predictive value (PPV) and/or high sensitivity.
- Sensitivity, Positive Percent Agreement (PPA), or true positive rate (TPR) may refer to an equation of TP/(TP+FN) or TP/(total number of infected subjects), where TP is the number of true positives and FN is the number of false negatives.
- PPA Positive Percent Agreement
- TPR true positive rate
- the value can reflect the total number of positive disease results (e.g., UC, CD) or total number of infection results based on a particular independent method of detecting the disease or infection (e.g., blood culture, PCR, biopsy, endoscopy).
- the methods generally may have a very high sensitivity, e.g., a sensitivity of greater 70%, 75%, 80%, 85%, 90%, 95%, 99%, or 99.5%).
- the methods can detect a subtype of inflammatory bowel disease (IBD) such as ulcerative colitis (UC) or Crohn’s disease (CD) at a very high sensitivity, e.g., a sensitivity of greater 70%, 75%, 80%, 85%, 90%, 95%, 99%, or 99.5%.
- IBD inflammatory bowel disease
- UC ulcerative colitis
- CD Crohn’s disease
- the methods can detect other inflammatory diseases (e.g., celiac disease) at a very high sensitivity, e.g., a sensitivity of greater 70%, 75%, 80%, 85%, 90%, 95%, 99%, or 99.5%.
- the methods provided herein may detect a disease or infection that is not detected or detectable by other methods, such as plate culturing, biopsy, scope, or polymerase chain reaction (PCR).
- the methods can detect a subtype of inflammatory bowel disease (IBD) such as ulcerative colitis (UC) or Crohn’s disease (CD) with a very high sensitivity in comparison to standard of care measurements (e.g., a standard of care measurement via fecal- calprotectin and high sensitivity-C-reactive protein).
- IBD inflammatory bowel disease
- CD Crohn’s disease
- the methods provided herein can detect a subtype of inflammatory disease with a sensitivity at least 5%, 10%, 20%, 30%, 40%, 50%, 60%, or 70% greater than a standard of care measurement (e.g., a standard of care measurement via fecal-calprotectin and high sensitivity-C-reactive protein).
- Negative Percent Agreement or true negative rate may refer to an equation such as TN/(TN+FP) or TN/(total number of uninfected subjects), where TN is true negative and FP is false positive.
- the value can reflect the total number of actual “non-infections” or “non-disease” as determined by an independent method of detecting disease or infection (e.g., blood culture, PCR, biopsy, endoscopy).
- the methods generally may have a high specificity, e.g., a specificity of greater than 70%, 75%, 80%, 85%, 90%, 95%, 99%, or 99.5%.
- the methods provided herein may provide a specificity (or negative percent agreement) and/or sensitivity (or positive percent agreement) that is at least 70%, 75%, 80%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or more.
- the nominal specificity is greater than or equal to 70%.
- the nominal negative predictive value (NPV) is greater than or equal to 95%.
- the NPV is at least 95%, 95.5%, 96%, 96.5%, 97%, 97.5%, 98%, 98.5%, 99%, 99.5% or more.
- the methods can detect a subtype of inflammatory bowel disease (IBD) such as ulcerative colitis (UC) or Crohn’s disease (CD) at a very high specificity, e.g., a specificity of greater 70%, 75%, 80%, 85%, 90%, 95%, 99%, or 99.5%.
- IBD inflammatory bowel disease
- CD Crohn’s disease
- the methods can detect other inflammatory diseases (e.g., celiac disease) at a very high specificity, e.g., a specificity of greater 70%, 75%, 80%, 85%, 90%, 95%, 99%, or 99.5%.
- the methods provided herein may detect a disease or infection that is not detected or detectable by other methods, such as plate culturing, biopsy, scope, or polymerase chain reaction (PCR).
- AUC-ROC Area Under the Receiver Operating Characteristic Curve quantifies the overall ability of a model to discriminate between positive and negative classes, integrating the trade-off between sensitivity (true positive rate) and specificity (true negative rate).
- an AUC may be 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, or 1.
- an AUC may be greater than 0.5, greater than 0.6, greater than 0.7, greater than 0.8, or greater than 0.9.
- an AUC may be greater than 0.50, greater than 0.51, greater than 0.52, greater than 0.53, greater than 0.54, greater than 0.55, greater than 0.56, greater than 0.57, greater than 0.58, greater than 0.59, greater than 0.60, greater than 0.61, greater than 0.62, greater than 0.63, greater than 0.64, greater than 0.65, greater than 0.66, greater than 0.67, greater than 0.68, greater than 0.69, greater than 0.70, greater than 0.71, greater than 0.72, greater than 0.73, greater than 0.74, greater than 0.75, greater than 0.76, greater than 0.77, greater than 0.78, greater than 0.79, greater than 0.80, greater than 0.81, greater than 0.82, greater than 0.83, greater than 0.84, greater than 0.85, greater than 0.86, greater than 0.87, greater than 0.88, greater than 0.89, greater than 0.90, greater than 0.91, greater than 0.92, greater than 0.93, greater than 0.94, greater than 0.95, greater than 0.96, greater than 0.97, greater than 0.98,
- AUC values for a disease can provided as UC positive versus UC negative (Crohn’s disease (CD) positive + healthy).
- AUC values for a disease such as CD can be CD positive versus CD negative (UC positive + healthy).
- AUC values for healthy can be healthy positive versus healthy negative (UC positive + CD positive).
- the performance metric is a measure of accuracy.
- accuracy can be calculated by the formula (TP+TN)/(TP+TN+FP+FN).
- the sample is identified as reflecting a disease or infection in the subject with an accuracy of greater than 75%, 80%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.5% or more. In some cases, the accuracy is greater than 95%.
- the methods can detect a subtype of inflammatory bowel disease (IBD) such as ulcerative colitis (UC) or Crohn’s disease (CD) at a very high accuracy, e.g., an accuracy of greater 70%, 75%, 80%, 85%, 90%, 95%, 99%, or 99.5%.
- IBD inflammatory bowel disease
- CD Crohn’s disease
- the methods can detect other inflammatory diseases (e.g., celiac disease) at a very high accuracy, e.g., an accuracy of greater 70%, 75%, 80%, 85%, 90%, 95%, 99%, or 99.5%.
- the methods provided herein may detect a disease or infection that is not detected or detectable by other methods, such as plate culturing, biopsy, scope, or polymerase chain reaction (PCR).
- the results of the sequencing analysis of the methods described herein provide a statistical confidence level that a given diagnosis is correct. In some cases, such statistical confidence level is above 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 99.5%.
- a sample comprising nucleic acids can be prepared prior to a sequencing assay.
- a raw biological sample comprising whole blood can be processed by centrifugation to generate an initial sample of plasma.
- whole blood can be collected in a K2-EDTA tube. In some embodiments, whole blood draws are not be pooled. In some embodiments, a tube can be gently inverted multiple times after draw. In some embodiments, a tube can be centrifuged for about 1200 RCF (g), about 1400 RCF (g), about 1600 RCF (g), or more after draw to separate plasma from the blood. The centrifugation can occur at ambient temperature. In some cases, the centrifugation occurs for greater than 5 minutes, 7 minutes, 10 minutes, 15 minutes or 20 minutes. In some embodiments, for tubes containing less than 4 mL a tube manufacturer’s instruction and centrifugation speed and time can be used.
- the plasma fraction can be transferred into a new tube.
- the plasma is subjected to centrifugation a second time to remove residual cells (e.g., mammalian cells and microbial cells).
- the additional centrifugation can be conducted at, e.g., about 1400 RCF (g), about 1600 RCF (g), about 1800 RCF (g), about 2000 RCF (g), or more.
- a tube can be labelled with a patient’s first and last name, a unique identifier (DOB or MRN), and/or a date and time of specimen collection.
- DOB unique identifier
- a specimen if a specimen is unlikely to reach a testing facility within 96 hours of collection it can be frozen directly in K2-EDTA after centrifugation.
- a tube if a gel plug does not rise to separate cells from plasma then a tube can be re-centrifuged at a higher speed.
- a specimen tube can be shipped to a testing facility.
- a nucleic acid can be extracted from a sample.
- an extraction may comprise separating nucleic acids from other cellular components and contaminants that can be present in a sample.
- a nucleic acid can be extracted from a sample using a liquid extraction (e.g., a Trizol, a DNAzol) technique.
- a liquid extraction e.g., a Trizol, a DNAzol
- an extraction can be performed by phenol chloroform extraction or precipitation by organic solvents (e.g., ethanol, or isopropanol).
- an extraction can be performed using a nucleic acid-binding column, a nucleic acid-binding spin column, or a combination thereof.
- an extraction of a cell-free nucleic acid can involve filtration or ultra-filtration.
- a nucleic acid can be extracted or purified by use of magnetic beads that bind nucleic acids.
- a salt concentration or a polyalkylene glycol can be adjusted to control a strength of bonds between functional groups and a nucleic acid, allowing for controlled and reversible binding.
- a nucleic acid can be released from a magnetic particle with an elution buffer.
- the methods described herein may comprise physically enriching a population of cfNA.
- physically enriching a population of cfNA comprises isolating, extracting, or selectively amplifying a desired population of cfNA (e.g., target cfNA) from the initial sample.
- physically enriching a population of cfNA does not comprise isolating or extracting the desired population of cfNA from the initial sample.
- physically enriching a population of cfNA does not comprise amplifying the desired population of cfNA.
- physically enriching a population of cfNA comprises removing the undesired cfNA population (e.g., non-target cfNA or contaminants) from the initial sample.
- physically enriching cfNA may comprise differentiating and/or selecting cfNA by one or more characteristics comprising size, sequence, GC content, secondary structure, biological source, or protein-binding.
- the methods comprise physically enriching microbial cfNA (mcfNA) in the biological sample.
- physically enriching mcfNA comprises physically enriching cfNA that are less than about 60 bp, 80 bp, 90 bp, 100 bp, 110 bp, 120 bp, 130 bp, 140 bp, 150 bp, 160 bp, 170 bp, 180 bp, 190 bp, 200 bp, 210 bp, 220 bp, 230 bp, 240 bp, or 250 bp in length.
- physically enriching mcfNA comprises amplifying the nucleic acids in the initial sample with primers containing non-human nucleic acid sequences. In some embodiments, physically enriching mcfNA comprises removing non-microbial cfNA. In some embodiments, physically enriching mcfNA comprises removing nucleosome-bound cfNA. In some embodiments, the methods comprise physically enriching non-microbial cfNA (e.g., host cfNA) in the biological sample. In some embodiments, the methods comprise physically enriching mammalian cfNA in the biological sample. In some embodiments, the methods comprise physically enriching degraded nucleic acids (e.g., degraded DNA).
- the degraded nucleic acids comprise degraded cfNA. In some embodiments, the degraded nucleic acids comprise degraded double-straded cfNA (e.g., degraded dscfNA) or degraded singlestranded cfNA (degraded sscfNA).
- the sample comprising degraded DNA can be ancient, formalin-fixed paraffin-embedded (FFPE) samples, or samples which have undergone many freeze-thaw cycles.
- the methods comprise physically enriching cfNA by size selection.
- size selection may comprise removing nucleic acids not in the desired size range.
- the desired size range comprises an artificial or engineered threshold.
- size selection comprises separating nucleic acids by size via chromatography (e.g., size-exclusion chromatography), electrophoresis (e.g., gel or capillary electrophoresis), centrifugation (e.g., density-gradient centrifugation), filtration (e.g., membrane ultrafiltration), magnetic bead-based methods (e.g., SPRI beads), affinity-based methods (e.g., streptavidin beads), or any combination thereof.
- chromatography e.g., size-exclusion chromatography
- electrophoresis e.g., gel or capillary electrophoresis
- centrifugation e.g., density-gradient centrifugation
- filtration e.g., membrane ultrafiltration
- magnetic bead-based methods e
- the methods comprise physically enriching cfNA by size selection.
- size selection may comprise removing nucleic acids not in the desired size range.
- the desired size range comprises an artificial or engineered threshold.
- size selection comprises separating nucleic acids by size via chromatography (e.g., size-exclusion chromatography), electrophoresis (e.g., gel or capillary electrophoresis), centrifugation (e.g., density-gradient centrifugation), filtration (e.g., membrane ultrafiltration), magnetic bead-based methods (e.g., SPRI beads), affinity-based methods (e.g., streptavidin beads), or any combination thereof.
- chromatography e.g., size-exclusion chromatography
- electrophoresis e.g., gel or capillary electrophoresis
- centrifugation e.g., density-gradient centrifugation
- filtration e.g., membrane ultrafiltration
- magnetic bead-based methods e
- the methods described herein may comprise discriminating or differentiating cfNA by size and/or selectively isolating or extracting the cfNA of the desired size range.
- the methods comprise differentiating microbial cfNA from the non- microbial cfNA (e.g., host cfNA) in the sample and selectively removing the non-microbial cfNA in the sample by size.
- the non-microbial cfNA comprises mammalian cfNA (e.g., human or animal cfNA).
- the methods may comprise selectively removing nucleic acid fragments greater than about 500 bp, about 450 bp, about 400 bp, about 350 bp, about 300 bp, about 250 bp, about 200 bp, about 150 bp, about 140 bp, about 130 bp, about 120 bp, about 110 bp, about 100 bp, about 90 bp, about 80 bp, about 70 bp, or about 60 bp in length.
- the methods may comprise selectively physically enriching nucleic acid fragments at most about 20 bp, about 30 bp, about 40 bp, about 50 bp, about 60 bp, about 70 bp, about 80 bp, about 90 bp, about 100 bp, about 110 bp, about 120 bp, about 130 bp, about 140 bp, about 150 bp, about 160 bp, about 170 bp, about 180 bp, about 190 bp, about 200 bp, about 210 bp, about 220 bp, about 230 bp, about 240 bp, or about 250 bp in length.
- the methods may comprise selectively physically enriching nucleic acid fragments of about 10 bp to about 20 bp, about 10 bp to about 30 bp, about 10 bp to about 40 bp, about 10 bp to about 50 bp, about 10 bp to about 60 bp, about 10 bp to about 70 bp, about 10 bp to about 80 bp, about 10 bp to about 90 bp, about 10 bp to about 100 bp, about 10 bp to about 110 bp, about 10 bp to about 120 bp, about 10 bp to about 130 bp, about 10 bp to about 140 bp, about 10 bp to about 150 bp, about 10 bp to about 160 bp, about 10 bp to about 170 bp, about 10 bp to about 180 bp, about 10 bp to about 190 bp, about 10 bp to about 200 bp,
- the methods may comprise selectively physically enriching nucleic acid fragments of about 20 bp to about 250 bp, about 20 bp to about 200 bp, about 20 bp to about 150 bp, about 20 bp to about 100 bp, about 20 bp to about 90 bp, about 20 bp to about 80 bp, about 20 bp to about 70 bp, about 20 bp to about 60 bp, about 20 bp to about 50 bp, about 30 bp to about 250 bp, about 30 bp to about 200 bp, about 30 bp to about 150 bp, about 30 bp to about 100 bp, about 30 bp to about 90 bp, about 30 bp to about 80 bp, about 30 bp to about 70 bp, about 30 bp to about 60 bp, about 30 bp to about 50 bp, about 40 bp to about 250 bp, about 40 bp to
- the methods may comprise denaturing nucleic acids. Denaturation may cause all, most, part, or a sufficient part for detection, of the double-stranded nucleic acids to become single-stranded. Denaturation may occur at any step in the process. In some embodiments, denaturation may remove all, most, or part of the secondary, tertiary, or quaternary structure of double-stranded or single-stranded nucleic acids.
- any type of initial sample may be subjected to the denaturation step, including samples that contain, or are suspected to contain, only double-stranded nucleic acids, only single-stranded nucleic acids, a mixture of double-stranded and single-stranded nucleic acids, or any higher order nucleic acid structure.
- the nucleic acids may be denatured using any method known in the art.
- single-stranded nucleic acids in the sample arise as a result of being subjected to denaturation.
- the nucleic acids in the sample are single-stranded because they were originally single- stranded when they were obtained from the subject, e.g., without limitation, single- stranded viral genomic RNA or single-stranded DNA or as a result of shipping and handling conditions.
- denaturation is accomplished by applying heat to the sample for an amount of time sufficient to denature double-stranded nucleic acids of interest or to denature secondary, tertiary, or quaternary structures of double-stranded or single-stranded nucleic acids.
- the sample may be denatured by heating at 95 °C, or within a range from about 65 to about 110 °C, such as from about 85 to about 100 °C.
- the sample may be heated at any temperature between about 50 °C and about 110 °C for any length of time sufficient to effectuate the denaturation, e.g., from about 1 second to about 60 minutes.
- long nucleic acids such as intact dsRNA viruses may require longer denaturation times.
- denaturation is performed in order to ensure that all, most, or part of the nucleic acids or nucleic acids of interest within a sample are present in single-stranded form.
- denaturation comprises selective denaturation to enrich certain nucleic acids.
- selective denaturation comprises a denaturation step effective for the selection of fragments of a certain length and/or GC-content.
- selective denaturation comprises incubation at selected or elevated temperatures.
- the selective denaturation step comprises incubation at a temperature of about 45°C, about 50°C, about 55°C, about 60°C, about 65°C, about 70°C, about 75°C, about 80°C, about 85°C, about 90°C, about 95°C, or about 100°C.
- denaturation comprises chemical denaturation.
- chemical denaturation comprises adding one or more denaturing agents for a selective or controlled denaturation.
- the denaturing agent comprises alkaline agents (e.g., NaOH), formamide, guanidinium chloride, guanidine, sodium salicylate, dimethyl sulfoxide (DMSO), propylene glycol, betaine, or urea.
- salts may comprise, consist of, or consist essentially of, for example, without limitation, NaCl and MgC12.
- denaturation comprises mechanical denaturation.
- mechanical denaturation can comprise sonication, applying a magnetic field (e.g.
- selective denaturation comprises incubation at a selected time.
- the selected time comprises about 5 seconds, about 10 seconds, about 15 seconds, about 20 seconds, about 25 seconds, about 30 seconds, about 35 seconds, about 40 seconds, about 45 seconds, about 50 seconds, about 55 seconds, about 1 minute, about 2 minutes, about 3 minutes, about 4 minutes, about 5 minutes, about 6 minutes, about 7 minutes, about 8 minutes, about 9 minutes, about 10 minutes, about 11 minutes, about 12 minutes, about 13 minutes, about 14 minutes, about 15 minutes, about 16 minutes, about 17 minutes, about 18 minutes, about 19, minutes about 20 minutes, about 21 minutes, about 22 minutes, about 23 minutes, about 24 minutes, about 25 minutes, about 26 minutes, about 27 minutes, about 28 minutes, about 29 minutes, about 30 minutes, about 31 minutes, about 32 minutes, about 33 minutes, about 34 minutes, about 35 minutes, about 36 minutes, about 37 minutes, about 38 minutes, about 39 minutes, about 40 minutes,
- the sample comprising the cell-free nucleic acids can be treated to denature other components in the sample that are not nucleic acids.
- the sample can be treated with an enzyme.
- the sample can be treated with a protease to denature the proteins in the sample.
- the protease can comprise a serine protease, a cysteine protease, an aspartic protease, a metalloprotease, or any combination thereof.
- the protease can be a broad-spectrum protease.
- the protease can comprise trypsin, chymotrypsin, elastase, papain, carboxypeptidase A, proteinase K, or a combination thereof.
- the protease comprises a thermosensitive protease.
- the methods comprise treating the initial sample with proteinase K.
- the methods further comprise inactivating the thermosensitive protease by heating.
- the proteins and nucleic acids in the initial sample are denatured at the same time.
- Process 1 provides an example of preparing a sequencing library from the double-stranded cfDNA in the original sample.
- a control molecule as described herein can be added to an initial sample of plasma to generate a spiked plasma sample.
- nucleic acid extraction can be performed on a spiked plasma sample to generate purified and concentrated cfDNA.
- a library preparation process can be performed on a purified and concentrated cfDNA sample.
- the library preparation may comprise
- library preparation may comprise performing unbiased amplification on the sample.
- a sample preparation method does not comprise extracting nucleic acids from a raw or initial sample. For example, in some cases, nucleic acids can be extracted during or following library preparation, if at all.
- an adapter pair can be attached to cfDNA fragments in a sample such as by ligation or PCR amplification.
- a pair of adapters may comprise a p5 adapter that is attached to a 5’ end of a molecule and a p7 adapter that is attached to a 3’ end of a molecule.
- a p5 and p7 sequence can allow a nucleic acid library to bind and generate clusters on a flow cell.
- the cfDNA can be attached to adapters comprising identifier sequences that can differentiate between multiple samples.
- the multiple samples comprise a plurality of patient samples and/or control samples (e.g., positive control, negative control).
- samples can be pooled after barcoding, then sequenced, then de-multiplexed to assign each cluster to its sample.
- Exemplary Process 2 provides an example of preparing a sequencing library from the double-stranded and single-stranded cfDNA in the original sample.
- a control molecule as described herein can be added to an initial sample of plasma to generate a spiked plasma sample.
- nucleic acid extraction can be performed on a spiked plasma sample to generate purified and concentrated cfDNA.
- a sample preparation method does not comprise extracting nucleic acids from a raw or initial sample.
- nucleic acids can be extracted during or following library preparation, if at all.
- the sample comprising cell-free nucleic acids can further comprise degraded cell-free nucleic acids (e.g., degraded ds-cfNA or degraded ss-cfNA).
- the sample comprising the cell-free nucleic acids can be treated to denature other components in the sample that are not nucleic acids.
- the sample can be treated with a protease.
- the protease is proteinase K.
- the protease e.g., proteinase K
- the protease can be inactivated by heating the sample.
- the cfDNA can be denatured to separate strands of double-stranded cfDNA, converting the double-stranded cfDNA into single-stranded cfDNA.
- the resulting sample may comprise single-stranded cfDNA derived from both double-stranded and single-stranded cfDNA in the original sample.
- the cfDNA can be denatured by heating for an amount of time sufficient to denature double-stranded nucleic acids or to denature secondary, tertiary, or quaternary structures of double-stranded or single-stranded nucleic acids.
- the cfDNA can be denatured at any temperature above 80°C. In some embodiments, the cfDNA can be denatured at about 95°C. In some embodiments, the methods comprise physically enriching cfNA by size selection. In some embodiments, the methods comprise physically enriching short cfDNA (e.g., 20 to 100 bases). In some embodiments, size selection comprises separating nucleic acids by size using a solid support. In some embodiments, the solid support comprises magnetic beads.
- the methods described herein may comprise attaching a splint adapter (e.g., a splint oligonucleotide) to the single-stranded cfDNA in the sample.
- a splint adapter e.g., a splint oligonucleotide
- the splint adapter may comprise dsDNA with a ssDNA overhang.
- the ssDNA overhang may comprise random or degenerate nucleotides (e.g., NNNNNN).
- the ssDNA overhang may comprise a specific sequence capable of hybridizing to a target molecule.
- the splint adapter can randomly hybridize to cfDNA via the singlestranded DNA overhang within the splint adapter.
- attaching the splint adapter can further comprise ligating the hybridized adapter to cfDNA, e.g., using an enzyme (e.g., ligase).
- the ligase comprises CircLigase II, CircLigase ssDNA Ligase, or Splint ligase.
- an adapter pair can be attached to cfDNA fragments in a sample such as by ligation or PCR amplification.
- a pair of adapters may comprise a p5 adapter that is attached to a 5’ end of a molecule and a p7 adapter that is attached to a 3’ end of a molecule.
- a p5 and p7 sequence can allow a nucleic acid library to bind and generate clusters on a flow cell.
- the cfDNA can be attached to adapters comprising identifier sequences that can differentiate between multiple samples.
- the sample identifiers are incorporated into the splint oligonucleotides.
- the multiple samples comprise a plurality of patient samples and/or control samples (e.g., positive control, negative control).
- multiple samples can be pooled after barcoding, then sequenced, then de-multiplexed to assign each cluster to its sample.
- nucleic acid library refers to a collection of nucleic acid fragments. In some embodiments, a collection of nucleic acid fragments can be used, for example, for sequencing.
- methods comprising subjecting nucleic acids in a nucleic acid library derived from a sample to a sequencing assay to generate sequence reads.
- the methods further comprise performing any necessary steps, methods, or techniques described herein to prepare the sample for the sequencing assay.
- the methods may comprise adding process control molecules (or synthetic spike-in molecules) to the sample, ligating adapters to the nucleic acids, or generating a nucleic acid library.
- the methods further comprise analyzing the sequence reads generated from the sequencing assay.
- analyzing the sequence reads comprises performing a bioinformatic analysis described herein.
- a bioinformatic analysis may comprise calculating an abundance of sequence reads, generating fragment length profiles of sequence reads, identifying and mapping sequence reads to known references to identify a source organism from which a sequenced nucleic acid was derived, or any combination thereof.
- a method can further comprise generating a report of a result of a sequencing assay.
- the methods described here can comprise sequencing short or degraded DNA.
- the methods can comprise sequencing a single-stranded DNA library prepared from the sample comprising cfNA described herein.
- the single-stranded library can be prepared by ligating an adapter oligonucleotide to the 3' ends of heat-denatured DNA.
- the adapter oligonucleotides can comprise an affinity tag for immobilizing onto a solid support.
- the adapter oligonucleotide can comprise a biotin and thus the adapted DNA strands can be immobilized on streptavidin-coated beads.
- the adapted DNA can be copied with a polymerase on the solid support.
- a second adapter can be attached by blunt-end ligation.
- the adapter oligonucleotide can comprise a splint oligonucleotide.
- the splinter oligonucleotide can be ligated to the heat-denatured DNA with a T4 ligase.
- the splinter oligonucleotide can comprise random bases hybridized to a 3 ' biotinylated donor oligonucleotide.
- a sequencing assay provided herein can be performed by any sequencing methods suitable for sequencing a nucleic acid provided herein.
- the sequencing assay herein may comprise massively parallel sequencing.
- the massively parallel sequencing may comprise whole genome sequencing.
- a massively parallel sequencing may comprise Next Generation Sequencing or a Next Next Generation sequencing.
- the methods provided herein comprise determining a concentration or quantity of a mcfDNA.
- the methods comprise monitoring a concentration or quantity of a mcfDNA over time.
- the method comprises identifying fragments of mcfDNA that vary during a course of treatment.
- the sequencing assay or the sequencing method comprises sequencing-by- synthesis.
- sequencing methods provided herein comprise Maxam-Gilbert sequencing-based techniques, chain-termination-based techniques, shotgun sequencing, bridge PCR sequencing, single-molecule real-time sequencing, ion semiconductor sequencing, nanopore sequencing, pyrosequencing, sequencing by synthesis, sequencing by ligation, sequencing by electron microscopy, dideoxy sequencing reactions (Sanger method), massively parallel sequencing, polony sequencing, DNA nanoball sequencing and any variation thereof.
- NGS Next Generation Sequencing
- NGS sequencing methods that allow for massively parallel sequencing of nucleic acid molecules during which a plurality, e.g., millions, of nucleic acid fragments from a single sample or from multiple different samples are sequenced simultaneously.
- Non-limiting examples of NGS include sequencing-by-synthesis, sequencing-by- ligation, real-time sequencing, and nanopore sequencing.
- sequencing involves hybridizing a primer to the template to form a template/primer duplex, contacting the duplex with a polymerase enzyme in the presence of detectably labeled or unlabeled nucleotides under conditions that permit the polymerase to add labeled or unlabeled nucleotides to the primer in a template-dependent manner, detecting a signal from the incorporated labeled nucleotide or detecting a signal resulting from the process of incorporating labeled or unlabeled nucleotide (e.g., proton release), and sequentially repeating the contacting and/or detecting steps at least once, wherein sequential detection of incorporated labeled or unlabeled nucleotide determines the sequence of the nucleic acid.
- a polymerase enzyme in the presence of detectably labeled or unlabeled nucleotides under conditions that permit the polymerase to add labeled or unlabeled nucleotides to the primer in a template-dependent manner
- exemplary detectable labels include radiolabels, fluorescent labels, protein labels, dye labels, enzymatic labels, etc.
- the detectable label can be an optically detectable label, such as a fluorescent label.
- Exemplary fluorescent labels include cyanine, rhodamine, fluorescein, coumarin, BODIPY®, Alexa Fluor®, or conjugated multi-dyes.
- a method disclosed herein may comprise calculating an abundance of sequence reads generated from a nucleic acid present in a sample.
- a method may comprise calculating an abundance of nucleic acids (e.g., cell-free nucleic acids (cfNA)) in a sample.
- an abundance of nucleic acids in a sample can be calculated based on an abundance of sequence reads generated from a nucleic acid in a sample.
- an abundance described herein may comprise an absolute abundance, a relative abundance, or a normalized abundance.
- a relative abundance or a normalized abundance can be calculated based on a reference value.
- a reference value may comprise an abundance of sequence reads generated from any nucleic acids in the sample.
- the reference value may be an abundance of sequence reads from single-stranded cell-free DNA (sscfDNA) or of sequence reads from a synthetic spike-in molecule (or process control molecule).
- the methods further comprise calculating the relative abundance of a sequence read at least in part by comparing the abundance of the sequence read to the abundance of another sequence read.
- the methods further comprise calculating the normalized abundance of a sequence read at least in part by comparing the abundance of the sequence read from single-stranded cfDNA in the sample to the abundance of sequence reads from synthetic spike-in molecules.
- the methods further comprise mapping a sequence read to a reference sequence.
- the reference sequence may comprise a microbial sequence.
- the reference sequence comprises a non-microbial sequence.
- the reference sequence comprises a genomic sequence.
- the method further comprise mapping a sequence read to a reference genome.
- a method disclosed herein may comprise identifying a species of microbe described herein by mapping a sequence read to a reference genome.
- the methods further comprise identifying the source of nucleic acids in the sample from which the sequence read originated.
- the methods further comprise performing a bioinformatic analysis.
- performing the bioinformatics analysis comprises assembling sequence data, detecting and quantifying sequence reads, distinguish populations of nucleic acids, detecting the presence and measuring the abundance of microbial nucleic acids, comparing sequence reads, comparing abundances of sequence reads, identifying contaminant nucleic acids from the sample collection site, identifying target nucleic acids (e.g., cell-free nucleic acids), identifying host nucleic acids, generating fragment lengths profiles of microbial nucleic acids, generating fragment lengths profiles of control process molecules, comparing fragment lengths profiles of the microbial nucleic acids, detecting site of infection, detecting the state of infection, detecting the risk of organ rejection in a transplant patient, determining the eligibility of a subject for a transplant, detecting potential for drug resistance, or any combination thereof.
- a host may comprise a subject described herein.
- sequence reads identified as non-host can then be aligned to a nucleotide database.
- the nucleotide database comprises microbial reference sequences.
- the database can be selected for those microbial sequences known to be associated with the host, e.g., the set of commensal and pathogenic microorganisms of the subject (e.g., animal or human).
- the microbial database can be optimized to mask or remove contaminating sequences.
- sequence reads can be aligned to a reference sequence comprising artifactual sequences.
- regions that show irregularities in read coverage when multiple samples are aligned can be masked or removed as an artifact.
- the detection of such irregular coverage can be done by various metrics, such as the ratio between coverage of a specific nucleotide and the average coverage of the entire contig within which this nucleotide is found.
- a sequence that is represented as greater than about 5x, about 10*, about 25*, about 50*, about 100/ the average coverage of the reference sequence comprising artifactual sequences can be artifactual.
- a binomial test can be applied to provide a per-base likelihood of coverage given the overall coverage of the contig.
- each high confidence read can align to multiple organisms in the given microbial database.
- an algorithm can be used to compute the most likely organism (for example, see Lindner et al. Nucl. Acids Res. (2013) 41 (1): elO, which is referenced herein in its entirety).
- GRAMMy or GASiC algorithms can be used to compute the most likely organism that a given read came from.
- alignments and assignment to a host sequence or to a non-host (e.g., microbial) sequence can be performed in accordance with art-recognized methods. For example, a read of 50 nt. can be assigned as matching a given genome if there is not more than 1 mismatch, not more than 2 mismatches, not more than 3 mismatches, not more than 4 mismatches, not more than 5 mismatches, etc. over the length of the read.
- publicly available algorithms can be used for alignments and identification.
- these assignments of reads to an organism can then be totaled and used to compute the estimated number of reads assigned to each organism in a given sample, in a determination of the prevalence of the organism in the sample (for example, a cell-free nucleic acid sample). In some embodiments, this information can be used to determine an origin of a pathogen or contaminant. In some embodiments, the analysis described herein can be used to normalize the counts for the size of the microbial genome to provide a calculation of coverage for a microbe.
- the normalized coverage for each microbe can be compared to the host sequence coverage in the same sample to account for differences in sequencing depth between samples.
- a dataset of microbial organisms represented by sequences in the sample, and the prevalence of those microorganisms can be optionally aggregated and displayed for ready visualization, e.g., in the form of a report.
- the methods provided herein comprise generating sequence reads (or sequencing reads) from the nucleic acids in the sample. “Sequence read,” “sequenced read,” and “sequencing read” are used interchangeably herein.
- the methods further comprise detecting, identifying, analyzing, processing, comparing, aligning, or mapping the sequence reads generated from the nucleic acids in the sample.
- the sequence reads are generated from the microbial cell-free nucleic acids (mcfNA) in the sample as described herein.
- the methods comprise mapping the sequence reads generated from the nucleic acids in the sample to a reference sequence.
- the reference sequence may comprise a microbial sequence.
- the microbial sequence comprises a region of a microbial genome.
- the reference sequence comprises an artifactual sequence.
- a sequencing can generate at least 100, at least 250, at least 500, at least 750, at least 1000, at least 1500, at least 2500, at least 3000, at least 3500, at least 4000, at least 4500, at least 5000, at least 5500, at least 6000, at least 7000, at least 8000, at least 9000, at least 10,000, at least 12,500, at least 15,000, at least 17,500, at least 20,000, at least 30,000, at least 40,000, at least 50,000, at least 60,000, at least 70,000, at least 80,000, at least 90,000, at least 100,000, at least 200,000, at least 300,000, at least 400,000, at least 500,000, at least 600,000, at least 700,000, at least 800,000, at least 900,000, at least 1,000,000, at least 2,000,000, at least 3,000,000, at least 4,000,000, at least 5,000,000, at least 6,000,000, at least 7,000,000, at least 8,000,000, at least 9,000,000, or at least 10,000,000 sequence
- kits and systems configured to perform a method disclosed herein.
- a kit or system may comprise a nucleic acid sequencer for generating DNA or RNA sequence information.
- a kit or system can further comprise a computer comprising software that performs a bioinformatics analysis on the DNA or RNA sequence information.
- a bioinformatics analysis can include, without limitation, assembling sequence data, detecting and quantifying sequence reads, distinguish populations of nucleic acids, detecting the presence and measuring the abundance of microbial nucleic acids, comparing sequence reads, comparing abundances of sequence reads, identifying contaminant nucleic acids from the sample collection site, identifying target nucleic acids (e.g., cell-free nucleic acids), identifying host nucleic acids, generating fragment lengths profiles of microbial nucleic acids, generating fragment lengths profiles of control process molecules, comparing fragment lengths profiles of the microbial nucleic acids, detecting site of infection, detecting the state of infection, detecting the risk of organ rejection in a transplant patient, determining the eligibility of a subject for a transplant, and/or detecting potential for drug resistance.
- target nucleic acids e.g., cell-free nucleic acids
- host nucleic acids e.g., cell-free nucleic acids
- the kit or system can also include computer control systems with machine-executable instructions (e.g., software) to implement the methods.
- FIG. 41 shows a computer system 4101 that is programmed or otherwise configured to implement methods of the present disclosure
- the computer system 4101 includes a central processing unit (CPU, also “processor” and “computer processor” herein) 4105, which can be a single core or multi core processor, or a plurality of processors for parallel processing.
- the computer system 4101 also includes memory or memory location 4110 (e.g., random-access memory, read-only memory, flash memory), electronic storage unit 4115 (e.g., hard disk), communication interface 4120 (e.g., network adapter) for communicating with one or more other systems, and peripheral devices 4125, such as cache, other memory, data storage and/or electronic display adapters.
- memory or memory location 4110 e.g., random-access memory, read-only memory, flash memory
- electronic storage unit 4115 e.g., hard disk
- communication interface 4120 e.g., network adapter
- peripheral devices 4125 such as cache, other memory, data storage and/or electronic display adapters.
- the memory 4110, storage unit 4115, interface 4120, and peripheral devices 4125 are in communication with the CPU 4105 through a communication bus (solid lines), such as a motherboard.
- the storage unit 4115 can be a data storage unit (or data repository) for storing data.
- the computer system 4101 can be operatively coupled to a computer network (“network”) 4130 with the aid of the communication interface 4120.
- the network 4130 can be the Internet, an internet and/or extranet, or an intranet and/or extranet that is in communication with the Internet.
- the network 4130 in some embodiments is a telecommunication and/or data network.
- the network 4130 can include one or more computer servers, which can enable distributed computing, such as cloud computing.
- the network 4130 in some embodiments with the aid of the computer system 4101, can implement a peer-to-peer network, which can enable devices coupled to the computer system 4101 to behave as a client or a server.
- the CPU 4105 can execute a sequence of machine readable instructions, which can be embodied in a program or software.
- the instructions can be stored in a memory location, such as the memory 4110.
- the instructions can be directed to the CPU 4105, which can subsequently program or otherwise configure the CPU 4105 to implement methods of the present disclosure. Examples of operations performed by the CPU 4105 can include fetch, decode, execute, and writeback.
- the CPU 4105 can be part of a circuit, such as an integrated circuit. One or more other components of the system 4101 can be included in the circuit. In some embodiments, the circuit is an application specific integrated circuit (ASIC).
- ASIC application specific integrated circuit
- the storage unit 4115 can store files, such as drivers, libraries, and saved programs.
- the storage unit 4115 can store user data, e.g., user preferences and user programs.
- the computer system 4101 in some embodiments can include one or more additional data storage units that are external to the computer system 4101, such as located on a remote server that is in communication with the computer system 4101 through an intranet or the Internet.
- the computer system 4101 can communicate with one or more remote computer systems through the network 4130.
- the computer system 4101 can communicate with a remote computer system of a user.
- remote computer systems include personal computers, slate or tablet PC's, telephones, smart phones, or personal digital assistants.
- the user can access the computer system 4101 via the network 4130.
- the kit or system can be implemented by way of machine (e.g., computer processor) executable code stored on an electronic storage location of the computer system 4101, such as, for example, on the memory 4110 or electronic storage unit 4115.
- the machine executable or machine readable code can be provided in the form of software.
- the code can be executed by the processor 4105.
- the code can be retrieved from the storage unit 4115 and stored on the memory 4110 for ready access by the processor 4105.
- the electronic storage unit 4115 can be precluded, and machine-executable instructions are stored on memory 4110.
- the code can be pre-compiled and configured for use with a machine having a processor adapted to execute the code or can be compiled during runtime.
- the code can be supplied in a programming language that can be selected to enable the code to execute in a pre-compiled or as-compiled fashion.
- kits and systems can be embodied in programming.
- Various aspects of the technology can be thought of as “products” or “articles of manufacture” typically in the form of machine (or processor) executable code and/or associated data that is carried on or embodied in a type of machine readable medium.
- Machine-executable code can be stored on an electronic storage unit, such as memory (e.g., read-only memory, randomaccess memory, flash memory) or a hard disk.
- “Storage” type media can include any or all of the tangible memory of the computers, processors or the like, or associated modules thereof, such as various semiconductor memories, tape drives, disk drives and the like, which can provide non- transitory storage at any time for the software programming.
- All or portions of the software can at times be communicated through the Internet or various other telecommunication networks. Such communications, for example, can enable loading of the software from one computer or processor into another, for example, from a management server or host computer into the computer platform of an application server.
- another type of media that can bear the software elements includes optical, electrical, and electromagnetic waves, such as used across physical interfaces between local devices, through wired and optical landline networks and over various air-links.
- the physical elements that carry such waves, such as wired or wireless links, optical links, or the like, also can be considered as media bearing the software.
- terms such as computer or machine “readable medium” refer to any medium that participates in providing instructions to a processor for execution.
- a machine readable medium such as computer-executable code
- a machine readable medium can take many forms, including but not limited to, a tangible storage medium, a carrier wave medium or physical transmission medium.
- Non-volatile storage media include, for example, optical or magnetic disks, such as any of the storage devices in any computer(s) or the like, such as can be used to implement the databases, etc. shown in the drawings.
- Volatile storage media include dynamic memory, such as main memory of such a computer platform.
- Tangible transmission media include coaxial cables; copper wire and fiber optics, including the wires that comprise a bus within a computer system.
- Carrier-wave transmission media can take the form of electric or electromagnetic signals, or acoustic or light waves such as those generated during radio frequency (RF) and infrared (IR) data communications.
- Common forms of computer-readable media therefore include for example: a floppy disk, a flexible disk, hard disk, magnetic tape, any other magnetic medium, a CD- ROM, DVD or DVD-ROM, any other optical medium, punch cards paper tape, any other physical storage medium with patterns of holes, a RAM, a ROM, a PROM and EPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier wave transporting data or instructions, cables or links transporting such a carrier wave, or any other medium from which a computer can read programming code and/or data.
- Many of these forms of computer readable media can be involved in carrying one or more sequences of one or more instructions to a processor for execution.
- the computer system 4101 can include or be in communication with an electronic display 4135 that comprises a user interface (UI) 4140 for providing, an output of a report, which can include a diagnosis of a subject or a therapeutic intervention for the subject.
- UI user interface
- Examples of UI's include, without limitation, a graphical user interface (GUI) and web-based user interface.
- the analysis can be provided as a report.
- the report can be provided to a subject, to a health care professional, a lab-worker, or other individual.
- Methods and systems of the present disclosure can be implemented by way of one or more algorithms.
- An algorithm can be implemented by way of software upon execution by the central processing unit 4105. The algorithm can, for example, facilitate the enrichment, sequencing and/or detection of pathogen or microbe or other target nucleic acids.
- Information about a patient or subject can be entered into a computer system, for example, patient background, patient medical history, or medical scans.
- the computer system can be used to analyze results from a method described herein, report results to a patient or doctor, or come up with a treatment plan.
- a classifier may be a machine learning model.
- a classifier may be trained.
- training may be supervised.
- training may be unsupervised.
- training may comprise the use of training data.
- training data may comprise data (such as data obtained from processing of human cell-free DNA and/or data obtained from processing of microbial cell-free DNA).
- training data may comprise target variables (e.g., class labels).
- the classifiers performance may be assessed by comparing the output of the classifier to the target variables.
- the comparison may comprise at least one of AUCROC, AUC, specificity, sensitivity, accuracy, and/or f-measure or any combination thereof.
- training data may be partitioned.
- a classifier may be trained on a subset of the training data, such as a subset of the partitions, with the remaining training data being used to assess the classifiers performance (such as through validation).
- unsupervised learning may provide groupings of samples.
- groupings may be treatment groups.
- groupings may be by disease severity.
- groupings may be indicative of treatment groups.
- groupings may be indicative of disease severity.
- unsupervised learning may comprise a clustering method.
- data may be clustered.
- clustering may be unsupervised.
- clustering may comprise agglomerative clustering.
- clustering may comprise hierarchical clustering.
- the methods described herein are for individualized treatment for a subject with an inflammatory disease with an unknown subtype or severity.
- individualized treatment can include treating a disease or disorder identified using the methods provided herein.
- the methods provided herein can guide the determination of what type of endoscopic procedure should be performed on a subject.
- the methods comprise monitoring the efficacy of a therapy in a subject, monitoring a subject’s response to a surgical intervention, and/or modifying a therapeutic regimen depending on the subject's response to the therapy.
- the methods provided herein may also be used to detect an infection in the subject. Since many of the methods provided herein involve detection of microbial cell-free nucleic acids; in some cases, the mcfNA may be used to detect an infection. If an infection is detected, the subject may also receive a treatment for the infection in addition to the treatment for the inflammatory disease.
- a subtype of inflammatory disease e.g., UC or CD
- the methods provided herein may also be used to detect an infection in the subject. Since many of the methods provided herein involve detection of microbial cell-free nucleic acids; in some cases, the mcfNA may be used to detect an infection. If an infection is detected, the subject may also receive a treatment for the infection in addition to the treatment for the inflammatory disease.
- the methods disclosed herein can further comprise sequencing of the subject's DNA for genetic variations that are associated with therapeutic resistance to therapeutics or to a particular therapeutic.
- samples can be collected serially at various times before or during the course of the disease in order to track the progression of the disease. The course of disease can then determine the selection of any therapies or treatment (e.g., pharmacological tretment, surgery).
- the serially collected samples are compared to each other to determine whether the disease is improving or worsening in the subject.
- the subject can be treated prophylactically to prevent the further development or worsening of disease.
- the subject may be placed on a maintenance therapy.
- the methods described herein are for adjusting a therapeutic regimen.
- the subject can be administered a drug to treat an infection.
- methods provided herein can be used to track or monitor the efficacy of the drug treatment.
- the therapeutic regimen can be adjusted, depending on upward or downward course of the infection. For example, if the methods provided herein indicate that an infection is not improving with drug treatment, the therapeutic regimen can be adjusted by changing the type of drug or treatment, discontinuing the use of the drug, continuing the use of the drug, increasing the dose of the drug, or adding a new drug or treatment to the subject's therapeutic regimen.
- the methods can be used for to distinguish populations of nucleic acids or for detecting a microbe in a subject.
- the methods provide a more comprehensive view of the state and diversity of the infection or symbiotic microbes in a subject.
- the identification of both RNA and DNA in a sample can be useful to detect RNA and DNA type viruses, or to detect bacterial, protist, parasitic or fungal genomic DNA and/or gene expression products, e.g., mRNA.
- Such process can also be able to differentiate between latent infection (e.g., which might be indicated by the presence of integrated retroviral DNA) versus active infection (e.g., which might be indicated by the presence of viral RNA from intact viral particles).
- Such processes can also be able to detect drug resistance and/or the origin of infection.
- Such processes can also be used to analyze host response.
- Such analyses can include analysis of cell-free, circulating nucleic acids, e.g., for microbial or viral infection identification.
- an mcfNA in a sample obtained from a subject can originate from a microbe living in or on a subject.
- a mcfNA can be detected in a subject that has been exposed to an infectious disease.
- an mcfNA can be detected in a healthy individual.
- detection of mcfNA by the methods disclosed herein can be performed as a biomarker of infection.
- mcfNAs can be associated with communicable and/or non-communicable diseases.
- mcfNAs can be associated with a range of diseases and conditions of a subject described herein, including an infection, an inflammatory bowel disease (IBD), a Kawasaki disease (KD), a human immunodeficiency virus (HIV), a cardiovascular disease (CVD), a cystic fibrosis (CF), a pneumonia, a sepsis, a cancer, a gastric cancer (GC), a hepatocellular carcinoma (HCC), a melanoma, or any combination thereof.
- IBD inflammatory bowel disease
- KD Kawasaki disease
- HV human immunodeficiency virus
- CVD cardiovascular disease
- CF cystic fibrosis
- a pneumonia a pneumonia
- a sepsis a cancer
- GC gastric cancer
- HCC hepatocellular carcinoma
- melanoma or any combination thereof.
- a second set of classifiers were able to distinguish between endoscopically validated remission, mild, moderate, and severe disease cases (as determined by Mayo Endoscopic score or Crohn’s disease activity index) with AUCs >0.75. These results corresponded to greater than 37% (UC) and 18% (CD) increase in sensitivity in comparison to standard of care measurements via fecal -calprotectin and high sensitivity-C-reactive protein for this cohort.
- critical features in classifiers disclosed herein corresponded with well-known signatures from microbes associated with IBD or well-curated from the human microbiome.
- plasma microbial cell-free DNA (mcfDNA) provided a differentiated lens into the microbiomes of the human body that are otherwise unattainable by conventional stool-based analysis.
- the analysis of plasma mcfDNA as disclosed herein served as an additional means to assess inflammatory bowel disease type and severity.
- a classifier was developed based on microbial cell-free DNA, human cell-free DNA, as well as combinations of both analyte types, to predict the disease type (healthy vs UC vs CD), as well as the severity of disease and anatomic location of the disease.
- these classifiers may be used to assess mucosal healing and response to therapy in IBD, leveraging the underlying associated microbes and human signals, based on both single time point and multi-time-points trends.
- the microbial and human signatures, both separately and jointed, which enable the classifiers, are identified through the following methodology. Briefly, a plurality of filtering steps was applied and aimed at removing potential (analytically) false-positive microbial detections. Filters included, but were not limited to, control for reagent contamination risk, control for incomplete and/or inaccurate database influence, control for collection site contamination, control for site-specific signals, and control for broader batch-level effects. Low-frequency microbes were further filtered out that may not support high experimental precision. Differential abundance analysis was applied to identify subsets of microbes highly associated with the studied conditions.
- the classifiers disclosed herein were able to discriminate between Crohn’s disease, Ulcerative colitis, or otherwise healthy patient populations with an AUC >0.95. Similarly, a second set of classifiers were able to distinguish between endoscopically validated remission, mild, moderate, and severe disease cases (as determined by Mayo Endoscopic score or Crohn’s disease activity index) with AUCs ranging from 0.75 to 0.86. These results correspond to greater than 37% (UC) and 18% (CD) increase in sensitivity in comparison to standard of care measurements via fecal-calprotectin and high sensitivity-C-reactive protein for this cohort. Upon investigation, critical features in the classifiers disclosed herein were found to correspond with well-known signatures from microbes associated with IBD or well- curated from the human microbiome.
- Proteinase K was introduced into spiked samples in order to digest proteins. The samples were then incubated at 95°C in order to denature the nucleic acid fragments and to deactivate the Proteinase K and then immediately placed on ice. While still within the plasma, the single-stranded DNA fragments were subjected to reaction conditions enabling 3 ’-end adapter attachment and 5’ end adapter attachment. The fragments were then amplified using universal primers to generate the final dual-indexed NGS library. Finally, the samples were purified using a commercial magnetic bead NGS cleanup-and-size-selection kit per the manufacturer’s before running on a sequencer, which enabled capture of NGS library fragments with relatively short cfDNA inserts ⁇ 100 bases and >20 bases.
- Deeper sequencing was implemented to maximize the discovery of microbial features.
- the samples were sequenced using a NOVASeq 6000 sequencer by Illumina. Sequencing was conducted following the manufacturer’ s instructions.
- the generated libraries were sequenced with approximately 300- 400M paired-end high-quality reads per sample.
- Relative abundances were assigned to each taxon in a sample based on the sequencing reads and their alignments. For each combination of read and taxon, a read-sequence probability was defined that accounted for the divergence between the microorganism present in the sample and reference assemblies in the database. A mixture model was used to assign a likelihood to the complete collection of sequencing reads that included the read sequence probabilities and the (unknown) abundances of each taxon in the sample. An expectation-maximization algorithm was applied to compute the maximum likelihood estimate of each taxon abundance. From these abundances, the number of reads arising from each taxon were aggregated up the taxonomic tree.
- a classifier was further developed for ulcerative colitis (UC) capable of distinguishing remission from an active disease state with an AUC of 0.8, which is above the preliminary provided guideline of 0.7. It was concluded therefore that the classifier is performing with sound discriminatory function. Further, the established AUC to distinguish between the three disease severity groups and remission is equivalent to or greater than 0.7 for each class. It was noted that the critical microbial features identified by the classifier were those found to be well associated with UC and mucosal barrier function. The second portion of the report reflects the work focused on the analysis of the Crohn’s disease (CD) cohort, which led to the generation of a classifier capable of distinguishing remission from active disease with an AUC of 0.68. It is important to note that the AUC for moderate and severe were 0.68 and 0.8, respectively.
- CD Crohn’s disease
- compositions and methods disclosed herein were designed and optimized for infectious disease diagnostics and are provided as a laboratory-developed test (LDT) service via a CLIA-certified, CAP-accredited, and NYS-approved laboratory located in Redwood Shores, CA.
- LDT laboratory-developed test
- the mcfDNA next generation sequencing method is capable of reporting on over 1,500 unique microbes across prokaryotes, eukaryotes, archaea, and DNA viruses, with species-level specificity. Underlying that capability is an alignment against a highly curated genome reference database harboring over 20,000 genomic references, matching over 16,000 species.
- Biomarker Discovery built on enhancements made to both chemistry as well as the data and analytics components.
- the Biomarker Discovery platform was designed to maximize the ability to observe and quantify microbial diversity while maintaining high specificity of detection.
- the Biomarker Discovery platform was designed to maximize the ability to observe and quantify microbial diversity while maintaining high specificity of detection.
- the use of significantly deeper sequencing where plasma biosamples are processed and sequenced via NovaSeq 6000 S4 PE runs, resulting in approximately 300-400M paired-end reads per sample.
- No template control (NTC) were included across all processed batches to enable the removal of background microbial signal that may have originated from either the environment or the reagents used to process the samples.
- the plasma mcfDNA signal was characterized across the 56 samples to determine the adequate sequencing depth for microbial identification and quantification.
- MPM microbial cell-free DNA molecules per microliter
- FIG. 2A log-normal distribution
- a clear and notable finding of this plot is the separation in MPM distribution amongst the samples, suggesting a subset of samples had higher overall amounts of plasma mcfDNA. Given the absolute quantification produced by the methods disclosed herein, this observation suggests a global variation in microbial abundance, which is further explored below.
- FIG. 3A indicates an approximate linear relation between the down sampling rates and the fraction of microbes detected, in comparison to the full read set.
- a 20x reduction in sequencing depth (down sampling from 300-400M to 15-20M PE reads per sample) resulted in a ⁇ 5x reduction in the number of species detected.
- a virome data set is analyzed.
- the virome data set is compared to 626 viruses detected by mcfDNA next generation sequencing methods.
- the mcfDNA signal may consist of not only those mcfDNA fragments associated with the microbes present in the GI tract, but potentially other organ systems as well.
- the detection of elevated signal in the severe state suggests the mcfDNA next generation sequencing methods may be able to provide additional means to assess presumed barrier breakdown.
- PCoA Principal Coordinate Analysis
- FIG. 7A shows the top microbes associated with mediating the variance associated with endoscopy score. While Propionibactierum acnes appeared to be an overly dominant factor, the values of the other identified features also carry sufficient differentiating value. Interestingly, all of these microbes overlapped with the critical features from the independent Bray-Curtis beta-diversity assessment in FIG. 7A.
- the number of microbes with at least a 1-fold difference in abundance was also determined and further was determined to be statistically significant via the Wilcoxon Rank Sum Test between asymptomatic and the respective disease states (FIG. 7C).
- the differential abundance analysis indicates an important positive correlation between disease severity and an increasing number of differential abundant microbes detected in plasma mcfDNA. This finding fits well with the notion of barrier break down and follow up analysis will identify microbes that are unique to a given disease state or function in a continuum with disease severity. While the initial ordinations indicate a few dominant species of relevance to partitioning the data the differential analysis indicates many other microbes exist in the data as well.
- the identified microbes associated with remission versus asymptomatic could be potentially explained by the two remission samples that clustered with the UC disease states in FIG. 7A. It will be important to determine if the patients who provided these remission samples had subsequent recurrence as this finding could be of high relevance to clinicians.
- D. nishinomiyaensis is similar to Malassezia in that it is commonly found in other parts of the body as a commensal but takes on a harmful role in cases of UC.
- the abundance of such microbes in other parts of the body, such as skin is found to be intensified due to IBD.
- PERMANOVA analysis indicated statistical significance between endoscopy categories (P ⁇ 0.01), which accounted for 19.1% of the variance (FIG. 9B). Top microbes accounting for the PERMANOVA assessment continued to overlap with the critical features from the independent Bray-Curtis beta diversity assessment. In some embodiments, a pairwise PERMANOVA approach is utilized to identify specific features between endoscopic categories. In this example, the number of microbes with at least a 1-fold difference that were deemed to be statistically significant via the Wilcoxon Rank Sum Test (P ⁇ 0.05) was counted.
- FIG. 9C summarizes the number of microbes for each comparison. It was observed that CD exhibited an increase in the number of differentially detected microbes, as disease severity increased in comparison to asymptomatic state.
- Klebsiella pneumoniae was also identified as a top feature.
- the ability of the methods and compositions disclosed herein to detect and identify K. pneumoniae as a feature of importance in partitioning data is significant.
- Both Fusobacterium nucleatum and Haemophilus parainfluenzae were also identified as abundant in CD.
- Another aspect of the present example is the development of machine-learning classifiers capable of predicting remission from diseased state (mild, moderate, severe) by leveraging the analysis across hundreds of well-phenotyped plasma samples.
- Asymptomatic patient samples of the present example were incorporated into the remission grouping and ML classifiers were developed which were capable of differentiating between asymptomatic, remission versus mild, moderate, severe disease states for UC and CD. Both species and genus level abundance values were separately utilized to train the ML models of the present disclosure and classification was performed without additional hyper-parameter tuning, but with significant tree depth.
- machine learning techniques comprising ensemble techniques, conditional VAE, SVM, decision trees, random forest, regressors, neural networks, or any combination thereof can be used in the methods herein.
- compositions and methods disclosed herein were able to differentiate between asymptomatic/remission and mild, moderate, severe grouping resulting in an AUC of 0.67 ⁇ 0.25, Fl score of 0.80 ⁇ 0.15, and an accuracy of 0.84 ⁇ 0.13.
- Assessment of feature importance was initially performed via Shapley additive explanations (SHAP) and resulted in a diverse set of microbes, including both known and novel microbes for UC.
- Notable known genera increasing in abundance from asymptomatic/remission to mild/moderate/severe disease include Acinetobacter, Staphylococcus, Blautia, Anoxybacillus, Paracoccus.
- the classifier identified a set of genera more abundant in asymptomatic/remission samples versus mild/moderate/severe disease including Corynebacterium. It is important to note members of a genus could contain both pathogens, mutualists, and commensals.
- the ML models for Crohn’s disease disclosed herein exhibited a exceptional ability to differentiate between the asymptomatic, remission and mild, moderate, severe grouping resulting in an AUC of 0.77 +/- 0.23, Fl score of 0.90 +/- 0.09, and an accuracy of 0.84 +/- 0.13.
- Feature importance partitioning these groups was initially performed via SHAP and resulted in a diverse set of microbes, including both known and novel microbes for CD, with a variety of expression patterns.
- Notable known microbes increasing in abundance from asymptomatic, remission to mild, moderate, severe disease state included Malassezia restricta, Acinetobacter baumannii, Streptococcus sanguinis.
- Lactobacillus plantarum and crispatus viewed as food fermenters, commensal, or possessing beneficial probiotic properties were also increasing; however, could be viewed as evidence for translocation.
- D. nishinomiyaenis, M. restricta, and A. baumannii were found to function as three of most critical factors (FIG. 11C), all with significantly elevated expression amongst the mild, moderate, severe grouping.
- compositions and methods disclosed herein revealed distinct sets of known and novel microbes capable of partitioning samples in mild, moderate, severe versus asymptomatic, remission states for UC and CD.
- a patient population was characterized based on the information provided through the IBD-Plexus (FIGs. 12A-12C).
- the 168 UC samples from non-overlapping patients were selected across 6 sites, maintaining the disease severity distribution across sites when possible (FIG. 12A).
- Disease severity groupings were in a laddered 5:4:3:2 ratio: 60 remission, 48 mild, 36 moderate, and 24 severe samples.
- the design was critical for the identification of site-specific signals (viewed as a nuisance independent variable), which often confound microbiome studies. To further minimize confounders, the study design was structured with a balanced population of male and female samples within groups, as well as focused on non-Hispanic, Caucasian populations to minimize additional potential confounders.
- MMES Modified Mayo Endoscopic Score
- FIG. 12B The Mayo score considered additional factors, such as stool bleeding, stool frequency, and physician assessment, allowing for an aggregative diagnosis that is not reliant on endoscopy alone. It is important to note that patients with similar MMES across the four disease severity labels showcase the effect of the stool bleeding, frequency, and physicians’ assessment on the final disease severity label. Finer characterization of the Mayo endoscopic subscore for each section of the colon indicated a UC disease severity label of Mild being grouped by pan, low scoring UC or low scoring UC of the rectum (FIG. 12C).
- Moderate and Severe UC cases were associated with high specific scores to either the rectum and/or sigmoid colon, in addition to other sites, with a few being close to pancolitis.
- the UC population exhibited both diverse and overlapping signals in terms of sites of disease manifestation and disease scoring respectively.
- FIG. 13A summarizes the total number of patients on biologies, chemotherapeutics, corticosteroids, non-steroidal anti-inflammatory drugs (NSAIDs), probiotics, antibiotics, and anti-diarrheals. No correlations between treatment types and disease severity labels were observed (Spearman’s rho ⁇ 0.2) outside those between anti-diarrheals/probiotics and antibiotics/NSAIDs (FIG. 13B). The correlation amongst the patient populations based on their binary labels was further assessed across the 7 treatment types, as listed.
- Hierarchical clustering indicated patients partitioned into 6 populations defined by treatments with anti-diarrheal and probiotic in addition to immuno-suppressive via a mixture of either biologic, chemotherapeutic or corticosteroid (group 1), antibiotic and NSAID in addition to either a biologic, chemotherapeutic or corticosteroid (group 2), a biologic alone (group 3), biologies with chemotherapy and/or corticosteroids (group 4), chemotherapy alone (group 5, purple), or no treatment (group 6, light brown) (FIG. 13C).
- group 1 biologic, chemotherapeutic or corticosteroid
- group 3 a biologic alone
- biologies with chemotherapy and/or corticosteroids group 4
- chemotherapy alone group 5, purple
- no treatment group 6, light brown
- IBD Inflammatory Bowel Disease
- fecal calprotectin fecal calprotectin
- hs-CRP high sensitivity C-reactive protein
- FIG. 14A It was determined that 4 of the severe labels would qualify as severe colitis (>12 mg/L) and nearly all mild and most moderate cases would be viewed as remission ( ⁇ 3 mg/L) (FIG. 14A). Similarly, a concentration of 200 ug/g is viewed as a positive cut-off and 50-200 ug/g borderline for fecal calprotectin.
- FIG. 14B shows that the majority of fecal calprotectin results were found below the positive cut-off and within the borderline range in a non-discriminatory manner.
- the between- sample diversity was assessed as a means to perform feature selection for classification as well as to appreciate large-scale unsupervised partitions in the data.
- a PCoA was performed using the Bray-Curtis distance metric, as well as the weighted unifrac.
- a supervised approach was then applied to dimensional reduction and feature extraction using linear discriminant analysis (LDA, FIG. 18). Partitions were observed via LDA, indicating a subset of the global signatures had the ability to appropriately partition the data. It was noted that the lack of a continuum observed via LDA does not necessarily suggest unique disease groupings; LDA is inherently designed to identify the most significant signatures that mediate partitions between groups. An inherent additional challenge with LDA is the bias introduced by low prevalent signatures that shape the partitions. Feature prevalence was considered when applying LDA as a means for feature selection.
- FIG. 19 highlights the differential abundance analysis between each disease severity group based on the clustering of phylogenetically similar microbes. In total, there were 68 phylogenetically clustered species encompassing 94 individual species that met criteria for differential abundance. These microbes were then curated for incorporation into classifiers.
- a number of methods was iterated upon to generate a classifier for predicting UC disease severity labels and converged on a gradient boosted decision trees algorithm, whose feature space was composed of differentially abundant, phylogenetically-clustered microbes.
- the algorithmbased classification also provided an opportunity to interrogate the model and assess feature importance. Training and testing were conducted using a 10-fold cross-validation repeated 5 times (repeated k-fold cross-validation) to gain a robust assessment of performance.
- the classifier generated an AUC of 0.80 ⁇ 0.1 for the remission versus all grouping, and AUCs at or above 0.7 for the Mild, Moderate, and Severe Grouping (FIG. 20).
- a CD cohort was characterized following a similar analysis conducted as part of the UC study as disclosed herein. Samples were selected across 6 sites, aiming for an even distribution of disease severity for each site (FIG. 22A). The block design enabled a detailed statistical assessment of microbial signatures that were disease-specific versus site-specific. Disease severity groupings were in a laddered 5:4:3:2 ratio: 60 remission, 48 mild, 36 moderate, and 24 severe samples. Ethnicity and balanced biological sex were controlled.
- CD Crohn’s disease activity index
- CD Al Crohn’s disease activity index
- SES-CD simple-endoscopic score
- FIG. 22B SES- CD is determined by the occurrence and size of the ulcers, their proportion of the surface covered and proportion of surface affected by disease, and the severity of stenosis for each of the 5 ileocolon segments available for endoscopic assessment. Clear delineations were observed for the SES-CD and disease severity labels. SES-CD subscores were analyzed to gain an appreciation of where disease manifestation was occurring.
- Correlations were assessed between therapeutics as well as treatments and the endoscopic scores. As shown in FIG. 23A, the majority of individuals (110/168) were on a biologic, while only a small percentage were on some form of corticosteroid (31 of 168) or chemotherapy (44/168). Strong correlations were observed with NSAIDs and antibiotic (9/9) treatments as well as antidiarrheals and probiotics (15/15). A weak correlation with corticosteroid treatment and disease severity label was observed, suggesting corticosteroid treatment was reserved for individuals with more severe disease state (FIG. 23B).
- Unsupervised learning of treatment groupings resulted in 6 treatment groups defined by treatment: biologic alone; no treatment; treatment with a biologic and chemotherapeutic; treatment with biologic and corticosteroid; mixture of biologic, chemotherapeutic, corticosteroid treatment, and a group of individuals on biologics with either NSAIDs/antibiotics or probiotics/anti-diarrheals (FIG. 23C).
- treatment groups defined by treatment: biologic alone; no treatment; treatment with a biologic and chemotherapeutic; treatment with biologic and corticosteroid; mixture of biologic, chemotherapeutic, corticosteroid treatment, and a group of individuals on biologics with either NSAIDs/antibiotics or probiotics/anti-diarrheals (FIG. 23C).
- FIG. 29 shows the differential abundance analysis at the species level between each disease severity group. In total, there were 52 species that met criteria for differential abundance. It is essential to note a clear trend of a very large number of microbial signatures being associated with the active (mild, moderate, severe) group in comparison to the remission group.
- a number of methods was iterated to generate a classifier for predicting CD disease severity labels, converging on a gradient boosted decision tree classifier based upon the filtered site-specific features of species level microbes. Training and testing were conducted using a 10- fold cross-validation repeated 5 times for each fold. The trained classifier generated an AUC of 0.81 ⁇ 0.1 for the severe disease grouping and an AUC of 0.68 ⁇ 0.01 for moderate and remission (FIG. 30). A decreased discriminatory potential was observed between mild and remission in the confusion matrix, which would suggest that there may be an influence of incomplete mucosal healing in the remission samples that the classifier is indicating.
- Remission was predominantly associated with low levels of plasma mcfDNA. Common food associated microbes arose as positive predictive markers for remission, though these were very lowly ranked.
- the classifier disclosed herein is focused on a smaller subset of grouped signals to enhance discrimination between remission and mild phenotypes.
- a de novo classifier is used to further assess whether and to what extent the first feature sets identified are robust enough to re-emerge.
- Example 2 Microbial and human feature selection
- the methods and compositions disclosed herein were used to analyze and characterize features extracted from data obtained from sequencing microbial cell- free DNA and human cell-free DNA from one or more subjects.
- mcfDNA was prepared and analyzed as described in Exemplary Process 2 of the present disclosure. Samples were obtained from healthy subjects and subjects diagnosed with a disease. Human cell-free DNA and microbial cell-free DNA was analyzed in each sample. No template control samples (or NTC, denoted herein as EC) were leveraged to control for microbial cell-free DNA not originating from the subjects but rather background via a subtraction of the background signal from the no template control samples. NTC subtracted EDRs were converted to MPM leveraging internal standards. The MPM estimates were aggregated up the taxonomic tree, and low-frequency features (occurring in ⁇ 10% of training data samples) at any taxonomic ranks were removed (filtered) from further analysis.
- the selected features were separated into their respective taxonomic ranks (species, genus, family, order, class, order, phylum, kingdom). For each rank, the log ratio was determined amongst the selected features. This log ratio for feature pairs was then subjected to the KWT and DMRT, and Cliff’s Delta was calculated. Up to the top 50 features from each comparison group that meets statistical significance and Cliff s Delta > 0.33 were selected in rank order by strength of the effect size (Cliff s Delta).
- Host-derived biomarkers were computed with a bioinformatic pipeline. Additional analysis was performed on paired-end sequencing data obtained from processing of human cell-free DNA data. Host cfDNA fragments were differentiated from other cfDNA molecular populations such as microbial-derived, synthetic quality control spike-ins, and reagent contaminants via read alignment and downstream stringent analytical filters. Utilizing the resultant paired-end alignments, gene-specific biomarkers were computed using fragmentomic-based algorithms applied to the gene-associated regulatory elements as identified and published by the wider scientific community, e.g. refTSS, ENCODE.
- Gene-level features were additionally aggregated into both cell and pathway level feature spaces using single sample gene set enrichment analysis (ssGSEA) techniques.
- ssGSEA single sample gene set enrichment analysis
- all genes were ranked by the measured peak-to-flank ratio, from which a relative enrichment score for each gene set was computed based on how consistently its genes appear at the top or bottom of the ranked list.
- the relative enrichment score was computed based on the cumulative peak-to-trough distribution of the gene set compared to a randomized background.
- Gene sets were pulled from curated public databases, e.g., CellMarker or Tabula Sapiens for cell types or KEGG or Reactome for pathways.
- Table 2 Relatively enriched and relatively depleted genes in healthy subjects, subjects with Crohn’s disease, and subjects with ulcerative colitis.
- Table 3 Relatively enriched and relatively depleted genes in subjects with mild ulcerative colitis, moderate ulcerative colitis, severe ulcerative colitis, and ulcerative colitis in remission
- Table 4 Relatively enriched and relatively depleted genes in subjects with mild Crohn’s disease, moderate Crohn’s disease, severe Crohn’s disease, and Crohn’s disease in remission
- Table 5 Relatively enriched and relatively depleted genes in subjects with Crohn’s disease or ulcerative colitis compared to healthy subjects.
- Table 6 Relatively enriched and relatively depleted genes in subjects with Crohn’s disease or ulcerative colitis compared to healthy subjects.
- Table 7 Relatively enriched and relatively depleted genes in subjects with Crohn’s disease or ulcerative colitis compared to healthy subjects.
- VIF Variance inflation factor
- a machine learning model as disclosed herein may comprise an extreme gradient boosting (XGBoost) model, a linear regressor, deep learning model, and any decision tree based method.
- XGBoost extreme gradient boosting
- FIGs. 35A-35F show disease type classification performance between healthy, ulcerative colitis (UC), and Crohn’s disease (CD) (all classes of disease, remission, mild, moderate, severe). Performance was assessed via 10-fold cross-validation repeated five times across different partitions (FIGs. 35B, 35D, and 35F), and using a leave one clinical site out strategy (LOSO) (FIGs. 35A, 35C, and 35E).
- UC ulcerative colitis
- CD Crohn’s disease
- Performance was assessed via 10-fold cross-validation repeated five times across different partitions (FIGs. 35B, 35D, and 35F), and using a leave one clinical site out strategy (LOSO) (FIGs. 35A, 35C, and 35E).
- UC ulcerative colitis
- CD Crohn’s disease
- Performance was assessed via 10-fold cross-validation repeated five times across different partitions (FIGs. 35B, 35D, and 35F), and using a leave one clinical site out strategy (LOSO) (FI
- FIGs. 36A-36C show disease type classification performance between healthy, active UC, and active CD (mild, moderate severe disease). Performance was assessed via multiple methods, and the results for the 10-fold cross validation are depicted (comparable performance was noted between methodologies).
- FIG. 36A Microbial Classifier
- FIG. 36B Human Classifier
- FIG. 36C Joint Classifier
- FIGs. 37A-37C show disease activity classification performance between UC disease activity groups (remission, mild moderate, severe). Performance was assessed via multiple methods, and the results for the 10-fold cross validation are depicted (comparable performance was noted between methodologies).
- FIG. 37A Microbial Classifier; FIG.
- FIG. 37B Human Classifier
- FIG. 37C Joint Classifier
- FIGs. 38A-38C show disease activity classification between UC disease activity groups (remission, mild) versus (moderate, severe). Performance was assessed via multiple methods, and the results for the 10-fold cross validation are depicted (comparable performance was noted between methodologies).
- FIG. 38A Microbial Classifier
- FIG. 38B Human Classifier
- FIG. 38C Joint Classifier
- FIGs. 39A-39C show disease type classification performance between CD disease activity (remission, mild moderate, severe). Performance was assessed via multiple methods, and the results for the 10-fold cross validation are depicted (comparable performance was noted between methodologies).
- FIG. 39A Microbial Classifier
- FIG. 39B Human Classifier
- FIG. 39A Microbial Classifier
- FIG. 39B Human Classifier
- FIG. 39A Microbial Classifier
- FIG. 39B Human Classifier
- FIG. 39A Microbial Classifier
- FIG. 39B Human Classifier
- FIG. 39A Microbial Classifier
- FIGs. 40A- 40C show disease type classification performance between CD disease activity groups (remission, mild) versus (moderate, severe). Performance was assessed via multiple methods, and the results for the 10-fold cross validation are depicted (comparable performance was noted between methodologies).
- FIG. 40A Microbial Classifier
- FIG. 40B Human Classifier
- FIG. 40C Joint Classifier
- Example 1 Provided in this example is the application of the methods described in Example 1 and Example 2 above in facilitating a patient diagnosis and/or informing a treatment regimen.
- a patient receives a diagnosis of inflammatory bowel disorder (IBD) of unknown subtype from a physician.
- the diagnosis is based upon a physical examination, stool test, reported symptoms, medications, personal medical history, and family medical history.
- a blood sample is collected from the patient.
- the plasma of the blood sample is processed as described in Examples 1 and 2. It is determined on the basis of the testing of cell-free DNA in the patient’s plasma that the patient has severe ulcerative colitis (severe UC). The determination informs a treatment regimen. Given the detected severity of the UC, the patient is prioritized for an endoscopy procedure that targets the large intestine and rectum. The endoscopy confirms the diagnosis of severe UC.
- the patient avoids obtaining a more invasive endoscopy, which would have been indicated had there been a suspicion of Crohn’s disease.
- the patient is prescribed a biologic (e.g., Infliximab, Adalimumab, Golimumab, or Certolizumab) with or without an immunomodulator, all at dosages recommended to treat UC.
- a biologic e.g., Infliximab, Adalimumab, Golimumab, or Certolizumab
Landscapes
- Chemical & Material Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Organic Chemistry (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Analytical Chemistry (AREA)
- Health & Medical Sciences (AREA)
- Zoology (AREA)
- Engineering & Computer Science (AREA)
- Wood Science & Technology (AREA)
- Genetics & Genomics (AREA)
- Biophysics (AREA)
- Microbiology (AREA)
- Molecular Biology (AREA)
- Immunology (AREA)
- Biotechnology (AREA)
- Physics & Mathematics (AREA)
- Biochemistry (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Chemical Kinetics & Catalysis (AREA)
- Pathology (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
Selon certains modes de réalisation, l'invention porte sur des procédés, des compositions et des systèmes permettant de distinguer la colite ulcéreuse (CU), la maladie de Crohn (MC) et d'autres troubles inflammatoires de l'intestin (MII) par le séquençage d'acides nucléiques acellulaires. Selon certains modes de réalisation, le séquençage d'acides nucléiques acellulaires microbiens peut fournir des données permettant de déterminer si la CU, la MC ou d'autres MII sont asymptomatiques, en rémission ou actives. Selon certains modes de réalisation, le séquençage d'acides nucléiques acellulaires microbiens peut fournir des données permettant de déterminer si une forme active de CU, de MC ou d'une autre MII est légère, modérée ou grave.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202463625258P | 2024-01-25 | 2024-01-25 | |
| US63/625,258 | 2024-01-25 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2025160484A1 true WO2025160484A1 (fr) | 2025-07-31 |
Family
ID=94771526
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2025/013060 Pending WO2025160484A1 (fr) | 2024-01-25 | 2025-01-25 | Biomarqueurs d'adn microbien et acellulaire humain pour diagnostiquer et évaluer la gravité d'une maladie intestinale inflammatoire |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2025160484A1 (fr) |
Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2014138999A1 (fr) * | 2013-03-14 | 2014-09-18 | University Of Ottawa | Procédés de diagnostic et de traitement de maladie intestinale inflammatoire |
| WO2017025617A1 (fr) * | 2015-08-11 | 2017-02-16 | Universitat De Girona | Procédé pour la quantification de membres du phylogroupe i et/ou du phylogroupe ii de faecalibacterium prausnitzii et leur utilisation en tant que biomarqueurs |
| US9976181B2 (en) | 2016-03-25 | 2018-05-22 | Karius, Inc. | Synthetic nucleic acid spike-ins |
| WO2018175913A1 (fr) * | 2017-03-23 | 2018-09-27 | Meharry Medical College | Procédés de diagnostic et de traitement d'une maladie inflammatoire de l'intestin |
| WO2022140302A1 (fr) * | 2020-12-21 | 2022-06-30 | Karius, Inc. | Séquençage d'acides nucléiques acellulaires microbiens pour détecter une inflammation et une infection secondaire et pour déterminer la gravité d'une maladie |
| WO2023141347A2 (fr) * | 2022-01-24 | 2023-07-27 | Gusto Global, Llc | Séquençage de fragment d'amplicon à point unique ciblé à loci uniques et à loci multiples |
-
2025
- 2025-01-25 WO PCT/US2025/013060 patent/WO2025160484A1/fr active Pending
Patent Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2014138999A1 (fr) * | 2013-03-14 | 2014-09-18 | University Of Ottawa | Procédés de diagnostic et de traitement de maladie intestinale inflammatoire |
| WO2017025617A1 (fr) * | 2015-08-11 | 2017-02-16 | Universitat De Girona | Procédé pour la quantification de membres du phylogroupe i et/ou du phylogroupe ii de faecalibacterium prausnitzii et leur utilisation en tant que biomarqueurs |
| US9976181B2 (en) | 2016-03-25 | 2018-05-22 | Karius, Inc. | Synthetic nucleic acid spike-ins |
| WO2018175913A1 (fr) * | 2017-03-23 | 2018-09-27 | Meharry Medical College | Procédés de diagnostic et de traitement d'une maladie inflammatoire de l'intestin |
| WO2022140302A1 (fr) * | 2020-12-21 | 2022-06-30 | Karius, Inc. | Séquençage d'acides nucléiques acellulaires microbiens pour détecter une inflammation et une infection secondaire et pour déterminer la gravité d'une maladie |
| WO2023141347A2 (fr) * | 2022-01-24 | 2023-07-27 | Gusto Global, Llc | Séquençage de fragment d'amplicon à point unique ciblé à loci uniques et à loci multiples |
Non-Patent Citations (7)
| Title |
|---|
| "Netter's Infectious Disease", 2021, ELSEVIER |
| BLAUWKAMP TIMOTHY A ET AL: "Analytical and clinical validation of a microbial cell-free DNA sequencing test for infectious disease", NATURE MICROBIOLOGY, NATURE PUBLISHING GROUP UK, LONDON, vol. 4, no. 4, 11 February 2019 (2019-02-11), pages 663 - 674, XP036739090, [retrieved on 20190211], DOI: 10.1038/S41564-018-0349-6 * |
| CHUAH CHER SHIONG ET AL: "Circulating Cell-Free DNA in Inflammatory Bowel Disease: Liquid Biopsies with Mechanistic and Translational Implications", FACULTY REVIEWS, vol. 12, 14 June 2023 (2023-06-14), XP093274876, ISSN: 2732-432X, Retrieved from the Internet <URL:https://s3-eu-west-1.amazonaws.com/science-now.reports/f1000reports/files/9002/12/14/article.pdf> DOI: 10.12703/r/12-14 * |
| KUBIRITOVA ZUZANA ET AL: "Cell-Free Nucleic Acids and their Emerging Role in the Pathogenesis and Clinical Management of Inflammatory Bowel Disease", INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, vol. 20, no. 15, 1 July 2019 (2019-07-01), Basel, CH, pages 3662, XP093274890, ISSN: 1422-0067, DOI: 10.3390/ijms20153662 * |
| LINDNER ET AL., NUCL. ACIDS RES., vol. 41, no. 1, 2013, pages e10 |
| MANDELLDOUGLASBENNETT: "Principles and Practice of Infectious Diseases", 2019, ELSEVIER |
| QIU PENG ET AL: "The Gut Microbiota in Inflammatory Bowel Disease", FRONTIERS IN CELLULAR INFECTION MICROBIOLOGY, vol. 12, 22 February 2022 (2022-02-22), CH, XP093274341, ISSN: 2235-2988, DOI: 10.3389/fcimb.2022.733992 * |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US20210403986A1 (en) | Detection and prediction of infectious disease | |
| US20220411877A1 (en) | Locked nucleic acids for capturing fusion genes | |
| CN107109470B (zh) | 用于诊断胃癌的miRNA生物标志物 | |
| Ferrero et al. | Small non-coding RNA profiling in human biofluids and surrogate tissues from healthy individuals: description of the diverse and most represented species | |
| CN117597456A (zh) | 用于确定肿瘤生长的速度的方法 | |
| US20200131506A1 (en) | Systems and methods for identification of nucleic acids in a sample | |
| CN107660234A (zh) | 使用二代测序预测器官移植排斥的方法 | |
| US20200291457A1 (en) | Sample series to differentiate target nucleic acids from contaminant nucleic acids | |
| AU2017246318A1 (en) | Noninvasive diagnostics by sequencing 5-hydroxymethylated cell-free dna | |
| AU2015213486A1 (en) | Biomarker signature method, and apparatus and kits therefor | |
| GB2470707A (en) | Method for in vitro detection and differentiation of pathophysiological states | |
| EP3990659A1 (fr) | Détection et traitement d'une maladie résiduelle à l'aide d'une analyse de l'adn tumoral circulant | |
| AU2017293417A1 (en) | Biomarkers for inflammatory bowel disease | |
| EP3802878A1 (fr) | Méthodes et systèmes pour déterminer l'origine cellulaire d'acides nucléiques acellulaires | |
| CN105431552B (zh) | 多组学标记在预测糖尿病中的用途 | |
| US20240247315A1 (en) | Diagnosing inflammatory bowel diseases | |
| WO2025160484A1 (fr) | Biomarqueurs d'adn microbien et acellulaire humain pour diagnostiquer et évaluer la gravité d'une maladie intestinale inflammatoire | |
| WO2025175229A1 (fr) | Procédés à étapes multiples pour détecter des acides nucléiques | |
| WO2025231437A1 (fr) | Prévision du syndrome de relargage cytokinique durant un traitement par lymphocytes car-t | |
| WO2024182546A1 (fr) | Biomarqueurs microbiens en transplantation et procédés associés | |
| WO2024015879A1 (fr) | Identification des stades précoces de la maladie de lyme fondée sur l'expression génique | |
| Stenzel | Genome-wide Approaches to Identifying the Etiologies of Complex Diseases: Applications in Colorectal Cancer and Congenital Heart Disease. | |
| Tyler | Understanding the Etiology of Inflammatory Complications Following Ileal Pouch-Anal Anastomosis |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 25708043 Country of ref document: EP Kind code of ref document: A1 |