[go: up one dir, main page]

WO2024227011A1 - Dna methylation-based algorithm to diagnose irritable bowel syndrome and other gi conditions - Google Patents

Dna methylation-based algorithm to diagnose irritable bowel syndrome and other gi conditions Download PDF

Info

Publication number
WO2024227011A1
WO2024227011A1 PCT/US2024/026561 US2024026561W WO2024227011A1 WO 2024227011 A1 WO2024227011 A1 WO 2024227011A1 US 2024026561 W US2024026561 W US 2024026561W WO 2024227011 A1 WO2024227011 A1 WO 2024227011A1
Authority
WO
WIPO (PCT)
Prior art keywords
ibs
ibd
ced
disease
dna methylation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/US2024/026561
Other languages
French (fr)
Inventor
Lin Chang
Swapna JOSHI
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of California Berkeley
University of California San Diego UCSD
Original Assignee
University of California Berkeley
University of California San Diego UCSD
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of California Berkeley, University of California San Diego UCSD filed Critical University of California Berkeley
Publication of WO2024227011A1 publication Critical patent/WO2024227011A1/en
Anticipated expiration legal-status Critical
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/20Supervised data analysis
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/112Disease subtyping, staging or classification
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/154Methylation markers
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N2800/00Detection or diagnosis of diseases
    • G01N2800/06Gastro-intestinal diseases
    • G01N2800/065Bowel diseases, e.g. Crohn, ulcerative colitis, IBS

Definitions

  • IBS Irritable bowel syndrome
  • IBD inflammatory bowel disease
  • CeD celiac disease
  • GI gastrointestinal
  • IBD and CeD are diagnosed by endoscopy with tissue confirmation
  • IBS is diagnosed by symptom criteria due to a lack of a reliable diagnostic test.
  • Epigenetic modifications including DNA methylation have not been studied well in IBS, IBD and CeD.
  • SUMMARY The tools and methods described herein address these needs and more.
  • the method comprises: (a) measuring the amount of DNA methylation at CpG sites in DNA in a biological sample obtained from the subject to generate a genome- wide DNA CpG methylation profile of the subject, wherein the measuring comprises probing at least 200,000 CpG sites; (b) performing a normalization of technical noise and bias correction on the profile; (c) filtering the profile to retain data from a first set of probed CpG sites that is the same as a first set of CpG sites from a first training data set trained on a first pairwise comparison between biological samples, wherein the first pairwise comparison is selected from: (i) irritable bowel syndrome (IBS) and inflammatory bowel disease (IBD); (ii) IBS and celiac disease (CeD); (iii) IBD and CeD; (iv) IBS and healthy control; (v) IBD and healthy control; and (vi) CeD and healthy control; and
  • IBS irritable bowel syndrome
  • IBD inflammatory bowel disease
  • CeD
  • the classifying can be implemented by use of a computer. Also provided is a system for performing the methods described herein.
  • the filtering of step (c) is repeated to retain data from a second set of probes that is the same as a second set of probes from a second training data set trained on a second pairwise comparison between biological samples, wherein the second pairwise comparison is selected from (i) to (vi), and wherein the second pairwise comparison differs from the first pairwise comparison.
  • the filtering of step (c) is repeated to retain data from subsequent sets of probes corresponding to subsequent training data sets trained on subsequent pairwise comparisons between biological samples, wherein each of the remaining pairwise comparisons of (i) to (vi) is performed.
  • the probing comprises contacting the biological samples with a methylation microarray.
  • the biological sample comprises blood, plasma, serum, or mucosal tissue.
  • the sample is peripheral blood mononuclear cells (PBMCs), peripheral blood lymphocytes (PBL), or whole blood.
  • the method further comprises measuring the amount of fecal calprotectin (FCP) and/or blood C-reactive protein (CRP) in the biological sample.
  • FCP fecal calprotectin
  • CRP blood C-reactive protein
  • 120 mcg/g FCP is the cut off for determining likelihood of IBD.
  • 150 mcg/g FCP is the cut off for determining likelihood of IBD.
  • the cutoff is 8-10 mg/l CRP for IBD. In some embodiments, the cutoff is 8 mg/l CRP for IBD. In some embodiments, an elevated (>4 U/ml) anti-tissue transglutaminase antibody is indicative of CeD.
  • the training data set is trained using GLMNET algorithm. In some embodiments, the training data set is trained using regularization parameter value and elastic net mixing parameter. This parameter minimizes the misclassification error. [0011] Also described is a method of screening for, diagnosing, or detecting irritable bowel syndrome (IBS), inflammatory bowel disease (IBD) and/or celiac disease (CeD) in a subject.
  • IBS irritable bowel syndrome
  • IBD inflammatory bowel disease
  • CeD celiac disease
  • the method comprises performing the method of classifying described herein on a biological sample obtained from the subject, and identifying the subject as having IBS, IBD, or CeD based on the classifying of step (d). In some embodiments, the method further comprises treating the subject for IBS, IBD, or CeD.
  • Treatment for IBS can include fiber supplements, laxatives, anti-diarrheal medications, such as loperamide, a bile acid binder, such as cholestyramine, colestipol or colesevelam, anticholinergic medications, such as dicyclomine or hysocyamine, tricyclic antidepressants, such as amitriptyline, imipramine, desipramine or nortriptyline, serotonin and norepinephrine reuptake inhibitor (SNRI) antidepressants, such as duloxetine, SSRI antidepressants, such as fluoxetine or paroxetine, tetracyclic antidepressant, such as mirtazapine, pregabalin or gabapentin.
  • anti-diarrheal medications such as loperamide
  • a bile acid binder such as cholestyramine, colestipol or colesevelam
  • anticholinergic medications such as dicyclomine or
  • Treatment for IBD can include anti-inflammatory drugs, which include aminosalicylates, such as mesalamine, balsalazide and olsalazine, and immune system suppressors.
  • anti-inflammatory drugs include aminosalicylates, such as mesalamine, balsalazide and olsalazine, and immune system suppressors.
  • Time-limited courses of corticosteroids can be both anti-inflammatory and immunosuppressing.
  • immunosuppressant drugs include azathioprine, mercaptopurine and methotrexate.
  • Small molecules can also be used for IBD treatment. These include tofacitinib, upadacitinib and ozanimod.
  • Biologics for treating IBD include infliximab, adalimumab, golimumab, certolizumab, vedolizumab, ustekinumab, and risankizumab.
  • treatment can include antibiotics, anti-diarrheal medications, such as psyllium powder or methylcellulose, and/or loperamide, pain relievers, and vitamins and supplements.
  • Treatment for CeD can include gluten-free diet, vitamin and mineral supplements, steroids to control inflammation, and other drugs, such as azathioprine or budesonide.
  • the normalization and correction comprise background correction and dye bias correction.
  • the resulting signal intensities are normalized using the ‘quantile normalization’ method.
  • CpGs associated with X and Y chromosomes, single nucleotide polymorphisms (SNPs) and repeats, non-specific or cross-reactive probes and probes showing low variability and extreme methylation values are removed.
  • SNPs single nucleotide polymorphisms
  • FIG.1 DNA methylation-based diagnostic algorithm for ruling-in IBS. Dx, diagnosis; IBS, irritable bowel syndrome; IBD, inflammatory bowel disease. [0017] FIG.2.
  • Dx diagnosis; IBS, irritable bowel syndrome; IBD, inflammatory bowel disease.
  • FIG.3. DNA methylation-based diagnostic algorithm for ruling out IBS or ruling-in celiac disease based on celiac serologies.
  • Dx diagnosis; IBS, irritable bowel syndrome; IBD, inflammatory bowel diseases.
  • Double cross validation process includes two nested cross-validation loops referred to as outer and inner loops. The entire data are divided into training and test sets (clear and hashed rectangles of “Outer loop”, respectively).
  • the training set which is further divided into tuning/calibration and validation sets (clear and hashed rectangles at right, respectively), is used as an inner loop.
  • the calibration set is used for model building (tuning hyperparameters) and the validation set is used to estimate the errors.
  • the model with the lowest prediction error within the inner loop is selected as the best model.
  • the test set in the outer loop is used to assess the predictive performance of the model.
  • Receiver operating characteristic (ROC) curves and the area under the ROC curves (AUC) for IBS vs IBD, IBS vs celiac disease and IBD vs celiac disease comparisons The x-axis represents the sensitivity and y-axis represents the specificity, and each point on the ROC curve represents a sensitivity/specificity pair corresponding to a particular decision threshold.
  • the AUC values for IBS vs IBD, IBS vs celiac disease and IBD vs celiac disease classifiers were 0.85, 0.82 and 0.78. [0021] FIG.6.
  • FIGS.7A-7F Differentially methylated CpG across chromosomal regions for IBS versus IBD (7A), IBS versus CeD (7B), IBD versus CeD (7C), IBS versus healthy controls (7D), IBD versus healthy controls (7E), and CeD versus healthy controls (7F).
  • FIGS.8A-8C Gene ontology terms associated with classifiers.
  • FIG.8A shows the GO terms enriched in IBS classifier-associated genes in: IBS vs healthy control (first panel), IBS vs IBD (middle panel), and IBS vs Celiac disease (third panel) comparisons.
  • FIG.8B shows the GO terms enriched in IBD classifier-associated genes in: IBD vs healthy control (first panel), IBD vs IBS (middle panel), and IBD vs Celiac disease (third panel) comparisons. Inflammation-related terms were represented in all the comparisons.
  • FIG.8C shows the GO terms enriched in Celiac disease classifier-associated genes in: Celiac disease vs healthy control (first panel), Celiac disease vs IBS (second panel), and Celiac disease vs IBD comparisons (third panel).
  • MHC Major histocompatibility complex
  • FIG.9 ROC curve for testing IBD versus healthy control classifiers on external validation data.
  • FIG.10. Permutation testing of classifiers associated with the pairs of diagnostic groups. DETAILED DESCRIPTION [0026] Described herein are methods of utilizing blood-based genome-wide DNA methylation profiles to develop a reliable test to diagnose irritable bowel syndrome (IBS) versus inflammatory bowel disease (IBD), celiac disease and other gastrointestinal diseases that mimic IBS.
  • IBS irritable bowel syndrome
  • IBD inflammatory bowel disease
  • celiac disease celiac disease
  • control or reference sample means a sample that is representative of normal measures of the respective marker, such as would be obtained from normal, healthy control subjects, or a baseline amount of marker to be used for comparison. Typically, a baseline will be a measurement taken from the same subject or patient.
  • the sample can be an actual sample used for testing, or a reference level or range, based on known normal measurements of the corresponding marker.
  • a “significant difference” means a difference that can be detected in a manner that is considered reliable by one skilled in the art, such as a statistically significant difference, or a difference that is of sufficient magnitude that, under the circumstances, can be detected with a reasonable level of reliability.
  • an increase or decrease of 10% relative to a reference sample is a significant difference.
  • an increase or decrease of 20%, 30%, 40%, or 50% relative to the reference sample is considered a significant difference.
  • an increase of two-fold relative to a reference sample is considered significant.
  • Nucleotide sequence refers to a heteropolymer of deoxyribonucleotides, ribonucleotides, or peptide-nucleic acid sequences that may be assembled from smaller fragments, isolated from larger fragments, or chemically synthesized de novo or partially synthesized by combining shorter oligonucleotide linkers, or from a series of oligonucleotides, to provide a sequence which is capable of expressing the encoded protein.
  • the term “probe” refers to an oligonucleotide, naturally or synthetically produced, via recombinant methods or by PCR amplification, that hybridizes to at least part of another oligonucleotide of interest.
  • a probe can be single-stranded or double- stranded. In the context of an array as described herein, a probe can be methylation- specific.
  • “hybridizes,” “hybridizing,” and “hybridization” means that the oligonucleotide forms a noncovalent interaction with the target DNA molecule under standard conditions.
  • Standard hybridizing conditions are those conditions that allow an oligonucleotide probe or primer to hybridize to a target DNA molecule. Such conditions are readily determined for an oligonucleotide probe or primer and the target DNA molecule using techniques well known to those skilled in the art.
  • the nucleotide sequence of a target polynucleotide is generally a sequence complementary to the oligonucleotide primer or probe.
  • the hybridizing oligonucleotide may contain nonhybridizing nucleotides that do not interfere with forming the noncovalent interaction.
  • the nonhybridizing nucleotides of an oligonucleotide primer or probe may be located at an end of the hybridizing oligonucleotide or within the hybridizing oligonucleotide.
  • an oligonucleotide probe or primer does not have to be complementary to all the nucleotides of the target sequence as long as there is hybridization under standard hybridization conditions.
  • complement and “complementary” as used herein, refers to the ability of two nucleic acid molecules to base pair with each other. For example, in DNA, adenine (A) is complementary to thymine (T). In RNA, adenine (A) is complementary to uracil (U).
  • complementarity refers to an antisense compound that is capable of base pairing with its target nucleic acid. For example, if a nucleobase at a certain position of an antisense compound is capable of hydrogen bonding with a nucleobase at a certain position of a target nucleic acid, then the position of hydrogen bonding between the oligonucleotide and the target nucleic acid is considered to be complementary at that nucleobase pair. Nucleobases comprising certain modifications may maintain the ability to pair with a counterpart nucleobase and thus, are still capable of nucleobase complementarity. Typically, two DNA molecules are complementary if they hybridize under the standard conditions referred to above.
  • two DNA molecules are complementary if they have at least about 80% sequence identity, preferably at least about 90% sequence identity.
  • the term "subject” includes any human or non-human animal.
  • the term "non-human animal” includes all vertebrates, e.g., mammals and non-mammals, such as non-human primates, horses, sheep, dogs, cows, pigs, chickens, and other veterinary subjects. In a typical embodiment, the subject is a human.
  • “a” or “an” means at least one, unless clearly indicated otherwise.
  • sample include, but are not limited to, blood, plasma or serum, saliva, urine, cerebral spinal fluid, milk, cervical secretions, semen, tissue, cell cultures, and other bodily fluids or tissue specimens.
  • Methods [0039] Described herein are methods of classifying biological samples based on gastrointestinal disease.
  • the method comprises: (a) measuring the amount of DNA methylation at CpG sites in DNA in a biological sample obtained from the subject to generate a genome-wide DNA CpG methylation profile of the subject, wherein the measuring comprises probing at least 200,000 CpG sites; (b) performing a normalization of technical noise and bias correction on the profile; (c) filtering the profile to retain data from a first set of probed CpG sites that is the same as a first set of CpG sites from a first training data set trained on a first pairwise comparison between biological samples, wherein the first pairwise comparison is selected from: (i) irritable bowel syndrome (IBS) and inflammatory bowel disease (IBD); (ii) IBS and celiac disease (CeD); (iii) IBD and CeD; (iv) IBS and healthy control; (v) IBD and healthy control; and (vi) CeD and healthy control; and (d) classifying the CpG methylation profile of the biological samples as IBS irritable
  • the classifying can be implemented by use of a computer. Also provided is a system for performing the methods described herein. [0040] In some embodiments, the filtering of step (c) is repeated to retain data from a second set of probes that is the same as a second set of probes from a second training data set trained on a second pairwise comparison between biological samples, wherein the second pairwise comparison is selected from (i) to (vi), and wherein the second pairwise comparison differs from the first pairwise comparison. [0041] In some embodiments, the filtering of step (c) is repeated to retain data from subsequent sets of probes corresponding to subsequent training data sets trained on subsequent pairwise comparisons between biological samples, wherein each of the remaining pairwise comparisons of (i) to (vi) is performed.
  • the probing comprises contacting the biological samples with a methylation microarray.
  • the biological sample comprises blood, plasma, serum, or mucosal tissue.
  • the sample is peripheral blood mononuclear cells (PBMCs), peripheral blood lymphocytes (PBL), or whole blood.
  • the method further comprises measuring the amount of fecal calprotectin and/or blood C-reactive protein (CRP) in the biological sample.
  • PBMCs peripheral blood mononuclear cells
  • PBL peripheral blood lymphocytes
  • the method further comprises measuring the amount of fecal calprotectin and/or blood C-reactive protein (CRP) in the biological sample.
  • 120 mcg/g FCP is the cut off for determining likelihood of IBD.
  • 150 mcg/g FCP is the cut off for determining likelihood of IBD.
  • the cutoff is 8-10 mg/l CRP for IBD. In some embodiments, the cutoff is 8 mg/l CRP for IBD. In some embodiments, an elevated (>4 U/ml) anti-tissue transglutaminase antibody is indicative of CeD.
  • the training data set is trained using GLMNET algorithm. In some embodiments, the training data set is trained using regularization parameter value and elastic net mixing parameter. This parameter minimizes the misclassification error.
  • the normalization and correction comprise background correction and dye bias correction.
  • the resulting signal intensities are normalized using the ‘quantile normalization’ method.
  • CpGs associated with X and Y chromosomes, single nucleotide polymorphisms (SNPs) and repeats, non-specific or cross-reactive probes and probes showing low variability and extreme methylation values are removed.
  • Also described is a method of screening for, diagnosing, or detecting irritable bowel syndrome (IBS), inflammatory bowel disease (IBD) and/or celiac disease (CeD) in a subject.
  • IBS irritable bowel syndrome
  • IBD inflammatory bowel disease
  • CeD celiac disease
  • the method comprises performing the method of classifying described herein on a biological sample obtained from the subject, and identifying the subject as having IBS, IBD, or CeD based on the classifying of step (d). In some embodiments, the method further comprises treating the subject for IBS, IBD, or CeD.
  • Treatment for Gastrointestinal Conditions can include fiber supplements, laxatives, anti-diarrheal medications, such as loperamide, a bile acid binder, such as cholestyramine, colestipol or colesevelam, anticholinergic medications, such as dicyclomine or hysocyamine, tricyclic antidepressants, such as amitriptyline, imipramine, desipramine or nortriptyline, serotonin and norepinephrine reuptake inhibitor (SNRI) antidepressants, such as duloxetine, selective serotonin reuptake inhibitor (SSRI) antidepressants, such as fluoxetine or paroxetine, tetracyclic antidepressant, such as mirtazapine, pregabalin or gabapentin.
  • anti-diarrheal medications such as loperamide
  • a bile acid binder such as cholestyramine, colestipol or colesevelam
  • Treatment for IBD can include anti-inflammatory drugs, which include aminosalicylates, such as mesalamine, balsalazide and olsalazine, and immune system suppressors.
  • anti-inflammatory drugs include aminosalicylates, such as mesalamine, balsalazide and olsalazine, and immune system suppressors.
  • Time-limited courses of corticosteroids can be both anti-inflammatory and immunosuppressing.
  • immunosuppressant drugs include azathioprine, mercaptopurine and methotrexate.
  • Small molecules can also be used for IBD treatment. These include tofacitinib, upadacitinib and ozanimod.
  • Biologics for treating IBD include infliximab, adalimumab, golimumab, certolizumab, vedolizumab, ustekinumab, and risankizumab.
  • treatment can include antibiotics, anti-diarrheal medications, such as psyllium powder or methylcellulose, and/or loperamide, pain relievers, and vitamins and supplements.
  • Treatment for CeD can include gluten-free diet, vitamin and mineral supplements, steroids to control inflammation, and other drugs, such as azathioprine or budesonide.
  • Example 1 Development of Blood-based Genome-wide DNA Methylation Profiles
  • This Example demonstrates describe the analytic process of utilizing blood-based genome-wide DNA methylation profiles to develop a reliable test to diagnose irritable bowel syndrome (IBS) vs inflammatory bowel disease (IBD), celiac disease and other gastrointestinal diseases that mimic IBS.
  • IBS irritable bowel syndrome
  • IBD inflammatory bowel disease
  • METHODS [0055] The methods described herein have been applied to Infinium MethylationEPIC BeadChip, IlluminaTM DNA methylation array, which enables interrogation of over 850,000 methylation sites quantitatively at single nucleotide resolution.
  • probe design type bias was corrected using Regression on Correlated Probes (RCP) method, followed by filtering on QC metrics. Detection P value threshold to identify low quality data point was set to 0.000001. “Impute” parameter was set to “True” (after removing rows or columns with too many missing values, default threshold of > 5% missing values).
  • Data were analyzed with and without correction for batch effects (technical variability due to pipetting errors between batches of samples that were run together on a 96 well plate). For the batch effect adjusted analyses, the batch effects were removed using ‘removeBatchEffect’ function separately for training and test folds from the “methylSet” object.
  • the PBMC and blood data were further subdivided into the following comparison groups which belonged to one of the two categories including 1. one vs one: comparing one diagnosis group (Dx) vs healthy control (HC) group or one Dx vs other Dx, and 2. one vs all other Dxs combined.
  • Dx diagnosis group
  • HC healthy control
  • the results presented below do not include HC in the 2nd category, it may be potentially included to improve model performance.
  • the GI disease controls included celiac disease, inflammatory bowel disease (IBD; Crohn’s disease and ulcerative colitis), and colon cancer.
  • Regularization parameter ( ⁇ ) was selected using 10-fold cross-validation on inner training and test data.
  • Glmnet is one of the >200 available methods to train a model using machine learning. A comprehensive list of methods is compiled here - https://topepo.github.io/caret/available-models.html.
  • the first three tests involve DNA methylation pattern-based classification of incoming sample into IBS vs IBD, IBS vs celiac disease, and IBS vs other GI diseases (e.g., microscopic colitis, bile acid diarrhea, colon cancer) where three positive tests indicate a high likelihood of a positive IBS diagnosis. In the case of three negative tests, this would indicate that the diagnosis is very unlikely to be IBS. In the case of at least two positives out of the three tests, it suggests a probable diagnosis of IBS but may not confidently rule out other GI diseases that present with IBS-like symptoms. Additionally, a negative diagnosis of IBS on at least two of the three tests suggests a low likelihood of IBS.
  • GI diseases e.g., microscopic colitis, bile acid diarrhea, colon cancer
  • IBD and celiac disease non-invasive markers including fecal calprotectin/C-reactive protein (CRP) and celiac serologies, respectively, can help to rule out IBS and rule-in active IBD (e.g., elevated fecal calprotectin and/or CRP) or celiac disease (positive celiac serologies) or rule-out active IBD and celiac disease (normal fecal calprotectin and CRP, and negative celiac serologies, respectively).
  • Fecal calprotectin and/or C-reactive protein (CRP) and celiac serologies can further help with the diagnosis of IBS.
  • normal fecal calprotectin levels ⁇ 50 has been shown to be associated with a ⁇ 1% chance of having IBD (i.e., ruling out IBD) and those between 50 and 150 are still suggestive of IBS (if the DNA methylation-based comparisons of IBS vs IBD, IBS vs celiac, IBS vs other GI diseases are positive), mildly active IBD, gastrointestinal infection or microscopic colitis.
  • a higher fecal calprotectin, e.g. >150 will be more suggestive of an inflammatory or infectious GI disorder, e.g., IBD (5).
  • abnormal celiac serologies would rule-out IBS alone and warrant diagnostic workup for celiac disease.
  • the diagnostic performance of a ‘rule-in’ IBS test can be further improved by developing algorithms to diagnose IBS from symptomatic patients who have a diagnostic test that is negative for IBD, celiac and other GI diseases, e.g., colon cancer.
  • DNA methylation data from a new sample will be generated using the EPIC platform and processed through the preprocessing steps followed by the three DNA methylation-based classification models to output class labels for the most probable prediction.
  • the votes for each class will be counted and information from fecal calprotectin/C-reactive protein (CRP) or celiac serologies will be used to make the final diagnostic calls.
  • CRP fecal calprotectin/C-reactive protein
  • celiac serologies will be used to make the final diagnostic calls.
  • the first column (IBS vs GI diseases other than IBD and celiac) represents the diagnosis based on an algorithm trained to classify IBS patients from other GI diagnoses such as non- celiac gluten sensitivity, microscopic colitis, bile acid diarrhea, colon cancer, etc., which may or may not present with red flags (weight loss, blood in stool, etc.). When all the tests are negative the patient is likely not IBS. However, they may still be IBD in remission or celiac disease on a gluten free diet.
  • Figure 2 shows the possible combinations of DNA methylation tests and resulting diagnoses when the Fecal Calprotectin and/or C-reactive protein (CRP) are positive.
  • CRP C-reactive protein
  • the first column (IBS vs other GI diseases) represents diagnosis based on an algorithm trained to classify IBS patients from other GI diagnoses such as non-celiac gluten sensitivity, microscopic colitis, colon cancer, bile acid diarrhea, which may or may not present with red flags (weight loss, blood in stool, etc.).
  • the diagnosis is more likely IBS than IBD if the fecal calprotectin and/or CRP levels are only mildly elevated, e.g., fecal calprotectin level >50 but ⁇ 150. If the fecal calprotectin and/or CRP levels are moderately or greatly increased, the probability of having IBS is very low and the probability of having IBD is high.
  • Example 2 Representative Workflow [0104] This Example illustrates the steps one can take to prepare, train, and analyze DNA methylation data using a machine learning model. Once a model has been trained, test samples can be evaluated using corresponding parameters.
  • the preprocessing steps are aimed at preparing DNA methylation data obtained as a raw scanner output to be used as an input into a machine learning models. There are 855,790 CpGs on Epic v1.0 array. The following filtering and preprocessing steps were standardized which use ENmix R package. 1. Background and dye bias correction performed using ‘preprocessENmix’ 2. Quality control performed using ‘QCinfo’ with a detection P threshold of 1e-6 and data quantile normalized using ‘norm.quantile’ function 3.
  • the first one included a larger set of IBS and HC samples that were run on Epic or 450K methylation array. This analysis used the platform type as the batch covariate.
  • the second analysis included IBS and HC samples that were down-sampled to include balanced sample numbers for IBS and HCs to avoid over-fitting.
  • the third analysis included IBS patient and HC samples that were run on Epic platform only. This step allowed for identification of differential methylation on probes that were not represented on 450K platform. All comparisons are shown in Table 3.
  • TN True Negatives
  • FP False Positives
  • FN False Negatives
  • TP True Positives
  • This data can be directly input into “predict” function along with the training model to generate the summary.
  • the class will be predicted based on the probability cutoff, as mentioned in the first row of all the model outputs. For example, if the probability associated with the new sample, when run on the IBS vs HC training model is predicted as 0.3, which is less than 0.44, it will be classified as a healthy control and if is predicted to be 0.5, it will be classified as IBS. [0118] Functional significance of selected genes [0119] To identify IBS-specific GO terms, common pathways between IBS vs HC, IBS vs IBD and IBS vs celiac classifiers with a p ⁇ 0.05 were identified.
  • IBS Irritable bowel syndrome
  • IBD inflammatory bowel disease
  • CeD celiac disease
  • GI gastrointestinal
  • IBS Irritable bowel syndrome
  • IBD and CeD are diagnosed by endoscopy with histological confirmation
  • IBS is diagnosed by symptom criteria due to a lack of a reliable diagnostic test.
  • DNA methylation-based diagnostic biomarkers have not been investigated in IBS, IBD and CeD. We aimed at building blood- derived DNA methylation-based diagnostic classifiers using machine learning to differentiate IBS, IBD, and CeD, subsequently identifying gene ontology (GO) terms associated with disease-specific classifiers.
  • GO gene ontology
  • Genome-wide DNA methylation of peripheral blood mononuclear cells from patients with IBS, IBD, CeD, and healthy controls was measured using Illumina’s 450K or EPIC arrays.
  • Differential DNA methylation between IBS, IBD, CeD and HCs was measured using general linear models (GLMs) in “limma”.
  • GLMs general linear models
  • Classifiers were developed using machine learning (ML) by training penalized generalized linear models using double cross-validation with diagnosis as an outcome, methylation as a predictor and age and other confounders as covariates.
  • An external dataset on 304 IBD and HCs was used as an independent validation cohort. Methylation sites selected using GLMs and those selected by the prediction models were used to identify GO enrichment.
  • Results were based on 315 participants (148 IBS, 47 IBD, 34 CeD and 86 healthy controls) who had DNA methylation data.
  • IBS vs IBD and IBD vs CeD showed the highest number of differentially methylated CpG sites followed by IBD vs HC, CeD vs HC and IBS vs HC.
  • IBS vs IBD IBS vs CeD
  • IBD vs CeD IBS vs CeD
  • the performance of IBD vs HCs was successfully validated in the external validation dataset.
  • IBS-associated differentially methylated genes and the ML-based classifiers were enriched in cell adhesion and neuronal pathways, while IBD and CeD classifiers were enriched in inflammation and MHC class II pathways, respectively (p ⁇ 0.05).
  • IBS Irritable bowel syndrome
  • IBS chronic gastrointestinal
  • IBS has a worldwide prevalence of 4.5-11% (1) and is associated with a significant healthcare and economic burden (2).
  • Most IBS patients have seen at least three physicians and undergo multiple expensive and invasive tests before a diagnosis of IBS, as IBS is often considered a diagnosis of exclusion (3,4).
  • Other GI conditions such as inflammatory bowel disease (IBD), celiac disease (CeD) and colon cancer, can present with similar GI symptoms but are diagnosed by confirmatory tissue histopathology and treated differently.
  • IBD inflammatory bowel disease
  • CeD celiac disease
  • colon cancer can present with similar GI symptoms but are diagnosed by confirmatory tissue histopathology and treated differently.
  • IBD comprised of Crohn’s disease and ulcerative colitis (UC)
  • UC ulcerative colitis
  • IBD is chronic, immune- mediated conditions manifesting as intestinal mucosal inflammation. IBD affects approximately 3 million Americans, evenly divided between Crohn’s disease and UC. (5,6)
  • diagnosis of IBD is made by gastrointestinal disease specialists and requires endoscopic and histological confirmation. The time between onset of symptoms and referral to a specialist can vary substantially.
  • One of the major sources of delay in diagnosis is treatment of the patient under the assumption that they have IBS rather than IBD. (7) About 10% of IBD patients are misdiagnosed as IBS.
  • CeD is a gluten-induced, immune-mediated enteropathy, with an estimated worldwide prevalence of approximately 1% (11-14). CeD can present with variable manifestations and can be misdiagnosed as IBS (15). GI society guidelines recommend screening for CeD with serologies in patients with IBS, chronic diarrhea, and anemia, for instance. However, up to 5% of patients are seronegative (14). Additionally, some patients self-initiate a gluten-free diet, which is not uncommon in clinical practice (16,17). Substantial delays in diagnosing CeD even up to 13 years can occur in clinical practice (13,18,19).
  • DNA methylation changes in blood have been associated with inflammation in patients with IBD (28), however, there is a lack of blood- based DNA methylation studies in CeD. Additionally, there are no studies investigating DNA methylation as a biomarker for the differential diagnosis of IBS from other diseases that mimic IBS. [0130] Therefore, we hypothesized that DNA methylation marks in PBMCs can serve as a biomarker for diagnosing IBS and for distinguishing IBS from IBD and CeD. Similarly, DNA methylation-based biomarkers may be used for non-invasive diagnoses of IBD and CeD.
  • IBS-SSS Irritable Bowel Syndrome Severity Scoring System
  • IBD Intra-diluent pulmonary disease
  • treatments such as inflammation-reducing drugs used in IBD have been associated with changes in DNA methylation (35)
  • we recruited patients that were treatment na ⁇ ve or currently on no IBD treatment including biologic agents (anti-TNF therapy, ustekinumab, risankizumab or vedolizumab) or other agents (tofacitinib, upadicitinib, ozanimod, 6-mercaptopurine, azathioprine, methotrexate, or 5-aminosalicylic acid [5-ASA]).
  • Additional exclusion criteria included Crohn’s disease treated surgically without evidence of subsequent disease and history of coexistent IBS or CeD.
  • Patients with IBD reported active GI symptoms, e.g., abdominal pain, bloating, diarrhea, and/or blood in stool. Disease activity was assessed with the following instruments: Simple Endoscopic Score for Crohn's Disease (36) (SES-CD, UCLA and SPARC IBD cohorts), and Crohn's Disease Activity Index (37) (CDAI, UCLA cohort), Short-CDAI (38) (SCDAI, SPARC IBD cohort), Simple Clinical Colitis Activity Index (39) (SCCAI, UCLA cohort), and Ulcerative Colitis Disease Activity Index (40) (UCDAI, SPARC IBD cohort) as described in the Supplementary Methods. [0135] Patients with CeD reported active GI symptoms, e.g.
  • DNA methylation data processing [0141] The methods used to process DNA methylation data and implementation of machine learning algorithms are presented in detail in Supplementary Methods. In short, raw Illumina DNA methylation array data (IDAT) files generated by the Illumina iScan scanner were processed using Enmix DNA methylation analysis pipeline (44). [0142] DNA methylation data pre-processing: Details of the analysis are presented in Supplementary Methods.
  • QC Quality Control
  • Background correction and dye bias correction were applied to the data and the resulting signal intensities were normalized using the ‘quantile normalization’ method.
  • Cell types were estimated as described previously (45).
  • QC information-based filtering was implemented to filter unwanted probes out of the 855,790 CpGs on the array.
  • CpGs on X and Y chromosomes, single nucleotide polymorphisms (SNPs) and repeats, non-specific or cross-reactive probes and probes showing low variability and extreme methylation values were removed. The remaining sites were used as input for classifier development.
  • Performance metrics including, accuracy (mean and 95% confidence interval [CI]), sensitivity, specificity, F1, and p values were generated based on the 2X2 confusion matrix constructed using the predicted and true test sample labels.
  • CI mean and 95% confidence interval
  • sensitivity specificity
  • F1 specificity
  • p values were generated based on the 2X2 confusion matrix constructed using the predicted and true test sample labels.
  • Gene ontology (GO) analysis Enrichment of GO terms and or pathways associated with differentially methylated CpG and ML-based classifiers for each comparison was assessed using ‘missMethyl’ package in R (48) as described in Supplementary Methods.
  • Sensitivity analyses Figure 6 shows correlation between DNA methylation principal components (PCs) and potential confounders.
  • the processed dataset includes 460398 probes on 304 samples (204 IBD and 100 healthy controls).
  • Permutation testing Permutation-based testing to assess classifier performance has been used extensively in classification problems in computational biology to test if there is a real class structure in the data (49). We generated 100 random permutations to generate AUCs from classifiers generated from shuffled labels to create null distributions.
  • Severity of IBS symptoms were moderate with mean (standard deviation [SD]) overall severity score of 9.45 (4.18; range 0-20) and IBS- SSS of 237 (89.70; range 0-500) which represents moderately severe symptoms.
  • the mean (SD) ACE score in IBS patients was 1.91 (1.89), Table 9A.
  • GFD gluten-free diet
  • DNA methylation-based classifiers for IBS, IBD and CeD [0164] DNA methylation-based classifiers for IBS, IBD and CeD [0164] Our results showed that the selected classifiers showed high accuracy for discrimination between various patient groups.
  • Table 6 shows the performance metrics for IBS compared to healthy controls, IBD and CeD.
  • IBS vs healthy control comparison DNA methylation data from 450K and EPIC arrays was used, for all other comparisons only subjects with EPIC array data were used; # Covariates included age, cell-type proportions, technical batch effects for all columns however, IBD vs healthy control comparison included site of IBD sample collection as an additional covariate.
  • F1 denotes weighted average of precision and recall; AUC, area under the receiver operating characteristic (ROC) curve; IBS, irritable bowel syndrome; IBD, inflammatory bowel diseases; CeD, celiac disease.
  • IBS vs HCs, IBD and CeD GO terms associated with IBS-related classifiers, i.e., overlapping terms between IBS vs HCs, IBS vs IBD and IBS vs CeD classifiers were enriched in cell adhesion and neuronal pathway-associated terms including “cell-adhesion”, “neurotransmitter uptake”, “forebrain neuron development” and “sensory perception of pain”, suggesting a role for IBS-associated genes in permeability and visceral pain (50) ( Figure 8A, Table 14A, p ⁇ 0.05).
  • IBD vs HCs, IBD and CeD IBD vs HCs, IBD and CeD
  • the IBD classifiers were primarily enriched in immune and inflammation-related pathways including “innate immune response”, “T-helper 1 type immune response” and “T- helper 17 cell chemotaxis” (p ⁇ 0.05, Figure 7B).
  • Analysis of GO terms associated with the differentially methylated genes as well as classifiers suggested that IBD was associated with epigenetic changes in inflammation and immune response pathways.
  • CeD vs HCs, IBS and IBD CeD classifiers were enriched in major histo-compatibility complex (MHC) pathways- related terms such as “MHC class II protein complex” and “antigen processing and presentation of exogenous peptide antigen” (p ⁇ 0.05, Figure 7C).
  • MHC major histo-compatibility complex
  • Analysis of GO terms associated with the differentially methylated genes as well as classifiers suggested that CeD was associated with epigenetic changes in MHC pathway.
  • Sensitivity analyses [0184] DNA methylation is known to be sensitive to confounders. Figure 6 shows correlation between DNA methylation principal components (PCs) and potential confounders.
  • IBS-associated classifiers were associated with cell adhesion, neuronal signaling and pain pathways, which are the most widely studied pathways in IBS (52-55).
  • IBD classifiers were associated with the IBD classifiers further supporting the functional significance of associated classifiers (56).
  • CeD classifier was associated with MHC class II receptor activity.
  • HLA human leukocyte antigen
  • CRP C-reactive protein
  • fecal calprotectin fecal calprotectin
  • lactoferrin have limitations including statistical heterogeniety between studies, variable cut-off points, and overlapping values between active inflammatory disease and non- inflammatory disease such as IBS (59).
  • IBS non-inflammatory disease
  • IBS-D a commercial blood test for IBS-D is currently available. Studies have demonstrated increased circulating antibodies to cytolethal distending toxin B and vinculin (anti-CdtB, anti-vinculin) in patients with IBS-D.
  • a normal fecal calprotectin levels ⁇ 40 microgram (ug)/gram (g) has been shown to be associated with a ⁇ 1% chance of having IBD (63) (i.e., ruling out IBD) and those between 40 ug/g and 150 ug/g may be seen in IBS, mildly active IBD, GI infection, GI neoplasm, CeD, or microscopic colitis.
  • IBD IBD
  • a non-invasive diagnostic test for IBS can be valuable in different ways depending on the type of clinical practice and patient population. For gastroenterologists, limited or more comprehensive diagnostic testing including upper and lower endoscopies may be done in the presence of no or positive alarm features, respectively. However, in the common scenario of a negative diagnostic evaluation, a rule in test for IBS will be beneficial, particularly for patients who want confirmation that they have IBS before starting treatment, and for health care providers who can avoid ordering additional or repeated costly diagnostic tests and instead, can institute appropriate treatment to relieve IBS symptoms.
  • a diagnostic test for IBS would allow them to make a positive diagnosis, recommend IBS treatment and avoid ordering more diagnostic testing and referring to a gastroenterologist for an endoscopic procedure.
  • Other clinical scenarios where a DNA methylation-based test for IBS could be helpful are the diagnosis of coexistent symptomatic IBS in patients with CeD, IBD or other GI disease that has been treated or in remission which is between 32-40% of patients (64) or in patients with non-celiac gluten or wheat sensitivity, but further studies are needed.
  • BSQ Bowel Symptom Questionnaire
  • IBS irritable bowel syndrome
  • IBD inflammatory bowel disease
  • CeD celiac disease
  • It is a self-report measure of GI symptoms. It includes multiple questions including the Rome diagnostic questions for IBS and bowel habit subtypes, overall GI symptom severity, and individual GI symptoms.
  • IBS, IBD and CeD participants rated the current overall intensity of their GI symptoms over the past week on a 20-point ordinal scale from 0 meaning none to 20 meaning most severe overall GI symptoms.
  • IBS-SSS Irritable Bowel Syndrome Severity Scoring System
  • HAD Hospital Anxiety and Depression Scale
  • the HAD (5) is a validated self-assessment mood scale specifically designed for use in non-psychiatric settings for assessment of symptoms of anxiety and depression.
  • the HAD provides two 7-item subscales: anxiety and depression, with scores ranging from 0-21, with 0-7 representing ‘non-cases’, 8-10 ‘doubtful cases’, and 11-21 representing ‘cases’.
  • ACE Adverse Childhood Experiences
  • the ACE questionnaire assesses the presence of early adverse life events before age 18 with 18 questions in eight ACE domains (number of questions) of physical (1), emotional (2), and sexual abuse (4), and includes household substance abuse (2), parental separation or divorce (1), mental illness in household (2), incarcerated household member (1), and parent treated violently (2) (6).
  • CDAI Crohn’s Disease Activity Index
  • the CDAI scores the used to assess disease activity in the UCLA Crohn’s disease (CD) patients (7).
  • the CDAI instrument consists of eight variables, two of which are subjective, and each weighted according to its ability to be predictive of disease activity. The total score ranges from 0 to over 600.
  • CDAI scores ⁇ 150 represent clinical remission, 150- 220 mild disease, 220-450 moderate disease, and >450 severe disease.
  • SCDAI Short Crohn’s Disease Activity Index
  • SCDAI is a short version of CDAI and can be used to reliably replace full CDAI score for assessment of disease activity in mildly to moderately active CD patients (8).
  • the range of score is same as SCDAI.
  • SES-CD Simple endoscopic score for Crohn's disease
  • SES-CD score were available for both UCLA and CCF patients. This is a simple endoscopic scoring system for Crohn's disease which assesses the size of mucosal ulcers, the ulcerated surface, affected surface, and luminal narrowing (9). The calculated score is interpreted as remission (0-2), mild (3-6), moderate (7-15), severe (>15).
  • SCCAI Simple Clinical Colitis Activity Index
  • UC ulcerative colitis Ulcerative Colitis
  • UCDAI Ulcerative Colitis Disease Activity Score
  • CDAT Score Coeliac Disease Adherence Test
  • GFD gluten-free diet
  • CSI Celiac Symptom Index Score
  • CSI score (13) ranges from 16–80 and ⁇ 30 indicates high quality of life (QoL) and an excellent gluten-free diet adherence and >45 indicates relatively poor QoL and worse gluten- free diet adherence.
  • DNA Methylation Arrays [0289] Illumina’s Human Methylation 450K array or Infinium MethylationEPIC BeadChip, Illumina TM were used to assess DNA methylation levels in PBMCs. The analysis was performed at three levels including, data preprocessing, classifier development, and classifier evaluation using R statistical language (14). [0290] Data pre-processing [0291] Data import and normalization of EPIC array data: As outlined in Figure 6, this stage of analysis includes steps to preprocess the raw Illumina DNA methylation array (IDAT) files generated by the Illumina iScan scanner and output normalized and filtered data. Data preprocessing was performed using the R package Enmix (15).
  • ITT Illumina DNA methylation array
  • Quality Control (QC) metrics were generated using the ‘QCinfo’ function. Samples that did not pass the QC threshold were excluded from analysis. Using the ‘preprocessENmix’ function, we used the “oob” method for background correction and “RELIC” for dye bias correction. This resulted in signal intensity data on methylated and unmethylated channels of a “MethylSet” class object. The data were quantile normalized using ‘norm.quantile’, separately for Methylated and Unmethylated intensities for Infinium I or Infinium II probes (“quantile1” method). Probe design type bias was corrected using Regression on Correlated Probes (RCP), followed by filtering on QC metrics. Detection P value threshold to identify low quality data points was set to 0.000001.
  • IBS vs HCs analysis was performed using three separate analyses. The first included a larger set of IBS and HCs samples that were run on either the EPIC or 450K methylation arrays. This analysis used the platform type as the batch covariate. The second analysis included a cohort of IBS and HCs from the first analysis that was down-sampled to an equal number of IBS and HCs for a balanced sample to avoid over-fitting. The third analysis included IBS patient and HC samples that were run on the EPIC platform only.
  • IBS irritable bowel syndrome
  • IBD inflammatory bowel diseases
  • CeD celiac disease
  • HCs healthy controls
  • N number of patients
  • Classifier development and evaluation [0299]
  • Covariates [0300] All analyses included age, cell-type proportions (six cell types including CD8-T cells, CD4-T cells, natural killer cells, B-cells, monocytes, neutrophils) and DNA methylation batch as covariates in the models.
  • IBS vs HC analysis for the larger dataset included platform (450K or EPIC) as the batch covariate.
  • IBS irritable bowel syndrome
  • IBD inflammatory bowel disease
  • CeD celiac disease
  • HCs Healthy controls
  • SSRIs Selective serotonin reuptake inhibitors
  • NSAIDs non-steroidal anti-inflammatory drugs
  • IBS irritable bowel syndrome
  • IBS-C constipation-predominant IBS
  • IBS-D diarrhea- predominant IBS
  • IBS-M Mixed IBS
  • IBS-U Un-subtyped IBS
  • UC ulcerative colitis
  • HCs healthy controls
  • GI gastrointestinal
  • Table 9B Table 9B.
  • Table 9C Symptom severity scores in celiac disease patients
  • CDAT celiac disease adherence test score, ⁇ 30, remission; 30-45 moderate disease activity; ⁇ 45 indicates relatively poor QoL and worse gluten-free diet adherence
  • CSI celiac symptom index,16–80 and ⁇ 30 indicates high QoL and excellent gluten-free diet adherence, while ⁇ 45 indicates relatively poor QoL and worse gluten-free diet adherence
  • Overall Severity of GI Symptoms 0-20; GI, gastrointestinal; SD: standard deviation
  • Table 10A Differentially Methylated CpG Sites Associated with Celiac vs Healthy [0339] Table 10B.
  • Table 10C Differentially Methylated CpG Sites Associated with IBD vs Celiac Disease
  • Table 10C Differentially Methylated CpG Sites Associated with IBD vs Healthy ⁇ 0.005fdr
  • Table 10D Differentially Methylated CpG Sites Associated with IBS vs Celiac
  • Table 10E Differentially Methylated CpG Sites Associated with IBS vs Healthy
  • Table 10F Differentially Methylated CpG Sites Associated with IBS vs IBD [0344] Tables 11A-11F.
  • GO terms associated with differentially methylated genes [0345] “#” refers to number of Differentially Methylated (D. M.) Genes [0346] Table 11A. IBD vs Healthy p ⁇ 0.05 [0347] Table 11B. IBS vs Healthy p ⁇ 0.05 [0348] Table 11C. IBS vs IBD p ⁇ 0.05 [0349] Table 11D. GO terms associated with differentially methylated genes [0350] Table 11E. GO terms associated with differentially methylated genes [0351] Table 11F. GO terms associated with differentially methylated genes [0352] Table 12.
  • IBS irritable bowel syndrome
  • IBD inflammatory bowel diseases
  • HCs healthy controls
  • AUC area under receiver operating characteristic (ROC) curve
  • CI confidence interval
  • Table 14A Classifier-Associated CpGs for IBS_IBD
  • Table 14B Classifier-Associated CpGs for IBS vs Celiac Disease [0358] Table 14C.
  • Classifier-Associated CpGs for IBD vs Celiac Disease [0359] Table 14D. Classifier-Associated CpGs for IBS vs Healthy Control [0360] Table 14E. Classifier-Associated CpGs for IBD vs Healthy Control [0361] Table 14F. Classifier-Associated CpGs for Celiac Disease vs Healthy Control [0362] Annotation of probes using IlluminaHumanMethylationEPICanno.ilm10b2.hg19 package can be used to obtain exact information on probe sequences using the CpG ID. IlluminaHumanMethylationEPICanno.ilm10b2.hg19. [0363] Throughout this application various publications are referenced.

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Analytical Chemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Organic Chemistry (AREA)
  • Biophysics (AREA)
  • Genetics & Genomics (AREA)
  • Biotechnology (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Wood Science & Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Zoology (AREA)
  • Molecular Biology (AREA)
  • Pathology (AREA)
  • Microbiology (AREA)
  • Biochemistry (AREA)
  • General Engineering & Computer Science (AREA)
  • Immunology (AREA)
  • Artificial Intelligence (AREA)
  • Bioethics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Epidemiology (AREA)
  • Evolutionary Computation (AREA)
  • Public Health (AREA)
  • Software Systems (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

Blood-based genome-wide DNA methylation profiles are utilized to develop a reliable test to diagnose irritable bowel syndrome (IBS) versus inflammatory bowel disease (IBD), celiac disease and other gastrointestinal diseases that mimic IBS. These methods provide a means to rule out IBS and rule-in active IBD or celiac disease, as well as to rule-out active IBD and celiac disease, and to rule-in IBS.

Description

DNA METHYLATION-BASED ALGORITHM TO DIAGNOSE IRRITABLE BOWEL SYNDROME AND OTHER GI CONDITIONS [0001] This application claims benefit of United States provisional patent application number 63/499,017, filed April 28, 2023, the entire contents of which are incorporated by reference into this application. STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH [0002] This invention was made with government support under Grant Numbers DK104078 and DK064539, awarded by the National Institutes of Health. The government has certain rights in the invention. BACKGROUND [0003] Irritable bowel syndrome (IBS), inflammatory bowel disease (IBD) and celiac disease (CeD) present with similar gastrointestinal (GI) symptoms. While IBD and CeD are diagnosed by endoscopy with tissue confirmation, IBS is diagnosed by symptom criteria due to a lack of a reliable diagnostic test. Epigenetic modifications including DNA methylation have not been studied well in IBS, IBD and CeD. [0004] There remains a need for blood-based diagnostic classifiers that can differentiate IBS, IBD, and CeD, as well as other GI diseases that mimic IBS. SUMMARY [0005] The tools and methods described herein address these needs and more. Described herein are methods of classifying biological samples based on gastrointestinal disease. In some embodiments, the method comprises: (a) measuring the amount of DNA methylation at CpG sites in DNA in a biological sample obtained from the subject to generate a genome- wide DNA CpG methylation profile of the subject, wherein the measuring comprises probing at least 200,000 CpG sites; (b) performing a normalization of technical noise and bias correction on the profile; (c) filtering the profile to retain data from a first set of probed CpG sites that is the same as a first set of CpG sites from a first training data set trained on a first pairwise comparison between biological samples, wherein the first pairwise comparison is selected from: (i) irritable bowel syndrome (IBS) and inflammatory bowel disease (IBD); (ii) IBS and celiac disease (CeD); (iii) IBD and CeD; (iv) IBS and healthy control; (v) IBD and healthy control; and (vi) CeD and healthy control; and (d) classifying the CpG methylation profile of the biological samples as IBS, IBD, CeD, or healthy based on a probability cutoff derived from the first training data set. The classifying can be implemented by use of a computer. Also provided is a system for performing the methods described herein. [0006] In some embodiments, the filtering of step (c) is repeated to retain data from a second set of probes that is the same as a second set of probes from a second training data set trained on a second pairwise comparison between biological samples, wherein the second pairwise comparison is selected from (i) to (vi), and wherein the second pairwise comparison differs from the first pairwise comparison. [0007] In some embodiments, the filtering of step (c) is repeated to retain data from subsequent sets of probes corresponding to subsequent training data sets trained on subsequent pairwise comparisons between biological samples, wherein each of the remaining pairwise comparisons of (i) to (vi) is performed. [0008] In some embodiments, the probing comprises contacting the biological samples with a methylation microarray. In some embodiments, the biological sample comprises blood, plasma, serum, or mucosal tissue. In some embodiments, the sample is peripheral blood mononuclear cells (PBMCs), peripheral blood lymphocytes (PBL), or whole blood. [0009] In some embodiments, the method further comprises measuring the amount of fecal calprotectin (FCP) and/or blood C-reactive protein (CRP) in the biological sample. In some embodiments, 120 mcg/g FCP is the cut off for determining likelihood of IBD. In some embodiments, 150 mcg/g FCP is the cut off for determining likelihood of IBD. In some embodiments, the cutoff is 8-10 mg/l CRP for IBD. In some embodiments, the cutoff is 8 mg/l CRP for IBD. In some embodiments, an elevated (>4 U/ml) anti-tissue transglutaminase antibody is indicative of CeD. [0010] In some embodiments, the training data set is trained using GLMNET algorithm. In some embodiments, the training data set is trained using regularization parameter value and elastic net mixing parameter. This parameter minimizes the misclassification error. [0011] Also described is a method of screening for, diagnosing, or detecting irritable bowel syndrome (IBS), inflammatory bowel disease (IBD) and/or celiac disease (CeD) in a subject. In some embodiments, the method comprises performing the method of classifying described herein on a biological sample obtained from the subject, and identifying the subject as having IBS, IBD, or CeD based on the classifying of step (d). In some embodiments, the method further comprises treating the subject for IBS, IBD, or CeD. [0012] Treatment for IBS can include fiber supplements, laxatives, anti-diarrheal medications, such as loperamide, a bile acid binder, such as cholestyramine, colestipol or colesevelam, anticholinergic medications, such as dicyclomine or hysocyamine, tricyclic antidepressants, such as amitriptyline, imipramine, desipramine or nortriptyline, serotonin and norepinephrine reuptake inhibitor (SNRI) antidepressants, such as duloxetine, SSRI antidepressants, such as fluoxetine or paroxetine, tetracyclic antidepressant, such as mirtazapine, pregabalin or gabapentin. Medications approved for certain people with IBS include: Alosetron, Eluxadoline, Rifaximin, Lubiprostone, Plecanatide, Tenapanor and Linaclotide. [0013] Treatment for IBD can include anti-inflammatory drugs, which include aminosalicylates, such as mesalamine, balsalazide and olsalazine, and immune system suppressors. Time-limited courses of corticosteroids can be both anti-inflammatory and immunosuppressing. Some examples of immunosuppressant drugs include azathioprine, mercaptopurine and methotrexate. Small molecules can also be used for IBD treatment. These include tofacitinib, upadacitinib and ozanimod. Biologics for treating IBD include infliximab, adalimumab, golimumab, certolizumab, vedolizumab, ustekinumab, and risankizumab. In some cases, treatment can include antibiotics, anti-diarrheal medications, such as psyllium powder or methylcellulose, and/or loperamide, pain relievers, and vitamins and supplements. [0014] Treatment for CeD can include gluten-free diet, vitamin and mineral supplements, steroids to control inflammation, and other drugs, such as azathioprine or budesonide. [0015] In some embodiments, the normalization and correction comprise background correction and dye bias correction. In some embodiments, the resulting signal intensities are normalized using the ‘quantile normalization’ method. In some embodiments, CpGs associated with X and Y chromosomes, single nucleotide polymorphisms (SNPs) and repeats, non-specific or cross-reactive probes and probes showing low variability and extreme methylation values are removed. BRIEF DESCRIPTION OF THE DRAWINGS [0016] FIG.1. DNA methylation-based diagnostic algorithm for ruling-in IBS. Dx, diagnosis; IBS, irritable bowel syndrome; IBD, inflammatory bowel disease. [0017] FIG.2. DNA methylation-based diagnostic algorithm for ruling out IBS and/or ruling- in IBD based on fecal calprotectin/C-reactive protein (CRP) levels. Dx, diagnosis; IBS, irritable bowel syndrome; IBD, inflammatory bowel disease. [0018] FIG.3. DNA methylation-based diagnostic algorithm for ruling out IBS or ruling-in celiac disease based on celiac serologies. Dx, diagnosis; IBS, irritable bowel syndrome; IBD, inflammatory bowel diseases. [0019] FIG.4. Double cross validation process includes two nested cross-validation loops referred to as outer and inner loops. The entire data are divided into training and test sets (clear and hashed rectangles of “Outer loop”, respectively). The training set, which is further divided into tuning/calibration and validation sets (clear and hashed rectangles at right, respectively), is used as an inner loop. The calibration set is used for model building (tuning hyperparameters) and the validation set is used to estimate the errors. The model with the lowest prediction error within the inner loop is selected as the best model. Then the test set in the outer loop is used to assess the predictive performance of the model. Multiple splits (N=10) of inner and outer datasets avoids the bias with respect to variable selection resulting from use of a single training set. [0020] FIG.5. Receiver operating characteristic (ROC) curves and the area under the ROC curves (AUC) for IBS vs IBD, IBS vs celiac disease and IBD vs celiac disease comparisons. The x-axis represents the sensitivity and y-axis represents the specificity, and each point on the ROC curve represents a sensitivity/specificity pair corresponding to a particular decision threshold. The AUC values for IBS vs IBD, IBS vs celiac disease and IBD vs celiac disease classifiers were 0.85, 0.82 and 0.78. [0021] FIG.6. Correlation between DNA methylation principal components and covariates shows correlation values (r) for correlations between six principal components of DNA methylation data in IBS and healthy control subjects and potential confounders. Age and chip/batch were used as covariates for all differential methylation analyses and cell-types were used as additional covariates for machine-learning models. [0022] FIGS.7A-7F. Differentially methylated CpG across chromosomal regions for IBS versus IBD (7A), IBS versus CeD (7B), IBD versus CeD (7C), IBS versus healthy controls (7D), IBD versus healthy controls (7E), and CeD versus healthy controls (7F). Manhattan plots are depicted to visualize distribution of differentially methylated CpG site-associated genes across chromosomal regions. Hashed circles are CpGs with a p-value corresponding to FDR<0.05 in IBS vs IBD, IBD vs healthy controls, IBD vs CeD and CeD vs healthy controls. For IBS vs healthy controls, hashed dots represent p<0.0005. [0023] FIGS.8A-8C. Gene ontology terms associated with classifiers. FIG.8A shows the GO terms enriched in IBS classifier-associated genes in: IBS vs healthy control (first panel), IBS vs IBD (middle panel), and IBS vs Celiac disease (third panel) comparisons. Cell adhesion, Neuronal pathways and transport-associated genes were among the enriched GO terms represented in all the three comparisons. FIG.8B shows the GO terms enriched in IBD classifier-associated genes in: IBD vs healthy control (first panel), IBD vs IBS (middle panel), and IBD vs Celiac disease (third panel) comparisons. Inflammation-related terms were represented in all the comparisons. FIG.8C shows the GO terms enriched in Celiac disease classifier-associated genes in: Celiac disease vs healthy control (first panel), Celiac disease vs IBS (second panel), and Celiac disease vs IBD comparisons (third panel). Major histocompatibility complex (MHC) and antigen presentation pathways were among the top terms in all the three celiac-associated classifiers. [0024] FIG.9. ROC curve for testing IBD versus healthy control classifiers on external validation data. [0025] FIG.10. Permutation testing of classifiers associated with the pairs of diagnostic groups. DETAILED DESCRIPTION [0026] Described herein are methods of utilizing blood-based genome-wide DNA methylation profiles to develop a reliable test to diagnose irritable bowel syndrome (IBS) versus inflammatory bowel disease (IBD), celiac disease and other gastrointestinal diseases that mimic IBS. These methods provide a means to rule out IBS and rule-in active IBD or celiac disease, as well as to rule-out active IBD and celiac disease, and to rule-in IBS. [0027] Definitions [0028] All scientific and technical terms used in this application have meanings commonly used in the art unless otherwise specified. As used in this application, the following words or phrases have the meanings specified. [0029] As used herein, a “control” or “reference” sample means a sample that is representative of normal measures of the respective marker, such as would be obtained from normal, healthy control subjects, or a baseline amount of marker to be used for comparison. Typically, a baseline will be a measurement taken from the same subject or patient. The sample can be an actual sample used for testing, or a reference level or range, based on known normal measurements of the corresponding marker. [0030] As used herein, a “significant difference” means a difference that can be detected in a manner that is considered reliable by one skilled in the art, such as a statistically significant difference, or a difference that is of sufficient magnitude that, under the circumstances, can be detected with a reasonable level of reliability. In one example, an increase or decrease of 10% relative to a reference sample is a significant difference. In other examples, an increase or decrease of 20%, 30%, 40%, or 50% relative to the reference sample is considered a significant difference. In yet another example, an increase of two-fold relative to a reference sample is considered significant. [0031] “Nucleotide sequence” refers to a heteropolymer of deoxyribonucleotides, ribonucleotides, or peptide-nucleic acid sequences that may be assembled from smaller fragments, isolated from larger fragments, or chemically synthesized de novo or partially synthesized by combining shorter oligonucleotide linkers, or from a series of oligonucleotides, to provide a sequence which is capable of expressing the encoded protein. [0032] As used herein, the term “probe” refers to an oligonucleotide, naturally or synthetically produced, via recombinant methods or by PCR amplification, that hybridizes to at least part of another oligonucleotide of interest. A probe can be single-stranded or double- stranded. In the context of an array as described herein, a probe can be methylation- specific. [0033] As used herein, "hybridizes," "hybridizing," and "hybridization" means that the oligonucleotide forms a noncovalent interaction with the target DNA molecule under standard conditions. Standard hybridizing conditions are those conditions that allow an oligonucleotide probe or primer to hybridize to a target DNA molecule. Such conditions are readily determined for an oligonucleotide probe or primer and the target DNA molecule using techniques well known to those skilled in the art. The nucleotide sequence of a target polynucleotide is generally a sequence complementary to the oligonucleotide primer or probe. The hybridizing oligonucleotide may contain nonhybridizing nucleotides that do not interfere with forming the noncovalent interaction. The nonhybridizing nucleotides of an oligonucleotide primer or probe may be located at an end of the hybridizing oligonucleotide or within the hybridizing oligonucleotide. Thus, an oligonucleotide probe or primer does not have to be complementary to all the nucleotides of the target sequence as long as there is hybridization under standard hybridization conditions. [0034] The term "complement" and "complementary" as used herein, refers to the ability of two nucleic acid molecules to base pair with each other. For example, in DNA, adenine (A) is complementary to thymine (T). In RNA, adenine (A) is complementary to uracil (U). In some embodiments, complementarity refers to an antisense compound that is capable of base pairing with its target nucleic acid. For example, if a nucleobase at a certain position of an antisense compound is capable of hydrogen bonding with a nucleobase at a certain position of a target nucleic acid, then the position of hydrogen bonding between the oligonucleotide and the target nucleic acid is considered to be complementary at that nucleobase pair. Nucleobases comprising certain modifications may maintain the ability to pair with a counterpart nucleobase and thus, are still capable of nucleobase complementarity. Typically, two DNA molecules are complementary if they hybridize under the standard conditions referred to above. Typically, two DNA molecules are complementary if they have at least about 80% sequence identity, preferably at least about 90% sequence identity. [0035] As used herein, the term "subject" includes any human or non-human animal. The term "non-human animal" includes all vertebrates, e.g., mammals and non-mammals, such as non-human primates, horses, sheep, dogs, cows, pigs, chickens, and other veterinary subjects. In a typical embodiment, the subject is a human. [0036] As used herein, “a” or “an” means at least one, unless clearly indicated otherwise. [0037] For use in the methods described herein, representative examples of the sample include, but are not limited to, blood, plasma or serum, saliva, urine, cerebral spinal fluid, milk, cervical secretions, semen, tissue, cell cultures, and other bodily fluids or tissue specimens. [0038] Methods [0039] Described herein are methods of classifying biological samples based on gastrointestinal disease. In some embodiments, the method comprises: (a) measuring the amount of DNA methylation at CpG sites in DNA in a biological sample obtained from the subject to generate a genome-wide DNA CpG methylation profile of the subject, wherein the measuring comprises probing at least 200,000 CpG sites; (b) performing a normalization of technical noise and bias correction on the profile; (c) filtering the profile to retain data from a first set of probed CpG sites that is the same as a first set of CpG sites from a first training data set trained on a first pairwise comparison between biological samples, wherein the first pairwise comparison is selected from: (i) irritable bowel syndrome (IBS) and inflammatory bowel disease (IBD); (ii) IBS and celiac disease (CeD); (iii) IBD and CeD; (iv) IBS and healthy control; (v) IBD and healthy control; and (vi) CeD and healthy control; and (d) classifying the CpG methylation profile of the biological samples as IBS, IBD, CeD, or healthy based on a probability cutoff derived from the first training data set. The classifying can be implemented by use of a computer. Also provided is a system for performing the methods described herein. [0040] In some embodiments, the filtering of step (c) is repeated to retain data from a second set of probes that is the same as a second set of probes from a second training data set trained on a second pairwise comparison between biological samples, wherein the second pairwise comparison is selected from (i) to (vi), and wherein the second pairwise comparison differs from the first pairwise comparison. [0041] In some embodiments, the filtering of step (c) is repeated to retain data from subsequent sets of probes corresponding to subsequent training data sets trained on subsequent pairwise comparisons between biological samples, wherein each of the remaining pairwise comparisons of (i) to (vi) is performed. [0042] In some embodiments, the probing comprises contacting the biological samples with a methylation microarray. In some embodiments, the biological sample comprises blood, plasma, serum, or mucosal tissue. In some embodiments, the sample is peripheral blood mononuclear cells (PBMCs), peripheral blood lymphocytes (PBL), or whole blood. [0043] In some embodiments, the method further comprises measuring the amount of fecal calprotectin and/or blood C-reactive protein (CRP) in the biological sample. In some embodiments, 120 mcg/g FCP is the cut off for determining likelihood of IBD. In some embodiments, 150 mcg/g FCP is the cut off for determining likelihood of IBD. Values significantly higher, especially those over 250 μg/g, are strongly indicative of active IBD. In some embodiments, the cutoff is 8-10 mg/l CRP for IBD. In some embodiments, the cutoff is 8 mg/l CRP for IBD. In some embodiments, an elevated (>4 U/ml) anti-tissue transglutaminase antibody is indicative of CeD. [0044] In some embodiments, the training data set is trained using GLMNET algorithm. In some embodiments, the training data set is trained using regularization parameter value and elastic net mixing parameter. This parameter minimizes the misclassification error. [0045] In some embodiments, the normalization and correction comprise background correction and dye bias correction. In some embodiments, the resulting signal intensities are normalized using the ‘quantile normalization’ method. In some embodiments, CpGs associated with X and Y chromosomes, single nucleotide polymorphisms (SNPs) and repeats, non-specific or cross-reactive probes and probes showing low variability and extreme methylation values are removed. [0046] Also described is a method of screening for, diagnosing, or detecting irritable bowel syndrome (IBS), inflammatory bowel disease (IBD) and/or celiac disease (CeD) in a subject. In some embodiments, the method comprises performing the method of classifying described herein on a biological sample obtained from the subject, and identifying the subject as having IBS, IBD, or CeD based on the classifying of step (d). In some embodiments, the method further comprises treating the subject for IBS, IBD, or CeD. [0047] Treatment for Gastrointestinal Conditions [0048] Treatment for IBS can include fiber supplements, laxatives, anti-diarrheal medications, such as loperamide, a bile acid binder, such as cholestyramine, colestipol or colesevelam, anticholinergic medications, such as dicyclomine or hysocyamine, tricyclic antidepressants, such as amitriptyline, imipramine, desipramine or nortriptyline, serotonin and norepinephrine reuptake inhibitor (SNRI) antidepressants, such as duloxetine, selective serotonin reuptake inhibitor (SSRI) antidepressants, such as fluoxetine or paroxetine, tetracyclic antidepressant, such as mirtazapine, pregabalin or gabapentin. Medications approved for certain people with IBS include: Alosetron, Eluxadoline, Rifaximin, Lubiprostone, Plecanatide, Tenapanor and Linaclotide. [0049] Treatment for IBD can include anti-inflammatory drugs, which include aminosalicylates, such as mesalamine, balsalazide and olsalazine, and immune system suppressors. Time-limited courses of corticosteroids can be both anti-inflammatory and immunosuppressing. Some examples of immunosuppressant drugs include azathioprine, mercaptopurine and methotrexate. Small molecules can also be used for IBD treatment. These include tofacitinib, upadacitinib and ozanimod. Biologics for treating IBD include infliximab, adalimumab, golimumab, certolizumab, vedolizumab, ustekinumab, and risankizumab. In some cases, treatment can include antibiotics, anti-diarrheal medications, such as psyllium powder or methylcellulose, and/or loperamide, pain relievers, and vitamins and supplements. [0050] Treatment for CeD can include gluten-free diet, vitamin and mineral supplements, steroids to control inflammation, and other drugs, such as azathioprine or budesonide. EXAMPLES [0051] The following examples are presented to illustrate the present invention and to assist one of ordinary skill in making and using the same. The examples are not intended in any way to otherwise limit the scope of the invention. [0052] Example 1: Development of Blood-based Genome-wide DNA Methylation Profiles [0053] This Example demonstrates describe the analytic process of utilizing blood-based genome-wide DNA methylation profiles to develop a reliable test to diagnose irritable bowel syndrome (IBS) vs inflammatory bowel disease (IBD), celiac disease and other gastrointestinal diseases that mimic IBS. [0054] METHODS [0055] The methods described herein have been applied to Infinium MethylationEPIC BeadChip, IlluminaTM DNA methylation array, which enables interrogation of over 850,000 methylation sites quantitatively at single nucleotide resolution. The analysis is performed in three steps, data preprocessing, classifier development and evaluation, and diagnostic algorithm development. Each step involves implementation of a set of codes written in R statistical language (1). [0056] 1. Data preprocessing (package names Italicized with no quotes, functions Italicized with single quotes and methods or object classes normal font with double quotes): [0057] 1.1. Data import and normalization: As outlined in Figure 1, this part of analysis includes steps to preprocess the raw Illumina DNA methylation array (IDAT) files generated by the Illumina iScan scanner and output normalized and filtered data. Data preprocessing was performed using ENmix package (2). Quality Control (QC) metrics were generated using ‘QCinfo’ function. Samples that did not pass the QC threshold were eliminated. Using ‘preprocessENmix’ function, we used “oob” method for background correction and “RELIC” for dye bias correction. This resulted in signal intensity data on methylated and unmethylated channels of “MethylSet” class object. The data were quantile normalized using ‘norm.quantile’, separately for Methylated and Unmethylated intensities for Infinium I or Infinium II probes (“quantile1” method). Probe design type bias was corrected using Regression on Correlated Probes (RCP) method, followed by filtering on QC metrics. Detection P value threshold to identify low quality data point was set to 0.000001. “Impute” parameter was set to “True” (after removing rows or columns with too many missing values, default threshold of > 5% missing values). All the steps were combined into a single function which can be applied to either single or multiple IDAT files. Cell types were estimated using ‘estimateCellCounts2’ function and “preprocessNoob” method for Epic platform in FlowSorted.Blood.EPIC package (3). Methylation age was predicted using “methyAge” function. [0058] 1.2. Probe filtration: Of 855,790 CpGs on the array, QC information-based filtering resulted in 846,790 probes. Additionally, a file with non-informative probes was created with unique probe IDs that were excluded (X and Y chromosome derived using IlluminaHumanMethylationEPICanno.ilm10b2.hg19 package, n=18,880; probes with SNPs and repeats, n=25,329; and non-specific or cross-reactive probes, n=29,233; a total of 778,660 unique probes). Data were analyzed with and without correction for batch effects (technical variability due to pipetting errors between batches of samples that were run together on a 96 well plate). For the batch effect adjusted analyses, the batch effects were removed using ‘removeBatchEffect’ function separately for training and test folds from the “methylSet” object. [0059] 1.3. Sample
Figure imgf000011_0001
[0060] Epic methylation data were available on 384 subjects. After normalization and initial filtering, data were split into PBMC and Blood. The breakdown of the number of samples per group for PBMCs and whole blood is shown in Table 1. [0061] PBMC and blood data were further filtered separately to exclude probes with low variability (standard deviation < 0.02) and those very high or very low average beta values (beta < 0.1 or > 0.9), which are likely to be skewed (4). This resulted in 282,250 probes which were used as an input for various analysis comparing groups. This number can change based on the nature and the number of samples included as well as the cutoff used for each of the above filtering steps. The PBMC and blood data were further subdivided into the following comparison groups which belonged to one of the two categories including 1. one vs one: comparing one diagnosis group (Dx) vs healthy control (HC) group or one Dx vs other Dx, and 2. one vs all other Dxs combined. Although the results presented below do not include HC in the 2nd category, it may be potentially included to improve model performance. The GI disease controls included celiac disease, inflammatory bowel disease (IBD; Crohn’s disease and ulcerative colitis), and colon cancer. The resulting comparison groups included: [0062] Comparisons of IBS vs Single Other GI Disease Control [0063] IBS vs HC, IBS as disease and healthy control (HC) as control group [0064] IBS vs IBD, IBS as disease and IBD as control group [0065] IBS vs Celiac, IBS as disease and celiac as control group [0066] IBD vs HC, IBD as disease and HC as control group [0067] Celiac vs HC, Celiac as disease and HC as control group [0068] IBD vs Celiac, IBD as disease and Celiac as control group One Dx vs all other Dxs [0069] IBS vs Celiac + IBD + Colon cancer [0070] IBD vs IBS + Celiac + Colon cancer [0071] Celiac vs IBS + IBD + Colon cancer [0072] 2. Classifier development and evaluation [0073] 2.1. Generating training and test fold data [0074] We implemented nested folds to split the data into 10 folds of outer training set and outer test set (90:10). The outer training set was further split into inner training and inner test set (90:10), for choosing model hyperparameters. To account for age, age was either added as a variable in both training and test datasets or the effect of age was regressed out using linear regression for training and test data separately. [0075] 2.2. Selection of model hyperparameters [0076] For each dataset, models on training data were generated using generalized linear models (GLM) via penalized maximum likelihood using glmnet package. Regularization parameter (λ) was selected using 10-fold cross-validation on inner training and test data. A model corresponding to λ that minimizes the error of cross-validation (λmin) was chosen and prediction was made for each of the 10 outer training-test data splits. Separate models were generatedfor lasso or elastic net penalties (ჴ=0, 0.1,0.2,…,1), and the model associated with ჴ resulting in the best performance was chosen as the final model. [0077] Glmnet is one of the >200 available methods to train a model using machine learning. A comprehensive list of methods is compiled here - https://topepo.github.io/caret/available-models.html. Additionally, newer methods are constantly being developed which can fine-tune the hyperparameters to enhance model performance. [0078] Additionally, an ensemble of methods can be utilized to test multiple models, which may perform better than individual models. For example, in our analysis, individual models were not successful at predicting patterns in the whole blood samples with a limited sample size, whereas, the ensemble method performed better and resulted in more significant predictions. [0079] 2.3. Evaluation of model performance [0080] Various performance metrics were generated based on the 2X2 confusion matrix for each of the comparison, created using caret package and GLMs. These included p-values from linear regression of predicted and true values, accuracy (confidence intervals), accuracy p values, kappa, sensitivity, specificity, positive predictive value (PPV), negative predictive values (NPV), precision, recall, F1, prevalence and balanced accuracy. In these analyses, a model with a low p value (<0.05) and highest sensitivity, and specificity were chosen as the best models. However, all other metrics mentioned above help in addressing different aspects of model performance. [0081] RESULTS [0082] Accuracy of prediction models: Table 2 shows the performance metrics based on the number of correct and incorrect predictions on new (test) samples. The accuracy of discrimination between IBS vs HC was 0.6 (p<0.017), however, the specificity was low (0.43). Models developed for IBS vs IBD and IBS vs Celiac disease performed better with an accuracy of 0.76 and 0.77, respectively. The performance of IBD vs HC and Celiac vs HC models better, with accuracies 0.82 and 0.84, respectively. [0083] Diagnostic algorithm: Based on the probabilistic models described above, which identify linear and non-linear relationships between DNA methylation and GI diseases, we designed an algorithm that can be used to diagnose IBS. The first three tests involve DNA methylation pattern-based classification of incoming sample into IBS vs IBD, IBS vs celiac disease, and IBS vs other GI diseases (e.g., microscopic colitis, bile acid diarrhea, colon cancer) where three positive tests indicate a high likelihood of a positive IBS diagnosis. In the case of three negative tests, this would indicate that the diagnosis is very unlikely to be IBS. In the case of at least two positives out of the three tests, it suggests a probable diagnosis of IBS but may not confidently rule out other GI diseases that present with IBS-like symptoms. Additionally, a negative diagnosis of IBS on at least two of the three tests suggests a low likelihood of IBS. In these situations, IBD and celiac disease non-invasive markers including fecal calprotectin/C-reactive protein (CRP) and celiac serologies, respectively, can help to rule out IBS and rule-in active IBD (e.g., elevated fecal calprotectin and/or CRP) or celiac disease (positive celiac serologies) or rule-out active IBD and celiac disease (normal fecal calprotectin and CRP, and negative celiac serologies, respectively). Fecal calprotectin and/or C-reactive protein (CRP) and celiac serologies can further help with the diagnosis of IBS. For example, normal fecal calprotectin levels <50 has been shown to be associated with a <1% chance of having IBD (i.e., ruling out IBD) and those between 50 and 150 are still suggestive of IBS (if the DNA methylation-based comparisons of IBS vs IBD, IBS vs celiac, IBS vs other GI diseases are positive), mildly active IBD, gastrointestinal infection or microscopic colitis. A higher fecal calprotectin, e.g. >150, will be more suggestive of an inflammatory or infectious GI disorder, e.g., IBD (5). Similarly, abnormal celiac serologies would rule-out IBS alone and warrant diagnostic workup for celiac disease. In general, abnormal fecal calprotectin/C-reactive protein (CRP) or celiac serologies would require additional diagnostic workup for IBD, celiac disease and other organic GI diseases such as microscopic colitis. The specific steps involved in making diagnostic decisions are presented in Table 2 and a workflow is shown in Figure 1. Some patients will have already undergone an upper endoscopy and colonoscopy with biopsies and small bowel imaging tests which show no evidence of celiac disease, IBD, GI cancer, microscopic colitis, which would increase the probability that the patient has IBS as long as they meet diagnostic symptom criteria for IBS. This would be valuable in clinical practice because the blood test would confirm the diagnosis of IBS, i.e., be a rule-in test for IBS. Additionally, similar algorithms can be implemented to diagnose pediatric IBS, IBD and celiac disease using PBMCs and whole blood samples (Figures 2 and 3). [0084] The diagnostic performance of a ‘rule-in’ IBS test can be further improved by developing algorithms to diagnose IBS from symptomatic patients who have a diagnostic test that is negative for IBD, celiac and other GI diseases, e.g., colon cancer. [0085] To make the diagnostic call, DNA methylation data from a new sample will be generated using the EPIC platform and processed through the preprocessing steps followed by the three DNA methylation-based classification models to output class labels for the most probable prediction. The votes for each class will be counted and information from fecal calprotectin/C-reactive protein (CRP) or celiac serologies will be used to make the final diagnostic calls. [0086] CONCLUSIONS [0087] Although no single set of DNA methylation-based classifier was able to diagnose IBS with a high sensitivity and specificity, we developed an algorithm which utilizes the information based on models developed using GI disease controls in diagnosing IBS with a greater confidence. [0088] References [0089] Team RC. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2020. [0090] Xu Z, Niu L, Li L, Taylor JA. Nucleic Acids Res.2016;44(3):e20. [0091] Salas LA, Koestler DC. Illumina EPIC data on immunomagnetic sorted peripheral adult blood cells.2020. [0092] Mansell G, et al. BMC Genomics.2019;20(1):366. [0093] Menees SB, et al. Am J Gastroenterol.2015;110(3):444-54. [0094] American Gastroenterological A. Gastroenterology.2019;157(3):856-7. [0095] Lacy BE, et al. Am J Gastroenterol.2021;116(1):17-44. [0096] Table 1: Patient population
Figure imgf000015_0001
[0097] Table 2: Performance of model predictions
Figure imgf000015_0002
Figure imgf000016_0001
[0098] This table shows the performance metrics for classification models developed for each comparison mentioned in columns. IBD, inflammatory bowel diseases; IBS, irritable bowel syndrome. [0099] Figure 1 shows the possible combinations of DNA methylation tests and resulting diagnoses when the Fecal Calprotectin, C-reactive protein (CRP) and celiac serologies are negative. These non-invasivetests are currently recommended in gastroenterology society guidelines for IBS even in the setting without red flags (6, 7). The first column (IBS vs GI diseases other than IBD and celiac) represents the diagnosis based on an algorithm trained to classify IBS patients from other GI diagnoses such as non- celiac gluten sensitivity, microscopic colitis, bile acid diarrhea, colon cancer, etc., which may or may not present with red flags (weight loss, blood in stool, etc.). When all the tests are negative the patient is likely not IBS. However, they may still be IBD in remission or celiac disease on a gluten free diet. [0100] Figure 2 shows the possible combinations of DNA methylation tests and resulting diagnoses when the Fecal Calprotectin and/or C-reactive protein (CRP) are positive. The first column (IBS vs other GI diseases) represents diagnosis based on an algorithm trained to classify IBS patients from other GI diagnoses such as non-celiac gluten sensitivity, microscopic colitis, colon cancer, bile acid diarrhea, which may or may not present with red flags (weight loss, blood in stool, etc.). #The diagnosis is more likely IBS than IBD if the fecal calprotectin and/or CRP levels are only mildly elevated, e.g., fecal calprotectin level >50 but <150. If the fecal calprotectin and/or CRP levels are moderately or greatly increased, the probability of having IBS is very low and the probability of having IBD is high. [0101] Additionally, more than one positive DNA methylation test for IBS diagnosis suggests a greater likelihood for IBS compared to IBD. [0102] Figure 3 shows the possible combinations of DNA methylation tests and resulting diagnoses when the celiac serologies are positive. The first column (IBS vs other GI diseases) represents diagnosis based on an algorithm trained to classify IBS patients from other GI diagnoses such as non-celiac gluten sensitivity, microscopic colitis, bile acid diarrhea, colon cancer, etc. [0103] Example 2: Representative Workflow [0104] This Example illustrates the steps one can take to prepare, train, and analyze DNA methylation data using a machine learning model. Once a model has been trained, test samples can be evaluated using corresponding parameters. This approach can be used to provide a non-invasive system for guiding detection and diagnosis of gastrointestinal disorders and facilitate effective treatment. [0105] Input Data Preparation [0106] The preprocessing steps are aimed at preparing DNA methylation data obtained as a raw scanner output to be used as an input into a machine learning models. There are 855,790 CpGs on Epic v1.0 array. The following filtering and preprocessing steps were standardized which use ENmix R package. 1. Background and dye bias correction performed using ‘preprocessENmix’ 2. Quality control performed using ‘QCinfo’ with a detection P threshold of 1e-6 and data quantile normalized using ‘norm.quantile’ function 3. Probe design type bias correction using Regression on Correlated Probes (rcp function) and the resulting betas [Methylated/(Methylated+Unmethylated probes)] filtered based on QC metrics resulting in 846,790 probes. 4. Additional filtering by exclusion of X and Y chromosomal probes (n=18,880), probes with SNPs and repeats (n=25,329), and non-specific or cross-reactive probes (n=29,233) resulting in 78,660 unique probes. 5. Annotation of probes using IlluminaHumanMethylationEPICanno.ilm10b2.hg19 package. 6. Sample selection to include disease pairs. 7. Exclusion of probes with a low variance (standard deviation < 0.02) and extreme beta values (beta < 0.1 or > 0.9) resulting in 180,000 to 250,000 probes depending on the diseases in question. 8. Preparation of metadata file with patient details including age, DNA methylation run or batch number, and cell-type composition. [0107] Training Process [0108] After normalization and initial filtering, we prepared the data for input into the machine learning models. Table 3 shows all the comparison groups tested and the methylation platforms used to generate the data. While comparisons involving IBD and celiac disease including IBS vs IBD, IBS vs celiac and IBD vs celiac were performed on Epic array data, IBS vs HC was performed using three separate analyses. The first one included a larger set of IBS and HC samples that were run on Epic or 450K methylation array. This analysis used the platform type as the batch covariate. The second analysis included IBS and HC samples that were down-sampled to include balanced sample numbers for IBS and HCs to avoid over-fitting. The third analysis included IBS patient and HC samples that were run on Epic platform only. This step allowed for identification of differential methylation on probes that were not represented on 450K platform. All comparisons are shown in Table 3. [0109] 3: Comparison groups and DNA methylation platforms
Figure imgf000018_0001
[0110] Covariates [0111] Age, cell type proportions (six cell types including CD8-T cells, CD4-T cells, natural killer cells, B-cells, monocytes, neutrophils) and DNA methylation batch were included as covariates in the model along with the set of pre-filtered and normalized DNA methylation probes. IBS vs HC analysis for the larger dataset included platform (450K or Epic) as the batch covariate. For analyses involving IBD samples, sample collection site was used as an additional batch covariate. [0112] Structure of the model [0113] The unique features of the models include the use of GLMNET algorithm, and a combination of regularization parameter (λ) value and elastic net mixing parameter (^, range 0 [ridge] to 1 [lasso]). To identify the best fitting model, we implemented a 10-fold cross validation with 10 repeats. Specifically, 1. Split the dataset into 10 parts using function ‘createFolds’ function from ‘caret’ package. 2. For each set: a. One in ten samples was used as a hold-out sample b. Model trained on the remaining samples c. Within the training data set, generalized linear models (GLM) via penalized maximum likelihood using glmnet package were fitted with model hyperparameters tuned to obtain regularization parameter (λ) that minimizes the error (λmin) using ‘cv.glmnet’ function. Specific arguments included, type.measure="deviance", family= "binomial", parallel = TRUE, nfolds = 10 d. One model for each elastic net mixing parameter (^, range 0 [ridge] to 1 [lasso]) was fitted 3. The step 2 resulted in one best performing model for every ^. After repeating this on various training:test splits (from step 1), summarized output for all repeats was generated which included the following evaluation scores: a. Probability cutoff b. Alpha (^) c. AUC d. Accuracy e. Kappa f. Accuracy Lower Confidence Interval (CI) g. Accuracy Upper CI h. Accuracy Null i. Accurac P value j. Mcnemar P Value k. Sensitivity l. Specificity m. Pos Pred Value n. Neg Pred Value o. Precision p. Recall q. F1 r. Probability cutoff s. Prevalence t. Detection Rate u. Detection Prevalence v. Balanced Accuracy w. True Negatives (TN) x. False Positives (FP) y. False Negatives (FN) z. True Positives (TP) 4. An accuracy p-value of <0.05 was considered significant and the best performing model was defined as a model with the lowest accuracy p-value and a high balanced accuracy value when multiple models have a low p-value. The model outputted a set of classifiers (one for each repeat) for each pair of the GI diseases in question. [0114] Table 4A
Figure imgf000020_0001
Figure imgf000021_0001
[0115] Table 4B
Figure imgf000021_0002
[0116] Preparing and running the new sample outside of this study [0117] Preprocessing steps of DNA methylation data generated using Illumina Epic array will include normalization of technical noise as described in the “Input Data Preparation” section and filtering the data to retain the same set of probes as the training data for a particular disease pair. This data can be directly input into “predict” function along with the training model to generate the summary. The class will be predicted based on the probability cutoff, as mentioned in the first row of all the model outputs. For example, if the probability associated with the new sample, when run on the IBS vs HC training model is predicted as 0.3, which is less than 0.44, it will be classified as a healthy control and if is predicted to be 0.5, it will be classified as IBS. [0118] Functional significance of selected genes [0119] To identify IBS-specific GO terms, common pathways between IBS vs HC, IBS vs IBD and IBS vs celiac classifiers with a p<0.05 were identified. To identify pathways that are well represented across the repeats, we performed gene ontology analyses on classifiers identified in all repeats and selected terms that were overrepresented (enrichment p<0.05) in >30 or >50% repeats for each comparison. These analyses were repeated for IBD and celiac-associated classifiers to identify disease-specific pathways. 3: Genome-Wide DNA Methylation Analysis of Irritable Bowel
Figure imgf000022_0001
Figure imgf000022_0002
[0120] This Example demonstrates that blood-based DNA methylation biomarkers provide a non-invasive test to distinguish IBS, IBD and CeD. Gene ontology suggested functional significance of the classifiers in disease-specific pathology. [0121] Irritable bowel syndrome (IBS), inflammatory bowel disease (IBD) and celiac disease (CeD) present with similar gastrointestinal (GI) symptoms. While IBD and CeD are diagnosed by endoscopy with histological confirmation, IBS is diagnosed by symptom criteria due to a lack of a reliable diagnostic test. DNA methylation-based diagnostic biomarkers have not been investigated in IBS, IBD and CeD. We aimed at building blood- derived DNA methylation-based diagnostic classifiers using machine learning to differentiate IBS, IBD, and CeD, subsequently identifying gene ontology (GO) terms associated with disease-specific classifiers. [0122] Genome-wide DNA methylation of peripheral blood mononuclear cells from patients with IBS, IBD, CeD, and healthy controls was measured using Illumina’s 450K or EPIC arrays. Differential DNA methylation between IBS, IBD, CeD and HCs was measured using general linear models (GLMs) in “limma”. Classifiers were developed using machine learning (ML) by training penalized generalized linear models using double cross-validation with diagnosis as an outcome, methylation as a predictor and age and other confounders as covariates. An external dataset on 304 IBD and HCs was used as an independent validation cohort. Methylation sites selected using GLMs and those selected by the prediction models were used to identify GO enrichment. [0123] Results were based on 315 participants (148 IBS, 47 IBD, 34 CeD and 86 healthy controls) who had DNA methylation data. IBS vs IBD and IBD vs CeD showed the highest number of differentially methylated CpG sites followed by IBD vs HC, CeD vs HC and IBS vs HC. Classification accuracies based on the area under the receiver operating characteristic curves for IBS vs IBD, IBS vs CeD and IBD vs CeD were 0.80 (95%CI=0.7-0.87, p=6.75E- 10), 0.78 (95%CI=0.68-0.86, p=4.57E-10) and 0.73 (95%CI=0.62-0.83, p=0.03), respectively. The performance of IBD vs HCs was successfully validated in the external validation dataset. IBS-associated differentially methylated genes and the ML-based classifiers were enriched in cell adhesion and neuronal pathways, while IBD and CeD classifiers were enriched in inflammation and MHC class II pathways, respectively (p<0.05). [0124] Irritable bowel syndrome (IBS) is a highly prevalent stress-sensitive, chronic gastrointestinal (GI) disorder characterized by chronic abdominal pain associated with diarrhea and/or constipation. IBS has a worldwide prevalence of 4.5-11% (1) and is associated with a significant healthcare and economic burden (2). There is currently no valid diagnostic test that can reliably diagnose IBS and therefore symptom-based criteria are used. Most IBS patients have seen at least three physicians and undergo multiple expensive and invasive tests before a diagnosis of IBS, as IBS is often considered a diagnosis of exclusion (3,4). Other GI conditions, such as inflammatory bowel disease (IBD), celiac disease (CeD) and colon cancer, can present with similar GI symptoms but are diagnosed by confirmatory tissue histopathology and treated differently. Expensive and invasive diagnostic evaluations are performed to exclude these other conditions before a diagnosis of IBS is made. [0125] IBD, comprised of Crohn’s disease and ulcerative colitis (UC), are chronic, immune- mediated conditions manifesting as intestinal mucosal inflammation. IBD affects approximately 3 million Americans, evenly divided between Crohn’s disease and UC. (5,6) In general, the diagnosis of IBD is made by gastrointestinal disease specialists and requires endoscopic and histological confirmation. The time between onset of symptoms and referral to a specialist can vary substantially. One of the major sources of delay in diagnosis is treatment of the patient under the assumption that they have IBS rather than IBD. (7) About 10% of IBD patients are misdiagnosed as IBS.7 Delays in diagnosis are associated with worse outcomes, including increased incidence of complications and need for early surgery (8-10). [0126] CeD is a gluten-induced, immune-mediated enteropathy, with an estimated worldwide prevalence of approximately 1% (11-14). CeD can present with variable manifestations and can be misdiagnosed as IBS (15). GI society guidelines recommend screening for CeD with serologies in patients with IBS, chronic diarrhea, and anemia, for instance. However, up to 5% of patients are seronegative (14). Additionally, some patients self-initiate a gluten-free diet, which is not uncommon in clinical practice (16,17). Substantial delays in diagnosing CeD even up to 13 years can occur in clinical practice (13,18,19). Delays in diagnosis and treatment of CeD can lead to long-term complications such as osteoporosis, infertility, anemia, and lymphoma and increased health care utilization and pharmacotherapy (20,21). [0127] Thus, there is a significant unmet need for an objective biomarker-based test that can distinguish IBS, IBD and CeD, allow an earlier diagnosis and appropriate treatment plan and reduce unnecessary medical costs and tests (e.g., computed tomography [CT] scans, abdominal ultrasounds, stool studies, breath tests, repeat endoscopies). The importance of a diagnostic test for patients was supported by a recent large scale survey of individuals with IBS who reported that a fast and accurate diagnostic test for IBS is one of the top ten research priorities (22). Such a discovery has the transformative potential of shifting the paradigm of diagnosing IBS, IBD and CeD. [0128] Epigenetic modifications including DNA methylation are mechanisms that regulate gene expression and higher-order DNA structure. DNA methylation has emerged as a leading mechanism linking gene-environment interactions to long-term behavioral development, particularly in complex disorders (23,24). In normal mammalian somatic genomes, DNA methylation mainly occurs at Cytosines in a CpG dinucleotide context. CpG methylation is generally absent from short stretches of CpG-rich sequences known as CpG islands (CGIs) which typically occur at or near the transcription start site of genes (25). Hypermethylation of CGI promoters is tightly linked with transcriptional repression of the affected gene and therefore have been viewed as an epimutation causing the silencing of a gene. In contrast, recent studies show that gene body methylation is positively correlated with gene expression and can be potential therapeutic targets (26). [0129] Aberrant DNA methylation has been associated with a variety of cancers and non- cancerous disorders, including psychiatric and neurodegenerative disorders. Our previous study on DNA methylation and targeted bisulphite sequencing in peripheral blood mononuclear cells (PBMCs) identified changes in DNA methylation in IBS patients compared to healthy controls (27). This study highlighted a role for neuronal and oxidative stress pathways in the pathophysiology of IBS. DNA methylation changes in blood have been associated with inflammation in patients with IBD (28), however, there is a lack of blood- based DNA methylation studies in CeD. Additionally, there are no studies investigating DNA methylation as a biomarker for the differential diagnosis of IBS from other diseases that mimic IBS. [0130] Therefore, we hypothesized that DNA methylation marks in PBMCs can serve as a biomarker for diagnosing IBS and for distinguishing IBS from IBD and CeD. Similarly, DNA methylation-based biomarkers may be used for non-invasive diagnoses of IBD and CeD. This study sought to develop DNA methylation-based classifiers to discriminate IBS, IBD, CeD, and healthy controls, and also to investigate gene ontology (GO) terms and pathways associated with the classifiers. [0131] MATERIALS AND METHODS [0132] Study population [0133] Male and female participants with IBS, IBD, and CeD, and healthy controls ages 18- 55 were recruited by community advertisement or at GI clinics at the University of California, Los Angeles (UCLA). IBS and healthy control (HC) samples were collected between 2009 and 2020 (analyzed on 450K or EPIC platforms). IBD and CeD samples were collected between 2012 and 2020. In addition to the UCLA cohort, we included banked PBMC samples from patients with IBD obtained from the Crohn’s and Colitis Foundation (CCF) Study of a Prospective Adult Research Cohort with IBD (SPARC IBD) with identical inclusion/exclusion criteria. The SPARC IBD cohort has been previously described (29). The diagnosis of IBS and bowel habit subtypes were based on Rome III (30) or IV (31) criteria depending on time of recruitment and confirmed by a clinician with expertise in IBS. Healthy controls had no personal or family history of IBS or other chronic pain conditions. Additional exclusion criteria for IBS and healthy controls included infectious or inflammatory disorders, active psychiatric illness over the past six months assessed by structured clinical interview for the DSM-IV (MINI) (30), use of corticosteroids in the past six months, use of narcotics in the past two months, and alcohol abuse. Questionnaires administered to the participants are described in detail in the Supplementary Methods. The Bowel Symptom Questionnaire (BSQ), Hospital Anxiety and Depression Scale (HAD) (32), and Adverse Childhood Experiences (ACE) (33) questionnaires were administered to all UCLA patients in this study including IBS, IBD and CeD. Additional questionnaires administered to IBS patients included Irritable Bowel Syndrome Severity Scoring System (IBS-SSS) (34). [0134] The diagnosis of IBD was confirmed by endoscopy with pathologic tissue confirmation. Since treatments such as inflammation-reducing drugs used in IBD have been associated with changes in DNA methylation (35), we recruited patients that were treatment naïve or currently on no IBD treatment including biologic agents (anti-TNF therapy, ustekinumab, risankizumab or vedolizumab) or other agents (tofacitinib, upadicitinib, ozanimod, 6-mercaptopurine, azathioprine, methotrexate, or 5-aminosalicylic acid [5-ASA]). Additional exclusion criteria included Crohn’s disease treated surgically without evidence of subsequent disease and history of coexistent IBS or CeD. Patients with IBD reported active GI symptoms, e.g., abdominal pain, bloating, diarrhea, and/or blood in stool. Disease activity was assessed with the following instruments: Simple Endoscopic Score for Crohn's Disease (36) (SES-CD, UCLA and SPARC IBD cohorts), and Crohn's Disease Activity Index (37) (CDAI, UCLA cohort), Short-CDAI (38) (SCDAI, SPARC IBD cohort), Simple Clinical Colitis Activity Index (39) (SCCAI, UCLA cohort), and Ulcerative Colitis Disease Activity Index (40) (UCDAI, SPARC IBD cohort) as described in the Supplementary Methods. [0135] Patients with CeD reported active GI symptoms, e.g. abdominal pain, bloating, diarrhea and had their diagnosis confirmed by positive anti-tissue transglutaminase antibodies (anti-TTG) and/or anti-endomysial antibodies (anti-EMA) and the presence of at least Marsh II lesions on duodenal biopsies. Patients currently adherent to a gluten-free diet > 2 weeks, history of co-existent diagnosis of IBD, IBS, other causes of malabsorption, or any other medical conditions that could explain their GI symptoms, were excluded. Questionnaires administered to CeD patients included the validated Celiac Symptom Index (CSI) (41) and Celiac Dietary Adherence Test (CDAT) (42). [0136] All study participants who had a history of > 1/2 pack per day of cigarettes were excluded. One Crohn’s (4%) and three CeD patients (9%) reported as being former smokers. A small percentage of participants reported the use of drugs including selective serotonin reuptake inhibitors (SSRIs) and serotonin-norepinephrine reuptake inhibitors (SNRIs, 7-9%), tricyclic anti-depressants (TCAs) or benzodiazapines (3-9%), statins (1-9%), and beta blockers (0-3%) (Supplementary Table 1), which can affect DNA methylation (43). None of the healthy controls used these medications. Participants recruited at UCLA were compensated. The study was approved by the UCLA Institutional Review Board, and all subjects signed a written informed consent prior to the study. [0137] Statistical and bioinformatic analyses [0138] Clinical and demographic data [0139] Group differences in demographic characteristics including age, sex, body mass index (BMI), race, ethnicity and smoking status were assessed using t-tests, ANOVA or Fisher’s tests. Summary statistics were created for disease activity scores for IBS, IBD and CeD. [0140] DNA methylation data processing [0141] The methods used to process DNA methylation data and implementation of machine learning algorithms are presented in detail in Supplementary Methods. In short, raw Illumina DNA methylation array data (IDAT) files generated by the Illumina iScan scanner were processed using Enmix DNA methylation analysis pipeline (44). [0142] DNA methylation data pre-processing: Details of the analysis are presented in Supplementary Methods. Briefly, Quality Control (QC) metrics were generated and samples that did not pass the QC threshold were eliminated. Background correction and dye bias correction were applied to the data and the resulting signal intensities were normalized using the ‘quantile normalization’ method. Cell types were estimated as described previously (45). QC information-based filtering was implemented to filter unwanted probes out of the 855,790 CpGs on the array. [0143] Additionally, CpGs on X and Y chromosomes, single nucleotide polymorphisms (SNPs) and repeats, non-specific or cross-reactive probes and probes showing low variability and extreme methylation values were removed. The remaining sites were used as input for classifier development. All the analyses were performed using R programming language (cran.r-project.org). To select the covariates that affected DNA methylation, we correlated the principal components derived from the methylation data, for example on IBS and HC data with potential confounders (Figure 6). [0144] Differential Methylation: Group differences between IBS, IBD, CeD and HC were analyzed using general linear models using ‘limma’ package using age and batch as covariates. P-values were adjusted for multiple tests using false FDR. [0145] Classifier development and evaluation: Using the genome-wide DNA methylation data, we developed machine learning (ML)-based classifiers to test their performance as diagnostic biomarkers in classifying the GI diseases and HCs. Generalized linear models (GLM) were fit via penalized maximum likelihood using glmnet package, considering diagnosis as outcome, normalized and filtered DNA methylation probes (CpG sites) as predictors, and age, cell-type proportions, and technical batch as covariates. We used a double cross-validation method (46) to evaluate and test machine learning models, which is a preferred method since the models are trained and tested on independent datasets. Double cross-validation process includes two nested cross-validation loops referred to as outer and inner loops. The training dataset was further sub-divided into tuning/calibration and validation sets which form the inner loop. The calibration set was used for model building (hyperparameter tuning) and the validation set was used to estimate the errors. The model with the lowest prediction error within the inner loop was selected as the best model. This model was then applied to the test dataset and the class labels were predicted. Multiple splits (N=10) of inner and outer datasets were run in order to avoid the bias with respect to variable selection resulting from use of a single training set (Figure 4). Lasso, ridge or elastic net regression models (penalty term representing shrinkage ჴ=0 to 1) were fitted and the model associated with an ჴ resulting in the best performance was chosen as the final model. Optimal cutoffs were chosen so as to have a minimum difference between the sensitivity and specificity and the values close to the area under the receiver operating characteristic curve (AUC) (47). Performance metrics including, accuracy (mean and 95% confidence interval [CI]), sensitivity, specificity, F1, and p values were generated based on the 2X2 confusion matrix constructed using the predicted and true test sample labels. [0146] Gene ontology (GO) analysis: Enrichment of GO terms and or pathways associated with differentially methylated CpG and ML-based classifiers for each comparison was assessed using ‘missMethyl’ package in R (48) as described in Supplementary Methods. [0147] Sensitivity analyses: Figure 6 shows correlation between DNA methylation principal components (PCs) and potential confounders. We performed a sensitivity analysis to assess the performance of the classifiers excluding patients who consumed some medications that can potentially alter DNA methylation (43) including SSRIs, benzodiazapines, statins or beta blockers (Table 8). Since IBD PBMC samples were derived from two sites (UCLA and CCF), we repeated IBS vs IBD comparison and included the “site” variable as an additional batch covariate. [0148] Assessment of Classifier Performance [0149] External validation set: We downloaded whole blood HM450K DNA methylation data on IBD vs healthy controls from the Gene Expression Omnibus (GEO) database (accession# GSE87648) for validating the performance of our IBD vs model. The processed dataset includes 460398 probes on 304 samples (204 IBD and 100 healthy controls). We filtered the external data to include probes that overlapped with the probes on Epic array to match the internal IBD and HC dataset. We trained our algorithm on our IBD and HC samples using GLMNET with 10-fold CV as described in the previous section and predicted the classification of external samples. The results were evaluated using AUC and CI. [0150] Permutation testing: Permutation-based testing to assess classifier performance has been used extensively in classification problems in computational biology to test if there is a real class structure in the data (49). We generated 100 random permutations to generate AUCs from classifiers generated from shuffled labels to create null distributions. We compared the IBS vs IBD, IBS vs CeD, IBD vs CeD, and IBS, IBD, and CeD vs models generated on the actual class labels to the corresponding null distributions and calculated the p-values for differences in the distributions. [0151] RESULTS [0152] We analyzed DNA methylation profiles of 315 participants including IBS (N=148; 65% women, 45 constipation-predominant IBS [IBS-C], 54 IBS-D, 49 mixed or unsubtyped IBS [IBS-M or IBS-U]), CeD (N=34; 68% women), IBD (N=47; 22 UC and 25 Crohn’s disease, 49% women], and healthy controls (N=86; 55% women). Table 5 shows demographic characteristics of study participants. There were significant overall group differences in mean age, race and ethnicity (p=5.97e-06, 5.0e-4, 1.0e-3, respectively). There were no significant group differences in the percentage of women or BMI (Fisher p = .31 and ANOVA p=.26, respectively). [0153] Table 5. Demographic Characteristics of the Study Population
Figure imgf000029_0001
[0154] #, ANOVA p value; $, Fisher test p value; IBS, irritable bowel syndrome; UC, ulcerative colitis; CeD, celiac disease; BMI, body mass index; SD, standard deviation [0155] About 35% IBS, 28% IBD and 31% CeD patients used medications such as antidepressants, statins or NSAIDs (Table 8). Severity of IBS symptoms were moderate with mean (standard deviation [SD]) overall severity score of 9.45 (4.18; range 0-20) and IBS- SSS of 237 (89.70; range 0-500) which represents moderately severe symptoms. The mean (SD) ACE score in IBS patients was 1.91 (1.89), Table 9A. UC disease activity assessed by UCDAI and SCCAI (mean [SD] = 4.29 [1.25] and 6.17 [4.84], respectively) indicated mild to moderate disease. Crohn’s disease activity assessed by CDAI, SCDAI and SES-CD indicated mild to moderate disease (mean [SD] = 146.36 [78.10], 243.82 [73.77] and 4.40 [5.30], respectively) Table 9B). CeD activity assessed by CSI scores indicated moderate disease (mean [SD] = 38 [10.98]) and CDAT scores (Mean [SD] = 14.94 [3.88]) indicated a poor adherence to a gluten-free diet (GFD) by the CeD patients in this study (>13 indicates non-adherence, Table 9C). [0156] Based on the medical records, a history of anxiety and/or depression was the most common co-morbidity (28%) associated with IBD, followed by thyroid disease (8%) and gastroesophageal reflux disease (GERD, 4%) in the UCLA cohort, and IBD-associated arthropathy amongst SPARC IBD patients (5%). Patients with CeD also reported the diagnosis of thyroid disease (14%), anemia and GERD (9%) and anxiety or depression (6%). [0157] Differentially Methylated CpG Sites Associated with Diagnostic Groups [0158] There were significant differences in CpG methylation between IBD vs HCs (FDR<0.05, N=4130), IBD vs CeD (FDR<0.05, N=655), CeD vs HCs (FDR<0.05, 311), IBS vs IBD (FDR<0.05, N=86) and IBS vs CeD (FDR<0.05, N=6, FDR<0.05). Between IBS and HCs, 48 CpGs were differentially methylated at p<0.0005, but none at FDR<0.05 (Figure 7, Tables 10A – 10F). [0159] Gene ontology terms associated with differentially methylated genes [0160] GO analyses on genes with differentially methylated CpG sites between disease groups suggested an association of terms including “humoral immune response” and “central nervous system maturation”, with IBS vs IBD; “MHC class II protein complex binding” and “immune system process” with IBS vs CeD; and “adaptive immune response” and “cytokine production” with IBD vs CeD. Disease groups vs HC comparisons resulted in terms including “ion transport activity” and “calcium-release channel with IBS vs HC; “immune response” and “leukocyte activation” with IBD vs HC; and “regulation of immune system process” and “T-cell activation” with CeD vs HC comparisons (Tables 11A – 11F). [0161] DNA methylation-based classifiers to discriminate IBS, IBD, CeD and HCs [0162] To identify disease-associated classifiers, penalized regression models were trained independently on pairs of diagnoses including, IBS, IBD, CeD and healthy controls (Table 7) and tested on a holdout test dataset resulted in disease specific classifiers. [0163] DNA methylation-based classifiers for IBS, IBD and CeD [0164] Our results showed that the selected classifiers showed high accuracy for discrimination between various patient groups. Table 6 shows the performance metrics for IBS compared to healthy controls, IBD and CeD. The AUC and accuracy for IBS vs IBD classifier were 0.85 and 0.80, respectively (p=6.75E-10) and those for IBS vs CeD were 0.82 and 0.78, respectively (p=4.57E-10). Figure 4 shows the ROC curves for these comparisons. For IBD vs CeD, AUC and accuracy were 0.78 and 0.73 (p=0.002), respectively. When comparing IBS vs Crohn’s disease and UC separately, the classifiers for IBS performed slightly better in discriminating against Crohn’s disease compared to UC (Crohn’s: Accuracy [95% CI] = 0.80 [0.66 – 0.9]; UC: Accuracy [95% CI] = 0.75 [0.60 – 0.87]). [0165] Table 6. Performance Metrics for IBS, IBD and CeD Associated Classifiers
Figure imgf000031_0001
[0166] * For IBS vs healthy control comparison, DNA methylation data from 450K and EPIC arrays was used, for all other comparisons only subjects with EPIC array data were used; #Covariates included age, cell-type proportions, technical batch effects for all columns however, IBD vs healthy control comparison included site of IBD sample collection as an additional covariate. F1 denotes weighted average of precision and recall; AUC, area under the receiver operating characteristic (ROC) curve; IBS, irritable bowel syndrome; IBD, inflammatory bowel diseases; CeD, celiac disease. [0167] When comparing GI diseases to healthy controls, the AUC and accuracy for IBS vs healthy controls were 0.69 and 0.69, respectively (p=3.06-05). The classification accuracies between the different bowel habit subtypes within IBS including IBS-D, IBS-C and IBS-M vs healthy controls were similar to the overall IBS group (Table 12). AUC and accuracy of IBD vs healthy controls were 0.92 and 0.82 (p= 1.33E-08), and those of CeD vs healthy controls were 0.85 and 0.80 (p =7.11E-09). [0168] Assessment of Classifier Performance [0169] External validation set [0170] Classifiers developed based on IBD vs data from this study were used to predict the classes of samples from an independent IBD vs DNA methylation dataset as described in the Methods section. The classifier successfully classified a majority of subjects to correct classes (Figure 9, AUC=0.74, p<0.0001). [0171] Permutation testing [0172] Using permutation testing, we demonstrated that the true labels for IBD vs HC, IBS vs HC, IBS vs IBD, IBS vs Celiac, and IBS vs HC performed significantly better than the permutated labels (P<0.05, Figure 10). [0173] GO terms and pathways associated with classifiers [0174] IBS vs HCs, IBD and CeD [0175] GO terms associated with IBS-related classifiers, i.e., overlapping terms between IBS vs HCs, IBS vs IBD and IBS vs CeD classifiers were enriched in cell adhesion and neuronal pathway-associated terms including “cell-adhesion”, “neurotransmitter uptake”, “forebrain neuron development” and “sensory perception of pain”, suggesting a role for IBS-associated genes in permeability and visceral pain (50) (Figure 8A, Table 14A, p<0.05). [0176] Analysis of GO terms associated with the differentially methylated genes as well as classifiers suggested that IBS was associated with epigenetic changes in neuronal and immune system pathways. [0177] IBD vs HCs, IBD and CeD [0178] The IBD classifiers were primarily enriched in immune and inflammation-related pathways including “innate immune response”, “T-helper 1 type immune response” and “T- helper 17 cell chemotaxis” (p<0.05, Figure 7B). [0179] Analysis of GO terms associated with the differentially methylated genes as well as classifiers suggested that IBD was associated with epigenetic changes in inflammation and immune response pathways. [0180] CeD vs HCs, IBS and IBD [0181] CeD classifiers were enriched in major histo-compatibility complex (MHC) pathways- related terms such as “MHC class II protein complex” and “antigen processing and presentation of exogenous peptide antigen” (p<0.05, Figure 7C). [0182] Analysis of GO terms associated with the differentially methylated genes as well as classifiers suggested that CeD was associated with epigenetic changes in MHC pathway. [0183] Sensitivity analyses [0184] DNA methylation is known to be sensitive to confounders. Figure 6 shows correlation between DNA methylation principal components (PCs) and potential confounders. [0185] We repeated IBS vs IBD and IBS vs CeD comparisons excluding patients who consumed medications including anti-depressants, NSAIDs and statins, which are reported to affect DNA methylation (43) and recalculated the performance metrics. The AUC for the new model was comparable to that of the original model in both comparisons (Table 13), suggesting that the models chosen were robust to the effects of these confounders. [0186] DISCUSSION [0187] We report, for the first-time, a comprehensive analysis of genome-wide DNA methylation data in most common GI disorders including IBS, IBD and CeD, and HCs. Here, we identify the differentially methylated CpG sites and potential blood-based diagnostic biomarkers based on DNA methylation profiles that can accurately distinguish IBS, IBD and CeD. The genes associated with the differentially methylated CpG sites corroborated with the pathophysiologic mechanisms of the disease and the biomarkers we report here showed a good to excellent (51) AUC for differentiating IBS, IBD and CeD. These biomarkers can potentially allow an earlier diagnosis and appropriate treatment plan and reduce unnecessary medical costs. [0188] Smaller studies with a limited genomic coverage exist in IBD (23) and CeD (24) populations involving multiple univariate tests which can have limited power to detect smaller linear and non-linear changes. Using machine learning frameworks enabled us to not only develop predictive models, but to also select for multiple relevant CpG sites without having to conduct repetitive association tests and consequential stringent multiple hypothesis adjustment. Moreover, our thorough preprocessing steps (including filtering specific sites and individuals, accounting for confounding) as well as the use of a double cross validation methodology lay the framework for reduced risk of overfitting and therefore improved reproducibility (and out of sample prediction). Finally, our stringent inclusion criteria of symptomatic IBD and CeD patients diagnosed with standard diagnostic criteria but not taking treatment for their diseases helped make our model more amenable to replicability and applicability to the target population of undiagnosed patients with chronic bowel symptoms. [0189] The GO analyses suggested that the disease-specific classifiers identified in this study were associated with pathways that are relevant to the pathophysiology of the corresponding diseases. For example, IBS-associated classifiers were associated with cell adhesion, neuronal signaling and pain pathways, which are the most widely studied pathways in IBS (52-55). Similarly, inflammatory and immune pathways and related terms were associated with the IBD classifiers further supporting the functional significance of associated classifiers (56). Additionally, the CeD classifier was associated with MHC class II receptor activity. MHC class II encoded by human leukocyte antigen (HLA) is a chief genetic determinant of CeD, and certain HLA-DQ allotypes like DQ2 and/or DQ8, which are allelic variants within the constant region of HLA genes are known to predispose to the disease by presenting posttranslationally modified gluten peptides to CD4+ T cells (57). [0190] Although endoscopies with biopsies remain the gold standard for diagnosing IBD and CeD, these are invasive, costly procedures which are associated with a risk of complications (58,59). Existing noninvasive biomarkers such as C-reactive protein (CRP), fecal calprotectin and lactoferrin have limitations including statistical heterogeniety between studies, variable cut-off points, and overlapping values between active inflammatory disease and non- inflammatory disease such as IBS (59). [0191] There is currently no reliable diagnostic test for all IBS bowel habit subtypes with acceptable test characteristics that are needed for widespread clinical use. However, a commercial blood test for IBS-D is currently available. Studies have demonstrated increased circulating antibodies to cytolethal distending toxin B and vinculin (anti-CdtB, anti-vinculin) in patients with IBS-D. (60) The development of these biomarkers is based on studies in animal models showing cross-reactivity of anti-CdtB, which is produced by bacteria that cause gastroenteritis, with host vinculin and the association of these antibodies with an IBS phenotype. A subsequent follow up study comparing test characteristics in IBS-D patients with a smaller number of patients with IBS-M (n=25) and IBS-C (n=30) showed highest plasma levels of anti-CdtB and anti-vinculin in patients with IBS-D, lowest levels in IBS-C and levels in between these two groups in IBS-M. (61) However, this anti-CdtB or anti- vinculin test, which utilized blood samples from IBS-D patients who participated in a clinical trial for rifaximin, had a low sensitivity (<50%) for diagnosing IBS but a high specificity (60). Thus, at least half of patients with IBS have a negative test. A study conducted in Australia using population and outpatient-based samples found that anti-CdtB and anti-vinculin levels had a poor ability to discriminate IBS from organic GI disease, including IBD. (62) Recent GI society guidelines for IBS did not issue recommendations for or against the use of this serologic test. (16,17) [0192] The diagnostic performance of a blood-based DNA methylation test could potentially be further improved by developing algorithms utilizing other non-invasive tests that aid in distinguishing gut inflammation and CeD, e.g., CRP, fecal calprotectin, and celiac serologies. However, while these tests help to rule out IBD and CeD if normal, they are not helpful in ruling in IBS. There are significant overlaps in elevated CRP and mildly elevated fecal calprotectin in IBS and IBD. For example, a normal fecal calprotectin levels <40 microgram (ug)/gram (g) has been shown to be associated with a <1% chance of having IBD (63) (i.e., ruling out IBD) and those between 40 ug/g and 150 ug/g may be seen in IBS, mildly active IBD, GI infection, GI neoplasm, CeD, or microscopic colitis. A higher fecal calprotectin, e.g., >150, will be more suggestive of an active inflammatory or infectious GI disorder, e.g., IBD (63). Abnormal fecal calprotectin or CRP would prompt additional diagnostic workup such as endoscopy with biopsies for IBD, CeD, and other organic GI diseases. Similarly, elevated celiac serologies would suggest the diagnosis of CeD but endoscopy with tissue confirmation is required in North America. However, celiac serologies and intestinal histology can be normal in CeD patients adherent to a prolonged gluten-free diet which limits their use in these patients (12). A gluten challenge followed by repeat serologies and/or small intestinal biopsy would be typically needed in those with self-initiated GFD, which can be unappealing and difficult for symptomatic patients. An accurate non-invasive diagnostic test for CeD would be helpful in this patient population. Use of a DNA methylation-based diagnostic test in this patient population requires further study since we only evaluated patients with active CeD not on a GFD or consuming one for <2 weeks. [0193] A non-invasive diagnostic test for IBS can be valuable in different ways depending on the type of clinical practice and patient population. For gastroenterologists, limited or more comprehensive diagnostic testing including upper and lower endoscopies may be done in the presence of no or positive alarm features, respectively. However, in the common scenario of a negative diagnostic evaluation, a rule in test for IBS will be beneficial, particularly for patients who want confirmation that they have IBS before starting treatment, and for health care providers who can avoid ordering additional or repeated costly diagnostic tests and instead, can institute appropriate treatment to relieve IBS symptoms. The sensitivity and specificity of a diagnostic test for IBS such as the one we propose in this paper would arguably be quite acceptable in this scenario. For primary care physicians, a diagnostic test for IBS would allow them to make a positive diagnosis, recommend IBS treatment and avoid ordering more diagnostic testing and referring to a gastroenterologist for an endoscopic procedure. Other clinical scenarios where a DNA methylation-based test for IBS could be helpful are the diagnosis of coexistent symptomatic IBS in patients with CeD, IBD or other GI disease that has been treated or in remission which is between 32-40% of patients (64) or in patients with non-celiac gluten or wheat sensitivity, but further studies are needed. [0194] Although machine learning is a powerful tool that learns latent patterns in the data to make predictions, it is susceptible to overfitting by learning patterns that arise from known or unknown confounding variables (65,66). To avoid overfitting, we nested the hyperparameter optimization using double cross-validation or nested cross validation (Figure 4). Since this method involves testing on a dataset which was independent of the training dataset, it provides almost unbiased estimates of the true errors (46). Although, the performance accuracy of our model was slightly better on internal dataset compared to the external data, it is known that external validations generally perform poorer than the development model (67). [0195] We accounted for confounders by including covariates such as batch effects, cell counts and age in our models, by applying strict exclusionary criteria such as smoking and medication intake or by performing additional sensitivity analyses such as excluding patients taking anti-depressants and accounting for differences by collection sites. Although IBS vs HC classifiers showed a low overall accuracy, the utility of a diagnostic test in discriminating against HCs is unclear since HCs lack GI symptoms. Additionally, a positive test for IBS vs HCs is likely to diagnose IBS with 70% sensitivity, providing a rule-in test. Additionally, our GO analysis suggested that the biomarkers associated with IBS vs HC comparison were associated with pathways relevant to IBS pathophysiology. [0196] In conclusion, using comprehensive data on DNA methylation in various GI conditions and healthy controls, this Example shows that blood-based DNA methylation changes provide non-invasive biomarkers to distinguish IBS, IBD and CeD, leading to a faster diagnosis and a rule-in test for IBS. GO analysis supports the functional significance of the classifiers in disease-specific pathology. Future analysis will be aimed at testing these markers further and assessing their utility in predicting response to treatment. [0197] References [0198] 1. Lovell RM, Ford AC. Clin Gastroenterol Hepatol 2012;10:712-721 e4. [0199] 2. Cash B, et al. Am J Manag Care 2005;11:S7-16. [0200] 3. Ladabaum U, et al. Clin Gastroenterol Hepatol 2012;10:37-45. [0201] 4. Spiegel BM, et al. Am J Gastroenterol 2010;105:848-58. [0202] 5. Dahlhamer JM, et al. MMWR Morb Mortal Wkly Rep 2016;65:1166-1169. [0203] 6. Kappelman MD, et al. Clin Gastroenterol Hepatol 2007;5:1424-9. [0204] 7. Card TR, et al. United European Gastroenterol J 2014;2:505-12. [0205] 8. Nahon S, et al. Dig Dis Sci 2016;61:3278-3284. [0206] 9. Nguyen VQ, et al. Inflamm Bowel Dis 2017;23:1825-1831. [0207] 10. Kang HS, et al. World J Gastroenterol 2019;25:989-1001. [0208] 11. Mustalahti K, et al. Ann Med 2010;42:587-95. [0209] 12. Lebwohl B, Rubio-Tapia A. Gastroenterology 2021;160:63-75. [0210] 13. Therrien A, et al. J Clin Gastroenterol 2020;54:8-21. [0211] 14. Diagnosis and Management of Gluten-Associated Disorders: Springer Cham, 2021. [0212] 15. Irvine AJ, et al. Am J Gastroenterol 2017;112:65-76. [0213] 16. Lacy BE, et al. Am J Gastroenterol 2021;116:17-44. [0214] 17. Smalley W, et al. Gastroenterology 2019;157:851-854. [0215] 18. Fuchs V, et al. United European Gastroenterol J 2018;6:567-575. [0216] 19. Rubio-Tapia A, et al. Am J Gastroenterol 2012;107:1538-44; quiz 1537, 1545. [0217] 20. Ukkola A, et al. BMC Gastroenterol 2012;12:136. [0218] 21. Card TR, et al. Scand J Gastroenterol 2013;48:801-7. [0219] 22. Black CJ, et al. Lancet Gastroenterol Hepatol 2023;8:499-501. [0220] 23. Roth TL. Dev Psychopathol 2013;25:1279-91. [0221] 24. Bjornsson HT, et al. Trends Genet 2004;20:350-8. [0222] 25. Deaton AM, Bird A. Genes Dev 2011;25:1010-22. [0223] 26. Yang X, et al. Cancer Cell 2014;26:577-90. [0224] 27. Mahurkar S, et al. Neurogastroenterol Motil 2016;28:410-22. [0225] 28. Kalla R, et al. J Crohns Colitis 2022. [0226] 29. Raffals LE, et al. Inflamm Bowel Dis 2022;28:192-199. [0227] 30. Longstreth GF, et al. Gastroenterology 2006;130:1480-91. [0228] 31. Mearin F, et al. Gastroenterology 2016;150:1393-1407. [0229] 32. Zigmond AS, Snaith RP. Acta Psychiatr Scand 1983;67:361-70. [0230] 33. Park SH, et al. Neurogastroenterol Motil 2016;28:1252-60. [0231] 34. Francis CY, et al. Aliment Pharmacol Ther 1997;11:395-402. [0232] 35. Lin S, et al. J Crohns Colitis.2023 Aug 8:jjad133. [0233] 36. Daperno M, et al. Gastrointest Endosc 2004;60:505-12. [0234] 37. Best WR, et al. Gastroenterology 1976;70:439-44. [0235] 38. Thia K, et al. Inflamm Bowel Dis 2011;17:105-11. [0236] 39. Walmsley RS, et al. Gut 1998;43:29-32. [0237] 40. Sutherland LR, et al. Gastroenterology 1987;92:1894-8. [0238] 41. Leffler DA, et al. Clin Gastroenterol Hepatol 2009;7:530-6, 536 e1-2. [0239] 42. Leffler DA, et al. Clin Gastroenterol Hepatol 2009;7:1328-34, 1334 e1-3. [0240] 43. Thompson M, et al. NPJ Genom Med 2022;7:50. [0241] 44. Xu Z, et al. Nucleic Acids Res 2016;44:e20. [0242] 45. Salas LA, Koestler DC. Illumina EPIC data on immunomagnetic sorted peripheral adult blood cells, 2020. github.com/immunomethylomics/FlowSorted.Blood.EPIC [0243] 46. Varma S, Simon R. BMC Bioinformatics 2006;7:91. [0244] 47. Unal I. Comput Math Methods Med 2017;2017:3762651. [0245] 48. Phipson B, et al. Bioinformatics 2016;32:286-8. [0246] 49. Ojala M. GGC.2009 Ninth IEEE International Conference on Data Mining, Miami Beach, FL, USA, 2009, pp.908-9132009. [0247] 50. Zhou Q, et al. Pain 2009;146:41-6. [0248] 51. Nahm FS. Korean J Anesthesiol 2022;75:25-36. [0249] 52. Verne GN, et al. Pain 2003;103:99-110. [0250] 53. Videlock EJ, et al. Am J Physiol Gastrointest Liver Physiol 2018;315:G140-G157. [0251] 54. Moloney RD, et al. Front Psychiatry 2015;6:15. [0252] 55. Mahurkar-Joshi S, Chang L. Front Psychiatry 2020;11:805. [0253] 56. Chen ML, Sundrud MS. Inflamm Bowel Dis 2016;22:1157-67. [0254] 57. Sollid LM. Immunogenetics 2017;69:605-616. [0255] 58. Cobb WS, et al. Am Surg 2004;70:750-7; discussion 757-8. [0256] 59. Mosli MH, et al. Am J Gastroenterol 2015;110:802-19; quiz 820. [0257] 60. Pimentel M, et al. PLoS One 2015;10:e0126438. [0258] 61. Rezaie A, et al. Dig Dis Sci 2017;62:1480-1485. [0259] 62. Talley NJ, et al. Clin Transl Gastroenterol 2019;10:e00064. [0260] 63. Menees SB, et al. Am J Gastroenterol 2015;110:444-54. [0261] 64. Fairbrass KM, et al. Lancet Gastroenterol Hepatol 2020;5:1053-1062. [0262] 65. Maksimovic J, et al. Nucleic Acids Res 2015;43:e106. [0263] 66. Thompson M, et al. Genome Biol 2019;20:138. [0264] 67. Ramspek CL, et al. Clin Kidney J 2021;14:49-58. SUPPLEMENTARY METHODS [0265] Questionnaires [0266] Bowel Symptom Questionnaire (BSQ) [0267] The BSQ was administered to all irritable bowel syndrome (IBS) patients, inflammatory bowel disease (IBD) patients from UCLA cohort and all celiac disease (CeD) patients and the overall disease severity scores were reported. It is a self-report measure of GI symptoms. It includes multiple questions including the Rome diagnostic questions for IBS and bowel habit subtypes, overall GI symptom severity, and individual GI symptoms. IBS, IBD and CeD participants rated the current overall intensity of their GI symptoms over the past week on a 20-point ordinal scale from 0 meaning none to 20 meaning most severe overall GI symptoms. (1-3) [0268] Irritable Bowel Syndrome Severity Scoring System (IBS-SSS) [0269] IBS-SSS (4) was administered to IBS patients. This validated instrument measures the frequency and severity of abdominal pain, severity of abdominal distension, dissatisfaction with bowel habits, and interference of IBS with daily life. It is scored on a scale from 0-100 in each of the 5 categories, and a total IBS-SSS score is the sum of these five with a range of 0-500 with remission <75, mild 75-175, moderate 175-300, severe >300. [0270] Hospital Anxiety and Depression Scale (HAD) [0271] The HAD (5) is a validated self-assessment mood scale specifically designed for use in non-psychiatric settings for assessment of symptoms of anxiety and depression. The HAD provides two 7-item subscales: anxiety and depression, with scores ranging from 0-21, with 0-7 representing ‘non-cases’, 8-10 ‘doubtful cases’, and 11-21 representing ‘cases’. [0272] Adverse Childhood Experiences (ACE) questionnaire [0273] The ACE questionnaire assesses the presence of early adverse life events before age 18 with 18 questions in eight ACE domains (number of questions) of physical (1), emotional (2), and sexual abuse (4), and includes household substance abuse (2), parental separation or divorce (1), mental illness in household (2), incarcerated household member (1), and parent treated violently (2) (6). ACE score is calculated by assigning one point for each domain ‘Yes’ = 1 or ‘No’ = 0 (ACE score range is 0-8= Physical abuse + Emotional Abuse + Sexual Abuse + Household Substance Abuse + Parent Separation/Divorce + Household Mental Illness + Incarcerated Household Member + Parent Treated Violently). ACE can also be scored as presence (ACE total score ≥1) or absence (ACE total score = 0) of an ACE. [0274] Crohn’s Disease Activity Index (CDAI) [0275] The CDAI scores the used to assess disease activity in the UCLA Crohn’s disease (CD) patients (7). The CDAI instrument consists of eight variables, two of which are subjective, and each weighted according to its ability to be predictive of disease activity. The total score ranges from 0 to over 600. CDAI scores <150 represent clinical remission, 150- 220 mild disease, 220-450 moderate disease, and >450 severe disease. [0276] Short Crohn’s Disease Activity Index (SCDAI) [0277] The SCDAI disease activity scores were available on the CCF CD patients. SCDAI is a short version of CDAI and can be used to reliably replace full CDAI score for assessment of disease activity in mildly to moderately active CD patients (8). The range of score is same as SCDAI. [0278] Simple endoscopic score for Crohn's disease (SES-CD) [0279] SES-CD score were available for both UCLA and CCF patients. This is a simple endoscopic scoring system for Crohn's disease which assesses the size of mucosal ulcers, the ulcerated surface, affected surface, and luminal narrowing (9). The calculated score is interpreted as remission (0-2), mild (3-6), moderate (7-15), severe (>15). [0280] Simple Clinical Colitis Activity Index (SCCAI) [0281] SCCAI is used in initial assessment of patients with ulcerative colitis Ulcerative Colitis (UC) and correlates well with more complex scoring systems (10). The score ranges from 0 to 19, where a score of >5 indicates active disease. [0282] Ulcerative Colitis Disease Activity Score (UCDAI) [0283] UCDAI provides an assessment of the disease severity in UC patients (11). The score can range from 0-9 with a score <2 indicating remission, 2-4 mild, 5-7 moderate and >7 severe disease. [0284] Coeliac Disease Adherence Test (CDAT Score) [0285] CDAT is a gluten-free diet (GFD) adherence assessment method which consists of seven questions, and the score ranges from 7–35; score >13 indicates non-adherence (12). [0286] Celiac Symptom Index (CSI) Score [0287] CSI score (13) ranges from 16–80 and <30 indicates high quality of life (QoL) and an excellent gluten-free diet adherence and >45 indicates relatively poor QoL and worse gluten- free diet adherence. [0288] DNA Methylation Arrays [0289] Illumina’s Human Methylation 450K array or Infinium MethylationEPIC BeadChip, IlluminaTM were used to assess DNA methylation levels in PBMCs. The analysis was performed at three levels including, data preprocessing, classifier development, and classifier evaluation using R statistical language (14). [0290] Data pre-processing [0291] Data import and normalization of EPIC array data: As outlined in Figure 6, this stage of analysis includes steps to preprocess the raw Illumina DNA methylation array (IDAT) files generated by the Illumina iScan scanner and output normalized and filtered data. Data preprocessing was performed using the R package Enmix (15). Quality Control (QC) metrics were generated using the ‘QCinfo’ function. Samples that did not pass the QC threshold were excluded from analysis. Using the ‘preprocessENmix’ function, we used the “oob” method for background correction and “RELIC” for dye bias correction. This resulted in signal intensity data on methylated and unmethylated channels of a “MethylSet” class object. The data were quantile normalized using ‘norm.quantile’, separately for Methylated and Unmethylated intensities for Infinium I or Infinium II probes (“quantile1” method). Probe design type bias was corrected using Regression on Correlated Probes (RCP), followed by filtering on QC metrics. Detection P value threshold to identify low quality data points was set to 0.000001. “Impute” parameter was set to “True” (after removing rows or columns with too many missing values, default threshold of > 5% missing values). All the steps were combined into a single function which can be applied to either single or multiple IDAT files. Cell types were estimated using ‘estimateCellCounts2’ function and “preprocessNoob” method for EPIC platform in FlowSorted.Blood.EPIC package (16). Methylation age was predicted using “methyAge” function. [0292] Data import and combined normalization of 450K and EPIC array data: IBS and healthy controls were measured on the 450K (N=100 and 31, respectively) and EPIC platforms (IBS N=48 and HCs N=55). Data from the two platforms were preprocessed using the exact same steps and normalized using ‘preprocessNoob function’. The arrays were combined using ‘combineArrays’ (17) function in ‘minfi’ package. [0293] IBS vs HCs analysis was performed using three separate analyses. The first included a larger set of IBS and HCs samples that were run on either the EPIC or 450K methylation arrays. This analysis used the platform type as the batch covariate. The second analysis included a cohort of IBS and HCs from the first analysis that was down-sampled to an equal number of IBS and HCs for a balanced sample to avoid over-fitting. The third analysis included IBS patient and HC samples that were run on the EPIC platform only. This allowed for the identification of differential methylation on probes that were not represented on the 450K platform. All the comparisons discussed in the paper are shown in Table 7. [0294] Probe filtration: From the original 855,790 CpGs measured on the EPIC array, QC- based filtering resulted in 846,790 probes. We further excluded X and Y chromosomal probes (n=18,880), probes with SNPs and repeats (n=25,329), and non-specific or cross- reactive probes (n=29,233) following annotations from the IlluminaHumanMethylation EPICanno.ilm10b2.hg19 package. After initial filtering, samples were separated into comparison group pairs (described below) and further filtering was applied to exclude probes with low variance (standard deviation < 0.02) and probes with consistently high or low methylation beta values (beta < 0.1 or > 0.9) (18). This resulted in 180,000 to 250,000 probes per pair depending upon the variability between samples. [0295] Making Comparison groups: After normalization and initial filtering, we prepared the data for input into the machine learning models. Table 5 shows all the comparison groups tested and the methylation platforms used to generate the data. [0296] Table 7. Comparison groups and DNA methylation platforms
Figure imgf000042_0001
[0297] IBS, irritable bowel syndrome; IBD, inflammatory bowel diseases; CeD, celiac disease; HCs, healthy controls; N=number of patients [0298] Classifier development and evaluation [0299] Covariates [0300] All analyses included age, cell-type proportions (six cell types including CD8-T cells, CD4-T cells, natural killer cells, B-cells, monocytes, neutrophils) and DNA methylation batch as covariates in the models. IBS vs HC analysis for the larger dataset included platform (450K or EPIC) as the batch covariate. [0301] Generating training and test fold data [0302] We developed a double cross-validation framework in which we split the data into 10 folds of an outer training set and an outer test set (90:10). We next split the outer training set into inner training and inner test sets (90:10) and performed an internal 10-fold cross validation on these inner training and test sets for choosing model hyperparameters. After the hyperparameters were selected from the inner cross-validation, we trained a model on the entire outer training set, and used it to predict the outer test set. We performed these steps 10 times (once per outer test set) to generate an out of sample predictor. [0303] Selection of model hyperparameters [0304] For each dataset, models on training data were generated using generalized linear models (GLM) via penalized maximum likelihood using glmnet package. Regularization parameter (λ) was selected using 10-fold cross-validation on inner training and test data. A model corresponding to λ that minimizes the error of cross-validation (λmin) was chosen and prediction was made for each of the 10 outer training-test data splits. Separate models were generated for lasso or elastic net penalties (ჴ=0, 0.1,0.2,…,1), and the model associated with ჴ resulting in the best performance (based on cross-validated accuracy) was chosen as the final model. [0305] Evaluation of model performance [0306] Various performance metrics were generated based on the 2X2 confusion matrix for each of the comparison, created using caret package and GLMs. These included p-values from linear regression of predicted and true values, accuracy (confidence intervals), accuracy p values, kappa, sensitivity, specificity, positive predictive value (PPV), negative predictive values (NPV), precision, recall, F1, prevalence and balanced accuracy. In these analyses, a model with a low p value (<0.05) and the highest sensitivity and specificity were chosen as the best models. However, all other metrics mentioned above help in addressing different aspects of model performance. Performance metrics were calculated based on the ‘confusion matrix’ which gives the number of true and false positive and negative predictions on test samples. [0307] References [0308] 1. Labus JS, et al. Aliment Pharmacol Ther 2004;20:89-97. [0309] 2. Talley NJ, et al. Aust N Z J Med 1995;25:302-8. [0310] 3. Schmulson M, et al. Am J Gastroenterol 1999;94:2929-35. [0311] 4. Francis CY, et al. Aliment Pharmacol Ther 1997;11:395-402. [0312] 5. Zigmond AS, Snaith RP. Acta Psychiatr Scand 1983;67:361-70. [0313] 6. Park SH, et al. Neurogastroenterol Motil 2016;28:1252-60. [0314] 7. Best WR, et al. Gastroenterology 1976;70:439-44. [0315] 8. Thia K, et al. Inflamm Bowel Dis 2011;17:105-11. [0316] 9. Daperno M, et al. Gastrointest Endosc 2004;60:505-12. [0317] 10. Walmsley RS, et al. Gut 1998;43:29-32. [0318] 11. Sutherland LR, et al. Gastroenterology 1987;92:1894-8. [0319] 12. Leffler DA, et al. Clin Gastroenterol Hepatol 2009;7:530-6, 536 e1-2. [0320] 13. Leffler DA, et al. Clin Gastroenterol Hepatol 2009;7:1328-34, 1334 e1-3. [0321] 14. Team RC. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing, 2020. [0322] 15. Xu Z, et al. Nucleic Acids Res 2016;44:e20. [0323] 16. Salas LA, Koestler DC. Illumina EPIC data on immunomagnetic sorted peripheral adult blood cells, 2020. [0324] 17. Fortin JP, et al. Bioinformatics 2017;33:558-560. [0325] 18. Mansell G, et al. BMC Genomics 2019;20:366. SUPPLEMENTARY RESULTS [0326] Table 8. Medications used by patients and healthy controls (HCs)
Figure imgf000044_0001
[0327] IBS: irritable bowel syndrome; IBD: inflammatory bowel disease; CeD, celiac disease; HCs, Healthy controls; SSRIs: Selective serotonin reuptake inhibitors; *Analgesics included acetaminophen and non-steroidal anti-inflammatory drugs (NSAIDs) and were taken on an as needed basis. [0328] Disease activity scores for IBS, IBD and CeD patients [0329] Table 9A. Symptom severity scores in IBS patients
Figure imgf000044_0002
Figure imgf000045_0001
[0330] IBS: irritable bowel syndrome; IBS-C: constipation-predominant IBS; IBS-D, diarrhea- predominant IBS; IBS-M: Mixed IBS; IBS-U: Un-subtyped IBS; UC: ulcerative colitis; HCs, healthy controls; GI, gastrointestinal; Overall Severity of GI Symptoms, 0-20; Abdominal pain score, 0-20; IBS-SSS: IBS symptom severity score, 0-500; ACE: adverse childhood experiences, 0-8; HAD: hospital anxiety and depression scale; HAD Anxiety, 0-21; HAD Depression 0-21; SD: standard deviation. [0331] Table 9B. Symptom severity scores in IBD patients [0332] Ulcerative colitis (UC) patients
Figure imgf000045_0002
[0333] SCCAI: simple clinical colitis activity index, 0-21, Flare >=5 (past 3 days); UC: ulcerative colitis; UCDAI: ulcerative colitis disease activity index, score=0-9: remission (<2), mild (2-4), moderate (5-7), severe disease (>7); GI, gastrointestinal; Overall Severity of GI Symptoms, 0-20; SD: standard deviation; UCLA: patients recruited at University of California Los Angeles; CCF, Crohn’s and Colitis Foundation. [0334] Crohn’s disease patients
Figure imgf000045_0003
Figure imgf000046_0001
[0335] SES-CD: simple endoscopic score for Crohn’s disease score, 0-2 remission, 3-6 mild, 7-15 moderate, >15 severe]; CDAI: Crohn’s disease activity index, 0-149 asymptomatic remission; 150-220 mild to moderate; 221-450 moderate to severe; 451-1100 severe to fulminant disease; SCDAI: short CDAI score, 44-150 remission, 150–219 mild, 220–450 moderate, >450 severe; Overall Severity of GI Symptoms, 0-20; GI, gastrointestinal; SD: standard deviation; Samples recruited at UCLA: University of California, Los Angeles and CCF: Crohn’s and Colitis Foundation. [0336] Table 9C.: Symptom severity scores in celiac disease patients
Figure imgf000046_0002
[0337] CDAT: celiac disease adherence test score, ≤30, remission; 30-45 moderate disease activity; ≥45 indicates relatively poor QoL and worse gluten-free diet adherence; CSI: celiac symptom index,16–80 and ≤30 indicates high QoL and excellent gluten-free diet adherence, while ≥45 indicates relatively poor QoL and worse gluten-free diet adherence; Overall Severity of GI Symptoms, 0-20; GI, gastrointestinal; SD: standard deviation [0338] Table 10A. Differentially Methylated CpG Sites Associated with Celiac vs Healthy
Figure imgf000046_0003
Figure imgf000047_0001
Figure imgf000048_0001
Figure imgf000049_0001
Figure imgf000050_0001
Figure imgf000051_0001
Figure imgf000052_0001
Figure imgf000053_0001
[0339] Table 10B. Differentially Methylated CpG Sites Associated with IBD vs Celiac Disease
Figure imgf000053_0002
Figure imgf000054_0001
Figure imgf000055_0001
Figure imgf000056_0001
Figure imgf000057_0001
Figure imgf000058_0001
Figure imgf000059_0001
Figure imgf000060_0001
Figure imgf000061_0001
Figure imgf000062_0001
Figure imgf000063_0001
Figure imgf000064_0001
Figure imgf000065_0001
Figure imgf000066_0001
Figure imgf000067_0001
Figure imgf000068_0001
Figure imgf000069_0001
Figure imgf000070_0001
[0340] Table 10C. Differentially Methylated CpG Sites Associated with IBD vs Healthy <0.005fdr
Figure imgf000070_0002
Figure imgf000071_0001
Figure imgf000072_0001
Figure imgf000073_0001
Figure imgf000074_0001
Figure imgf000075_0001
Figure imgf000076_0001
Figure imgf000077_0001
Figure imgf000078_0001
Figure imgf000079_0001
Figure imgf000080_0001
[0341] Table 10D. Differentially Methylated CpG Sites Associated with IBS vs Celiac
Figure imgf000080_0002
[0342] Table 10E. Differentially Methylated CpG Sites Associated with IBS vs Healthy
Figure imgf000080_0003
Figure imgf000081_0001
[0343] Table 10F. Differentially Methylated CpG Sites Associated with IBS vs IBD
Figure imgf000081_0002
Figure imgf000082_0001
Figure imgf000083_0001
Figure imgf000084_0001
Figure imgf000085_0001
Figure imgf000086_0001
Figure imgf000087_0001
Figure imgf000088_0001
Figure imgf000089_0001
[0344] Tables 11A-11F. GO terms associated with differentially methylated genes [0345] “#” refers to number of Differentially Methylated (D. M.) Genes [0346] Table 11A. IBD vs Healthy p<0.05
Figure imgf000089_0002
Figure imgf000090_0001
Figure imgf000091_0001
Figure imgf000092_0001
Figure imgf000093_0001
Figure imgf000094_0001
Figure imgf000095_0001
Figure imgf000096_0001
Figure imgf000097_0001
[0347] Table 11B. IBS vs Healthy p<0.05
Figure imgf000098_0001
Figure imgf000099_0001
Figure imgf000100_0001
Figure imgf000101_0001
[0348] Table 11C. IBS vs IBD p<0.05
Figure imgf000101_0002
Figure imgf000102_0001
Figure imgf000103_0001
Figure imgf000104_0001
Figure imgf000105_0001
Figure imgf000106_0001
Figure imgf000107_0001
Figure imgf000108_0001
Figure imgf000109_0001
Figure imgf000110_0001
Figure imgf000111_0001
Figure imgf000112_0001
[0349] Table 11D. GO terms associated with differentially methylated genes
Figure imgf000112_0002
Figure imgf000113_0001
Figure imgf000114_0001
Figure imgf000115_0001
Figure imgf000116_0001
Figure imgf000117_0001
Figure imgf000118_0001
[0350] Table 11E. GO terms associated with differentially methylated genes
Figure imgf000118_0002
Figure imgf000119_0001
[0351] Table 11F. GO terms associated with differentially methylated genes
Figure imgf000119_0002
Figure imgf000120_0001
[0352] Table 12. Classifier performance for IBS bowel habit subtypes
Figure imgf000121_0001
[0353] IBS: irritable bowel syndrome; IBD: inflammatory bowel diseases; HCs: healthy controls; AUC, area under receiver operating characteristic (ROC) curve; CI: confidence interval [0354] Table 13. Sensitivity analysis for assessing classifier performance after excluding patients using medications
Figure imgf000121_0002
[0355] IBS: irritable bowel syndrome; IBD: inflammatory bowel diseases; HCs: healthy controls; AUC, area under receiver operating characteristic (ROC) curve; CI: confidence interval [0356] Table 14A. Classifier-Associated CpGs for IBS_IBD
Figure imgf000121_0003
Figure imgf000122_0001
[0357] Table 14B. Classifier-Associated CpGs for IBS vs Celiac Disease
Figure imgf000122_0002
[0358] Table 14C. Classifier-Associated CpGs for IBD vs Celiac Disease
Figure imgf000122_0003
Figure imgf000123_0001
[0359] Table 14D. Classifier-Associated CpGs for IBS vs Healthy Control
Figure imgf000123_0002
Figure imgf000124_0001
Figure imgf000125_0001
Figure imgf000126_0001
Figure imgf000127_0001
[0360] Table 14E. Classifier-Associated CpGs for IBD vs Healthy Control
Figure imgf000127_0002
Figure imgf000128_0001
Figure imgf000129_0001
Figure imgf000130_0001
[0361] Table 14F. Classifier-Associated CpGs for Celiac Disease vs Healthy Control
Figure imgf000130_0002
Figure imgf000131_0001
[0362] Annotation of probes using IlluminaHumanMethylationEPICanno.ilm10b2.hg19 package can be used to obtain exact information on probe sequences using the CpG ID. IlluminaHumanMethylationEPICanno.ilm10b2.hg19. [0363] Throughout this application various publications are referenced. The disclosures of these publications in their entireties are hereby incorporated by reference into this application in order to describe more fully the state of the art to which this invention pertains. [0364] Those skilled in the art will appreciate that the conceptions and specific embodiments disclosed in the foregoing description may be readily utilized as a basis for modifying or designing other embodiments for carrying out the same purposes of the present invention. Those skilled in the art will also appreciate that such equivalent embodiments do not depart from the spirit and scope of the invention as set forth in the appended claims.

Claims

What is claimed is: 1. A method of classifying biological samples based on gastrointestinal disease, the method comprising: (a) measuring the amount of DNA methylation at CpG sites in DNA in a biological sample obtained from the subject to generate a genome-wide DNA CpG methylation profile of the subject, wherein the measuring comprises probing at least 200,000 CpG sites; (b) performing a normalization of technical noise and bias correction on the profile; (c) filtering the profile to retain data from a first set of probed CpG sites that is the same as a first set of CpG sites from a first training data set trained on a first pairwise comparison between biological samples, wherein the first pairwise comparison is selected from: (i) irritable bowel syndrome (IBS) and inflammatory bowel disease (IBD); (ii) IBS and celiac disease (CeD); (iii) IBD and CeD; (iv) IBS and healthy control; (v) IBD and healthy control; and (vi) CeD and healthy control; (d) classifying the CpG methylation profile of the biological samples as IBS, IBD, CeD, or healthy based on a probability cutoff derived from the first training data set.
2. The method of claim 1, wherein the filtering of step (c) is repeated to retain data from a second set of probes that is the same as a second set of probes from a second training data set trained on a second pairwise comparison between biological samples, wherein the second pairwise comparison is selected from (i) to (vi), and wherein the second pairwise comparison differs from the first pairwise comparison.
3. The method of claim 2, wherein the filtering of step (c) is repeated to retain data from subsequent sets of probes corresponding to subsequent training data sets trained on subsequent pairwise comparisons between biological samples, wherein each of the remaining pairwise comparisons of (i) to (vi) is performed.
4. The method of claim 1, wherein the probing comprises contacting the biological samples with a methylation microarray.
5. The method of any of the preceding claims, wherein the biological sample comprises blood, plasma, serum, or mucosal tissue.
6. The method of claim 5, wherein the sample is peripheral blood mononuclear cells (PBMCs), peripheral blood lymphocytes (PBL), or whole blood.
7. The method of any of the preceding claims, further comprising measuring the amount of fecal calprotectin and/or blood C-reactive protein (CRP) in the biological sample.
8. The method of any of the preceding claims, wherein the training data set is trained using GLMNET algorithm.
9. The method of any of the preceding claims, wherein the training data set is trained using regularization parameter value and elastic net mixing parameter.
10. The method of any of the preceding claims, wherein the classifying is implemented by a computer.
11. A method of screening for irritable bowel syndrome (IBS), inflammatory bowel disease (IBD) and celiac disease (CeD) in a subject, the method comprising performing the method of classifying of claim 1 on a biological sample obtained from the subject, and identifying the subject as having IBS, IBD, or CeD based on the classifying of step (d).
12. The method of claim 11, further comprising treating the subject for IBS, IBD, or CeD.
13. The method of claim 12, wherein the treatment for IBS comprises administration of one or more of fiber supplements, laxatives, anti-diarrheal medication, loperamide, a bile acid binder, cholestyramine, colestipol, colesevelam, anticholinergic medications, dicyclomine or hysocyamine, tricyclic antidepressants, imipramine, desipramine, nortriptyline, selective serotonin reuptake inhibitors (SSRI) such as fluoxetine, paroxetine, serotonin and norepinephrine reuptake inhibitor (SNRI), such as duloxetine, tetracyclic antidepressant, such as mirtazapine, pregabalin, gabapentin, Alosetron, Eluxadoline, Rifaximin, Lubiprostone, Plecanatide, Tenapanor or Linaclotide.
14. The method of claim 12, wherein the treatment for IBD comprises administration of one or more of anti-inflammatory drugs, ude aminosalicylates, mesalamine, balsalazide, olsalazine, immune system suppressors, corticosteroids, azathioprine, mercaptopurine, methotrexate, tofacitinib, Upadacitinib, ozanimod, infliximab, adalimumab, golimumab, certolizumab, vedolizumab, ustekinumab, Risankizumab, antibiotics, anti-diarrheal medications, psyllium powder, methylcellulose, or loperamide.
15. The method of claim 12, wherein the treatment for CeD comprises one or more of gluten-free diet, vitamin and mineral supplements, steroids, azathioprine, or budesonide.
PCT/US2024/026561 2023-04-28 2024-04-26 Dna methylation-based algorithm to diagnose irritable bowel syndrome and other gi conditions Pending WO2024227011A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202363499017P 2023-04-28 2023-04-28
US63/499,017 2023-04-28

Publications (1)

Publication Number Publication Date
WO2024227011A1 true WO2024227011A1 (en) 2024-10-31

Family

ID=93257141

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2024/026561 Pending WO2024227011A1 (en) 2023-04-28 2024-04-26 Dna methylation-based algorithm to diagnose irritable bowel syndrome and other gi conditions

Country Status (1)

Country Link
WO (1) WO2024227011A1 (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090275923A1 (en) * 2006-06-20 2009-11-05 Koninklijke Philips Electronics N.V. Electronic capsule for treating gastrointestinal disease
US20210207217A1 (en) * 2018-05-31 2021-07-08 The Regents Of The University Of California Dna methylation based biomarkers for irritable bowel syndrome and inflammatory bowel disease

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090275923A1 (en) * 2006-06-20 2009-11-05 Koninklijke Philips Electronics N.V. Electronic capsule for treating gastrointestinal disease
US20210207217A1 (en) * 2018-05-31 2021-07-08 The Regents Of The University Of California Dna methylation based biomarkers for irritable bowel syndrome and inflammatory bowel disease

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
KRISTYN GALBRAITH;MATIJA SNUDERL: "DNA methylation as a diagnostic tool", ACTA NEUROPATHOLOGICA COMMUNICATIONS, BIOMED CENTRAL LTD, LONDON, UK, vol. 10, no. 1, 8 May 2022 (2022-05-08), London, UK, pages 1 - 7, XP021302758, DOI: 10.1186/s40478-022-01371-2 *
MAHURKAR-JOSHI SWAPNA, CHANG LIN: "Epigenetic Mechanisms in Irritable Bowel Syndrome", FRONTIERS IN PSYCHIATRY, FRONTIERS MEDIA SA, FRONTIERS IN PSYCHIATRY SWITZERLAND 2020, vol. 11, Frontiers in psychiatry Switzerland 2020, XP093231329, ISSN: 1664-0640, DOI: 10.3389/fpsyt.2020.00805 *

Similar Documents

Publication Publication Date Title
Ricciuto et al. Predicting outcomes in pediatric crohn’s disease for management optimization: systematic review and consensus statements from the pediatric inflammatory bowel disease–ahead program
Verstockt et al. How do we predict a patient’s disease course and whether they will respond to specific treatments?
US20210341494A1 (en) Personalized Medicine Approach for Treating Cognitive Loss
US11493512B2 (en) Biomarkers and methods for measuring and monitoring axial spondyloarthritis activity
Ananthakrishnan Precision medicine in inflammatory bowel diseases
Wang et al. Identification and validation of the common pathogenesis and hub biomarkers in Hirschsprung disease complicated with Crohn’s disease
Walker et al. A putative blood-based biomarker for autism spectrum disorder-associated ileocolitis
CN115856309A (en) Application of protein marker in preparation of product for diagnosing Parkinson&#39;s disease or predicting Parkinson&#39;s disease
Liu et al. Prediction of disease severity in patients with early rheumatoid arthritis by gene expression profiling
Mahurkar‐Joshi et al. Genome‐Wide DNA Methylation Identifies Potential Disease‐Specific Biomarkers and Pathophysiologic Mechanisms in Irritable Bowel Syndrome, Inflammatory Bowel Disease, and Celiac Disease
WO2024227011A1 (en) Dna methylation-based algorithm to diagnose irritable bowel syndrome and other gi conditions
Rao et al. Health-related quality of life in patients with diverse rare diseases: An online survey
Han et al. Epigenetic, ribosomal, and immune dysregulation in paediatric acute-onset neuropsychiatric syndrome
Wei et al. Blood‐based inflammatory protein biomarker panel for the prediction of relapse and severity in patients with neuromyelitis optica spectrum disorder: A prospective cohort study
Park et al. Characterization of terminal-ileal and colonic Crohn’s disease in treatment-naïve paediatric patients based on transcriptomic profile using logistic regression
US20230265517A1 (en) Novel dna methylation markers associated with renal function and method for predictiing renal function
Kawanishi et al. Suicidality in civilian women with PTSD: possible link to childhood maltreatment, proinflammatory molecules, and their genetic variations
Chakraborty et al. Personalized medicine in India: mirage or a viable goal?
Lexmond et al. Accuracy of digital mRNA profiling of oesophageal biopsies as a novel diagnostic approach to eosinophilic oesophagitis
Esteller-Gauxax et al. α-Synuclein seed amplifications assay in a cohort with cognitive impairment: performance and interactions with CSF and plasma biomarkers
Winchester et al. Identification of a novel proteomic Biomarker in Parkinson’s Disease: Discovery and Replication in Blood, brain and CSF
US20230400473A1 (en) Methods and compositions for the treatment of crohn&#39;s disease
Zhao et al. INTEGRATING GENETIC RISK SCORES AND TRADITIONAL RISK FACTORS TO PREDICT DEVELOPMENT OF LUPUS NEPHRITIS BASED ON (CSTAR) COHORT
Sakai et al. IMPACT OF BELIMUMAB ON EFFICACY, SAFETY AND IMMUNE PHENOTYPES IN REFRACTORY AND ACTIVE LUPUS NEPHRITIS IN REAL-WORLD LOOPS REGISTRY
Játiva et al. FERROPTOSIS AND PYROPTOSIS IN KIDNEY MACROPHAGES AND EPITHELIAL KIDNEY CELLS IS MEDIATED BY A METABOLIC SWITCH TOWARD GLYCOLYSIS IN LUPUS NEPHRITIS

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24798075

Country of ref document: EP

Kind code of ref document: A1