WO2025179073A1 - Procédés et systèmes pour l'analyse des régions différentiellement méthylées en fonction des tissus - Google Patents
Procédés et systèmes pour l'analyse des régions différentiellement méthylées en fonction des tissusInfo
- Publication number
- WO2025179073A1 WO2025179073A1 PCT/US2025/016672 US2025016672W WO2025179073A1 WO 2025179073 A1 WO2025179073 A1 WO 2025179073A1 US 2025016672 W US2025016672 W US 2025016672W WO 2025179073 A1 WO2025179073 A1 WO 2025179073A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- subject
- dmrs
- regions
- methylation
- specific
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
- C12Q1/6883—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
- C12Q1/6886—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/20—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/30—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/154—Methylation markers
Definitions
- Circulating tumor deoxyribonucleic acid may be used as a non-invasive, tumor-specific biomarker for clinical use.
- ctDNA may be derived from tumor cells undergoing cell-death and released into circulation of various bodily fluids including blood.
- the majority of blood-derived cell-free DNA may originate from healthy (e.g., non-cancerous) tissues.
- the fraction of ctDNA observed may range from ⁇ 0.1% to 90% of the total cell-free DNA depending on factors including the primary site of the tumor and disease burden.
- ctDNA provides non-invasive access to the tumor’s molecular landscape and disease burden.
- the present disclosure provides a method for analyzing a sample derived from a subject, comprising: (a) obtaining a sample comprising nucleic acid molecules obtained or derived from the subject; (b) assaying the nucleic acid molecules to generate a data set comprising methylation states of one or more genomic regions comprising differentially methylation regions (DMRs); and (c) processing at least a portion of the data set to generate an output indicative of presence or absence cancer in the subject, wherein the at least the portion of the data set pertains to a set of DMRs specific to the subject.
- DMRs differentially methylation regions
- the method further comprises providing a universal panel of genomic regions. In some embodiments, the method further comprises using the universal panel of genomic regions to enrich for the one or more genomic regions comprising the DMRs.
- the assaying further comprises sequencing the nucleic acid molecules at a depth of at most 50 Million (M) single reads. In some embodiments, the assaying further comprises sequencing the nucleic acid molecules at a depth of at most 10 M single reads.
- the processing of (c) further comprises comparing the one or more genomic regions to a set of DMRs specific to one or more reference subjects to generate the set of DMRs specific to the subject. In some embodiments, the processing of (c) further comprises comparing the one or more genomic regions to a set of anti-DMRs to generate a set of anti-DMRs specific to the subject.
- the comparing further comprises generating one or more counts of the set of DMRs specific to the subject. In some embodiments, the comparing further comprises generating one or more counts of the set of anti-DMRs specific to the subject. In some embodiments, the method further comprises normalizing the one or more counts of the set of DMRs specific to the subject to the one or more counts of the set of anti-DMRs specific to the subject. In some embodiments, the normalizing further comprises generating a methylation score. In some embodiments, the method further comprises comparing the methylation score to a threshold score, thereby generating the output indicative of the presence or absence of the cancer in the subject.
- the method further comprises obtaining one or more control samples comprising control nucleic acid molecules.
- the one or more control samples comprise one or more non-tissue samples and/or one or more tissue samples.
- the one or more control nucleic acid molecules are derived from one or more tissue samples.
- the one or more control nucleic acid molecules comprise one or more cell-free nucleic acid molecules.
- the one or more control samples are derived from one or more control subjects without cancer.
- the method further comprises assaying the control nucleic acid molecules to generate a control data set comprising methylation states of one or more control genomic regions.
- the assaying the control nucleic acid molecules further comprises conducting one or more methylation reactions. In some embodiments, the assaying the control nucleic acid molecules further comprises sequencing the control nucleic acids, or derivatives thereof. In some embodiments, the methylation states of the one or more control genomic regions comprise hypermethylated states, methylated states, non-methylated states, or hypomethylated states, or any combinations thereof.
- the method further comprises processing at least a portion of the control data set to identify one or more hypomethylated regions, regions of nonmethylation, or regions that are amenable to methylation enrichment, or any combinations thereof, thereby generating the universal panel of regions.
- the universal panel of genomic regions comprises the regions of non-methylation and the regions that are amenable to methylation enrichment.
- the method further comprises obtaining one or more reference samples from one or more reference subjects.
- the one or more reference subjects have cancer.
- the one or more reference samples comprise one or more reference nucleic acid molecules.
- the one or more reference nucleic acid molecules comprise cell-free nucleic acid molecules.
- the one or more reference nucleic acid molecules are derived from one or more non-tissue samples and/or one or more tissue samples.
- the method further comprises obtaining another one or more control samples comprising another one or more control nucleic acid molecules.
- the method further comprises assaying the one or more reference nucleic acid molecules and the another one or more control nucleic acid molecule to generate a reference data set comprising methylation states of one or more regions.
- the method further comprises processing the reference data set with the universal panel of genomic regions to identify the set of DMRs specific to one or more reference subjects and the set of anti -DMRs.
- the processing of (c) further comprises comparing the one or more genomic regions to a set of DMRs specific to a reference subject to generate the set of DMRs specific to the subject. In some embodiments, the processing of (c) further comprises comparing the one or more genomic regions to a set of anti-DMRs to generate a set of anti- DMRs specific to the subject. In some embodiments, the comparing further comprises generating one or more counts of the set of DMRs specific to the subject. In some embodiments, the comparing further comprises generating one or more counts of the set of anti-DMRs specific to the subject.
- the method further comprises normalizing the one or more counts of the set of DMRs specific to the subject to the one or more counts of the set of anti-DMRs specific to the subject. In some embodiments, the normalizing generates another methylation score. In some embodiments, the method further comprises comparing the another methylation score to a threshold score, thereby generating the output indicative of the cancer in the subject.
- the method further comprises obtaining from the reference subject, a reference sample comprising one or more another reference nucleic acid molecules.
- the one or more another reference nucleic acid molecules are derived from one or more non-tissue samples and/or one or more tissue samples.
- the one or more another reference nucleic acid molecules comprise cell-free nucleic acid molecules.
- the reference subject is same subject as the subject.
- the reference sample is obtained prior to obtaining the sample.
- the reference sample is obtained or derived from the subject subsequent to diagnosis with the cancer.
- the reference subject is obtained or derived from the subject prior to treatment with a therapy.
- the method further comprises assaying the one or more another reference nucleic acid molecules to generate a reference data set comprising methylation states of one or more reference genomic regions.
- the assaying comprises enriching for the one or more reference genomic regions using the universal panel of genomic regions.
- the assaying comprises enriching for the one or more reference genomic regions without using the universal panel of genomic regions.
- the method further comprises processing the reference data set to identify the set of DMRs specific to the reference subject.
- the method further comprises integrating the methylation score and the additional methylation score to generate a single score.
- the single score is indicative of the cancer.
- the integrating comprises Support Vector Machine, logistic regression, Bayesian Interference Model, weighted average, decision trees, and/or random forests.
- the nucleic acid molecules are derived from one or more nontissue samples. In some embodiments, the nucleic acid molecules are derived from one or more tissue samples. In some embodiments, the nucleic acid molecules comprise cell-free deoxyribonucleic acid (DNA) molecules. In some embodiments, the sample comprises a tissue sample, a blood sample, and/or a plasma sample. In some embodiments, the cancer is a late-stage cancer. In some embodiments, the cancer is an early-stage cancer.
- the method further comprises generating an output indicative of presence or absence of minimal residual disease in the subject.
- the method further comprises, based at least on the processing, treating the subject with a therapy capable of treating the cancer.
- the therapy comprises a chemotherapy, a radiation therapy, an immunotherapy, a targeted therapy, a surgical resection, or a combination thereof.
- the method further comprises, based at least on the processing, recommending a therapy regimen for the subject or changing a therapy regimen for the subject.
- the assaying further comprises mixing the nucleic acid molecules with filler nucleic acid molecules.
- the assaying does not comprise mixing the nucleic acid molecules with filler nucleic acid molecules.
- the assaying further comprises enriching methylated nucleic acids.
- the enriching further comprises using a binder that binds to one or more methylated nucleotides.
- the binder comprises a protein comprising a methyl-CpG-binding domain.
- the protein is a MBD2 protein.
- the binder comprises an antibody.
- the antibody is an anti 5-mC antibody.
- the antibody is an anti 5 -hydroxymethyl cytosine antibody.
- the binder exhibits a reduced level of a non-specific binding to non-methylated nucleotides of the cell free nucleic acid molecule or a sheared genomic nucleic acid molecule.
- the assaying further comprises sequencing the nucleic acids, or derivatives thereof. In some embodiments, the sequencing does not comprise bisulfite sequencing. In some embodiments, the sequencing further comprises bisulfite sequencing with methylation specific PCR. In some embodiments, the sequencing further comprises targeted sequencing.
- the sequencing further comprises using a plurality of capture probes.
- the plurality of capture probes comprises probes that are homologous or complementary to the one or more genomic regions.
- the plurality of capture probes comprises probes that are homologous or complementary to regions with a known amount of methylation.
- the plurality of capture probes comprises probes that are homologous or complementary to regions with no CpG methylation.
- the sequencing generates sequencing reads corresponding to the one or more genomic regions.
- the processing of (c) comprises counting a number of sequencing reads corresponding to a region of the one or more genomic regions.
- the present disclosure provides a method for monitoring a subject for regression or progression of a disease or condition comprising: assaying a biological sample of the subject for one or more markers specific to the subject, using a universal panel of genomic regions.
- the assaying is without use of a primer or bait set specific to the one or more markers. [0025] In some embodiments, the assaying further comprises sequencing nucleic acid molecules from the biological sample at a depth of at most 50 Million (M) single reads. In some embodiments, the assaying further comprises sequencing the nucleic acid molecules at a depth of at most 10 M single reads.
- the biological sample is a blood sample or a plasma sample. In some embodiments, the biological sample is a tissue sample and/or a non-tissue sample.
- the universal panel of genomic regions are derived from one or more control samples are derived from one or more control subjects without cancer.
- the universal panel of genomic regions comprises non-methylated regions and regions that are amenable to methylation enrichment.
- the one or more markers comprise differentially methylated regions (DMRs) specific to the subject.
- DMRs differentially methylated regions
- the method further comprises comparing one or more genomic regions of the biological sample to a set of anti-DMRs specific to the one or more reference samples to generate anti-DMRs specific to the subject.
- genomic regions to one or more reference genomic regions to one or more reference samples to generate a set of DMRs specific to the one or more reference samples and/or a set of anti-DMRs specific to the one or more reference samples.
- the one or more reference samples are derived from one or more reference subjects with cancer. In some embodiments, the one or more reference samples comprise non-tissue samples or tissue samples.
- the method further comprises comparing the DMRs specific to the subject to the set of DMRs specific to the one or more reference samples to generate one or more counts of the DMRs specific to the subject. In some embodiments, the method further comprises comparing the anti-DMRs specific to the subject to the set of anti-DMRs specific to the one or more reference samples to generate one or more counts of the anti- DMRs specific to the subject. In some embodiments, the method further comprises normalizing the one or more counts of the DMRs specific to the one or more counts of anti- DMRs specific to the subject, thereby generating a first methylation score.
- the method further comprises obtaining a reference sample.
- the reference sample is derived from the subject prior to obtaining the biological sample.
- the reference sample is a non-tissue sample and/or a tissue sample.
- the method further comprises comparing the universal panel of genomic regions to one or more genomic regions of the reference sample to generate a set of DMRs specific to the reference sample and/or a set of anti-DMRs specific to the reference sample.
- the method further comprises comparing the DMRs specific to the subject to the set of DMRs specific to the reference sample to generate one or more counts of the DMRs specific to the subject.
- the method further comprises comparing the anti-DMRs specific to the subject to the set of anti-DMRs specific to the one or more reference samples to generate one or more counts of the anti-DMRs specific to the subject. In some embodiments, the method further comprises normalizing the one or more counts of the DMRs specific to the subject to the one or more counts of the anti- DMRs specific to the subject, thereby generating a second methylation score.
- another biological sample is obtained at a time before or after obtaining the biological sample.
- the another biological sample comprises another one or more markers specific to the subject.
- the another one or more markers comprise another DMRs specific to the subject.
- the method further comprises comparing the another biological sample to the set of anti-DMRs specific to the one or more reference samples to generate another anti-DMRs specific to the subject. In some embodiments, the method further comprises comparing the another DMRs specific to the subject to the set of DMRs specific to the reference sample to generate one or more counts of the another DMRs specific to the subject. In some embodiments, the method further comprises comparing the another anti- DMRs specific to the subject to the set of anti-DMRs specific to the one or more reference samples to generate one or more counts of the another anti-DMRs specific to the subject.
- the method further comprises normalizing the one or more DMR counts of the another DMRs specific to the subject to the one or more counts of the another anti- DMRs specific to the subject, thereby generating a third methylation score. In some embodiments, the method further comprises comparing the second methylation score and the third methylation score, thereby generating an output indicative of the regression or the progression of the disease or the condition.
- the method further comprises integrating the first methylation score with the second methylation score to generate a single methylation score.
- the single methylation score is an output indicative of the regression or the progression of the disease or the condition.
- the disease or condition comprises a cancer.
- the cancer is a late-stage cancer.
- the cancer is an early- stage cancer.
- the disease or condition is a pre-cancer.
- the biological sample comprises nucleic acid molecules.
- the nucleic acid molecules are derived from one or more non-tissue samples.
- the nucleic acid molecules are derived from one or more tissue samples.
- the nucleic acid molecules comprise cell-free nucleic acid molecules.
- the biological sample comprises a blood sample and/or a plasma sample.
- the assaying further comprises sequencing the nucleic acid molecules. In some embodiments, the sequencing does not comprise bisulfite sequencing. In some embodiments, the sequencing further comprises bisulfite sequencing with methylation specific PCR. In some embodiments, the sequencing further comprises generating sequencing reads corresponding to the one or more markers. In some embodiments, the sequencing further comprises targeted sequencing.
- the sequencing further comprises using a plurality of capture probes.
- the plurality of capture probes comprises probes that are homologous or complementary to the plurality of regions.
- the plurality of capture probes comprises probes that are homologous or complementary to regions with a known amount of methylation.
- the plurality of capture probes comprises probes that are homologous or complementary to regions with no CpG methylation.
- the assaying further comprises counting a number of sequencing reads corresponding to a marker of the one or more markers.
- the assaying further comprises enriching methylated nucleic acids.
- the enriching further comprises using a binder that binds to one or more methylated nucleotides.
- the binder comprises a protein comprising a methyl-CpG-binding domain.
- the protein is a MBD2 protein.
- the binder comprises an antibody.
- the antibody is an anti 5-mC antibody.
- the antibody is an anti 5- hydroxymethyl cytosine antibody.
- the binder exhibits a reduced level of a non-specific binding to non-methylated nucleotides of the cell free nucleic acid molecule or a sheared genomic nucleic acid molecule.
- the method further comprises, based at least on the assaying, treating the subject with a therapy capable of treating the cancer.
- the therapy comprises a chemotherapy, a radiation therapy, an immunotherapy, a targeted therapy, a surgical resection, or a combination thereof.
- the method further comprises, based at least on the assaying, recommending a therapy regimen for the subject or changing a therapy regimen for the subject.
- a method for analyzing a sample derived from a subject comprising: assaying the sample for at least a portion of a set of differentially methylation regions (DMRs) specific to the subject to generate an output indicative of presence or absence of cancer, wherein the assaying comprises sequencing, wherein the sequencing has a depth of at most 50 Million (M) single reads.
- DMRs differentially methylation regions
- the sequencing has a depth of at most 10 M single reads.
- the method further comprises assaying the sample to generate a data set comprising methylation states of one or more genomic regions.
- the method further comprises providing a universal panel of genomic regions. In some embodiments, the method further comprises comparing the universal panel of genomic regions to one or more reference genomic regions to generate a set of DMRs specific to a reference sample and/or a set of anti -DMRs specific to the reference sample. In some embodiments, the method further comprises comparing the universal panel of genomic regions to another one or more reference genomic regions to generate a set of DMRs specific to a one or more reference samples and/or a set of anti- DMRs specific to the one or more reference samples.
- the sample comprises nucleic acid molecules.
- the nucleic acid molecules are derived from one or more tissue samples and/or one or more non-tissue samples.
- the nucleic acid molecules comprise cell-free nucleic acid molecules.
- the sample comprises a tissue sample.
- the sample does not comprise a tissue sample.
- the sample comprises a blood sample or a plasma sample.
- the method further comprises comparing the one or more genomic regions to the set of DMRs specific to the one or more reference samples to generate the set of DMRs specific to the subject. In some embodiments, the method further comprises comparing the one or more genomic regions to the set of anti-DMRs specific to the one or more reference samples to generate the set of anti-DMRs specific to the subject. In some embodiments, the comparing further comprises generating one or more counts of the set of DMRs specific to the subject. In some embodiments, the comparing further comprises generating one or more counts of the set of anti-DMRs specific to the subject.
- the method further comprises normalizing the one or more counts of the set of DMRs specific to the subject to the one or more counts of the set of anti-DMRs specific to the subject. In some embodiments, the normalizing further comprises generating a methylation score.
- the method further comprises comparing the one or more genomic regions to the set of DMRs specific to the reference sample to generate the set of DMRs specific to the subject. In some embodiments, the method further comprises comparing the one or more genomic regions to the set of anti-DMRs specific to the one or more reference samples to generate the set of anti-DMRs specific to the subject. In some embodiments, the comparing further comprises generating one or more counts of the set of DMRs specific to the subject. In some embodiments, the comparing further comprises generating one or more counts of the set of anti-DMRs specific to the subject.
- the method further comprises normalizing the one or more counts of the set of DMRs specific to the subject to the one or more counts of the set of anti-DMRs specific to the subject. In some embodiments, the normalizing further comprises generating another methylation score.
- the method further comprises using the output indicative of the presence or absence of cancer to determine progression of the cancer. In some embodiments, the method further comprises using the output indicative of the presence or absence of cancer to determine regression of the cancer. In some embodiments, the method further comprises using the output indicative of the presence or absence of cancer to determine therapy of the cancer.
- the present disclosure provides a method for detecting Minimal Residual Disease (MRD), comprising: assaying a biological sample from a subject, wherein the assaying does not comprise analyzing a solid tumor sample of the subject, wherein the assaying comprises sequencing one or more genomic regions in the biological sample, wherein the sequencing has a depth of at most 50 million single reads.
- MRD Minimal Residual Disease
- the present disclosure provides a method comprising: assaying the sample for at least a portion of a set of differentially methylation regions (DMRs) specific to the subject to generate an output indicative of presence or absence of cancer at a specificity of at least 90%.
- DMRs differentially methylation regions
- the present disclosure provides a method comprising: assaying a sample for at least a portion of a set of differentially methylation regions (DMRs) specific to a subject to generate an output indicative of presence or absence of cancer at a specificity of at least 90%.
- DMRs differentially methylation regions
- the present disclosure provides a method for classifying a sample derived from subject, the method comprising: (a) obtaining a sample comprising cell-free nucleic acids from a subject; (b) assaying the cell-free nucleic acids to generate a data set comprising methylation states of one or more genomic regions comprising differentially methylation regions (DMRs); and (c) processing at least a portion of the data set to generate an output indicative of cancer in the subject, wherein the portion of the data set pertains to a set of DMRs specific to the subject.
- DMRs differentially methylation regions
- the cell-free nucleic acid molecule is a cell-free deoxyribonucleic acid (DNA) molecule.
- the sample comprises a tissue sample, a blood sample, or a plasma sample.
- the assaying comprises sequencing the cell-free nucleic acids, or derivatives thereof.
- the sequencing does not comprise bisulfite sequencing.
- the sequencing comprises targeted sequencing.
- the sequencing comprises using a plurality of capture probes.
- the plurality of capture probes comprises probes that are homologous or complementary to the one or more genomic regions.
- the plurality of capture probes comprises probes that are homologous or complementary to regions with a known amount of methylation. In some embodiments, the plurality of capture probes comprises probes that are homologous or complementary to regions with no CpG methylation. In some embodiments, the sequencing generates sequencing reads corresponding to one or more genomic regions. In some embodiments, the assaying comprises counting a number of sequencing reads corresponding to a region of a plurality of regions. In some embodiments, the assaying comprises enriching methylated nucleic acids. In some embodiments, the enriching comprises using a binder that binds to one or more methylated nucleotides.
- the binder comprises a protein comprising a methyl-CpG-binding domain.
- the protein is a MBD2 protein.
- the binder comprises an antibody.
- the antibody is an anti 5-mC antibody.
- the antibody is an anti 5- hydroxymethyl cytosine antibody.
- the binder exhibits a reduced level of a non-specific binding to non-methylated nucleotides of the cell free nucleic acid molecule or a sheared genomic nucleic acid molecule.
- the sample obtained in (a) is a test sample.
- the method further comprises obtaining a control sample.
- the one or more genomic regions is identified by differential methylation analysis of a test sample and control sample.
- the control sample is derived from a subject without cancer.
- the one or more regions exhibits hypermethylation in the test sample compared to the control sample.
- the test sample is obtained from the subject at a time prior to obtaining the sample.
- the test sample is a tissue sample.
- the test sample is a blood sample.
- the second test sample is a tissue sample.
- the second test sample is a blood sample.
- the method further comprising assaying cell-free nucleic acids of the second sample to generate a second data set comprising methylation states of one or more genomic regions comprising differentially methylation regions. In some embodiments, the method further comprises based at least on the processing, treating the subject with a therapy. In some embodiments, the method further comprises based at least on the processing, recommending a therapy regimen for the subject. In some embodiments, the method further comprises based at least on the processing, changing a therapy regimen for the subject.
- the cancer is breast cancer, bladder cancer, colorectal cancer, endometrial cancer, prostate cancer, renal cancer, pancreatic cancer, or lung cancer.
- the cancer is a late-stage cancer.
- the cancer is an early- stage cancer.
- the one or more genomic regions comprise sites that are amenable to methylation enrichment.
- sites that are amenable to methylation enrichment can comprise sites that can be enriched after in vitro methylation of the sites.
- the present disclosure provides a method for monitoring a subject for regression or progression of a disease or condition comprising: assaying a biological sample of the subject for one or more markers specific to the subject, without use of a primer or bait set specific to the one or more markers.
- the disease comprises a cancer.
- the cancer is breast cancer, bladder cancer, colorectal cancer, endometrial cancer, prostate cancer, renal cancer, pancreatic cancer, or lung cancer.
- the cancer is a late-stage cancer.
- the cancer is an early- stage cancer.
- the disease or condition is a pre-cancer.
- the one or more markers comprises DMRs.
- the biological sample comprises cell-free nucleic acid molecules.
- the cell-free nucleic acid molecule is a cell-free deoxyribonucleic acid (DNA) molecule.
- the biological sample comprises a blood sample and/or a plasma sample.
- the assaying comprises sequencing the cell-free nucleic acids, or derivatives thereof.
- the sequencing does not comprise bisulfite sequencing.
- the sequencing generates sequencing reads corresponding to the one or more markers.
- the sequencing comprises targeted sequencing.
- the sequencing comprises using a plurality of capture probes.
- the plurality of capture probes comprises probes that are homologous or complementary to the one or more markers. In some embodiments, the plurality of capture probes comprises probes that are homologous or complementary to regions with a known amount of methylation. In some embodiments, the plurality of capture probes comprises probes that are homologous or complementary to regions with no CpG methylation. In some embodiments, the assaying comprises counting a number of sequencing reads corresponding to a marker of the one or more markers. In some embodiments, the assaying comprises enriching methylated nucleic acids. In some embodiments, the enriching comprises using a binder that binds to one or more methylated nucleotides.
- the binder comprises a protein comprising a methyl-CpG-binding domain.
- the protein is a MBD2 protein.
- the binder comprises an antibody.
- the antibody is an anti 5-mC antibody.
- the antibody is an anti 5 -hydroxymethyl cytosine antibody.
- the binder exhibits a reduced level of a non-specific binding to non-methylated nucleotides of the cell free nucleic acid molecule or a sheared genomic nucleic acid molecule.
- the one or more markers is identified by differential methylation analysis of a test sample and control sample.
- the control sample is derived from a subject without cancer.
- the one or more markers exhibits hypermethylation in the test sample compared to the control sample.
- the test sample is obtained from the subject at a time prior to obtaining the sample.
- the test sample is a tissue sample.
- the test sample is a blood sample.
- the sample is a tissue sample.
- the second sample is a blood sample.
- the method further comprising assaying the second sample of the subject for one or more markers specific to the subject, without use of a primer or bait set specific to the one or more markers.
- the assaying comprises (i) generating a data set comprising data pertaining to additional markers and the one or more marker, and (ii) processing a portion of the data set corresponding to the one or more markers. In some embodiments, the processing does not comprise processing the additional markers. In some embodiments, the method further comprises based at least on the assaying, treating the subject with a therapy. In some embodiments, the method further comprises based at least on the assaying, recommending a therapy regimen for the subject. In some embodiments, the method further comprises based at least on the assaying, changing a therapy regimen for the subject.
- the present disclosure provides a method of nucleic acids processing comprising: (a) assaying methylation levels of a plurality of regions in a biological sample comprising cell-free nucleic acids, wherein the plurality of regions have been identified as regions of a genome that (i) comprises one or more sites that are amenable to methylation enrichment, and (ii) comprises methylation at below a threshold in a non-diseased control; (b) processing methylation levels of the plurality of regions to identify a methylation background; (c) processing the methylation levels for a subset of the plurality of regions, wherein the subset comprises differentially methylated regions (DMRs), to identify a DMR specific methylation level; (d) generating a normalized DMR methylation level by normalizing the DMR specific level against the methylation background.
- DMRs differentially methylated regions
- the cell-free nucleic acid molecule is a cell-free deoxyribonucleic acid (DNA) molecule.
- the biological sample comprises a blood sample and/or a plasma sample.
- the region of the plurality of regions has a length of 300 base pairs (bp).
- (a) comprises sequencing the cell-free nucleic acids, and/or derivatives thereof.
- the sequencing does not comprise bisulfite sequencing.
- the sequencing comprises targeted sequencing.
- the sequencing comprises using a plurality of capture probes.
- the plurality of capture probes comprises probes that are homologous or complementary to the plurality of regions.
- the plurality of capture probes comprises probes that are homologous or complementary to regions with a known amount of methylation.
- the plurality of capture probes comprises probes that are homologous or complementary to regions with no CpG methylation.
- the sequencing generates sequencing reads corresponding to the plurality of regions.
- the assaying comprises counting a number of sequencing reads corresponding to a region of a plurality of regions.
- the (b) comprises counting a number of sequencing reads for each region of the plurality of regions.
- the method further comprises generating an average of sequencing read counts for all regions of the plurality of regions thereby generating the methylation background.
- the (c) comprises counting a number of sequencing reads for each region of the subset of plurality of regions.
- the method further comprises generating an average of sequencing read counts for all regions of the subset of the plurality of regions thereby generating the DMR specific methylation level.
- the generating the normalized DMR methylation level comprises dividing an average number of sequencing reads associated with the subset by an average number of sequencing reads associated with the plurality of regions.
- the (a) comprises enriching methylated nucleic acids.
- the enriching comprises using a binder that binds to one or more methylated nucleotides.
- the binder comprises a protein comprising a methyl-CpG- binding domain.
- the protein is a MBD2 protein.
- the binder comprises an antibody.
- the antibody is an anti 5-mC antibody.
- the antibody is an anti 5 -hydroxymethyl cytosine antibody.
- the binder exhibits a reduced level of a non-specific binding to non-methylated nucleotides of the cell free nucleic acid molecule or a sheared genomic nucleic acid molecule.
- the DMR subset is identified by differential methylation analysis of a test sample and control sample.
- the test sample is derived from a subject with cancer.
- the control sample is derived from a subject with cancer.
- the DMR subset comprises one or more regions that exhibits hypermethylation in the test sample compared to the control sample.
- the DMR subset comprises one or more regions that comprises DMRs specific to a particular cancer type.
- the DMR subset comprises one or more regions that comprises DMRs that are not specific to a particular cancer type.
- the method further comprises identifying the subject as having a disease, based at least on the normalized DMR methylation level. In some embodiments, the identifying comprises comparing the normalized DMR methylation level against a control level. In some embodiments, the control level corresponds to an expected value for a non- cancerous sample. In some embodiments, the control level corresponds to an expected value for a cancerous sample. In some embodiments, the disease or condition is a cancer or a tumor. In some embodiments, the disease or condition is a pre-cancer. In some embodiments, the cancer is breast cancer, bladder cancer, colorectal cancer, endometrial cancer, prostate cancer, renal cancer, pancreatic cancer, or lung cancer. In some embodiments, the cancer is a late-stage cancer. In some embodiments, the cancer is an early-stage cancer.
- the one or more sites that are amenable to methylation enrichment are determined by determining methylation levels in a fully methylated control sample.
- the fully methylated control sample comprises nucleic acids subjected to in vitro methylation (e.g., in vitro enzymatic methylation).
- the determining methylation levels in the fully methylated control sample comprises enriching methylated nucleic acids.
- the enriching comprises using a binder that binds to one or more methylated nucleotides.
- the binder comprises a protein comprising a methyl-CpG-binding domain.
- the protein is a MBD2 protein.
- the binder comprises an antibody. In some embodiments, the antibody is an anti 5-mC antibody. In some embodiments, the antibody is an anti 5 -hydroxymethyl cytosine antibody. In some embodiments, the binder exhibits a reduced level of a non-specific binding to non-methylated nucleotides of the cell free nucleic acid molecule or a sheared genomic nucleic acid molecule. In some embodiments, the methylation control sample further comprises filler deoxyribonucleic acid (DNA) molecule. In some embodiments, the filler DNA has a length of about 50 bp to about 800 bp. In some embodiments, the methylation control sample further comprises genomic DNA. In some embodiments, the genomic DNA is subjected to shearing.
- DNA deoxyribonucleic acid
- the methylation control sample further comprises cell-free DNA.
- the method comprises a reduction in a noise level compared to a noise level of a corresponding sample that has a normalized DMR methylation level generated by normalizing the DMR specific level against a background derived from a whole genome. In some embodiments, the method comprises a reduction in a noise level compared to a noise level of a corresponding sample that has a normalized DMR methylation level generated by normalizing the DMR specific level against a background derived from all genomic regions that are amenable to methylation enrichment.
- the present disclosure provides a method of identifying a subject as having cancer, the method comprising: (a) obtaining a sample comprising cell-free nucleic acid from a subject; (b) assaying the cell-free nucleic acids to identifying a methylation level of a subset of the cell-free nucleic acids, wherein the subset of cell-free nucleic acids correspond to (e.g., complementary to) regions of a genome that (i) comprise one or more sites that are amenable to methylation enrichment and (ii) comprise substantially no methylation in a healthy control; and (c) based at least on the methylation state of the subset of the cell-free nucleic acids, identifying the subject as having cancer.
- the subset of cell-free nucleic acids can correspond to regions of a genome by having one or more genomic sequences that are complementary to the sequences of the genome.
- the assaying comprises sequencing. In some embodiments, based at least on (b), identifying a plurality of differentially methylated regions (DMRs). In some embodiments, the method further comprises processing the plurality of DMRs using a classifier to identify the subject as having cancer.
- DMRs differentially methylated regions
- the present disclosure provides a method of generating a trained classifier, the method comprising (a) determining the presence of differentially methylated regions (DMRs) in a plurality of regions in a set of biological samples comprising cell-free nucleic acids to generate a training data set, wherein the plurality of regions have been identified as regions of a genome that (i) comprises one or more sites that are amenable to methylation enrichment, and (ii) comprises methylation at below a threshold in a nondiseased control; (b) computer processing the training data set using machine learning to train an untrained classifier, thereby generating the trained classifier.
- DMRs differentially methylated regions
- the training data set comprises sample parameter data corresponding to characteristics of the subject from which a sample is derived from.
- the characteristics comprises a cancer stage of the subject.
- the characteristics comprises a cancer organ origin.
- the characteristics comprises an age or sex of a subject.
- the characteristics comprises one or more co-morbidities.
- a subset of the set of biological samples are derived from subjects having cancer.
- a subset of the set of biological samples are derived from subjects that do not have cancer.
- the present disclosure provides a method of nucleic acids processing comprising: (a) assaying methylation levels of a plurality of regions in a biological sample comprising cell-free nucleic acids, wherein the plurality of regions have been identified as regions of a genome that (i) comprises one or more sites that are amenable to methylation enrichment, and (ii) comprises methylation at below a threshold in a healthy control, wherein the assaying comprises sequencing the cell-free nucleic acids to generate sequencing reads;
- the cell- free nucleic acid molecule is a cell-free deoxyribonucleic acid (DNA) molecule.
- the biological sample comprises a blood sample and/or a plasma sample.
- the region of the plurality of regions has a length of 300 base pairs (bp).
- the sequencing does not comprise bisulfite sequencing.
- the sequencing comprises targeted sequencing.
- the sequencing comprises using a plurality of capture probes.
- the plurality of capture probes comprises probes that are homologous or complementary to the plurality of regions.
- the plurality of capture probes comprises probes that are homologous or complementary to regions with a known amount of methylation.
- the plurality of capture probes comprises probes that are homologous or complementary to regions with no CpG methylation.
- the sequencing generates sequencing reads corresponding and/or complementary to the plurality of regions.
- (a) comprises enriching methylated nucleic acids.
- enriching comprises using a binder that binds to one or more methylated nucleotides.
- the binder comprises a protein comprising a methyl-CpG- binding domain.
- the protein is a MBD2 protein.
- the binder comprises an antibody.
- the antibody can be an anti 5-mC antibody.
- the antibody can be an anti 5 -hydroxymethyl cytosine antibody.
- the binder exhibits a reduced level of a nonspecific binding to non-methylated nucleotides of the cell free nucleic acid molecule or a sheared genomic nucleic acid molecule.
- the subset of the plurality of regions is identified by differential methylation analysis of a test sample and control sample.
- the test sample is derived from a subject with cancer.
- the control sample is derived from a subject without cancer.
- the subset of the plurality of regions comprises one or more regions that exhibits hypermethylation in the test sample compared to the control sample.
- the subset of the plurality of regions comprises one or more regions that comprises DMRs specific to a particular cancer type.
- the subset of the plurality of regions comprises one or more regions that comprises DMRs that are not specific to a particular cancer type.
- the method further comprises identifying the subject as having a disease, based at least on the normalized DMR methylation level. In some embodiments, identifying comprises comparing the normalized DMR methylation level against a control level. In some embodiments, the control level corresponds to an expected value for a non- cancerous sample. In some embodiments, the control level corresponds to an expected value for a cancerous sample. In some embodiments, the disease or condition is a cancer or a tumor. In some embodiments, the cancer is breast cancer, bladder cancer, colorectal cancer, endometrial cancer, prostate cancer, renal cancer, pancreatic cancer, or lung cancer. In some embodiments, the cancer is a late-stage cancer. In some embodiments, the cancer is an early- stage cancer.
- the one or more sites that are amenable to methylation enrichment are determined by determining methylation levels in a fully methylated control sample.
- the fully methylated control sample comprises nucleic acids subjected to in vitro methylation (e.g., in vitro enzymatic methylation).
- determining methylation levels in the fully methylated control sample comprises enriching methylated nucleic acids.
- the enriching comprises using a binder that binds to one or more methylated nucleotides.
- the binder comprises a protein comprising a methyl-CpG-binding domain.
- the protein is a MBD2 protein.
- the binder comprises an antibody.
- the antibody is an anti 5-mC antibody.
- the antibody is an anti 5 -hydroxymethyl cytosine antibody.
- the binder exhibits a reduced level of a non-specific binding to non-methylated nucleotides of the cell free nucleic acid molecule or a sheared genomic nucleic acid molecule.
- the fully methylated control sample further comprises filler deoxyribonucleic acid (DNA) molecule.
- the filler DNA has a length of about 50 bp to about 800 bp.
- the fully methylated control sample further comprises cell-free DNA.
- the method comprises a reduction in a noise level compared to a noise level of a corresponding sample that has a normalized DMR methylation level generated by normalizing the DMR specific level against a background derived from all regions of a whole genome. In some embodiments, the method comprises a reduction in a noise level compared to a noise level of a corresponding sample that has a normalized DMR methylation level generated by normalizing the DMR specific level against a background derived from all genomic regions that are amenable to methylation enrichment.
- the present disclosure provides a method, comprising: (a) obtaining a sample comprising cell-free nucleic acids from a first set of one or more subjects; (b) obtaining a control sample comprising cell-free nucleic acids from a second set of one or more subjects; (c) assaying the cell-free nucleic acids from the first set of one or more subjects and the second set of one or more subjects to generate a training data set comprising methylation states of one or more genomic regions comprising differentially methylation regions (DMRs); and (d) generating a classifier using the training data set to generate an output indicative of cancer.
- DMRs differentially methylation regions
- Another aspect of the present disclosure provides a non-transitory computer readable medium comprising machine executable code that, upon execution by one or more computer processors, implements any of the methods above or elsewhere herein.
- Another aspect of the present disclosure provides a system comprising one or more computer processors and computer memory coupled thereto.
- the computer memory comprises machine executable code that, upon execution by the one or more computer processors, implements any of the methods above or elsewhere herein.
- FIG. 1 shows a diagram illustrating a process for collecting flow-through of unmethylated/hypomethylated deoxyribonucleic acid (DNA) fragments.
- FIG. 2 shows a computer system that is programmed or otherwise configured to implement methods provided herein.
- FIG. 3 shows a diagram illustrating how blood quiet regions are selected.
- FIGs. 4A-4B illustrate using blood quiet regions to assess blood quiet circulating tumor DNA (ctDNA) specific differentially methylated regions (DMRs) and ctDNA specific methylation level.
- FIG. 4A shows a diagram illustrating how to identify blood quiet ctDNA specific DMRs.
- FIG. 4B shows a diagram illustrating how to utilize the signal -to-noise (SNR) normalization method to calculate ctDNA specific methylation level.
- SNR signal -to-noise
- FIG. 5A-5B illustrates a schematic of an example assay.
- FIG. 6 shows data relating to limits of detection for a targeted panel workflow versus a non-targeted panel workflow.
- FIG. 7 shows a diagram illustrating a process for developing a head and neck cancer (HNC) signature using an algorithm.
- HNC head and neck cancer
- FIG. 8 shows methylation levels of signature DMRs in peripheral blood leukocytes
- PBL normal solid tissue
- primary solid tumor
- FIG. 9 shows correlation between tumor purity and mean signature signal within The Cancer Genome Atlas (TCGA) HNC tumor tissue.
- FIG. 10 shows enrichment of signature DMRs in differential HNC CpG sites in
- FIG. 11 shows identified signature DMRs mapped to top 15 genes.
- FIG. 12 shows pathway enrichment in signature DMRs.
- FIG. 13 shows methylation levels of signature DMRs in peripheral blood leukocytes
- PBL normal tissue
- HNSC head and neck squamous cell carcinoma
- LUSC lung squamous cell carcinoma
- CEC cervical squamous cell carcinoma
- FIG. 14 illustrates a workflow for selecting proto-DMRs.
- FIG. 15 illustrates a workflow for the baseline/tissue-informed approach.
- FIG. 16 illustrates a workflow for selecting cancer specific anti-DMRs.
- FIG. 17 illustrates a workflow for selecting cancer specific DMRs.
- FIG. 18 illustrates a workflow for the tissue/baseline agnostic approach.
- FIG. 19 illustrates a workflow for the joint model approach.
- FIG. 20 shows the baseline/tissue agnostic scores generated upon subjecting various dilutions of FaDu cell line to the baseline/tissue agnostic (whole methylome) approach.
- the FaDu cell line were diluted in non-cancer control cell-free DNA (cfDNA).
- FIG. 21 shows the baseline-informed scores generated upon subjecting various dilutions of FaDu cell line to the baseline-informed (whole methylome) approach.
- the FaDu cell line were diluted in non-cancer control cfDNA.
- FIG. 22 shows the baseline/tissue agnostic scores generated upon subjecting various dilutions of FaDu cell line to the baseline/tissue agnostic (proto-DMR panel) approach.
- the FaDu cell line were diluted in non-cancer control cfDNA.
- FIG. 23 shows the baseline-informed scores generated upon subjecting various dilutions of FaDu cell line to the baseline-informed (proto-DMR panel) approach.
- the FaDu cell line were diluted in non-cancer control cfDNA.
- FIG. 24 shows the baseline-informed scores generated upon subjecting various dilutions of cfDNA from colorectal, stage III subjects to the baseline-informed (whole methylome) approach.
- the cfDNA from colorectal, stage III subjects were diluted in non- cancer control cfDNA.
- FIG. 25 shows the baseline/tissue agnostic scores generated upon subjecting various dilutions of cfDNA from colorectal, stage III subjects to the baseline/tissue agnostic (whole methylome) approach.
- the cfDNA from colorectal, stage III subjects were diluted in non- cancer control cfDNA.
- FIG. 26 shows the baseline-informed scores generated upon subjecting various dilutions of cfDNA from colorectal, stage IV subjects to the baseline-informed (proto-DMR panel) approach.
- the cfDNA from colorectal, stage IV subjects were diluted in non-cancer control cfDNA.
- FIG. 27 shows the baseline/tissue agnostic scores generated upon subjecting various dilutions of cfDNA from colorectal, stage IV subjects to the baseline/tissue agnostic (proto- DMR panel) approach.
- the cfDNA from colorectal, stage IV subjects were diluted in non- cancer control cfDNA.
- FIG. 28A shows the sensitivity and the specificity for detecting minimal residual disease using the whole methylome in the baseline-informed approach, the baseline/tissue agnostic approach, and the joint model approach.
- FIG. 28B shows the sensitivity and the specificity for detecting minimal residual disease using the proto-DMR panel in the baseline-informed approach, the baseline/tissue agnostic approach, and the joint model approach.
- FIG. 29 shows an example schematic of evaluating and thresholding joint predictive models.
- the present disclosure provides methods and/or systems for the processing and analysis of nucleic acids present in biological samples through the generation of libraries of methylated genomic regions, which can be useful in determining a risk or likelihood of a subject having cancer or a tumor with high sensitivity and/or high specificity.
- the methods and/or systems disclosed herein can process and/or analyze nucleic acids present in biological samples through different approaches.
- the one or more approaches can comprise baseline- informed approach, baseline-agnostic approach, and/or joint approach.
- the different approaches disclosed herein can utilize the same panel comprising genomic regions (e.g., proto- DMRs, control genomic regions) with little to no methylation signals in non-cancer controls (e.g., a pool of non-cancer subjects) to identify and/or select one or more differentially methylated regions (DMRs) that can be used for monitoring a cancer or disease.
- genomic regions e.g., proto- DMRs, control genomic regions
- non-cancer controls e.g., a pool of non-cancer subjects
- DMRs differentially methylated regions
- Methylation patterns of nucleic acid molecules derived from a tissue sample or a non-tissue sample (e.g., cell free nucleic acid molecules, circulating tumor DNA) of a subject can be useful for predicting, screening, diagnosing and/or monitoring for a cancer.
- Utilizing the panel from non-cancer controls can offer various advantages compared to panels that require customization for different cancer indications.
- the methods and/or systems disclosed herein may not need large cancer cohorts to determine one or more DMRs for monitoring a subject, and/or may not need prior knowledge of cancerspecific DMRs.
- the panel can be applicable across any cancer types without modification, and/or expensive panel redesigns.
- the panel comprises genomic regions (e.g., proto-DMRs, control genomic regions) with little to no methylation signals in non- cancer controls, low sequencing depth may be needed to identify methylated in a subject, reducing sequencing cost. This can be in contrast to bisulfite sequencing that requires sequencing of the non-methylated regions, meaning that as the size of the panel expands there can be more required sequencing depth.
- Methods and systems provided herein can comprise assaying the cell-free nucleic acids to identifying a methylation level of a subset of the cell-free nucleic acids, which can be processed to monitor a subject.
- the methylation states of various nucleic acids can allow for identification of differential methylation regions (DMRs) in a subject as compared to another sample (e.g., control).
- DMRs differential methylation regions
- the presence of specific DMRs may be used to differentiate between, for example, cancerous and non-cancerous tissue.
- Specific DMR may be specific to types of tissues (e.g., a tumor or cancer cell) and may be used to monitor the presence or absence of methylated states of the tissue sample.
- the term “subject,” as used herein, generally refers to any member of the animal kingdom.
- the subject may be a human.
- the subject may be an individual exhibiting a disease (e.g., cancer) or an individual not exhibiting the disease.
- the subject may be considered to have a risk of developing the disease, such as cancer.
- the subject may be symptomatic or asymptomatic for a disease.
- the subject may be a patient.
- the subject may be a patient receiving medical care for a disease or condition (e.g., cancer).
- genomic information generally refers to genomic information from a subject, which may be, for example, at least a portion or an entirety of a subject’s hereditary information.
- a genome can be encoded either in DNA or in RNA.
- a genome can comprise coding regions (e.g., that code for proteins) as well as non-coding regions.
- a genome can include the sequence of all chromosomes together in an organism.
- the human genome ordinarily has a total of 46 chromosomes. The sequence of all of these together may constitute a human genome.
- methylome generally refers to measure of an amount of DNA methylation and/or DNA methylation level at a plurality of sites or loci in a genome.
- DNA methylation is a process by which methyl groups are added to a DNA molecule.
- DNA methylation can act to modulate (e.g., repress) gene transcription.
- the methylome may correspond to (e.g., complementary to) all of the genome (whole genome methylation), a substantial part of the genome, or relatively small portion(s) of the genome.
- the term “methylome” as used herein can also refer to the set of methylation modifications (e.g., on a nucleic acid) in an organism, in a cell, or in a sample.
- a methylome can depend on the method of methylation measurement. For example, when using the antibody 5mC, a methylome can represent all the information of DNA methylation on the cytosines of a genome.
- nucleic acid generally refers to a polynucleotide comprising two or more nucleotides, i.e., a polymeric form of nucleotides of various lengths (e.g., at least 10, 20, 30, 40, 50, 100, 200, 300, 400, 500, 1000, 10000, or more nucleotides in length), either deoxyribonucleotides (dNTPs) or ribonucleotides (rNTPs), or analogs thereof.
- dNTPs deoxyribonucleotides
- rNTPs ribonucleotides
- Nonlimiting examples of nucleic acids include deoxyribonucleic (DNA), ribonucleic acid (RNA), coding or non-coding regions of a gene or gene fragment, loci (locus) defined from linkage analysis, exons, introns, messenger RNA (mRNA), transfer RNA, ribosomal RNA, short interfering RNA (siRNA), short-hairpin RNA (shRNA), micro-RNA (miRNA), ribozymes, cDNA, recombinant nucleic acids, branched nucleic acids, plasmids, vectors, isolated DNA of any sequence, isolated RNA of any sequence, nucleic acid probes, and primers.
- DNA deoxyribonucleic
- RNA ribonucleic acid
- coding or non-coding regions of a gene or gene fragment loci (locus) defined from linkage analysis, exons, introns, messenger RNA (mRNA), transfer RNA, ribosomal RNA, short interfering
- a nucleic acid may comprise one or more modified nucleotides, such as methylated nucleotides and nucleotide analogs. If present, modifications to the nucleotide structure may be made before or after assembly of the nucleic acid.
- the sequence of nucleotides of a nucleic acid may be interrupted by non-nucleotide components.
- a nucleic acid may be further modified after polymerization, such as by conjugation or binding with a reporter agent.
- a “variant” nucleic acid is a polynucleotide having a nucleotide sequence identical to that of its original nucleic acid except having at least one nucleotide modified, for example, deleted, inserted, or replaced, respectively. The variant may have a nucleotide sequence at least about 80%, 90%, 95%, or 99%, identity to the nucleotide sequence of the original nucleic acid.
- Cell-free methylated DNA generally includes DNA that can be one or more nucleic acid molecules circulating freely in the blood stream. In some cases, cell-free methylated DNA can be methylated at various regions of the DNA. Samples, for example, plasma samples may be taken to analyze cell-free methylated DNA. Studies reveal that much of the circulating nucleic acids in blood arise from necrotic or apoptotic cells and greatly elevated levels of nucleic acids from apoptosis is observed in diseases such as cancer.
- circulating DNA bears hallmark signs of the disease including mutations in oncogenes, microsatellite alterations, and, for certain cancers, viral genomic sequences, DNA or RNA in plasma has become increasingly studied as a potential biomarker for disease.
- a quantitative assay for low levels of circulating tumor DNA in total circulating DNA may serve as a better marker for detecting the relapse of colorectal cancer compared with carcinoembryonic antigen, the biomarker used clinically.
- Cell-free DNA e.g., circulating cfDNA
- sequencing also referred to as “genomic sequencing,” is a process for determining the order of the chemical building blocks (e.g., adenine, cytosine, guanine, thymine, uracil) that make up a nucleic acid molecule (e.g., DNA, RNA, cDNA).
- chemical building blocks e.g., adenine, cytosine, guanine, thymine, uracil
- library preparation generally includes list end-repair, A-tailing, adapter ligation, or any other preparation performed on the cell free DNA to permit subsequent sequencing of DNA.
- Library preparation can allow for a nucleic acid sample (e.g., DNA, cDNA) to adhere to the sequencing apparatus (e.g., a flow cell, a bead).
- Nonlimiting examples of library preparation include ligation-based library preparation and tagmentation-based library preparation.
- Library preparation can result in the creation of a sequencing library, a pool of nucleic acid (e.g., DNA) fragments with adapters attached. The type of adapter attached during library preparation can depend on the sequencing platform/apparatus used.
- the output of sequencing can be a “sequencing read.”
- a “sequencing read” is an inferred sequence of base pairs or base pair probabilities corresponding to all or part of a nucleic acid fragment (e.g., a DNA fragment).
- the length of a sequencing read can depend on the sequencing platform/apparatus used.
- the length of a sequencing read can be about 10, about 20, about 30, about 40, about 50, about 60, about 70, about 80, about 90, about 100, about 150, about 200, about 250, about 300, about 350, about 400, about 450, about 500, about 550, about 600, about 650, about 700, about 750, about 800, about 850, about 900, about 950, about 1000, about 1100, about 1200, about 1300, about 1400, about 1500, about 1600, about 1700, about 1800, about 1900, about 2000, or more base pairs in length.
- sequencing depth refers to the ratio of the total number of bases obtained by sequencing to the size of the genome. Sequencing depth can also refer to the average number of times each base is measured in a genome during sequencing. Factors that can determine sequencing depth can include the error rate of the sequencing methods, the assembly algorithm used during sequencing, the repeat complexity of the nucleic acid molecule, region, or genome that is being sequenced, and the length of the sequencing read.
- supplemental processed DNA e.g., “filler DNA” generally may be noncoding DNA or it may consist of amplicons.
- proto-DMR or “proto-differentially methylated region,” or “quiet regions” can refer a genomic region that is able to be methylated by a methylation enrichment assay (e.g., in vitro methylation assay) but does not necessarily show methylation in a given sample type.
- the proto-DMRs refer to one or more genomic regions that can be 1) pulled down upon in vitro methylation (e.g., enzymatic methylation) of the genomic regions, and 2) confirmed to be non-methylated in non-cancer subjects.
- proto-DMRs can be areas of interest in differential methylation analysis as they are shown to be capable of having methylation, however in a particular sample type (e.g., a non-diseased or healthy control), no methylation is observed.
- a cancer sample, or other sample of interest may comprise a methylation state that is different from a healthy or non-diseased control, thus giving rise to a differentially methylated region (DMR), which be analyzed.
- the one or more proto-DMRs can be captured (e.g., via hybrid capture and/or multiplex PCR) and/or analyzed in a subject to identify DMRs and anti-DMRs.
- anti-DMR or “anti-differentially methylated region” can refer a genomic region that is able to be methylated by a methylation enrichment assay (e.g., in vitro methylation assay) but does not show a change in methylation state in when comparing a sample without a particular condition to a sample with a particular condition.
- the anti-DMR can identified from within a proto-DMR. These regions may be used for normalization or as a reference, for example, in the methods disclosed herein. As these regions may be methylated, these regions, under certain conditions, can be analyzed using methylation specific methods, while representing a background level for samples relating to a particular condition.
- DMR differentiated region
- genomic region that is methylated and/or hypermethylated in a sample with a particular condition, compared to a sample without a particular condition.
- the DMR can be identified from within a proto- DMR.
- non-diseased control or “non-diseased sample” can refer to a sample that is derived from a sample that does not have a particular disease.
- the non-diseased control or sample may be substantially free of a particular disease (e.g., cancer), and can be used to as a control or reference for use in detection of the particular disease.
- the non-diseased control or non-diseased sample may comprise aberrations, genetic variants, or be infected with other diseases that are not the particular disease of interest.
- the fragment length metric can be fragment length.
- the subject cell-free methylated DNA can be limited to fragments having a length of ⁇ 170 bp, ⁇ 165 bp, ⁇ 160 bp, ⁇ 155 bp, ⁇ 150 bp, ⁇ 145 bp, ⁇ 140 bp, ⁇ 135 bp, ⁇ 130 bp, ⁇ 125 bp, ⁇ 120 bp, ⁇ 115 bp, ⁇ 110 bp, ⁇ 105 bp, or ⁇ 100 bp.
- the subject cell-free methylated DNA can be limited to fragments having a length of between about 100 - about 150 bp, 110 - 140 bp, or 120 - 130 bp.
- the fragment length metric can be the fragment length distribution of the subject cell-free methylated DNA.
- the subject cell- free methylated DNA can be limited to fragments within the bottom 50th, 45th, 40th, 35th, 30th, 25th, 20th, 15th, or 10th percentile based on length.
- sheared genomic nucleic acid molecule or “sheared DNA,” also referred to as “cfDNA mimic” can comprise a subset of whole-genome DNA.
- sheared DNA comprises randomly cleaved DNA.
- sheared DNA comprises DNA cleaved at specified locations along the genome.
- sheared DNA comprises DNA that can be fragmented to a predetermined fragment range. Physical shearing can be performed using, for example, probe sonication or nebulization. Enzymatic shearing or fragmentation can also be performed to generate sheared DNA.
- Sheared genomic nucleic acid molecules can be about 20, about 30, about 40, about 50, about 60, about 70, about 80, about 90, about 100, about 150, about 200, about 250, about 300, about 350, about 400, about 450, about 500, about 550, about 600, about 650, about 700, about 750, about 800, about 850, about 900, about 950, about 1000, or more base pairs in length.
- control may comprise both positive and negative control, or at least a positive control.
- Cell-free nucleic acids such as cell-free DNA (cfDNA)
- cfDNA cell-free DNA
- samples that can be collected noninvasively can be blood, urine, saliva, or CSF.
- Cancer development can be associated with focal gain of 5’ methylcytosines (5mC), for instance, at cytosine-phosphate-guanine (CpG) islands and CpG island shores. Cancer development can also be associated with global cytosine demethylation. Global cytosine demethylation can be a genome- wide loss of 5mC.
- ctDNA can be distinguished from cfDNA molecules derived from healthy tissue (e.g., non-tumor and/or non-cancer tissue) by the methylation level (e.g., the percentage of nucleotide residues that are methylated) of the nucleic acid molecules.
- healthy tissue e.g., non-tumor and/or non-cancer tissue
- methylation level e.g., the percentage of nucleotide residues that are methylated
- nucleic acid molecules of or derived from tumor tissue and/or cancer tissue can be hypomethylated (e.g., can comprise a lower level of methylation, for instance, wherein there are fewer methylated nucleotide residues and/or a lower percentage of methylated nucleotide residues) compared to nucleic acid molecules of or derived from healthy tissue, or non-diseased (e.g., nucleic acid molecules of or derived from healthy tissue that consist of or comprise nucleotide sequences corresponding to the same region(s) of the genome of the subject).
- hypomethylated e.g., can comprise a lower level of methylation, for instance, wherein there are fewer methylated nucleotide residues and/or a lower percentage of methylated nucleotide residues
- non-diseased e.g., nucleic acid molecules of or derived from healthy tissue that consist of or comprise nucleotide sequences corresponding to the same region(s) of the
- tumor- derived nucleic acid molecules e.g., ctDNA molecules
- nucleic acid molecules of or derived from tumor tissue and/or cancer tissue can be hypermethylated (e.g., can comprise a higher level of methylation, for instance, wherein there are greater methylated nucleotide residues and/or a greater percentage of methylated nucleotide residues) compared to nucleic acid molecules of or derived from healthy tissue (e.g., nucleic acid molecules of or derived from healthy tissue that consist of or comprise nucleotide sequences corresponding to the same region(s) of the genome of the subject).
- healthy tissue e.g., nucleic acid molecules of or derived from healthy tissue that consist of or comprise nucleotide sequences corresponding to the same region(s) of the genome of the subject.
- tumor-derived nucleic acid molecules can comprise one or more regions having greater methylated nucleotide residues than nucleic acid molecules (e.g., cfDNA molecules) derived from healthy tissues (e.g., non- tumor and/or non-cancer tissues) in the same biological sample.
- nucleic acid molecules e.g., cfDNA molecules
- healthy tissues e.g., non- tumor and/or non-cancer tissues
- all or a portion of a tumor-derived fraction of a plurality of cell-free DNA molecules e.g., ctDNA
- biophysical properties e.g., the length of the cfDNA molecules or the presence of stereotypical 5’ and 3’ end sequence motifs
- ctDNA molecules can have shorter nucleic acid lengths than cfDNA molecules derived from healthy tissues.
- ctDNA molecules may comprise stereotypical 5’ and 3’ end motifs.
- one or more of these distinguishing features may be used to deplete a population of nucleic acid molecules of cfDNA derived from healthy tissue and/or to enrich a population of nucleic acid molecules for ctDNA.
- ctDNA can have shorter fragment length compared to cfDNA derived from a healthy tissue.
- Nucleic acid molecules derived from tumor or cancer cells or tissue may be present in a biological sample (and/or a population of nucleic acids derived from the biological sample) in substantially lower quantities than nucleic acid molecules (e.g., cfDNA) derived from healthy tissue.
- ctDNA present in a plurality of nucleic acid molecules (e.g., cfDNA) in or derived from a biological sample, for instance, because they are present in the sample in lower quantities relative to cfDNA derived from healthy tissue (e.g., which may require using a greater amount of potentially scarce biological sample and/or which may require significantly higher sequencing depth).
- Various methods and systems disclosed herein may alleviate potential issues relating to the low quantities of ctDNA (e.g., via enrichment of methylated nucleic acids).
- nucleic acids e.g., cfDNA molecules or amplicons thereof derived from a biological sample
- a plurality of nucleic acids may be subjected to genome-wide depletion of nucleic acid molecules methylated in one or more specific regions of the genomic sequence of the nucleic acid molecules (e.g., CpG islands, CpG island shores, or repetitive sequences of the genome, such as long interspersed nuclear elements (LINEs), short interspersed nuclear elements (SINEs), or LTRs (long terminal repeats)) to achieve increased sensitivity and/or increased specificity in assays for determining the presence or absence or the sequence identity of ctDNA molecules in the plurality.
- LINEs long interspersed nuclear elements
- SINEs short interspersed nuclear elements
- LTRs long terminal repeats
- a whole genome comprises all genomic regions.
- a whole genome can comprise all hypomethylated and/or all hypermethylated genomic regions.
- a whole genome file comprises all the genomic regions of all the chromosomes of a sample (e.g., all human chromosomes).
- a whole genome file can comprise all the genomic regions of the autosomes (e.g., human chromosomes 1-22).
- a subset of the global or whole genome can be used to provide specific information about methylation at specified regions or a plurality of regions of a genome.
- specified regions or a plurality of regions can comprise one or more sites that are amenable to methylation enrichment.
- one or more sites that are amenable to methylation enrichment can comprise one or more sites that can be enriched for one or more methylated site after in vitro methylation.
- specified regions or a plurality of regions can comprise substantially no methylation at below a threshold in a healthy control.
- specified regions or a plurality of regions can comprise (i) one or more sites that are amenable to methylation enrichment and (ii) substantially no methylation at below a threshold in a healthy control.
- Such specified regions or a plurality of regions can be experimentally validated and can provide additional information for using or distinguishing one or more biomarkers.
- the one or more biomarkers can be differentially methylated regions (DMRs) for the purposes of distinguishing cancer from control samples.
- DMRs differentially methylated regions
- Using a subset of the whole genome, as opposed to the whole genome may provide advantages such as reducing the overall noise of the method.
- the background may comprise regions of a genome that comprises (i) one or more sites that are amenable to methylation enrichment, or (ii) methylation at below a threshold in a healthy control.
- the background may comprise a selection of specific DMRs.
- the use of a smaller region may decrease the sequencing footprint or may allow for increased sequencing depth for areas that may be relevant for a given subject (e.g., subject-specific DMR).
- Specific DMRs in a sample may be used to monitor a sample for the presence of a tumor or cancer.
- the change in methylation (e.g., hypermethylation) in a DMR may be detected in a sample derived from a subject having cancer. This increase or decrease in methylation may be used as a marker for the subject’s cancer.
- the sample may comprise a plurality of DMRs that are indicative of the subject’s cancer.
- the plurality of DMRs may be specific to a subject’s cancer. In this way, the presence of one or more of the pluralities of DMRs may indicate that the cancer in present in the sample. By observing one or more of the pluralities of DMRs, the cancer may be monitored.
- a cancer may be monitored subsequent to a therapy to determine an efficacy of a therapy.
- the cancer can be monitored between two time points.
- the cancer can be monitored for regression and/or progression.
- by observing one or more of the pluralities of DMRs recurrence can be determined.
- Different subject may comprise a different set of DMRs.
- Monitoring subject may comprise monitoring DMRs that are specific to a given subject.
- the DMRs that are specific to a given subject can be used to detect cancer, detect cancer progression and/or regression, or detect minimal residual disease, or any combinations thereof.
- a first subject may comprise a DMR A, DMR B, and DMR C.
- a second subject may comprise a DMR X, DMR Y, and DMR Z.
- data relating to DMR A, DMR B, and DMR C may be used to detect the presence or absence of a tumor in the first subject.
- data relating to DMR X, DMR Y, and DMR Z may be used to detect the presence or absence of a tumor in the second subject.
- the first and second subject comprise different DMRs, data pertaining regions of other possible DMRs can still be collected. For example, data pertaining to regions corresponding to DMR A, DMR B, and DMR C can still be collected for the second subject.
- Downstream data analysis may selectively analyze specific DMRs.
- data pertaining to regions corresponding to DMR A, DMR B, and DMR C may be collect and then be omitted or ignored when monitoring the second subject.
- Collecting data for multiple DMR while analyzing a subset of the DMRs may allow for less sample specific reactions and eliminate the need to generate custom panels for each individuals and/or detecting cancer, while still providing data relevant for a given subject.
- the sequencing reactions may be the same for libraries derived from different subject, with the down stream analysis customized or personalized (e.g., algorithmically) for a given subject. This may reduce variability in the data, or reduce or eliminate the need for custom built probes, primers or other nucleic acid tools, while still allowing for personalization for a given subject.
- the subject-specific DMRs may comprise DMRs that are specific to a cancer subtype or cancer tissue of origin.
- the subject-specific DMRs may comprise a set of DMRs specific to a patient’s tumor.
- the subject-specific DMRs may comprise DMRs specific to a non- cancerous cell of a subject (e.g., patient).
- the methods of the disclosure allow for monitoring of a subject using one or more markers (e.g., DMRs) specific to the subject.
- the one or more markers can be DMRs.
- the methods may be performed without the use of a primer set or bait set specific to the one or more markers specific to the subject.
- the primer set can be nucleic acid sequences designed to anneal one or more regions corresponding to one or more markers.
- the bait set can be labeled probes that can capture one or more regions corresponding to one or more markers.
- the method may comprise the use of a primer set, or bait set that generated prior to determination of the one or more markers as being indicative of a tissue in a subject.
- the primer set or bait set may comprise primers or bait set that can anneal to the regions corresponding to the one or more markers, however the primer or bait sets may also anneal to regions that do not correspond to the one or more markers. For example, the primer or bait sets may anneal to regions that do not have complementary regions to that of one or more markers.
- the methods described in this disclosure may filter (e.g., computationally filter) or reduce the data set to comprise data to the one or more markers. For example, the methods may comprise obtaining data for a targeted sequencing reaction and then may be filtered to analyze data that can be deemed relevant for a given subject.
- the method disclosed herein may specifically analyze regions that are amenable to methylation, as opposed to a whole genome.
- Whole genome sequencing may be agnostic to regions that are able to or are otherwise amenable to methylation in a biologically suitable manner (e.g., via enzymatic methylation).
- Generating panels specific to methylatable regions e.g., proto-DMRs
- Generating panels for targeting one or more regions that have little to no methylation signals in non-cancer controls e.g., proto-DMRs
- a method of nucleic acids processing can comprise assaying methylation levels of a plurality of regions in a biological sample comprising cell-free nucleic acids.
- the plurality of regions may have been identified as regions of a genome that comprises (i) one or more sites that are amenable to methylation enrichment, and/or (ii) methylation at below a threshold in a healthy control.
- the method can further comprise processing methylation levels of the plurality of regions to identify a methylation background.
- the methylation levels can be processed for a subset of the plurality of regions.
- the subset can comprise differentially methylated regions (DMRs), to identify a DMR specific methylation level.
- DMRs differentially methylated regions
- a normalized DMR methylation level can be generated by normalizing the DMR specific level against the methylation background (e.g., anti-DMRs).
- the sample can be obtained from the subject.
- the sample can comprise nucleic acid molecules (e.g., cell-free nucleic acid molecules) from the subject.
- the nucleic acid molecules e.g., cell-free nucleic acid molecules
- the data set can comprise methylation states of one or more genomic regions.
- the one or more genomic regions can comprise differentially methylation regions (DMRs).
- At least a portion of the data set can be processed to generate an output indicative of cancer in the subject.
- the portion of the data set can pertain to a set of DMRs specific to the subject.
- DMRs specific to the subject can comprise DMRs that can be unique to the subject.
- DMRs specific to the subject can comprise DMRs that may be present in the subject but not in one or more other subjects.
- DMRs specific to the subject can comprise DMRs that can be identified from the subject.
- DMRs specific to the subject can comprise DMRs that can be personal to the subject.
- DMRs specific to the subject can comprise DMRs that have different magnitude of methylation states that can be unique to the subject.
- the portion of the data set can be processed by different approaches.
- the portion of the data set can be processed by a baseline-informed approach and/or a baseline-agnostic approach.
- the baseline-informed approach can be used when a baseline sample or a reference sample can be available.
- “Baseline sample” and “reference sample” can be used interchangeably.
- the reference sample can be obtained at a time prior to obtaining the sample from the subject.
- the reference sample can be obtained from the subject at a time subsequent to diagnosis and/or prior to treatment of a therapy.
- the reference sample can be a tissue sample (e.g., cancer tissue sample) and/or a non-tissue sample (e.g., plasma sample).
- both or either of the samples can be used in the baseline-informed approach.
- the baseline-informed approach can be performed alone.
- the baseline-informed approach and the baseline-agnostic approach can both be performed.
- the reference sample can not be available, the baseline-agnostic approach can be performed alone.
- the baseline-informed approach and/or the baseline-agnostic approach disclosed herein can utilize a universal panel of genomic regions.
- a “universal panel of genomic regions” can be used interchangeably with “a panel of control genomic regions,” and “proto- DMR panel” herein.
- the universal panel of genomic regions can comprise regions of nonmethylation and/or regions that are amenable to methylation enrichment in non-cancer controls (e.g., a pool of non-cancer subjects).
- the universal panel of genomic regions can comprise both regions of non-methylation and regions that are amenable to methylation enrichment in non-cancer controls (e.g., a pool of non-cancer subjects).
- the universal panel of genomic regions can comprise regions of little to no methylation signals in non-cancer controls (e.g., a pool of non-cancer subjects).
- the universal panel of genomic regions can be universally adaptable across any cancer type without modification to the panel.
- the need for expensive, hard-to-source cancer subjects may not be needed to generate a panel that can capture DMRs that may be associated with cancer.
- the universal panel of genomic regions e.g., proto-DMRs panel
- the universal panel of genomic regions can be used (e.g., may be sensitive) to identifying cancer-associated hypermethylation events (e.g., DMRs) across one or more types of cancer, since the universal panel of genomic can comprise regions that may have no to little methylation signal in non-cancer controls.
- the universal panel of genomic regions can be used to identify cancer-associated DMRs across various cancer types, which may eliminate the need to generate custom panels for each cancer types, and/or reduce the turn around time to identify cancer-associated DMRs across various cancer types.
- the universal panel of genomic regions can be optimized for use following a methylation enrichment assay to target specific genomic regions in the subject and/or one or more reference subjects (e.g., a pool of cancer subjects).
- a methylation enrichment assay e.g., cfMeDIP
- the universal panel of genomic regions can be used to capture specific genomic regions of the methylated nucleic acid molecules (e.g., cell-free methylated nucleic acid molecules) from the subject and/or one or more reference subjects (e.g., a pool of cancer subjects) for sequencing.
- cfMeDIP-seq + proto-DMR panel targeted enrichment can yield minimal sequencing signal in non-cancer samples, dramatically reducing unnecessary sequencing costs.
- high sequencing depth may not be required to identify genomic regions comprising DMRs from the subject and/or one or more reference subjects (e.g., a pool of cancer subjects).
- the turn around time and/or the cost to identify DMRs from the subjects and/or one or more reference subjects e.g., a pool of cancer subjects
- using the universal panel of genomic regions can ensure that non-cancer samples generate almost no sequencing signal, while in cancer samples, nearly all detected reads can come from ctDNA.
- the universal panel of genomic regions sequencing depth requirements can be determined by the number of molecules present rather than the number of regions sequenced. Expansion of the universal panel of genomic regions to a larger size (e.g., to 1MB) may not increase sequencing depth requirements.
- the universal panel of genomic regions can be generated from one or more control samples derived from one or more control subjects without cancer.
- the one or more control samples are one or more non-tissue samples and/or one or more tissue samples.
- the one or more control samples (e.g., from a pool of non-cancer subjects) can comprise control nucleic acid molecules.
- the one or more control samples can comprise control cell-free nucleic acid molecules.
- the control nucleic acid molecules e.g., cell-free nucleic acid molecules
- the methylation states of one or more control genomic regions can comprise hypermethylated states, methylated states, non-methylated states, or hypomethylated states, or any combinations thereof.
- the control nucleic acid molecules e.g., cell-free nucleic acid molecules
- the one or methylation reactions can comprise in vitro methylation reactions s.
- the one or more methylation reactions can result in methylation and/or hypermethylation of the one or more control genomic regions.
- the one or more methylation reactions can result in one or more fully methylated control genomic regions.
- One or more control genomic regions that are fully methylated can have methylation and/or hypermethylation at every nucleobases within the one or more control genomic regions. For example, all of the nucleobases within the control genomic regions can be methylated. In some cases, the one or more methylation reactions can result in one or more partially methylated control genomic regions. One or more control genomic regions that are partially methylated can have methylation and/or hypermethylation at a portion of nucleobases within the one or more control genomic regions. For example, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, or at least 99% of nucleobases can be methylated and/or hypermethylated within the control genomic regions.
- nucleobases can be methylated and/or hypermethylated within the control genomic regions.
- the control nucleic acid molecules can be assayed by sequencing.
- the control nucleic acid molecules can be sequenced after conducting the one or more methylation reactions (e.g., in vitro enzymatic methylation) and/or after conducting the one or more methylation enrichment reactions.
- the one or more methylation reactions can be conducted, followed by the one or more methylation enrichment reactions.
- control nucleic acid molecules can be subjected to in vitro methylation, and then can be enriched for one or more methylated regions by performing a methylation enrichment reaction.
- Sequencing the control nucleic acid molecules (e.g., cell-free nucleic acid molecules) after conducting the one or more methylation reactions and/or one or more methylation enrichment reactions can generate a control data set comprising one or more control regions that can be amenable to methylation enrichment (e.g., enzymatic methylation enrichment).
- the control nucleic acid molecules e.g., cell-free nucleic acid molecules
- Sequencing the control nucleic acid molecules (e.g., cell-free nucleic acid molecules) without conducting the one or more methylation regions (e.g., in vitro methylation) can generate another control data set comprising one or more controls regions that can be hypomethylated and/or non-methylated.
- the another control data set can comprise one or more controls regions that can have little to no methylation signals in non-cancer subjects.
- the set of the control nucleic acid molecules can be sequenced after conducting the one or more methylation reactions, and another set of the control nucleic acid molecules (e.g., cell-free nucleic acid molecules) can be sequenced without conducting the one or more methylation reactions.
- the data set comprising one or more control regions that can be amenable to methylation enrichment and the another data set comprising one or more controls regions that are hypomethylated and/or non-methylated can be processed.
- one or more control regions that are amenable to methylation enrichment can comprise one or more control regions that can be enriched for one or more methylated control regions after in vitro methylation.
- Processing the control data set and the another control data set can comprise intersecting the control data set and the another control data set to identify one or more regions that can be common between the control data set and the another control data set, thereby generating the universal panel of genomic regions (e.g., proto-DMRs).
- the one or more regions that are common between the two data set can comprise one or more control regions that can be amenable to methylation enrichment and regions that can be hypomethylated and/or non-methylated in non-cancer controls (e.g., a pool of non-cancer subjects).
- the universal panel of genomic regions can comprise hypomethylated regions, regions of non-methylation, and/or regions that are amenable to methylation enrichment, or any combinations thereof.
- the universal panel of genomic regions can comprise regions of non-methylation and regions that are amenable to methylation enrichment in non- cancer controls (e.g., a pool of non-cancer subjects).
- An another aspect disclosed herein is a method of processing a nucleic acid sample of a subject.
- Nucleic acid molecules can be obtained from the nucleic acid sample of the subject.
- One or more methylated regions of the nucleic acid molecules can be enriched.
- a universal panel of genomic regions can be used to target one or more genomic regions of the one or more methylated regions.
- the universal panel of genomic regions can be used following methylation enrichment to target specific genomic regions of the subject and/or one or more reference subjects (e.g., a pool of cancer subjects). For example, after subjecting nucleic acid molecules to methylation enrichment assay (e.g., cfMeDIP), the universal panel of genomic regions can be used to capture specific genomic regions of the methylated nucleic acid molecules from the subject for sequencing.
- methylation enrichment assay e.g., cfMeDIP
- the sequencing depth can be a depth of at most 10 million (M) single reads, at most 20 M single reads, at most 30 M single reads, at most 40 M single reads, at most 50 M single reads, at most 60 M single reads, at most 70 M single reads, at most 80 M single reads, at most 90M single reads, or at most 100 M reads.
- M 10 million
- the sequencing depth can be a depth from 1 M single reads to 10 M single reads, from 10 M single reads to 20 M single reads, from 20 M single reads to 30 M single reads, from 30 M single reads to 40 M single reads, from 40 M single reads to 50 M single reads, from 50 M single reads to 60 M single reads, from 60 M single reads to 70M single reads, from 70 M single reads to 80 M single reads, from 80 M single reads to 90 M single reads, or from 90 M single reads to 100 M single reads.
- the sequencing depth can be a depth of at least 1 M single reads, at least 10 M single reads, at least 20 M single reads, at least 30 M single reads, at least 40 M single reads, at least 50 M single reads, at least 60 M single reads, at least 70 M single reads, at least 80 M single reads, at least 90 M single reads, at least 100 M single reads, or at least 200 M single reads.
- FIG. 14 An example of workflow of the method 1400 for generating the universal panel of genomic regions (e.g., proto-DMRs) is shown in FIG. 14.
- One or more control samples can be obtained.
- the one or more control samples can be derived from one or more non-cancer controls (e.g., a pool of non-cancer subjects) 1402.
- the one or more control samples can be derived from one or more blood sample or plasma sample.
- the one or more control samples can be derived from one or more tissue sample.
- the one or more control samples can comprise nucleic acid molecules derived from one or more non-tissue samples or one or more tissue samples (e.g., cell free nucleic acid molecules or nucleic acid molecules derived from a tissue sample).
- the nucleic acid molecules (e.g., cell free methylated nucleic acid molecules) from the one or more control samples can be further assayed for methylation enrichment.
- methylation enrichment can be performed with cell free methylated DNA immunoprecipitation (cfMeDIP) 1403.
- cfMeDIP can pulldown cell free methylated nucleic acids (e.g., cell free nucleic acid molecules) for subsequent sequencing.
- methylation enrichment can be performed by subjecting the nucleic acid molecules (e.g., cell free nucleic acid molecules) to methylated DNA immunoprecipitation (MeDIP), cell-free methyl-CpG binding domain (cfMBD), methyl-CpG binding domain (MBD), methylationdependent immunoprecipitation (MDIP), methylation-sensitive restriction enzyme (MSRE), TET-assisted pyridine borane sequencing (TAPS) with methylation specific PCR, bisulfite conversion with methylation specific PCR, and/or methylation-specific hybrid capture, or other derivatives thereof.
- MeDIP methylated DNA immunoprecipitation
- cfMBD cell-free methyl-CpG binding domain
- MBD methyl-CpG binding domain
- MDIP methylation-sensitive immunoprecipitation
- MSRE methylation-sensitive restriction enzyme
- TAPS TET-assisted pyridine borane sequencing
- Sequencing can be performed with next generation sequencing (NGS) 1404 or with any sequencing methods disclosed herein.
- Sequencing can generate one or more data sets comprising sequencing reads corresponding to methylated regions, hypermethylated regions, hypomethylated regions, or non-methylated regions, or combination thereof that can be mapped along a genome (e.g., human genome).
- the one or more data sets can be analyzed to identify one or more regions with low number of sequencing reads 1405.
- the sequencing reads from the two or more data sets can be averaged to determine one or more regions with low number of average sequencing reads.
- the average can be a weighted average.
- Low number of sequencing reads, or low average sequencing reads can be at most 10, at most 9, at most 8, at most 7, at most 6, at most 5, at most 4, at most 3, at most 2, or at most 1 sequencing reads.
- the one or more regions with low number of sequencing reads can represent one or more regions that have low methylation signals 1408. In some cases, the one or more regions with low methylation signals can be hypomethylated regions and/or nonmethylated regions.
- the one or more control samples derived from non-cancer controls can also be subjected to one or more methylation reactions.
- the one or more methylation reactions can be an in vitro methylation.
- the one or more methylation reactions can generate one or more in-vitro fully methylated samples 1401.
- the one or more in vitro fully methylated samples can have genomic regions with nucleobases that are all methylated.
- the one or more in vitro fully methylated samples can comprise methylated control nucleic acid molecules (e.g., cell free nucleic acid molecules).
- methylated control nucleic acid molecules e.g., cell free nucleic acid molecules
- methylation enrichment can be performed with cell free methylated DNA immunoprecipitation (cfMeDIP) 1403.
- cfMEDIP can pulldown methylated nucleic acids (e.g., cell free nucleic acid molecules) for subsequent sequencing.
- methylation enrichment can be performed by subjecting the nucleic acid molecules (e.g., cell free nucleic acid molecules) to methylated DNA immunoprecipitation (MeDIP), cell-free methyl-CpG binding domain (cfMBD), methyl-CpG binding domain (MBD), methylation-dependent immunoprecipitation (MDIP), methylationsensitive restriction enzyme (MSRE), TET-assisted pyridine borane sequencing (TAPS) with methylation specific PCR, bisulfite conversion with methylation specific PCR, and/or methylation-specific hybrid capture, or other derivatives thereof.
- MeDIP methylated DNA immunoprecipitation
- cfMBD cell-free methyl-CpG binding domain
- MBD methylation-dependent immunoprecipitation
- MDIP methylation-dependent immunoprecipitation
- MSRE methylationsensitive restriction enzyme
- TET-assisted pyridine borane sequencing TAPS
- Sequencing can be performed with next generation sequencing (NGS) 1404 or with any sequencing methods disclosed herein.
- Sequencing can generate one or more data set comprising sequencing reads corresponding to methylated regions, hypermethylated regions, hypomethylated regions, or non-methylated regions, or combination thereof that can be mapped along a genome (e.g., human genome).
- the one or more data set can be analyzed to identify one or more regions with high number of sequencing reads 1406.
- One or more regions with high number of sequencing reads can be regions with high binding affinity, where the regions can be pulled down for sequencing upon methylation enrichment.
- the sequencing reads from the two or more data sets can be averaged to determine one or more regions with high number of average sequencing reads.
- the average can be a weighted average.
- a computational method can be applied to identify one or more regions with high number of sequencing reads.
- a computational model-based analysis of ChlP-Seq (MACS) peak calling method can be used to identify peaks associated with high sequencing reads.
- the one or more regions with high number of sequencing reads can represent one or more regions with high methylation enrichment 1407.
- the one or more regions with high methylation enrichment can be regions that are amenable to methylation enrichment (e.g., enzymatic methylation enrichment).
- the identified one or more high methylation enrichment regions and the one or more low signal regions in non-cancer controls can be intersected 1409 to generate the proto- DMRs 1410.
- the intersecting can comprise finding one or more regions that can be both regions of high methylation enrichment and low signals in non-cancer controls (e.g., a pool of non-cancer subjects).
- the baseline-informed approach can identify one or more subject-specific DMRs derived from using a reference sample obtained from the subject.
- the reference sample can be obtained prior to obtaining the sample.
- the one or more subject-specific DMRs can be personalized to the subject.
- the one or more subject-specific DMRs can be unique to the subject.
- the baseline-informed approach can generate personalized ctDNA quantification and/or classification while maintaining high specificity.
- a customized DMR signature (e.g., DMRs/anti-DMRs specific to a reference subject) can be generated with the baseline-informed approach, which can help avoid reliance on generic indication specific signature, and can make more precise in identifying DMRs relevant to the subject.
- the baseline-informed approach comprises DMR selection that can be restricted to regions that can also be present in a known positive control cohort via prevalence filtering, and/or can ensure statistical significance by applying strict outlier detection techniques.
- the one or more subject-specific DMRs can be used for tracking a subject’s progression and/or regression of a cancer.
- the reference sample e.g., plasma sample
- the reference sample can be used in the baseline-informed approach to generate a personalized set of one or more subjectspecific DMRs that may be used to track in the subject over time.
- the reference sample can be a tissue sample.
- the tissue sample can be a resected tumor tissue.
- the baseline-informed approach can be a tissue informed and/or a tissue naive approach.
- the one or more subject-specific DMRs generated from using the tissue sample and/or non-tissue sample in the baseline-informed approach can have enhanced specificity.
- the baseline-informed approach can use a single universal panel of genomic regions (e.g., proto-DMRs), eliminating the need for patient-specific panels, reducing cost and complexity.
- the baseline-informed approach can generate unique patient specific DMRs and anti-DMRs based on the reference sample (e.g., tissue sample and/or non-tissue sample).
- the sample used in the baseline-informed approach can be a non-tissue sample (e.g., a blood sample and/or a plasma sample) and/or a tissue sample.
- the nucleic acid molecules from a non-tissue sample and/or a tissue sample can comprise nucleic acid molecules can be derived from a tissue sample or a non-tissue sample.
- the nucleic acid molecules (e.g., cell-free nucleic acid molecules) obtained from the sample can be assayed for methylation enrichment.
- the methylation enrichment can be performed with cell free methylated DNA immunoprecipitation (cfMeDIP).
- cfMeDIP can pulldown cell free methylated nucleic acids (e.g., cell free nucleic acid molecules) for subsequent sequencing.
- methylation enrichment can be performed by subjecting the nucleic acid molecules (e.g., cell free nucleic acid molecules) to methylated DNA immunoprecipitation (MeDIP), cell-free methyl-CpG binding domain (cfMBD), methyl -CpG binding domain (MBD), methylation-dependent immunoprecipitation (MDIP), methylation-sensitive restriction enzyme (MSRE), TET- assisted pyridine borane sequencing (TAPS) with methylation specific PCR, bisulfite conversion with methylation specific PCR, and/or methylation-specific hybrid capture, or other derivatives thereof.
- Methylation enrichment can generate methylated nucleic acid molecules (e.g., cell-free methylated nucleic acid molecules).
- the methylated nucleic acid molecules can be subjected to proto-DMR enrichment by using a universal panel of genomic regions disclosed herein.
- performing proto-DMR enrichment can target and isolate genomic regions that have complementarity (e.g., sequence complementarity) to the one or more proto-DMRs.
- the universal panel of genomic regions e.g., proto-DMRs
- the universal panel of genomic regions can be used to enrich for one or more genomic regions that correspond to one or more control regions that are regions of non-methylation and regions that are amenable to methylation enrichment in control subjects without cancer.
- corresponding can mean that one or more genomic regions can have genomic sequences that are complementary to one or more sequences in the proto-DMRs.
- the universal panel of genomic regions can be used to enrich for one or more genomic regions of the proto-DMRs by hybrid capture.
- the universal panel of genomic regions can be used to enrich for one or more genomic regions of the proto-DMRs by multiplex PCR.
- enriching can be performed by targeting the one or more genomic regions that can have complementarity (e.g., sequence complementarity) to one or more control regions that are regions of non-methylation and regions that are amenable to methylation enrichment in control subjects without cancer.
- the methylated nucleic acid molecules (e.g., cell-free methylated nucleic acid molecules) after proto-DMR enrichment can be subjected to sequencing to generate a data set comprising methylation states of one or more genomic regions.
- the one or more methylated nucleic acid molecules can be subjected to whole methylome sequencing to generate the data set comprising methylation states of the one or more genomic regions.
- Whole methylome sequencing can be sequencing performed on one or more methylated nucleic acid molecules without further enrichment with the universal panel of genomic regions.
- the methylation states of the one or more genomic regions can comprise hypermethylated states, methylated states, non-methylated states, or hypomethylated states, or any combinations thereof.
- the baseline-informed approach can comprise identifying a set of DMRs specific to the reference subject and/or a set of anti-DMRs specific to the reference subject.
- a set of anti-DMRs and/or a set of DMRs specific to one or more reference subjects can refer to anti-DMRs and/or a set of DMRs that can be identified by assaying the one or more reference samples.
- the set of DMRs specific to the reference subject and/or the set of anti-DMRs specific to the reference subject can be identified with a reference sample by using the universal panel of genomic regions (e.g., proto-DMRs).
- the set of DMRs specific to the reference subject and/or the set of anti-DMRs specific to the reference subject can be selected from the universal panel of genomic regions.
- the set of DMRs specific to the reference subject and/or the set of anti-DMRs specific to the reference subject can be a subset of the universal panel of genomic regions.
- the reference subject can be the subject.
- the method can comprise obtaining the reference sample from the subject.
- the reference sample can be obtained at a time prior to obtaining the sample.
- the reference sample can be obtained from the subject subsequent to diagnosis with the cancer.
- the reference sample can be obtained from the subject prior to treatment with a therapy.
- the reference sample can be obtained from the subject subsequent to diagnosis with the cancer and prior to treatment with a therapy.
- the reference sample can be a blood sample and/or a plasma sample.
- the reference sample can be a tissue sample.
- the tissue sample can be a cancer tissue sample from the subject.
- the tissue sample can be a resected tumor tissue.
- the reference sample can comprise reference nucleic acid molecules derived from a tissue sample.
- the reference sample can comprise reference nucleic acid molecules (e.g., cell-free nucleic acid molecules) derived from a non-tissue sample.
- the reference nucleic acid molecules e.g., cell-free nucleic acid molecules
- the methylation enrichment can be performed with cell free methylated DNA immunoprecipitation (cfMeDIP).
- cfMeDIP can pulldown cell free methylated nucleic acids (e.g., cell free nucleic acid molecules) for subsequent sequencing.
- methylation enrichment can be performed by subjecting the nucleic acid molecules (e.g., cell free nucleic acid molecules) to methylated DNA immunoprecipitation (MeDIP), cell-free methyl-CpG binding domain (cfMBD), methyl-CpG binding domain (MBD), methylation-dependent immunoprecipitation (MDIP), methylation-sensitive restriction enzyme (MSRE), TET-assisted pyridine borane sequencing (TAPS) with methylation specific PCR, bisulfite conversion with methylation specific PCR, and/or methylation-specific hybrid capture, or other derivatives thereof.
- MeDIP methylated DNA immunoprecipitation
- cfMBD cell-free methyl-CpG binding domain
- MBD methyl-dependent immunoprecipitation
- MDIP methylation-sensitive restriction enzyme
- TAPS TET-assisted pyridine borane sequencing
- Methylation enrichment can generate methylated reference nucleic acid molecules (e.g., cell-free methylated nucleic acid molecules).
- the methylated reference nucleic acid molecules e.g., cell-free methylated nucleic acid molecules
- proto-DMR enrichment can mean to target and isolate genomic regions that have complementarity (e.g., sequence complementarity) to the one or more proto-DMRs.
- the universal panel of genomic regions can be used to enrich for one or more reference genomic regions that corresponds to one or more control regions that are hypomethylated regions, regions of non-methylation, and/or regions that are amenable to methylation enrichment, or any combinations thereof in control subjects without cancer.
- the universal panel of genomic regions e.g., proto-DMRs
- corresponding can mean that one or more genomic regions can have genomic sequences that are complementary to one or more sequences in the proto-DMRs.
- the universal panel of genomic regions can be used to enrich for one or more genomic regions of the proto-DMRs by hybrid capture.
- the universal panel of genomic regions can be used to enrich for one or more genomic regions of the proto- DMRs by multiplex PCR.
- enriching can be performed by targeting the one or more genomic regions that can have complementarity (e.g., sequence complementarity) to one or more control regions that are regions of nonmethylation and regions that are amenable to methylation enrichment in control subjects without cancer.
- the methylated reference nucleic acid molecules (e.g., cell-free methylated nucleic acid molecules) after proto-DMR enrichment can be subjected to sequencing to generate a reference data set comprising methylation states of one or more reference genomic regions.
- the methylated reference nucleic acid molecules (e.g., cell-free methylated nucleic acid molecules) can be subjected to whole methylome sequencing to generate a reference data set comprising methylation states of one or more reference genomic regions.
- Whole methylome sequencing can be sequencing performed on one or more methylated nucleic acid molecules without further enrichment with the universal panel of genomic regions.
- the methylation states of the one or more reference genomic regions can comprise hypermethylated states, methylated states, non-methylated states, or hypomethylated states, or any combinations thereof.
- the reference data set can be processed to identify a set of DMRs specific to the reference subject. For example, the reference data set can be analyzed to determine regions with high sequencing reads.
- the set of DMRs specific to the reference subject can comprise reference genomic regions that are hypermethylated and/or methylated.
- the reference data set can be processed to identify a set of anti-DMRs specific to the reference subject. For example, the reference data set can be analyzed to determine regions with little to no sequencing reads.
- the set of anti-DMRs specific to the reference subject can comprise reference genomic regions that are non-methylated and/or hypomethylated.
- the set of anti-DMRs reference specific to the subject can comprise genomic regions that are non-methylated and/or hypomethylated in non-cancer subjects.
- the baseline-informed approach can comprise identifying a set of anti- DMRs specific to one or more reference subjects.
- a set of anti-DMRs specific to one or more reference subjects can refer to the anti-DMRs that can be identified by assaying the one or more reference samples.
- a set of anti-DMRs specific to one or more reference subjects and/or a set of DMRs specific to one or more reference subjects can be identified with one or more reference samples by using the universal panel of genomic regions (e.g., proto-DMRs).
- the set of anti-DMRs specific to the one or more reference subjects and/or a set of DMRs specific to one or more reference subjects can be selected and/or targeted from the universal panel of genomic regions.
- the set of anti-DMRs specific to the one or more reference subjects can be a subset of the universal panel of genomic regions.
- the method can comprise obtaining one or more reference samples from one or more reference subjects.
- the one or more reference subjects can have cancer.
- the one or more reference samples can be one or more blood sample or one or more plasma sample.
- one or more reference samples can be one or more tissue samples.
- one or more reference samples can comprise reference nucleic acid molecules derived from one or more tissue sample.
- one or more reference samples can comprise reference nucleic acid molecules derived from one or more non-tissue samples (e.g., cell-free nucleic acid molecules).
- the one or more reference nucleic acid molecules e.g., cell-free nucleic acid molecules
- the methylation enrichment can be performed with cell free methylated DNA immunoprecipitation (cfMeDIP).
- cfMeDIP can pulldown cell free methylated nucleic acids (e.g., cell free nucleic acid molecules) for subsequent sequencing.
- methylation enrichment can be performed by subjecting the nucleic acid molecules (e.g., cell free nucleic acid molecules) to methylated DNA immunoprecipitation (MeDIP), cell-free methyl-CpG binding domain (cfMBD), methyl -CpG binding domain (MBD), methylation-dependent immunoprecipitation (MDIP), methylation-sensitive restriction enzyme (MSRE), TET- assisted pyridine borane sequencing (TAPS) with methylation specific PCR, bisulfite conversion with methylation specific PCR, and/or methylation-specific hybrid capture, or other derivatives thereof.
- MeDIP methylated DNA immunoprecipitation
- cfMBD cell-free methyl-CpG binding domain
- MBD methylation-dependent immunoprecipitation
- MDIP methylation-sensitive restriction enzyme
- MSRE methylation-sensitive restriction enzyme
- TAPS TET- assisted pyridine borane sequencing
- Methylation enrichment can generate one or more methylated reference nucleic acid molecules (e.g., cell-free methylated nucleic acid molecules).
- the one or more methylated reference nucleic acid molecules e.g., cell-free methylated nucleic acid molecules
- performing proto-DMR enrichment can target and isolate genomic regions that have complementarity (e.g., sequence complementarity) to the one or more proto-DMRs.
- the universal panel of genomic regions can be used to enrich for one or more reference genomic regions that corresponds to one or more control regions that are hypomethylated regions, regions of non-methylation, and/or regions that are amenable to methylation enrichment, or any combinations thereof in controls subject without cancer.
- the universal panel of genomic regions e.g., proto-DMRs
- corresponding can mean that one or more genomic regions can have genomic sequences that are complementary to one or more sequences in the proto- DMRs.
- the universal panel of genomic regions can be used to enrich for one or more genomic regions of the proto-DMRs by hybrid capture.
- the universal panel of genomic regions can be used to enrich for one or more genomic regions of the proto-DMRs by multiplex PCR.
- enriching can be performed by targeting the one or more genomic regions that can have complementarity(e.g., sequence complementarity) to one or more control regions that are regions of non-methylation and regions that are amenable to methylation enrichment in control subjects without cancer.
- the one or more methylated reference nucleic acid molecules (e.g., cell-free methylated nucleic acid molecules) after proto-DMR enrichment can be subjected to sequencing to generate a reference data set comprising methylation states of one or more reference genomic regions.
- the one or more methylated reference nucleic acid molecules (e.g., cell-free methylated nucleic acid molecules) can be subjected to whole methylome sequencing to generate a reference data set comprising methylation states of one or more reference genomic regions.
- Whole methylome sequencing can be sequencing performed on one or more methylated nucleic acid molecules without further enrichment with the universal panel of genomic regions.
- the methylation states of the one or more reference genomic regions can comprise hypermethylated states, methylated states, non-methylated states, or hypomethylated states, or any combinations thereof.
- the reference data set can be processed to identify a set of DMRs specific to one or more reference subjects. For example, the reference data set can be analyzed to determine regions with high sequencing reads.
- the set of DMRs specific to the one or more reference subjects can comprise reference genomic regions that are hypermethylated and/or methylated.
- the reference data set can be processed to identify a set of anti-DMRs specific to the one or more reference subjects. For example, the reference data set can be analyzed to determine regions with little to no sequencing reads.
- the set of anti-DMRs specific to the one or more reference subjects can comprise reference genomic regions that are non-methylated and/or hypomethylated.
- the set of anti-DMRs specific to the one or more reference subjects can comprise genomic regions that are non-methylated and/or hypomethylated in non-cancer subjects.
- the set of anti-DMRs specific to one or more reference subjects can be identified using the one or more reference samples and one or more control samples.
- the one or more control samples can be obtained from control subjects without cancer.
- the one or more control samples can be one or more tissue samples and/or one or more nontissue samples.
- the one or more control samples can comprise one or more control nucleic acid from one or more tissue samples and/or one or more non-tissue samples.
- the one or more control samples can comprise one or more control nucleic acid molecules (e.g., cell-free nucleic acid molecules).
- the one or more control nucleic acid molecules (e.g., cell-free nucleic acid molecules) and the one or more reference nucleic acid molecules (e.g., cell-free nucleic acid molecules) can be assayed for methylation enrichment.
- the methylation enrichment can be performed with cell free methylated DNA immunoprecipitation (cfMeDIP).
- cfMeDIP can pulldown methylated nucleic acids (e.g., cell free nucleic acid molecules) for subsequent sequencing.
- methylation enrichment can be performed by subjecting the nucleic acid molecules (e.g., cell free nucleic acid molecules) to methylated DNA immunoprecipitation (MeDIP), cell-free methyl-CpG binding domain (cfMBD), methyl-CpG binding domain (MBD), methylationdependent immunoprecipitation (MDIP), methylation-sensitive restriction enzyme (MSRE), TET-assisted pyridine borane sequencing (TAPS) with methylation specific PCR, bisulfite conversion with methylation specific PCR, and/or methylation-specific hybrid capture, or other derivatives thereof.
- MeDIP methylated DNA immunoprecipitation
- cfMBD cell-free methyl-CpG binding domain
- MBD methyl-CpG binding domain
- MDIP methylationdependent immunoprecipitation
- MSRE methylation-sensitive restriction enzyme
- TAPS TET-assisted pyridine borane sequencing
- Methylation enrichment can generate one or more methylated reference nucleic acid molecules (e.g., cell-free methylated nucleic acid molecules) and/or one or more methylated control nucleic acid molecules (e.g., cell-free methylated nucleic acid molecules).
- the one or more methylated reference nucleic acid molecules and/or the one or more methylated control nucleic acid molecules can be subjected to sequencing to generate another reference data set comprising i) methylation states of one or more reference genomic regions and/or ii) methylation states of one or more control genomic regions.
- the methylation states of one or more reference genomic regions and one or more control genomic regions can be compared to identify one or more nonmethylated regions and/or hypomethylated regions in both reference genomic regions and control genomic regions.
- the identified one or more non-methylated regions and/or hypomethylated regions can be compared with the universal panel of genomic regions (e.g., proto-DMRs) to identify anti-DMRs specific with the one or more reference subjects.
- the identified one or more non-methylated regions and/or hypomethylated regions can be intersected with the universal panel of genomic regions (e.g., proto-DMRs) to identify regions of that overlap.
- one or more genomic regions that overlap e.g., have same genomic coordinate
- the set of anti-DMRs specific to the one or more reference subjects can comprise reference genomic regions that are non-methylated and/or hypomethylated.
- the one or more genomic regions of the subject can be compared to the set of DMRs specific to the reference subject. In some cases, comparing can generate one or more counts of the set of DMRs specific to the subject. In some cases, counts can be methylation signal obtained after analyzing the sequencing data set. In some cases, counts can be sequencing reads that map to the region of interest and/or average of the sequencing reads that map to the region of interest. For example, counts can be one or more sequencing reads in one or more genomic regions that map to the set of DMRs specific to the reference subject to generate one or more counts of the set of DMRs specific to the subject.
- mapping can mean finding one or more genomic regions that share the same genomic coordinates as the set of DMRs specific to the reference subject.
- the one or more genomic regions of the subject can be compared to the set of anti-DMRs specific to the reference subject and/or anti-DMRs specific to the one or more reference subjects.
- comparing can generate one or more counts of the set of anti-DMRs specific to the subject.
- counts can be methylation signal obtained after analyzing the sequencing data set.
- counts can be sequencing reads that map to the region of interest and/or average of the sequencing reads that map to the region of interest.
- comparing can comprise counting for one or more sequencing reads in one or more genomic regions that map to the set of anti-DMRs specific to the reference subject and/or anti-DMRs specific the one or more reference subjects to generate one or more counts of the set of anti-DMRs specific to the subject.
- mapping can mean finding one or more genomic regions that share the same genomic coordinates as the set of DMRs and/or the set of anti-DMRs specific to the reference subject.
- the one or more counts of the set of DMRs specific to the subject can be normalized to the one or more counts of the set of anti- DMRs specific to the subject. Normalizing can be performed to ensure data consistency across different data sets and reduce noise that may exist within the data sets.
- normalizing can generate a methylation score. In some cases, normalizing can be performed by computing the ratio of the set of the one or more counts of the set of DMRs specific to the subject to the one or more counts of the set of anti -DMRs specific to the subject. In some cases, the one or more counts of the set of DMRs specific to the subject can be averaged to generate an average count of the set of DMRs specific to the subject. In some cases, the one or more counts of the set of anti -DMRs specific to the subject can be averaged to generate an average count of the set of anti -DMRs specific to the subject.
- ratio of the average count of the set of DMRs specific to the subject to the average count of the set of anti -DMRs specific to the subject can be computed.
- the ratio computed can be a methylation score.
- the methylation score can be a signal-to-noise ratio.
- the methylation score can be generated by using one or more Bayesian-based statistical inference methods.
- the Bayesian-based statistical inference methods can be for estimating parameters and/or performing regression to generate the methylation score.
- the methylation score can be a circulating tumor DNA quantification.
- the methylation score can be compared to a threshold score, thereby generating the output indicative of the cancer in the subject.
- the methylation score above the threshold score can be indicative of the presence of cancer.
- the methylation score above the threshold score can be indicative of the presence of circulating tumor DNAs.
- the methylation score below the threshold score can be indicative of the absence of circulating tumor DNAs.
- the threshold score can be generated with a plurality of control samples obtained from a plurality of control subjects. For example, a plurality of control samples can be subjected to the baseline-informed approach disclosed herein to compute a plurality of methylation scores.
- the threshold score can be set where at least 5% of the control subjects had a score defined to be false positives. In some cases, the threshold score can be set where at least 1%, at least 2%, at least 3%, 4 at least %, at least 5%, at least 6%, at least 7%, at least 8%, at least 9%, at least 10%, or more of the control subjects had a score defined to be false positives.
- the threshold score can be set where at most 1%, at most 2%, at most 3%, at most 4%, at most 5%, at most 6%, at most 7%, at most 8%, at most 9%, at most 10%, or more of the control subjects had a score defined to be false positives.
- a baseline sample can be obtained from a subject.
- the term “baseline sample” can be used interchangeably with the “reference sample” herein.
- the term “subject” can be used interchangeably with “patient” herein.
- the baseline sample can be obtained from the subject at a time prior to obtaining the sample of interest.
- the baseline sample can be obtained from the subject subsequent to diagnosis and/or prior to treatment with a therapy.
- the baseline sample can be a tissue sample 1501.
- the baseline sample can be a blood sample and/or a plasma sample.
- nucleic acid molecules can be derived from a tissue sample and/or a non-tissue sample (e.g., blood sample, plasma sample).
- the baseline sample can comprise nucleic acid molecules (e.g., cell- free nucleic acid molecules) 1501.
- the baseline sample can be subjected to a methylation enrichment assay 1502 to enrich for one or more methylation regions.
- the nucleic acid molecules can be subjected to methylated DNA immunoprecipitation (cfMeDIP) or similar methylation enrichment thereof disclosed herein to enrich for one or more methylated nucleic acids (e.g., cell-free methylated nucleic acid molecules) for subsequent sequencing.
- cfMeDIP methylated DNA immunoprecipitation
- the baseline sample can be subjected to further proto-DMR enrichment by using the universal panel of genomic regions disclosed herein (e.g., Proto-DMRs) 1503.
- the universal panel of genomic regions e.g., proto-DMRs
- performing proto-DMR enrichment can target and isolate genomic regions that have complementarity (e.g., sequence complementarity) to one or more proto- DMRs.
- the one or more reference genomic regions of interest can correspond to one or more control regions that are hypomethylated regions, regions of non-methylation, and/or regions that are amenable to methylation enrichment, or any combinations thereof in controls subject without cancer (e.g., correspond to proto-DMRs).
- corresponding can mean that one or more genomic regions (e.g., reference genomic regions) can have genomic sequences that are complementary to one or more sequences in the proto-DMRs.
- the universal panel of genomic regions can be used to enrich for one or more genomic regions of the proto-DMRs by hybrid capture.
- the universal panel of genomic regions can be used to enrich for one or more genomic regions of the proto-DMRs by multiplex PCR. For example, with the proto- DMR panel, enriching can be performed by targeting the one or more genomic regions that correspond to one or more control regions that are regions of non-methylation and regions that are amenable to methylation enrichment in control subjects without cancer.
- the enriched methylated nucleic acids e.g., cell free methylated nucleic acids
- Sequencing can be performed with next generation sequencing (NGS) 1504 or with any sequencing methods disclosed herein.
- Sequencing can generate a dataset comprising methylated regions, hypermethylated regions, hypomethylated regions, or non-methylated regions, or combination thereof that can be mapped along a genome (e.g., human genome).
- the dataset can be processed to identify one or more patient specific hypermethylated DMRs 1505.
- the sequencing reads can be analyzed to identify one or more regions with high number of reads to identify methylated and/or hypermethylated regions (e.g., DMRs).
- the identified patient specific hypermethylated DMRs can be analyzed for prevalence.
- the patient specific hypermethylated DMRs can be compared to DMRs of non-cancer subject cohort to determine whether the identified patient specific hypermethylated DMRs are statistical outliers.
- DMRs can be statistical outliers when compared to one or more non-cancer cohorts.
- the identified patient specific methylated DMRs and/or hypermethylated DMRs can be compared to DMRs of cancer subject cohort to ensure the patient specific methylated DMRs and/or hypermethylated DMRs are present in both the subject’s sample and cancer subject cohort.
- false positives in identifying the patient specific methylated DMRs and/or hypermethylated DMRs can be reduced.
- Another sample can be obtained from the subject at a time point subsequent to obtaining the baseline sample.
- the another sample can be a tissue sample and/or a non-tissue sample (e.g., a blood sample and/or a plasma sample) from the subject.
- the another sample can be nucleic acid molecules derived from a tissue sample and/or a nontissue sample from the subject.
- the another sample can comprise cell-free nucleic acid molecules 1506.
- the another sample can be subjected to a methylation enrichment assay to enrich for one or more methylation regions 1507.
- the nucleic acid molecules can be subjected to methylated DNA immunoprecipitation (cfMeDIP) or similar methylation enrichment thereof disclosed herein to enrich for one or more methylated nucleic acids (e.g., cell-free methylated nucleic acid molecules) for subsequent sequencing.
- cfMeDIP methylated DNA immunoprecipitation
- the sample can be subjected to further proto-DMR enrichment by using the universal panel of genomic regions disclosed herein (e.g., Proto-DMRs) 1508.
- proto-DMR enrichment can mean to target and isolate genomic regions that have complementarity (e.g., sequence complementarity) to the one or more proto-DMRs.
- the universal panel of genomic regions can be used to enrich for one or more reference genomic regions of interest.
- the one or more reference genomic regions of interest can correspond to one or more control regions that are hypomethylated regions, regions of non-methylation, and/or regions that are amenable to methylation, or any combinations thereof in controls subject without cancer (e.g., correspond to proto-DMRs).
- corresponding can mean that one or more genomic regions can have genomic sequences that are complementary to one or more sequences in the proto-DMRs.
- the universal panel of genomic regions can be used to enrich for one or more genomic regions of the proto-DMRs by hybrid capture.
- the universal panel of genomic regions can be used to enrich for one or more genomic regions of the proto- DMRs by multiplex PCR.
- enriching can be performed by targeting the one or more genomic regions that complementarity (e.g., sequence complementarity) to one or more control regions that are regions of nonmethylation and regions that are amenable to methylation enrichment in control subjects without cancer.
- the enriched methylated nucleic acids e.g., cell free methylated nucleic acids
- Sequencing can be performed with next generation sequencing (NGS) 1509 or with any sequencing methods disclosed herein.
- NGS next generation sequencing
- Sequencing can generate a dataset comprising methylated regions, hypermethylated regions, hypomethylated regions, or non-methylated regions, or combination thereof that can be mapped along a genome (e.g., human genome) and/or the genomic coordinates of the patient specific anti-DMRs and/or DMRs.
- the dataset can be processed to identify DMR counts and anti-DMR 1510.
- the sequencing reads can be counted in genomic regions that correspond or map to the identified patient specific hypermethylated DMRs 1512 to generate DMR counts.
- the DMR counts can be averaged.
- the sequencing reads can be counted in genomic regions that correspond or map to the identified cancer specific anti-DMRs 1511 to generate anti-DMR counts.
- the anti-DMR counts can be averaged. In some cases, the average of the DMR counts can be normalized to the average of the anti-DMR counts 1510. In some cases, normalizing the average of the DMR counts to the average of the anti-DMR counts can generate a patient specific methylation score 1513.
- the baseline-agnostic approach can be used when a reference sample from the subject may not be available.
- the baseline-agnostic approach can be used when a sample obtained from the subject subsequent to diagnosis and prior to treatment may not be available.
- the baseline-agnostic approach can be used to identify DMRs and/or anti-DMRs across multiple cancer types using a single universal panel of genomic regions (e.g., proto- DMRs), eliminating the need for patient-specific panels, reducing cost and complexity.
- the baseline-agnostic approach can be used with a predefined subset of DMRs and/or anti-DMRs derived from a cohort of cancer subjects using the universal panel of genomic regions.
- the predefined subject of DMRs and/or anti-DMRs can be identified through a refined selection process as described herein for improved accuracy.
- the baseline-agnostic approach can be used in addition to the baseline-informed approach disclosed herein.
- the baseline-agnostic approach can be a tumor naive and/or tumor agnostic.
- tumor naive and/or tumor agnostic approaches can be approaches that use nontissue samples. This can be in contrast to tumor-informed approach where tumor samples can be used for assaying and/or analyzing.
- the baseline-agnostic approach can provide robust classification of cancer.
- the baseline-agnostic approach can provide robust classification of cancer.
- the sample used in the baseline-agnostic approach can be a non-tissue sample (e.g., a blood sample and/or a plasma sample).
- the sample can be a tissue sample.
- the sample can comprise nucleic acid molecules can be derived from a tissue sample and/or a non-tissue sample.
- the nucleic acid molecules the nucleic acid molecules (e.g., cell-free nucleic acid molecules) obtained from the sample can be assayed for methylation enrichment.
- the methylation enrichment can be performed with cell free methylated DNA immunoprecipitation (cfMeDIP).
- cfMeDIP can pulldown cell free methylated nucleic acids (e.g., cell free nucleic acid molecules) for subsequent sequencing.
- methylation enrichment can be performed by subjecting the nucleic acid molecules (e.g., cell free nucleic acid molecules) to methylated DNA immunoprecipitation (MeDIP), cell-free methyl-CpG binding domain (cfMBD), methyl-CpG binding domain (MBD), methylation-dependent immunoprecipitation (MDIP), methylation-sensitive restriction enzyme (MSRE), TET-assisted pyridine borane sequencing (TAPS) with methylation specific PCR, bisulfite conversion with methylation specific PCR, and/or methylation-specific hybrid capture, or other derivatives thereof.
- MeDIP methylated DNA immunoprecipitation
- cfMBD cell-free methyl-CpG binding domain
- MBD methyl-dependent immunoprecipitation
- MDIP methylation-sensitive restriction enzyme
- Methylation enrichment can generate methylated nucleic acid molecules (e.g., cell-free methylated nucleic acid molecules).
- the methylated nucleic acid molecules e.g., cell-free methylated nucleic acid molecules
- performing proto- DMR enrichment can target and isolate genomic regions that have complementarity (e.g., sequence complementarity) to the one or more proto-DMRs.
- the universal panel of genomic regions can be used to enrich for one or more genomic regions that corresponds to one or more control regions that are hypomethylated regions, regions of non-methylation, or regions that are amenable to methylation, or any combinations thereof in controls subject without cancer (e.g., proto-DMRs).
- universal panel of genomic regions can be used to enrich for one or more genomic regions that correspond to one or more control regions that are regions of non-methylation and regions that are amenable to methylation enrichment in control subjects without cancer.
- the one or more genomic regions can have one or more genomic sequences that are complementary to one or more sequences in the proto-DMRs.
- the universal panel of genomic regions can be used to enrich for one or more genomic regions of the proto-DMRs by hybrid capture.
- the universal panel of genomic regions can be used to enrich for one or more genomic regions of the proto-DMRs by multiplex PCR.
- enriching can result in targeting the one or more genomic regions that can have complementarity (e.g., sequence complementarity) to one or more control regions that are regions of non-m ethylation and regions that are amenable to methylation enrichment in control subjects without cancer.
- the methylated nucleic acid molecules (e.g., cell-free methylated nucleic acid molecules) after proto-DMR enrichment can be subjected to sequencing to generate a data set comprising methylation states of one or more genomic regions.
- the one or more methylated nucleic acid molecules can be subjected to whole methylome sequencing to generate the data set comprising methylation states of the one or more genomic regions.
- Whole methylome sequencing can be sequencing performed on one or more methylated nucleic acid molecules without further enrichment with the universal panel of genomic regions.
- the methylation states of the one or more genomic regions can comprise hypermethylated states, methylated states, non-methylated states, or hypomethylated states, or any combinations thereof.
- the baseline-agnostic approach can comprise identifying a set of anti- DMRs specific to one or more reference subjects.
- a set of anti-DMRs specific to one or more reference subjects can refer to anti-DMRs were identified by assaying the one or more reference samples.
- a set of anti-DMRs specific to one or more reference subjects and/or a set of DMRs specific to one or more reference subjects can be identified with one or more reference samples by using the universal panel of genomic regions (e.g., proto-DMRs).
- the set of anti-DMRs specific to the one or more reference subjects and/or the set of anti-DMRs specific to the one or more reference subjects can be selected from the universal panel of genomic regions.
- the set of anti-DMRs specific to the one or more reference subjects can be a subset of the universal panel of genomic regions.
- the method can comprise obtaining one or more reference samples from one or more reference subjects.
- the one or more reference subjects can have cancer.
- the one or more reference samples can be one or more blood sample or one or more plasma sample.
- one or more reference samples can be one or more tissue samples.
- one or more reference samples can comprise reference nucleic acid molecules derived from one or more tissue sample.
- one or more reference samples can comprise reference nucleic acid molecules derived from one or more non-tissue samples (e.g., cell-free nucleic acid molecules).
- the one or more reference nucleic acid molecules e.g., cell-free nucleic acid molecules
- the methylation enrichment can be performed with cell free methylated DNA immunoprecipitation (cfMeDIP).
- cfMeDIP can pulldown cell free methylated nucleic acids (e.g., cell free nucleic acid molecules) for subsequent sequencing.
- methylation enrichment can be performed by subjecting the nucleic acid molecules (e.g., cell free nucleic acid molecules) to methylated DNA immunoprecipitation (MeDIP), cell-free methyl-CpG binding domain (cfMBD), methyl-CpG binding domain (MBD), methylation-dependent immunoprecipitation (MDIP), methylation-sensitive restriction enzyme (MSRE), TET-assisted pyridine borane sequencing (TAPS) with methylation specific PCR, bisulfite conversion with methylation specific PCR, and/or methylation-specific hybrid capture, or other derivatives thereof.
- MeDIP methylated DNA immunoprecipitation
- cfMBD cell-free methyl-CpG binding domain
- MBD methyl-dependent immunoprecipitation
- MDIP methylation-sensitive restriction enzyme
- TAPS TET-assisted pyridine borane sequencing
- the one or more methylated reference nucleic acid molecules can be subjected to proto-DMR enrichment by using a panel of one or more control regions disclosed herein.
- the universal panel of genomic regions e.g., proto-DMRs
- the universal panel of genomic regions can be used to enrich for one or more reference genomic regions that corresponds to one or more control regions that are hypomethylated regions, regions of nonmethylation, and/or regions that are amenable to methylation enrichment, or any combinations thereof in controls subject without cancer.
- the universal panel of genomic regions can be used to enrich for one or more reference genomic regions that corresponds to one or more control regions that are regions of nonmethylation and regions that are amenable to enrichment methylation enrichment in control subjects without cancer.
- the one or more methylated reference nucleic acid molecules (e.g., cell-free methylated nucleic acid molecules) after proto-DMR enrichment can be subjected to sequencing to generate a reference data set comprising methylation states of the one or more reference genomic regions.
- the one or more methylated reference nucleic acid molecules can be subjected to whole methylome sequencing to generate a reference data set comprising methylation states of one or more reference genomic regions.
- Whole methylome sequencing can be sequencing performed on one or more methylated nucleic acid molecules without further enrichment with the universal panel of genomic regions.
- the methylation states of the one or more reference genomic regions can comprise hypermethylated states, methylated states, non-methylated states, or hypomethylated states, or any combinations thereof.
- the reference data set can be processed to identify a set of DMRs specific to one or more reference subjects.
- the reference data set can be analyzed to determine regions with high sequencing reads.
- the set of DMRs specific to the one or more reference subjects can comprise reference genomic regions that are hypermethylated and/or methylated.
- the reference data set can be processed to identify a set of anti-DMRs specific to the one or more reference subjects.
- the reference data set can be analyzed to determine regions with little to no sequencing reads.
- the set of anti-DMRs specific to the one or more reference subjects can comprise reference genomic regions that are non-methylated and/or hypomethylated.
- the set of anti-DMRs specific to the one or more reference subjects can comprise genomic regions that are non-methylated and/or hypomethylated in non-cancer subjects.
- the set of anti-DMRs specific to the one or more reference subjects can comprise genomic regions that are non-methylated in non-cancer subjects.
- the set of DMRs specific to one or more reference subjects and/or the set of anti-DMRs specific to one or more reference subjects can be identified using the one or more reference samples and one or more control samples.
- the one or more control samples can be obtained from control subjects without cancer.
- the one or more control samples can be one or more tissue samples and/or one or more non-tissue samples.
- the one or more control samples can comprise one or more control nucleic acid from one or more tissue samples and/or one or more non-tissue samples.
- the one or more control samples can comprise one or more control nucleic acid molecules (e.g., cell-free nucleic acid molecules).
- the one or more control nucleic acid molecules (e.g., cell-free nucleic acid molecules) and the one or more reference nucleic acid molecules (e.g., cell-free nucleic acid molecules) can be assayed for methylation enrichment.
- the methylation enrichment can be performed with cell free methylated DNA immunoprecipitation (cfMeDIP).
- cfMeDIP can pulldown cell free methylated nucleic acids (e.g., cell free nucleic acid molecules) for subsequent sequencing.
- methylation enrichment can be performed by subjecting the nucleic acid molecules (e.g., cell free nucleic acid molecules) to methylated DNA immunoprecipitation (MeDIP), cell-free methyl-CpG binding domain (cfMBD), methyl -CpG binding domain (MBD), methylation-dependent immunoprecipitation (MDIP), methylation-sensitive restriction enzyme (MSRE), TET- assisted pyridine borane sequencing (TAPS) with methylation specific PCR, bisulfite conversion with methylation specific PCR, and/or methylation-specific hybrid capture, or other derivatives thereof.
- MeDIP methylated DNA immunoprecipitation
- cfMBD cell-free methyl-CpG binding domain
- MBD methylation-dependent immunoprecipitation
- MDIP methylation-sensitive restriction enzyme
- MSRE methylation-sensitive restriction enzyme
- TAPS TET- assisted pyridine borane sequencing
- Methylation enrichment can generate one or more methylated reference nucleic acid molecules (e.g., cell-free methylated nucleic acid molecules) and/or one or more methylated control nucleic acid molecules (e.g., cell-free methylated nucleic acid molecules).
- the one or more methylated reference nucleic acid molecules and/or the one or more methylated control nucleic acid molecules can be subjected to sequencing to generate another reference data set comprising i) methylation states of one or more reference genomic regions and/or ii) one or more control genomic regions.
- the methylation states of one or more reference genomic regions and one or more control genomic regions can be compared to identify one or more non-methylated regions and/or hypomethylated regions in both reference genomic regions and control genomic regions.
- the identified one or more non-methylated regions and/or hypomethylated regions can be compared with the universal panel of genomic regions (e.g., proto-DMRs) to identify anti-DMRs specific with the one or more reference subjects.
- the identified one or more non-methylated regions and/or hypomethylated regions can be intersected with the panel of one or more control genomic regions (e.g., proto-DMRs) to identify regions that overlap.
- one or more regions reference genomic regions that overlap (e.g., have same genomic coordinate) to non-methylated regions, hypomethylated regions, and/or regions that are amenable to methylation enrichment in the panel can be identified.
- the set of anti-DMRs specific to the one or more reference subjects can comprise reference genomic regions that are non-methylated states and/or hypomethylated states.
- the methylation states of one or more reference genomic regions and one or more control genomic regions can be compared. Comparing can identify one or more regions that are hypermethylated and/or methylated in the one or more reference genomic regions as opposed to in the one or more control genomic regions.
- the identified one or more methylated regions and/or hypermethylated regions can be compared with the panel of one or more control genomic regions (e.g., proto-DMRs) to identify DMRs specific with the one or more reference subjects.
- the identified one or more methylated regions and/or hypermethylated regions can be intersected with the panel of one or more control genomic regions (e.g., proto-DMRs) to identify regions that overlap.
- one or more reference genomic regions that overlap e.g., have same genomic coordinate
- the set of DMRs specific to the one or more reference subjects can comprise reference genomic regions that are methylated states and/or hypermethylated states.
- the one or more genomic regions of the subject can be compared to the set of DMRs specific to the reference subject. In some cases, comparing can generate one or more counts of the set of DMRs specific to the subject. In some cases, counts can be methylation signal obtained after analyzing the sequencing data set. In some cases, counts can be sequencing reads that map to the region of interest and/or average of the sequencing reads that map to the region of interest. For example, comparing can comprise counting for one or more sequencing reads in one or more genomic regions that map to the set of DMRs specific to the one or more reference subjects to generate one or more counts of the set of DMRs specific to the subject.
- mapping can mean finding one or more genomic regions that share the same genomic coordinates as the set of DMRs specific to the one or more reference subjects.
- the one or more genomic regions of the subject can be compared to the set of anti -DMRs specific to the one or more reference subjects.
- comparing can generate one or more counts of the set of anti-DMRs specific to the subject.
- counts can be methylation signal obtained after analyzing the sequencing data set.
- counts can be sequencing reads that map to the region of interest and/or average of the sequencing reads that map to the region of interest.
- comparing can comprise counting for one or more sequencing reads in one or more genomic regions that map to the set of anti-DMRs specific the one or more reference subjects to generate one or more counts of the set of anti-DMRs specific to the subject.
- mapping can mean finding one or more genomic regions that share the same genomic coordinates as the set of DMRs and/or a set of anti-DMRs specific to the one or more reference subjects.
- the one or more counts of the set of DMRs specific to the subject can be normalized to the one or more counts of the set of anti-DMRs specific to the subject. Normalizing can be performed to ensure data consistency across different data sets and reduce noise that may exist within the data sets. In some cases, normalizing can generate a methylation score.
- normalizing can be performed by computing the ratio of the set of the one or more counts of the set of DMRs specific to the subject to the one or more counts of the set of anti-DMRs specific to the subject.
- the one or more counts of the set of DMRs specific to the subject can be averaged to generate an average count of the set of DMRs specific to the subject.
- the one or more counts of the set of anti- DMRs specific to the subject can be averaged to generate an average count of the set of anti- DMRs specific to the subject.
- ratio of the average count of the set of DMRs specific to the subject to the average count of the set of anti -DMRs specific to the subject can be computed.
- the ratio computed can be a methylation score.
- the methylation score can be referred to as a signal-to-noise ratio.
- the methylation score can be a circulating tumor DNA quantification.
- the methylation score can be compared to a threshold score, thereby generating the output indicative of the cancer in the subject.
- the methylation score above the threshold score can be indicative of the presence of cancer.
- the methylation score above the threshold score can be indicative of the presence of circulating tumor DNAs.
- the methylation score below the threshold score can be indicative of the absence of circulating tumor DNAs.
- the threshold score can be generated with a plurality of control samples obtained from a plurality of control subjects.
- the plurality of control samples can be subjected to the baseline-informed approach disclosed herein to compute a plurality of methylation scores.
- the threshold score can be set where at least 5% of the control subjects had a score defined to be false positives.
- the threshold score can be set where at least at least 1%, at least 2%, at least 3%, at least 4%, at least 5%, at least 6%, at least 7%, at least 8%, at least 9%, at least 10%, or more of the control subjects had a score defined to be false positives.
- the threshold score can be set where at most 1%, at most 2%, at most 3%, at most 4%, at most 5%, at most 6%, at most 7%, at most 8%, at most 9%, at most 10%, or more of the control subjects had a score defined to be false positives.
- cancer specific DMRs and/or cancer specific anti- DMRs can be first identified in the baseline-agnostic approach.
- the term “cancer specific DMRs” can be used interchangeably with “DMRs specific to one or more reference subjects” herein.
- the term “cancer specific anti-DMRs” can be used interchangeably with “anti-DMRs specific to one or more reference subjects” herein.
- the identified cancer specific anti-DMRs can also be used in the baseline-informed approach.
- a set of cancer samples 1601 e.g., one or more reference samples
- a set of non-cancer samples 1602 e.g., control samples
- the cancer samples can be tissue samples and/or non-tissue samples from cancer subjects.
- the non-cancer samples can be tissue samples and/or nontissue samples (e.g., blood sample and/or a plasma sample) from non-cancer subjects.
- the cancer samples and non-cancer samples can comprise nucleic acid molecules derived from tissue samples and/or non-tissue samples.
- the cancer samples and non-cancer samples can comprise nucleic acid molecules (e.g., cell-free nucleic acid molecules).
- the cancer samples and the non-cancer samples can be assayed to generate data set comprising genomic regions of the cancer samples and genomic regions of the non-cancer samples.
- the one or more genomic regions of the cancer samples and of the non-cancer samples can be compared to identify one or more regions with stable signal in both cancer and non-cancer samples 1603, thereby generating a panel of stable regions 1604.
- Stable signal can be hypomethylated regions and/or non-methylated regions in both non-cancer samples and cancer samples. By identifying hypomethylated regions and/or non-methylated regions that are present in both the one or more genomic regions of the cancer samples and one or more genomic regions of the non-cancer samples, the panel of stable regions can be generated.
- the panel of one or more stable regions can be intersected with a panel of one or more control regions (e.g., Proto-DMRs) 1605, 1606 to generate cancer specific anti-DMRs 1607.
- the intersecting can comprise finding one or more regions that are common in the stable region and in the panel of one or more genomic regions, thereby generating the cancer specific anti-DMRs.
- Cancer specific DMRs can be generated in a similar workflow. As shown in the example workflow 1700 in FIG. 17, a set of cancer samples 1701 (e.g., one or more reference samples) and a set of non-cancer samples 1702 (e.g., control samples) can be obtained.
- the cancer samples can be tissue samples and/or non-tissue samples from cancer subjects.
- the non-cancer samples can be tissue samples and/or non-tissue samples (e.g., blood sample and/or a plasma sample) from non-cancer subjects.
- the cancer samples and non-cancer samples can comprise nucleic acid molecules derived from tissue samples and/or non-tissue samples.
- the cancer samples and non-cancer samples can comprise nucleic acid molecules (e.g., cell-free nucleic acid molecules).
- the cancer samples and the non-cancer samples can be assayed to generate data set comprising genomic regions of the cancer samples and of the non-cancer samples.
- the one or more genomic regions of the cancer samples and of the non-cancer samples can be compared to identify one or more regions that are hypermethylated DMRs 1703, thereby generating a set of one or more DMRs 1704.
- the identified one or more regions that are hypermethylated DMRs are one or more genomic regions of the cancer samples are hypermethylated and/or methylated as opposed to the one or more genomic regions of the non-cancer samples.
- the set of one or more DMRs can be intersected with a universal panel of genomic regions (e.g., Proto-DMRs) 1705, 1706 to generate cancer specific DMRs 1707.
- the intersecting can comprise finding one or more regions that are common in the set of DMRs and in the universal panel of one or more genomic regions.
- the identified cancer specific hypermethylated DMRs can be analyzed for prevalence.
- the cancer specific hypermethylated DMRs can be compared to DMRs of noncancer cohort to determine whether the identified cancer specific hypermethylated DMRs are statistical outliers.
- DMRs can be statistical outliers when compared to one or more non-cancer cohorts.
- the identified cancer specific methylated and/or hypermethylated DMRs can be compared to DMRs of cancer cohort to ensure the cancer specific methylated and/or hypermethylated DMRs are present in both the reference subjects and another cancer cohort.
- false positives in identifying the cancer specific methylated DMRs and/or hypermethylated DMRs can be reduced.
- the identified one or more cancer specific DMRs and/or one or more cancer specific anti-DMRs can be applied to a baseline-agnostic approach.
- An example of the workflow for the baseline-agnostic approach 1800 is shown in FIG. 18.
- a sample can be first obtained from a subject.
- the sample can be a tissue sample and/or a non-tissue sample.
- the same can comprise cell-free nucleic acid molecules derived from a tissue sample and/or a non-tissue sample.
- the sample can comprise a nucleic acid molecules (e.g., cell-free nucleic acid molecules) 1801.
- the sample can be subjected to a methylation enrichment assay to enrich for one or more methylation regions 1802 to enrich for one or more methylation regions.
- the nucleic acid molecules e.g., cell-free nucleic acid molecules
- cfMeDIP methylated DNA immunoprecipitation
- the sample can be subjected to further proto-DMR enrichment by using a universal panel of genomic regions (e.g., Proto-DMRs) 1803.
- the universal panel of genomic regions can be used to enrich for one or more genomic regions of interest.
- the one or more reference genomic regions of interest can correspond to one or more control regions that are hypomethylated regions, regions of non-methylation, and/or regions that are amenable to methylation enrichment (e.g., enzymatic methylation enrichment), or any combinations thereof in controls subject without cancer (e.g., correspond to proto-DMRs).
- the one or more reference genomic regions can have one or more genomic sequences that are complementary to one or more sequences in the proto-DMRs.
- the enriched methylated nucleic acids e.g., cell free methylated nucleic acids
- Sequencing can be performed with next generation sequencing (NGS) 1804 or with any sequencing methods disclosed herein. Sequencing can generate a dataset comprising methylated regions, hypermethylated regions, hypomethylated regions, or non-methylated regions, or combination thereof that can be mapped along a genome (e.g., human genome).
- the dataset can be processed to identify DMR counts and anti-DMR 1807.
- the sequencing reads can be counted in genomic regions that correspond or map to the identified cancer specific hypermethylated DMRs 1805 to generate DMR counts.
- the DMR counts can be averaged.
- the sequencing reads can be counted in genomic regions that correspond or map to the identified cancer specific anti-DMRs 1806 to generate anti-DMR counts.
- the anti-DMR counts can be averaged. In some cases, the average of the DMR counts can be normalized to the average of the anti-DMR counts 1807. In some cases, normalizing the average of the DMR counts to the average of the anti-DMR counts can generate a cancer specific methylation score 1808.
- the join approach can use both the baseline-informed approach disclosed herein and the baseline-agnostic approach disclosed herein.
- the joint approach can leverage both the baseline-informed approach and the baseline-agnostic approach to maximize sensitivity and specificity in detection cancer. Detection of cancer can be measured by detection of circulating tumor DNA (ctDNA).
- the joint approach can integrate both scores from the baseline-informed approach and the baseline-agnostic approach for improved detection of cancer.
- the joint model can comprise subject monitoring of a cancer without a personalized signature.
- the joint model can comprise subject monitoring of a cancer without a personalized signature derived from the baseline-informed approach.
- the joint approach can comprise using a personalized signature derived from the baseline-informed approach, enhance sensitivity and specificity
- the one or more methylation scores obtained for the baseline-informed approach and the baseline-agnostic approach can be integrated to generate a single score.
- the methylation score generated from the baseline-informed approach and the additional methylation score generated from the baseline-agnostic approach can be integrated to generate a single score.
- the single score can be indicative of the cancer.
- integrating comprise Support Vector Machine, logistic regression, Bayesian Interference Model, weighted average, decision trees, and/or random forests.
- FIG. 19 An example of the workflow for a joint model approach 1900 is shown in FIG. 19. Following the example workflow shown in FIG. 15 and FIG. 18, cancer specific DMRs and cancer specific anti-DMRs can be generated for use in the joint model approach.
- Patient specific DMRs can be identified following the baseline-informed approach, as shown in FIG. 19, and as also shown in FIG. 15.
- patient specific DMRs can be generated by using a baseline sample.
- the baseline sample can be obtained from the subject at a time prior to obtaining the sample of interest.
- the baseline sample can be obtained from the subject subsequent to diagnosis and/or prior to treatment with a therapy.
- the baseline sample can be a tissue sample 1901.
- the baseline sample can be a blood sample and/or a plasma sample.
- the baseline sample can comprise nucleic acid molecules derived from a tissue sample and or a non-tissue sample.
- the baseline sample can comprise nucleic acid molecules (e.g., cell-free nucleic acid molecules) 1901.
- the baseline sample can be subjected to a methylation enrichment assay 1902 to enrich for one or more methylation regions.
- the nucleic acid molecules can be subjected to methylated DNA immunoprecipitation (cfMeDIP) or similar methylation enrichment thereof disclosed herein to enrich for one or more methylated nucleic acids (e.g., cell-free methylated nucleic acid molecules) for subsequent sequencing.
- cfMeDIP methylated DNA immunoprecipitation
- the baseline sample can be subjected to further proto-DMR enrichment by using a panel of one or more control regions disclosed herein (e.g., Proto-DMRs) 1903.
- the universal panel of genomic regions e.g., proto-DMRs
- the one or more reference genomic regions of interest can correspond to one or more control regions that are hypomethylated regions, regions of non-methylation, and/or regions that are amenable to methylation enrichment, or any combinations thereof in controls subject without cancer (e.g., correspond to proto-DMRs).
- the one or more reference genomic regions can have one or more genomic sequences that are complementary to one or more sequences in the proto-DMRs.
- the enriched methylated nucleic acids e.g., cell free methylated nucleic acids
- Sequencing can be performed with next generation sequencing (NGS) 1904 or with any sequencing methods disclosed herein.
- Sequencing can generate a dataset comprising methylated regions, hypermethylated regions, hypomethylated regions, or non-methylated regions, or combination thereof that can be mapped along a genome (e.g., human genome) and/or the genomic coordinates of the cancerspecific DMRs.
- the dataset can be processed to identify one or more patient specific hypermethylated DMRs 1905.
- the sequencing reads can be analzyed to identify one or more regions with high number of reads to identify methylated and/or hypermethylated regions (e.g., DMRs).
- the identified patient specific hypermethylated DMRs can be analyzed for prevalence.
- the patient specific hypermethylated DMRs can be compared to DMRs of non-cancer cohort to determine whether the identified patient specific hypermethylated DMRs are statistical outliers.
- the identified patient specific hypermethylated DMRs can be compared to DMRs of cancer cohort to ensure the patient specific hypermethylated DMRs are present in both the subject’s sample and cancer cohort. By analyzing for prevalence, false positives can be reduced.
- Another sample can be obtained from the subject at a time point subsequent to obtaining the baseline sample to compute the cancer specific methylation score generated from the baseline-agnostic approach and the patient specific methylation score generated from the baseline-informed approach.
- a 3rd sample can be obtained at a 3rd time point after obtaining baseline sample.
- a 4th sample can be obtained at a 4th time point after obtaining baseline sample.
- a 5th sample can be obtained at a 5th time point after obtaining baseline sample.
- a 6th sample can be obtained at a 6th time point after obtaining baseline sample.
- a 7th sample can be obtained at a 7th time point after obtaining baseline sample.
- an 8th sample can be obtained at an 8th time point after obtaining baseline sample.
- one or more addition samples can be obtained at one or more additional time points (e.g., subsequent to one another sample).
- the another sample or any subsequent one or more samples can comprise cell-free nucleic acid molecules 1906 derived from a tissue sample and/or a non-tissue sample from the subject.
- the another sample can be subjected to a methylation enrichment assay to enrich for one or more methylation regions 1907.
- the nucleic acid molecules e.g., cell-free nucleic acid molecules
- cfMeDIP methylated DNA immunoprecipitation
- similar methylation enrichment thereof disclosed herein to enrich for one or more methylated nucleic acids (e.g., cell-free methylated nucleic acid molecules) for subsequent sequencing.
- the sample can be subjected to further proto-DMR enrichment by using a panel of one or more control regions disclosed herein (e.g., Proto- DMRs) 1908.
- the universal panel of genomic regions e.g., proto-DMRs
- the one or more genomic regions of interest can correspond to one or more control regions that are hypomethylated regions, regions of non-methylation, and/or regions that are amenable to methylation enrichment, or any combinations thereof in controls subject without cancer (e.g., correspond to proto- DMRs).
- the one or more reference genomic regions can have one or more genomic sequences that are complementary to one or more sequences in the proto-DMRs.
- the enriched methylated nucleic acids can next be sequenced.
- Sequencing can be performed with next generation sequencing (NGS) 1909 or with any sequencing methods disclosed herein.
- NGS next generation sequencing
- Sequencing can generate a dataset comprising methylated regions, hypermethylated regions, hypomethylated regions, or non-methylated regions, or combination thereof that can be mapped along a genome (e.g., human genome).
- the dataset can be processed to identify DMR counts and anti-DMR 1914.
- the sequencing reads can be counted in genomic regions that correspond or map to the identified patient specific hypermethylated DMRs from the baseline sample 1912 to generate DMR counts. In some cases, the DMR counts can be averaged.
- the sequencing reads can be counted in genomic regions that correspond or map to the identified cancer specific anti-DMRs from one or more cancer subjects 1910 to generate anti-DMR counts.
- the anti-DMR counts can be averaged.
- the average of the DMR counts can be normalized to the average of the anti-DMR counts 1914.
- normalizing the average of the DMR counts to the average of the anti-DMR counts can generate a patient specific methylation score.
- a machine learning classifier can be used to generate the patient specific score.
- a Bayesian-based statistical inference method for estimating parameters ad/or performing regression can be used to generate the patient specific score.
- the sequencing reads can be counted in genomic regions that correspond or map to the identified cancer specific hypermethylated DMRs from one or more cancer subjects 1911 to generate DMR counts.
- the DMR counts can be averaged.
- the average of the DMR counts can be normalized to the average of the anti- DMR counts 1915.
- normalizing the average of the DMR counts to the average of the anti-DMR counts can generate a cancer specific methylation score.
- the score from the baseline-agnostic approach and the score form the baseline- informed approach can be integrated.
- the scores can be integrated by Support Vector Machine (SVM) 1915. Integration of the score can generate a single joint methylation score 1916 that can be the indicative of the cancer.
- SVM Support Vector Machine
- a 3rd sample can be obtained at a 3rd time point after obtaining baseline sample.
- a 4th sample can be obtained at a 4th time point after obtaining baseline sample.
- a 5th sample can be obtained at a 5th time point after obtaining baseline sample.
- a 6th sample can be obtained at a 6th time point after obtaining baseline sample.
- a 7th sample can be obtained at a 7th time point after obtaining baseline sample.
- an 8th sample can be obtained at an 8th time point after obtaining baseline sample.
- the method can comprise assaying a biological sample of the subject.
- the biological sample can be assayed for one or more markers specific to the subject, using a universal panel of genomic regions.
- assaying comprises sequencing the nucleic acid molecules from the biological sample at a depth of at most 50 M single reads. In some cases, assaying comprises sequencing the nucleic acid molecules at a depth of at most 10 M single reads.
- the sequencing depth can be a depth of at most 10 million (M) single reads, at most 20 M single reads, at most 30 M single reads, at most 40 M single reads, at most 50 M single reads, at most 60 M single reads, at most 70 M single reads, at most 80 M single reads, at most 90M single reads, or at most 100 M reads.
- M 10 million
- the sequencing depth can be a depth from 1 M single reads to 10 M single reads, from 10 M single reads to 20 M single reads, from 20 M single reads to 30 M single reads, from 30 M single reads to 40 M single reads, from 40 M single reads to 50 M single reads, from 50 M single reads to 60 M single reads, from 60 M single reads to 70M single reads, from 70 M single reads to 80 M single reads, from 80 M single reads to 90 M single reads, or from 90 M single reads to 100 M single reads.
- the sequencing depth can be a depth of at least 1 M single reads, at least 10 M single reads, at least 20 M single reads, at least 30 M single reads, at least 40 M single reads, at least 50 M single reads, at least 60 M single reads, at least 70 M single reads, at least 80 M single reads, at least 90 M single reads, at least 100 M single reads, or at least 200 M single reads.
- the method for monitoring the subject comprise using a baseline- informed approach disclosed herein.
- Baseline-informed approach can involve using a reference sample from the subject, one or more control samples from non-cancer subjects, and/or one or more reference samples from cancer subjects as disclosed herein.
- the reference sample can be from the same subject.
- the reference sample can be obtained at a time prior to obtaining the sample.
- the reference sample can be obtained from the subject subsequent to diagnosis with the cancer.
- the reference sample can be obtained from the subject prior to treatment with a therapy.
- the reference sample can be obtained from the subject subsequent to diagnosis with the cancer and prior to treatment with a therapy.
- the reference sample can be a blood sample and/or a plasma sample.
- the reference sample can be a non-tissue sample.
- the one or more control samples from non-cancer subjects, and/or one or more reference samples can be one or more blood samples or one or more plasma samples.
- the one or more control samples from non-cancer subjects and/or one or more reference samples can be a tissue sample.
- the reference sample can comprise reference nucleic acid molecules (e.g., cell-free nucleic acid molecules) derived from a nontissue sample and/or a tissue sample.
- the one or more control samples from non-cancer subjects comprise control nucleic acid molecules (e.g., cell-free nucleic acid molecules) derived from one or more tissue samples and/or one or more non-tissue samples.
- the one or more reference samples can comprise one or more reference nucleic acid molecules (e.g., cell-free nucleic acid molecules) derived from one or more tissue samples and/or one or more non-tissue samples.
- the method for monitoring comprises comparing the universal panel of genomic regions to one or more reference genomic regions of the reference sample to generate a set of DMRs specific to the reference sample and/or a set of anti-DMRs specific to the reference sample. In some cases, the method for monitoring comprises comparing the universal panel of genomic regions to one or more reference genomic regions of the one or more reference samples to generate a set of DMRs specific to the one or more reference samples and/or a set of anti-DMRs specific to the one or more reference samples. In some cases, the set of DMRs specific to the one or more reference samples can further comprise comparing one or more reference genomic regions to one or more control regions of the one or more control samples.
- Comparing can identify one or more regions that hypermethylated and/or methylated in the reference genomic regions as opposed to the one or more control regions.
- the one or more identified hypermethylated and/or methylated regions can be compared to the universal panel of genomic regions to further identify the set of DMRs specific to the one or more reference samples.
- the set of anti-DMRs specific to the one or more reference samples can further comprise comparing one or more reference genomic regions to one or more control regions of the one or more control samples.
- Comparing can identify one or more regions that non-methylated and/or hypomethylated in both the reference genomic regions and the one or more control regions.
- the one or more identified hypomethylated and/or non-methylated regions can be compared to the universal panel of genomic regions to further identify the set of anti-DMRs specific to the one or more reference samples.
- one or more markers comprise DMRs specific to the subject.
- the anti-DMRs specific to the subject can be generated by comparing one or more genomic regions of the biological sample to a set of anti-DMRs specific to the one or more reference samples.
- DMRs specific to the subject can be compared to the set of DMRs specific to the reference sample to generate one or more counts of the DMRs specific to the subject.
- the one or more counts of the DMRs specific to the subject can be sequencing reads of the one or more genomic region of the biological sample that map to one or more regions corresponding to the DMRs specific to the reference samples.
- the anti- DMRs specific to the subject can be compared to the set of anti -DMRs specific to the reference sample to generate one or more counts of the anti -DMRs specific to the subject.
- the one or more counts of the anti -DMRs specific to the subject can be sequencing reads of the one or more genomic region of the biological sample that map to one or more regions corresponding to the anti-DMRs specific to the reference sample.
- the anti- DMRs specific to the subject can be compared to the set of anti-DMRs specific to the one or more reference samples to generate one or more counts of the anti-DMRs specific to the subject.
- the one or more counts of the anti-DMRs specific to the subject can be sequencing reads of the one or more genomic region of the biological sample that map to one or more regions corresponding to the anti-DMRs specific to the one or more reference samples.
- the one or more counts of the DMRs specific to the subject can be normalized to one or more counts of the anti-DMRs specific to the subject.
- the one or more counts of the DMRs specific to the subject can be averaged to generate an average count of the DMRs specific to the subject.
- the one or more counts of the anti-DMRs specific to the subject can be averaged to generate an average count of the anti-DMRs specific to the subject.
- the average count of the DMRs specific to the subject and be normalized to the average count of the anti-DMRs specific to the subject.
- the ratio of the average count of the DMRs specific to the subject to the average count of the anti-DMRs specific to the subject can be computed.
- the computed ratio can represent a methylation score.
- a first methylation score can be an output indicative of cancer.
- the first methylation score can be compared to a threshold score disclosed herein. For example, a methylation score higher than a threshold score can be indicative of the presence of cancer. In some cases, a methylation score lower than a threshold score can be indicative of the absence of cancer.
- another biological sample can be obtained at a time before or after obtaining the biological sample.
- the another biological sample can be used to monitor regression or progression of the disease or condition.
- the another biological sample can comprise another one or more markers specific to the subject.
- the another one or more markers can comprise another DMRs specific to the subject.
- the another DMRs specific to the subject can be generated by comparing the genomic regions of the another biological sample to the set of DMRs specific to the reference sample.
- the method further comprises comparing the genomic regions of another biological sample to the set of anti-DMRs to generate another anti-DMRs specific to the subject.
- the another DMRs specific to the subject can be compared to the set of DMRs specific to the reference sample to generate one or more counts of the another DMRs specific to the subject.
- the one or more counts of the another DMRs specific to the subject can be sequencing reads of the one or more genomic region of the another biological sample that map to one or more regions corresponding to the DMRs specific to the reference sample.
- the another anti-DMRs specific to the subject can be compared to the set of anti-DMRs specific to the reference sample to generate one or more counts of the another anti-DMRs specific to the subject.
- the one or more counts of the another anti-DMRs specific to the subject can be sequencing reads of the one or more genomic region of the another biological sample that map to one or more regions corresponding to the anti-DMRs specific to the reference sample.
- the another anti-DMRs specific to the subject can be compared to the set of anti-DMRs specific to the one or more reference samples to generate one or more counts of the another anti-DMRs specific to the subject.
- the one or more counts of the another anti-DMRs specific to the subject can be sequencing reads of the one or more genomic region of the another biological sample that map to one or more regions corresponding to the anti-DMRs specific to the one or more reference samples.
- the one or more counts of the another DMRs specific to the subject can be normalized to one or more counts of the another anti-DMRs specific to the subject. In some cases, the one or more counts of the another DMRs specific to the subject can be averaged to generate an average count of the another DMRs specific to the subject. In some cases, the one or more counts of the another anti-DMRs specific to the subject can be averaged to generate an average count of the another anti-DMRs specific to the subject. In some cases, the average count of the another DMRs specific to the subject and be normalized to the average count of the another anti-DMRs specific to the subject. For example, the ratio of the average count of the another DMRs specific to the subject to the average count of the another anti-DMRs specific to the subject can be computed. The computed ratio can represent a second methylation score.
- the first methylation score and the second methylation score can be compared, thereby generating an output indicative of the regression or the progression of the disease or the condition. For example, a first methylation score greater than the second methylation score can be indicative of progression of the disease or condition. A first methylation score smaller than the second methylation score can be indicative of regression of the disease or condition. Similarly, an additional biological sample can be obtained to after the another biological sample to generate another methylation score for comparison.
- the method for monitoring the subject comprise using a baselineagnostic approach disclosed herein.
- Baseline-agnostic approach can involve using one or more control samples from non-cancer subjects, and/or one or more reference samples from cancer subjects.
- the one or more control samples from non-cancer subjects, and/or one or more reference samples can be one or more blood samples or one or more plasma samples.
- the one or more control samples from non-cancer subjects, and/or one or more reference samples can be one or more tissue samples.
- the one or more control samples from non-cancer subjects comprise control nucleic acid molecules derived from one or more tissue samples and/or non-tissue samples.
- the one or more reference samples can comprise one or more reference nucleic acid molecules derived from one or more tissue samples and/or non-tissue samples.
- the method for monitoring comprises comparing the universal panel of genomic regions to one or more reference genomic regions of the one or more reference samples to generate a set of DMRs specific to the one or more reference samples and/or a set of anti -DMRs specific to the one or more reference samples.
- the set of DMRs specific to the one or more reference samples can further comprise comparing one or more reference genomic regions to one or more control regions of the one or more control samples. Comparing can identify one or more regions that hypermethylated and/or methylated in the reference genomic regions as opposed to the one or more control regions.
- the one or more identified hypermethylated and/or methylated regions can be compared to the universal panel of genomic regions to further identify the set of DMRs specific to the one or more reference samples.
- the set of anti-DMRs specific to the one or more reference samples can further comprise comparing one or more reference genomic regions to one or more control regions of the one or more control samples. Comparing can identify one or more regions that non-methylated and/or hypomethylated in both the reference genomic regions and the one or more control regions.
- the one or more identified hypomethylated and/or nonmethylated regions can be compared to the universal panel of genomic regions to further identify the set of anti-DMRs specific to the one or more reference samples.
- one or more markers comprise DMRs specific to the subject.
- the anti-DMRs specific to the subject can be generated by comparing one or more genomic regions of the biological sample to a set of anti-DMRs specific to the one or more reference samples.
- the anti-DMRs specific to the subject can be compared to the set of anti-DMRs specific to the one or more reference samples to generate one or more counts of the anti-DMRs specific to the subject.
- the one or more counts of the anti-DMRs specific to the subject can be sequencing reads of the one or more genomic region of the biological sample that map to one or more regions corresponding to the anti-DMRs specific to the one or more reference samples.
- the one or more counts of the DMRs specific to the subject can be normalized to one or more counts of the anti-DMRs specific to the subject. In some cases, the one or more counts of the DMRs specific to the subject can be averaged to generate an average count of the DMRs specific to the subject. In some cases, the one or more counts of the anti-DMRs specific to the subject can be averaged to generate an average count of the anti-DMRs specific to the subject. In some cases, the average count of the DMRs specific to the subject and be normalized to the average count of the anti-DMRs specific to the subject. For example, the ratio of the average count of the DMRs specific to the subject to the average count of the anti-DMRs specific to the subject can be computed.
- the computed ratio can represent a methylation score.
- a first methylation score can be an output indicative of cancer.
- the first methylation score can be compared to a threshold score disclosed herein. For example, a methylation score higher than a threshold score can be indicative of the presence of cancer. In some cases, a methylation score lower than a threshold score can be indicative of the absence of cancer.
- a method for classifying a sample derived from a subject The sample can be analyzed for at least a portion of a set of DMRs specific to the subject. The analysis for at least a portion of a set of DMRs specific to the subject can generate an output indicative of cancer.
- Analyzing can comprise sequencing, wherein the sequencing can have a depth of at most 50 M single reads.
- the sequencing depth can be a depth of at most 10 million (M) single reads. In some cases, the sequencing depth can be a depth of at most 10 million (M) single reads, at most 20 M single reads, at most 30 M single reads, at most 40 M single reads, at most 50 M single reads, at most 60 M single reads, at most 70 M single reads, at most 80 M single reads, at most 90M single reads, or at most 100 M reads.
- M 10 million
- the sequencing depth can be a depth from 1 M single reads to 10 M single reads, from 10 M single reads to 20 M single reads, from 20 M single reads to 30 M single reads, from 30 M single reads to 40 M single reads, from 40 M single reads to 50 M single reads, from 50 M single reads to 60 M single reads, from 60 M single reads to 70M single reads, from 70 M single reads to 80 M single reads, from 80 M single reads to 90 M single reads, or from 90 M single reads to 100 M single reads.
- the sequencing depth can be a depth of at least 1 M single reads, at least 10 M single reads, at least 20 M single reads, at least 30 M single reads, at least 40 M single reads, at least 50 M single reads, at least 60 M single reads, at least 70 M single reads, at least 80 M single reads, at least 90 M single reads, at least 100 M single reads, or at least 200 M single reads.
- the method for classifying further comprises providing a universal panel of genomic regions (e.g., proto-DMRs).
- the universal panel of genomic regions e.g., proto-DMRs
- the universal panel of genomic regions can be used to enrich for one or more genomic regions of the subject or one or more subjects (e.g., cancer subjects, non-cancer subjects).
- the universal panel of genomic regions can be compared to one or more reference genomic regions to generate a set of DMRs specific to a reference sample disclosed herein and/or a set of anti-DMRs specific to the reference sample disclosed herein.
- the universal panel of genomic regions can be compared to one or more reference genomic regions to generate a set of DMRs specific to one or more reference samples disclosed herein and/or a set of anti-DMRs specific to the one or more reference samples disclosed herein.
- the anti- DMRs specific to the one or more reference samples, the DMRs specific to the one or more reference samples, the DMRs specific to the reference sample, and/or the DMRs specific to the reference sample can be used as disclosed herein to generate a set of DMRs specific to the subject and a set of anti-DMRs specific to the subject.
- the set of DMRs specific to the subject and the set of anti-DMRs specific to the subject can be used as disclosed herein to compute a methylation score indicative of the cancer.
- the output indicative of the cancer can be used to determine progression of the cancer. In some cases, the output indicative of the cancer can be used to determine regression of the cancer. In some cases, the output indicative of the cancer can be used to determine therapy of the cancer. In some cases, the output indicative of the cancer can be used to determine progression of the cancer. In some cases, the output indicative of the cancer can be used to determine regression of the cancer. In some cases, the output indicative of the cancer can be used to determine therapy response for the cancer.
- MRD Minimal Residual Disease
- a biological sample from a subject can be assayed. Assaying may not comprise assaying a solid tumor sample of the subject.
- the biological sample can be a nontissue sample (e.g., a blood sample and/or a plasma sample).
- the biological sample can be a blood sample and/or a plasma sample.
- the method can be a tumor naive approach.
- the method disclosed herein can detect MRD can be detected at a specificity of at least 90%.
- the method disclosed herein can detect MRD at a specificity of at least about 60%, at least about 70%, at least about 80%, at least about 81%, at least about 82%, at least about 83%, at least about 84%, at least about 85%, at least about 86%, at least about 87%, at least about 88%, at least about 89%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, at least about 99.5%, at least about 99.6%, at least about 99.7%, at least about 99.8%, at least about 99.9%, or any percentage in between the numbers.
- the method disclosed herein can detect MRD at a specificity of at most about 80%, at most about 81%, at most about 82%, at most about 83%, at most about 84%, at most about 85%, at most about 86%, at most about 87%, at most about 88%, at most about 89%, at most about 90%, at most about 91%, at most about 92%, at most about 93%, at most about 94%, at most about 95%, at most about 96%, at most about 97%, at most about 98%, at most about 99%, at most about 99.5%, at most about 99.6%, at most about 99.7%, at most about 99.8%, at most about 99.9%, or any percentage in between the numbers. In some cases, the method disclosed herein can detect MRD at a specificity of 100%.
- the method disclosed herein can detect MRD at a sensitivity of at least about 10%, at least about 20%, at least about 30%, at least about 40%, at least 50%, at least about 60%, at least about 70%, at least about 80%, at least about 81%, at least about 82%, at least about 83%, at least about 84%, at least about 85%, at least about 86%, at least about 87%, at least about 88%, at least about 89%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, at least about 99.5%, at least about 99.6%, at least about 99.7%, at least about 99.8%, at least about 99.9%, or any percentage in between the numbers.
- the method disclosed herein can detect MRD at a sensitivity of at most about 10%, at most about 20%, at most about 30%, at most about 40%, at most 50%, at most about 60%, at most about 70%, at most about 80%, at most about 81%, at most about 82%, at most about 83%, at most about 84%, at most about 85%, at most about 86%, at most about 87%, at most about 88%, at most about 89%, at most about 90%, at most about 91%, at most about 92%, at most about 93%, at most about 94%, at most about 95%, at most about 96%, at most about 97%, at most about 98%, at most about 99%, at most about 99.5%, at most about 99.6%, at most about 99.7%, at most about 99.8%, at most about 99.9%, or any percentage in between the numbers. In some cases, the method disclosed herein can detect MRD at a sensitivity of 100%.
- MRD Minimal Residual Disease
- a biological sample from a subject can be assayed.
- assaying may not comprise assaying a solid tumor sample of the subject.
- assaying can comprise assaying a tumor sample of the subject.
- assaying can comprise assaying one or more non-tissue sample of the subject (e.g., a blood sample and/or plasma sample).
- assaying may not comprise assaying one or more non-tissue sample of the subject (e.g., a blood sample and/or plasma sample).
- assaying comprise sequencing one or more genomic regions in the biological sample. Sequencing can have a depth of at most 50 M single reads.
- the MRD can be head and neck cancer.
- the method can comprise assaying at least a portion of a set of methylated genomic regions of a sample to generate an output indicative of cancer.
- the assaying can comprise sequencing the at least portion of the set of methylated genomic regions.
- the sequencing depth of one or more methylated genomic regions can be at most 50 M single reads.
- the sequencing depth of one or more genomic regions can be a depth of at most 10 million (M) single reads, at most 20 M single reads, at most 30 M single reads, at most 40 M single reads, at most 50 M single reads, at most 60 M single reads, at most 70 M single reads, at most 80 M single reads, at most 90M single reads, or at most 100 M reads.
- M 10 million
- the sequencing depth can be a depth from 1 M single reads to 10 M single reads, from 10 M single reads to 20 M single reads, from 20 M single reads to 30 M single reads, from 30 M single reads to 40 M single reads, from 40 M single reads to 50 M single reads, from 50 M single reads to 60 M single reads, from 60 M single reads to 70M single reads, from 70 M single reads to 80 M single reads, from 80 M single reads to 90 M single reads, or from 90 M single reads to 100 M single reads.
- the sequencing depth can be a depth of at least 1 M single reads, at least 10 M single reads, at least 20 M single reads, at least 30 M single reads, at least 40 M single reads, at least 50 M single reads, at least 60 M single reads, at least 70 M single reads, at least 80 M single reads, at least 90 M single reads, at least 100 M single reads, or at least 200 M single reads.
- the sequencing depth of one or more methylated genomic regions can be a depth of at most 10 million (M) single reads, at most 20 M single reads, at most 30 M single reads, at most 40 M single reads, at most 50 M single reads, at most 60 M single reads, at most 70 M single reads, at most 80 M single reads, at most 90M single reads, or at most 100 M reads.
- M 10 million
- the sequencing depth can be a depth from 1 M single reads to 10 M single reads, from 10 M single reads to 20 M single reads, from 20 M single reads to 30 M single reads, from 30 M single reads to 40 M single reads, from 40 M single reads to 50 M single reads, from 50 M single reads to 60 M single reads, from 60 M single reads to 70M single reads, from 70 M single reads to 80 M single reads, from 80 M single reads to 90 M single reads, or from 90 M single reads to 100 M single reads.
- the sequencing depth can be a depth of at least 1 M single reads, at least 10 M single reads, at least 20 M single reads, at least 30 M single reads, at least 40 M single reads, at least 50 M single reads, at least 60 M single reads, at least 70 M single reads, at least 80 M single reads, at least 90 M single reads, at least 100 M single reads, or at least 200 M single reads.
- the sequencing depth of one or more methylated genomic regions after proto-DMR enrichment can be a depth of at most 10 million (M) single reads, at most 20 M single reads, at most 30 M single reads, at most 40 M single reads, at most 50 M single reads, at most 60 M single reads, at most 70 M single reads, at most 80 M single reads, at most 90M single reads, or at most 100 M reads.
- M 10 million
- the sequencing depth can be a depth from 1 M single reads to 10 M single reads, from 10 M single reads to 20 M single reads, from 20 M single reads to 30 M single reads, from 30 M single reads to 40 M single reads, from 40 M single reads to 50 M single reads, from 50 M single reads to 60 M single reads, from 60 M single reads to 70M single reads, from 70 M single reads to 80 M single reads, from 80 M single reads to 90 M single reads, or from 90 M single reads to 100 M single reads.
- the sequencing depth can be a depth of at least 1 M single reads, at least 10 M single reads, at least 20 M single reads, at least 30 M single reads, at least 40 M single reads, at least 50 M single reads, at least 60 M single reads, at least 70 M single reads, at least 80 M single reads, at least 90 M single reads, at least 100 M single reads, or at least 200 M single reads.
- a method for an output indicative of cancer comprising assaying a sample for at least a portion of a set of differentially methylated regions (DMRs) specific to a subject to generate an output indicative of cancer.
- the output indicative of cancer can be the presence or absence of cancer.
- the output indicative of cancer can be generated at a specificity of at least 90%.
- the output indicative of cancer can be generated at a specificity of at least about 60%, at least about 70%, at least about 80%, at least about 81%, at least about 82%, at least about 83%, at least about 84%, at least about 85%, at least about 86%, at least about 87%, at least about 88%, at least about 89%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, at least about 99.5%, at least about 99.6%, at least about 99.7%, at least about 99.8%, at least about 99.9%, or any percentage in between the numbers.
- the output indicative of cancer can be generated at at a specificity of most about 60%, at most about 70%, at most about 80%, at most about 81%, at most about 82%, at most about 83%, at most about 84%, at most about 85%, at most about 86%, at most about 87%, at most about 88%, at most about 89%, at most about 90%, at most about 91%, at most about 92%, at most about 93%, at most about 94%, at most about 95%, at most about 96%, at most about 97%, at most about 98%, at most about 99%, at most about 99.5%, at most about 99.6%, at most about 99.7%, at most about 99.8%, at most about 99.9%, or any percentage in between the numbers.
- the output indicative of cancer can be generated at a specificity of 100%.
- the output indicative of cancer can be generated at a sensitivity of at least at least about 10%, at least about 20%, at least about 30%, at least about 40%, at least 50%, about 60%, at least about 70%, at least about 80%, at least about 81%, at least about 82%, at least about 83%, at least about 84%, at least about 85%, at least about 86%, at least about 87%, at least about 88%, at least about 89%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, at least about 99.5%, at least about 99.6%, at least about 99.7%, at least about 99.8%, at least about 99.9%, or any percentage in between the numbers.
- the output indicative of cancer can be generated at a sensitivity of at most about 10%, at most about 20%, at most about 30%, at most about 40%, at most 50%, at most about 60%, at most about 70%, at most about 80%, at most about 81%, at most about 82%, at most about 83%, at most about 84%, at most about 85%, at most about 86%, at most about 87%, at most about 88%, at most about 89%, at most about 90%, at most about 91%, at most about 92%, at most about 93%, at most about 94%, at most about 95%, at most about 96%, at most about 97%, at most about 98%, at most about 99%, at most about 99.5%, at most about 99.6%, at most about 99.7%, at most about 99.8%, at most about 99.9%, or any percentage in between the numbers.
- the output indicative of cancer can be generated at a sensitivity of 100%.
- the method disclosed herein can detect one or more cancers. In some cases, the method disclosed herein can detect one or more cancers early on (e.g., multi-cancer early detection). In some cases, the method disclosed herein can detect one or more cancers at a specificity of at least about 60%, at least about 70%, at least about 80%, at least about 81%, at least about 82%, at least about 83%, at least about 84%, at least about 85%, at least about 86%, at least about 87%, at least about 88%, at least about 89%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, at least about 99.5%, at least about 99.6%, at least about 99.7%, at least about 99.8%, at least about 99.9%, or any percentage in between the numbers.
- the method disclosed herein can detect one or more cancers at a specificity of at most about 60%, at most about 70%, at most about 80%, at most about 81%, at most about 82%, at most about 83%, at most about 84%, at most about 85%, at most about 86%, at most about 87%, at most about 88%, at most about 89%, at most about 90%, at most about 91%, at most about 92%, at most about 93%, at most about 94%, at most about 95%, at most about 96%, at most about 97%, at most about 98%, at most about 99%, at most about 99.5%, at most about 99.6%, at most about 99.7%, at most about 99.8%, at most about 99.9%, or any percentage in between the numbers. In some cases, the method disclosed herein can detect one or more cancers at a specificity of 100%.
- the method disclosed herein can detect one or more cancers at a sensitivity of at least about 10%, at least about 20%, at least about 30%, at least about 40%, at least 50%, at least about 60%, at least about 70%, at least about 80%, at least about 81%, at least about 82%, at least about 83%, at least about 84%, at least about 85%, at least about 86%, at least about 87%, at least about 88%, at least about 89%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, at least about 99.5%, at least about 99.6%, at least about 99.7%, at least about 99.8%, at least about 99.9%, or any percentage in between the numbers.
- the method disclosed herein can detect one or more cancers at a sensitivity of at most about 10%, at most about 20%, at most about 30%, at most about 40%, at most 50%, at most about 60%, at most about 70%, at most about 80%, at most about 81%, at most about 82%, at most about 83%, at most about 84%, at most about 85%, at most about 86%, at most about 87%, at most about 88%, at most about 89%, at most about 90%, at most about 91%, at most about 92%, at most about 93%, at most about 94%, at most about 95%, at most about 96%, at most about 97%, at most about 98%, at most about 99%, at most about 99.5%, at most about 99.6%, at most about 99.7%, at most about 99.8%, at most about 99.9%, or any percentage in between the numbers.
- a sample from a biological sample can be taken from a subject.
- a nucleic acid sample can be derived from a sample obtained from a subject.
- the biological sample comprises a blood sample and/or a plasma sample.
- the biological sample comprises a tissue or cell sample.
- the subject can be healthy.
- the subject can be non-diseased.
- the subject can be a non-cancer subject.
- the subject can have or be suspected of having a disease or condition.
- the disease or condition can be a cancer or a tumor.
- cancer include breast cancer, bladder cancer, colorectal cancer, endometrial cancer, prostate cancer, renal cancer, pancreatic cancer, or lung cancer.
- the cell-free nucleic acids (e.g., cell-free DNA (cfDNA)) in a biological sample can be further treated to increase methylation level (e.g., in vitro enzymatic methylation).
- cell-free nucleic acids (e.g., cfDNA) can be partially methylated.
- cell-free nucleic acids (e.g., cfDNA) can be fully methylated.
- the one or more sites that are amenable to methylation enrichment are determined by determining methylation levels in a fully methylated control sample.
- the methylation control sample comprises cell-free DNA.
- the methylation control sample comprises genomic DNA.
- the genomic DNA can be subjected to shearing.
- the fully methylated control sample comprises nucleic acids subjected to in vitro methylation (e.g., in vitro enzymatic methylation).
- increasing methylation level can be performed using in vitro methylation by an enzyme directed to alter methylation levels of nucleic acid fragments.
- the enzyme can be CpG methyltransferase.
- in vitro methylation can result in partial methylation.
- in vitro methylation can result in full methylation.
- in vitro methylation can result in hypermethylation.
- a methylated DNA binder e.g., a 5mC antibody
- cfMeDIP can be used to obtain and/or enrich for methylated or hypermethylated nucleic acids (e.g., fully methylated control sample).
- methylation enrichment can be performed by subjecting the nucleic acid molecules (e.g., cell free nucleic acid molecules) to methylated DNA immunoprecipitation (MeDIP), cell-free methyl-CpG binding domain (cfMBD), methyl-CpG binding domain (MBD), methylation-dependent immunoprecipitation (MDIP), methylation-sensitive restriction enzyme (MSRE), TET- assisted pyridine borane sequencing (TAPS) with methylation specific PCR, bisulfite conversion with methylation specific PCR, and/or methylation-specific hybrid capture, or other derivatives thereof.
- MeDIP methylated DNA immunoprecipitation
- cfMBD cell-free methyl-CpG binding domain
- MBD methyl-dependent immunoprecipitation
- MDIP methylation-sensitive restriction enzyme
- Hypermethylated nucleic acids can be further incubated and amplified. Hypermethylated nucleic acids can create a validated set of genomic regions which can more accurately distinguish between cancer and healthy samples in cfDNA.
- a hypermethylated section of DNA may be methylated at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or more across all bases.
- a hypermethylated section of DNA may be methylated at most 50%, at most 55%, at most 60%, at most 65%, at most 70%, at most 75%, at most 80%, at most 85%, at most 90%, at most 95%, or more across all bases.
- a hypermethylated section of DNA may be tightly packed, resulting in a silenced gene.
- assaying methylation levels of a plurality of regions comprises sequencing the cell-free nucleic acids, or derivatives thereof.
- sequencing can be bisulfite sequencing.
- sequencing does not comprise bisulfite sequencing.
- sequencing comprises targeted sequencing.
- sequencing comprises using a plurality of capture probes.
- the plurality of capture probes comprises probes that are homologous or complementary to the plurality of regions.
- the plurality of capture probes comprises probes that are homologous or complementary to regions with a known amount of methylation.
- the plurality of capture probes comprises probes that are homologous or complementary to regions with no CpG methylation.
- sequencing generates sequencing reads corresponding to the plurality of regions.
- assaying comprises counting a number of sequencing reads corresponding to a region of a plurality of regions.
- assaying methylation levels of a plurality of regions comprises preparing cell-free nucleic acids (e.g., cfDNA) to undergo Cell-free Methylated DNA ImmunoPrecipitation sequencing (cfMeDIP-seq), as illustrated in FIG. 1.
- cell-free nucleic acids e.g., cfDNA
- cfMeDIP-seq Cell-free Methylated DNA ImmunoPrecipitation sequencing
- FIG. 1 cell-free nucleic acids
- cell- free nucleic acids e.g., cfDNA
- spike-in control DNA undergo end-pairing, A-tailing, adapter ligation, and/or other preparation thereof to permit sequencing.
- a spike-in control DNA can not be supplemented.
- a first plurality of nucleic acid molecules (e.g., comprising nucleic acid molecules, such as cfDNA, from a biological sample of a subject) may be combined (e.g., mixed) with a second plurality of nucleic acid molecules (e.g., wherein the second plurality of nucleic acid molecules can not be from the subject from whom the biological sample was taken), for instance, as shown in FIG. 1.
- the second plurality of nucleic acid molecules comprises supplemental processed nucleic acid (e.g., supplemental processed DNA), (e.g., comprising lambda DNA).
- each of the second plurality of nucleic acid molecules does not align to a human genome.
- cell-free nucleic acids can further undergo enrichment of methylated nucleic acids.
- cell-free nucleic acids can undergo immunoprecipitation by pulling down the hypermethylated regions with a binder disclosed herein (e.g., protein comprises a methyl-CpG domain).
- the binder exhibits a reduced level of a non-specific binding to non-methylated nucleotides of the cell free nucleic acid molecule or a sheared genomic nucleic acid molecule.
- a binder can be a binder selective for a methylated region of nucleic acid molecules (e.g., a methylcytosine binder (MBD), such as an MBD-Fc fusion protein).
- a binder may be specific to one or more methylated nucleotide species (e.g., 5-methylcytosine (5mC)), for instance, as shown in FIG. 1.
- Filler DNAs as disclosed herein can also be added to the cell-free nucleic acids (e.g., cfDNA). In some cases, filler DNAs may not be added.
- enriching of methylated nucleic acids can comprise enriching for one or more methylated regions that correspond to one or more control genomic regions.
- the one or more reference genomic regions can have one or more genomic sequences that are complementary to one or more sequences in the proto-DMRs.
- the one or more control genomic regions can comprise one or more hypomethylated regions, regions of nonmethylation, or regions that are amenable to methylation enrichment, or any combinations thereof.
- methylated regions of nucleic acid molecules may be purified (e.g., after library creation) to yield a plurality of purified nucleic acid molecules, for example, prior to or as part of a process of determining or identifying a sequence of all or a portion of the methylated nucleic acid molecule population.
- all, or a portion of the plurality of purified nucleic acid molecules may be amplified (e.g., via polymerase chain reaction), for instance, prior to or as part of a process of determining or identifying a sequence of all or a portion of the methylated nucleic acid molecule population.
- a population of amplified nucleic acid molecules or a derivative thereof e.g., comprising amplicons of all or a portion of the plurality of purified nucleic acid molecules
- sequencing e.g., for the determination and/or identification of a sequence of the nucleic acid molecules.
- cell-free nucleic acids e.g., cfDNA
- hypermethylated regions enriched can undergo sequencing that does not comprise bisulfite sequencing.
- determining methylation levels in the fully methylated control sample comprises sequencing.
- sequencing can be bisulfite sequencing.
- sequencing can be bisulfite sequencing paired with methylation specific PCR.
- sequencing does not comprise bisulfite sequencing.
- sequencing comprises targeted sequencing.
- determining methylation levels in the fully methylated control sample comprises preparing methylation control sample (e.g., nucleic acids subjected to in vitro methylation enrichment) to undergo cfMeDIP-seq, as illustrated in FIG. 1.
- methylation control sample e.g., nucleic acids subjected to in vitro methylation
- the methylation control sample may not be supplemented with a spike-in control DNA.
- a first plurality of nucleic acid molecules e.g., comprising nucleic acid molecules, such as cfDNA, from a biological sample of a subject
- a second plurality of nucleic acid molecules e.g., wherein the second plurality of nucleic acid molecules may not be from the subject from whom the biological sample was taken
- the second plurality of nucleic acid molecules comprises supplemental processed nucleic acid (e.g., comprising lambda DNA).
- each of the second plurality of nucleic acid molecules does not align to a human genome.
- methylation control sample e.g., nucleic acids subjected to in vitro methylation
- methylation control sample e.g., nucleic acids subjected to in vitro methylation
- a binder exhibits a reduced level of a non-specific binding to non-methylated nucleotides of the cell free nucleic acid molecule or a sheared genomic nucleic acid molecule.
- a binder can be a binder selective for a methylated region of nucleic acid molecules (e.g., a methylcytosine binder (MBD), such as an MBD-Fc fusion protein).
- MBD methylcytosine binder
- a binder may be specific to one or more methylated nucleotide species (e.g., 5-methylcytosine (5mC)), for instance, as shown in FIG. 1.
- Filler DNAs as disclosed herein can also be added to the methylation control sample.
- methylated regions of the methylation control sample may be purified (e.g., after library creation) to yield a plurality of purified nucleic acid molecules, for example, prior to or as part of a process of determining or identifying a sequence of all or a portion of the methylated nucleic acid molecule population.
- all, or a portion of the plurality of purified nucleic acid molecules may be amplified (e.g., via polymerase chain reaction), for instance, prior to or as part of a process of determining or identifying a sequence of all or a portion of the methylated nucleic acid molecule population.
- a population of amplified nucleic acid molecules or a derivative thereof e.g., comprising amplicons of all or a portion of the plurality of purified nucleic acid molecules
- sequencing e.g., for the determination and/or identification of a sequence of the nucleic acid molecules.
- cell-free nucleic acids e.g., cfDNA
- hypermethylated regions enriched can undergo sequencing that does not comprise bisulfite sequencing.
- a methylation level of a particular nucleic acid fragments may be considered to reach the threshold methylation level when a binder with a sufficient specificity for methylated cytosines can be able to bind to the particular nucleic acid fragments either with or without using filler DNA as described here.
- a methylation level of particular nucleic acid fragments e.g., DNA fragments, plurality of regions described herein
- depletion of a plurality of nucleic acid molecules results in a remainder population of the plurality of nucleic acid molecules, wherein the remainder of the plurality of nucleic acid molecules comprises (or, in some cases, consists of) nucleic acid molecules having a methylation level below the threshold methylation level (e.g., wherein the remainder population can be hypomethylated/unmethylated relative to one or more nucleic acid molecules removed from the plurality of nucleic acid molecules during depletion).
- a methylation level may be calculated as a percentage of hypermethylated nucleic acid fragments compared to all the nucleic acid fragments contained in a sample.
- a methylation level may be calculated as a percentage of hypermethylated nucleic acid fragments compared to nucleic acid fragments that are in regions that are amenable to methylation enrichment that are contained in a sample.
- a threshold methylation level can be from 0.1% to 1%, 1% to 5%, 5% to 10%, 10% to 15%, 15% to 20%, 20% to 25%, 25% to 30%, 30% to 35%, 35% to 40%, 40% to 45%, 45% to 50%, 50% to 55%, 55% to 60%, 65% to 70%, 70% to 75%, 75% to 80%, 80% to 85%, 85% to 90%, 95% to 100%, at least 1%, at least 5%, at least 10%, at least 15%, at least 20%, at least 25%, at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at most 1%, at most 5%, at most 10%, at most 15%, at most 20%, at most 25%, at most 30%, at most 35%, at most 40%, at most 45%, at most 50%, at most 55%, at most 60%
- processing methylated levels of the plurality of regions to identify a methylation of background comprises counting a number of sequencing reads for each region of the plurality of regions. In some cases, processing further comprises generating an average of sequencing read counts for all regions of the plurality of regions thereby generating the methylation background.
- the methylation background corresponds to the total number of reads in a region. For example, the methylation background may correspond to the total number of reads in a region comprising an anti-DMR.
- processing methylation levels for a subset of the plurality of regions comprises counting a number of sequencing reads for each region of the subset of plurality of regions.
- processing further comprises generating an average of sequencing read counts for all regions of the subset of the plurality of regions thereby generating the DMR specific methylation level.
- Sequences reads may be filtered for particular characteristics, (e.g., length, or number of CpGs), and reads that are filter out may be ignored or omitted for the read counts.
- the average of sequencing read counts may be a weighted average.
- the DMRs are identified by differential methylation analysis of test sample and control sample.
- the test sample can be derived from a subject with cancer.
- the test sample can be derived from a subject without cancer.
- the control sample can be derived from a subject with cancer.
- the control sample can be derived from a subject without cancer.
- the DMRs e.g., DMR subsets
- the DMRs comprises one or more regions that exhibits hypermethylation in the test sample compared to the control sample.
- the DMRs (e.g., DMR subsets) comprise one or more regions that comprises DMRs specific to a particular cancer type.
- the DMRs comprises one or more regions that comprises DMRs that are not specific to a particular cancer type. In some cases, the DMRs (e.g., DMR subset) comprises one or more regions that comprises DMRs that are specific to a subject. In some cases, the DMRs (e.g., DMR subset) comprises one or more regions that comprises DMRs that are not specific to a subject. [0228] In some cases, processing methylation levels for a subset of the plurality of regions, wherein the subset comprises DMRs (e.g., DMR subsets), comprises counting a number of sequencing reads of nucleic acid regions with certain characteristics.
- a count for sequencing reads corresponding to length (e.g., fragment length).
- sequencing reads for fragments or nucleic acids that are ⁇ 150 bp can be identified and can be counted.
- the lengths may be ⁇ 170 bp, ⁇ 165 bp, ⁇ 160 bp, ⁇ 155 bp, ⁇ 150 bp, ⁇ 145 bp, ⁇ 140 bp, ⁇ 135 bp, ⁇ 130 bp, ⁇ 125 bp, ⁇ 120 bp, ⁇ 115 bp, ⁇ 110 bp, ⁇ 105 bp, or ⁇ 100 bp.
- a count for sequencing reads corresponding amounts to methylation such as the number of CpG motifs in a region may be performed.
- sequence read for regions of at least 5 CpGs may be counted.
- sequence read for regions of at least 1 CpG, 2 CpG, 3 CpG, 4 CpG, 5 CpG, 6 CpG, 7 CpG, 8 CpG, 9 CpG, 10 CpG, 11 CpG, 12 CpG, 13 CpG, 14 CpG, 15 CpG, 16 CpG, 17 CpG, 18 CpG, 19 CpG, 20 CpG, or more may be counted.
- the sequences read may be counted for a methylation amount or state and a length. For example, the sequencing reads with a methylation amount of 5 CpG and of less that 150 bp may be counted.
- levels of methylations may be normalized against a background. This normalization may reduce or eliminate noise, or otherwise improve the sensitivity. The normalization may also remove or reduce non relevant signals.
- the levels of methylations may be determined for a set of DMRs. This set of DMRs may be normalized against a background of the methylation level of the plurality of regions comprising (i) one or more sites that are amenable to methylation enrichment and (ii) methylation at below a threshold in a non-diseased control (e.g., anti-DMRs).
- the levels of methylations may be determined for a set of DMRs by counting the number of reads that have certain characteristics (e.g., 5 CpG or more and/or a length of ⁇ 150 bp) in those DMRs.
- the levels of methylations may be determined for a set of DMRs by counting the number of reads that have 2 CpG or more and/or a length of ⁇ 150 bp in those DMRs.
- This set of sequencing reads may be normalized against a background of the total number of sequencing read counts for a set of regions that comprise anti-DMRs .
- generating the normalized DMR methylation level comprises dividing an average number of sequencing reads associated with the subset by an average number of sequencing reads associated with the plurality of regions. The average may be a weighted average.
- generating the normalized DMR methylation level can comprise one or more Baysian inference approaches and/or machine learning classifiers.
- the method of nucleic acids processing further comprises identifying a subject as having a disease, based on the normalized DMR methylation level.
- identifying a subject as having a disease comprises comparing the normalized DMR methylation level against a control level.
- the control level corresponds to an expected value for a non- cancerous sample.
- the control level corresponds to an expected value for a cancerous sample.
- the disease or condition can be a cancer or a tumor.
- cancer include breast cancer, bladder cancer, colorectal cancer, endometrial cancer, prostate cancer, renal cancer, pancreatic cancer, or lung cancer.
- the cancer can be a late-stage cancer. In some cases, the cancer can be an early-stage cancer.
- the methylation levels of a plurality of regions disclosed herein can be used to identify a background region that can be used for normalization of DMR methylation level to distinguish between healthy and cancerous cell-free nucleic acid samples.
- a background region can comprise at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 15, at least 20, at least 30, at least 40, at least 50, at least 60, at least 70, at least 80, at least 90, at least 100, at least 200, at least 300, at least 400, at least 500, at least 600, at least 700, at least 800, at least 900, at least 1,000, at least 5,000, at least 10,000, at least 20,000, at least 30,000, at least 40,000, at least 50,000, at least 60,000, at least 70,000, or more genomic windows or window regions.
- a genomic window or window region can have a length of at least about 10, at least about 20, at least about 30, at least about 40, at least about 50, at least about 60, at least about 70, at least about 80, at least about 90, at least about 100, at least about 150, at least about 200, at least about 250, at least about 300, at least about 350, at least about 400, at least about 450, at least about 500, at least about 550, at least about 600, at least about 650, at least about 700, at least about 750, at least about 800, at least about 850, at least about 900, at least about 950, at least about 1,000, at least about
- a genomic window can be about 300 bp in length.
- the genomic windows are adjacent to one another in the genome.
- the genomic windows can be non-adjacent regions on the genome.
- the genomic windows can be of dynamic length.
- identified DMR specific methylation level can be normalized by dividing the DMRs to the methylation background (e.g., methylation levels of a plurality of regions disclosed herein, e.g., regions of a genome that comprises (i) one or more sites that are amenable to methylation enrichment and (ii) methylation at below a threshold in a healthy control).
- a background region can be used to identify DMRs from healthy and/or cancerous cell-free nucleic acid samples.
- the DMRs identified by using background region comprises a subset of the DMRs identified by using genome-wide background.
- the DMRs identified by using a background region identify more DMRs as compared to using genome-wide background.
- a background region e.g., methylation levels of a plurality of regions disclosed herein, e.g., regions of a genome that comprises (i) one or more sites that are amenable to methylation enrichment and (ii) methylation at below a threshold in a healthy control
- SNR signal-to-noise
- any one of the methods disclosed herein comprises a reduction in a noise level compared to a noise level of a corresponding sample that has a normalized DMR methylation level generated by normalizing a DMR specific level against a background derived from a whole genome.
- any one of the methods disclosed herein comprises a reduction in noise level compared to a noise level of a corresponding sample that has a normalized DMR methylation level generated by normalizing a DMR specific level against a background derived from all genomic regions that are amenable to methylation enrichment.
- the use of a background region can identify more than at least about 100, at least about 150, at least about 200, at least about 250, at least about 300, at least about 350, at least about 400, at least about 450, at least about 500, at least about 550, at least about 600, at least about 650, at least about 700, at least about 750, at least about 800, at least about 850, at least about 900, at least about 1000, or more regions (e.g., DMRs, such as ctDNA specific DMRs, anti-DMRs, or proto- DMRs) compared to using genome-wide background or background derived from all genomic regions that are amenable to methylation enrichment.
- regions e.g., DMRs, such as ctDNA specific DMRs, anti-DMRs, or proto- DMRs
- the use of a background region can identify more than at least about 1-fold, at least about 2-fold, at least about 3-fold, at least about 4-fold, at least about 5-fold, at least about 6-fold, at least about 7- fold, at least about 8-fold, at least about 9-fold, at least about 10-fold, at least about 15-fold, at least about 20-fold, at least about 25-fold, at least about 30-fold, at least about 35-fold, at least about 40-fold, at least 45-fold, at least about 50-fold, at least about 100-fold, at least about 150-fold, at least about 200-fold, at least about 250-fold, at least about 300-fold, at least about 350-fold, at least about 400-fold, at least about 450-fold, at least about 500-fold, or more regions (e.g., DMRs, such as ctDNA specific DMRs, anti-DMRs, or proto-DMRs) compared to using genome-wide background or background derived from all genomic regions that are amenable to
- the use of a background region can identify more cell-free nucleic acids (e.g., cfDNAs) derived from tumor or cancer cells (e.g., ctDNAs) specific methylation level more than at least about 100, at least about 150, at least about 200, at least about 250, at least about 300, at least about 350, at least about
- DMRs such as ctDNA specific DMRs, anti- DMRs, or proto-DMRs
- regions e.g., DMRs, such as ctDNA specific DMRs, anti- DMRs, or proto-DMRs
- the use of a background region can identify more than at least about 1-fold, at least about 2-fold, at least about 3-fold, at least about 4-fold, at least about 5-fold, at least about 6-fold, at least about 7-fold, at least about 8-fold, at least about 9-fold, at least about 10-fold, at least about 15-fold, at least about 20-fold, at least about 25-fold, at least about 30-fold, at least about 35- fold, at least about 40-fold, at least 45-fold, at least about 50-fold, at least about 100-fold, at least about 150-fold, at least about 200-fold, at least about 250-fold, at least about 300-fold, at least about 350-fold, at least about 400-fold, at least about 450-fold, at least about 500- fold, or more DMRs compared to using genome-wide background or background derived from all genomic regions that are amenable to methylation enrichment.
- the use of a background region reduce the run time to identify DMRs (e.g., ctDNA specific DMRs) by at least about 5 minutes, at least about 10 minutes, at least about 15 minutes, at least about 20 minutes, at least about 25 minutes, at least about 30 minutes, at least about 35 minutes, at least about 40 minutes, at least about 45 minutes, at least about 50 minutes, at least about 55 minutes, at least about 1 hour, at least about 1.5 hours, at least about 2 hours, at least about 2.5 hours, at least about 3 hours, at least about 3.5 hours, at least about 4 hours, at least about 4.5 hours, at least about 5 hours, at least about 5.5 hours, at least about 6 hours, at least about 6.5 hours, at least about 7 hours, at least about 7.5 hours, at least about 8 hours, at least about 8.5 hours, at least about 9 hours, at least about 9.5 hours, at least about 10 hours, or more as compared to using genome-wide background or background derived from all genomic regions that are amenable to methylation
- the use of a background region reduce the run time to identify DMRs (e.g., ctDNA specific DMRs) by about 1-fold, at least about 2-fold, at least about 3-fold, at least about 4-fold, at least about 5-fold, at least about 6-fold, at least about 7-fold, at least about 8-fold, at least about 9-fold, at least about 10-fold, at least about 15-fold, at least about 20-fold, at least about 25-fold, at least about 30-fold, at least about 35-fold, at least about 40- fold, at least 45-fold, at least about 50-fold, or more as compared to using genome-wide background or background derived from all genomic regions that are amenable to methylation enrichment to identify DMRs (e.g., ctDNA specific DMRs) to identify DMRs (e.g., ctDNA specific DMRs).
- DMRs e.g., ctDNA specific DMRs
- the background region e.g., methylation levels of a plurality of regions disclosed herein, e.g., regions of a genome that comprises (i) one or more sites that are amenable to methylation enrichment and (ii) methylation at below a threshold in a healthy control
- the use of the background region can be not limited to any specific type of cancer (e.g., pan cancer).
- the panel of the background region e.g., methylation levels of a plurality of regions disclosed herein, e.g., regions of a genome that comprises (i) one or more sites that are amenable to methylation enrichment and (ii) methylation at below a threshold in a healthy control
- the same panel can be used to assess samples from any cancer types to identify DMRs within the plurality of regions described herein.
- supplemental processed nucleic acid may be added to a first plurality of nucleic acids (e.g., a plurality of nucleic acids from a biological sample, which may comprise cfDNA from healthy tissue and/or cfDNA from tumor tissue, such as ctDNA), for instance as shown in FIG. 1.
- the supplemental processed nucleic acid can be DNA and/or RNA.
- addition of supplemental processed nucleic acid (e.g., a second plurality of nucleic acid molecules) to a first plurality of nucleic acid molecules can increase the specificity and/or sensitivity of a method, or system described herein, for instance, with respect to the detection and/or identification of a nucleic acid sequence of the first plurality of nucleic acid molecules.
- addition of supplemental processed nucleic acid (e.g., a second plurality of nucleic acid molecules) to a first plurality of nucleic acid molecules may increase the rate of depletion of a methylated region of a nucleic acid sequence, e.g., during the practice of some embodiments of methods and systems described herein.
- supplemental processed nucleic acid e.g., a second plurality of nucleic acid molecules
- a first plurality of nucleic acid molecules e.g., comprising cfDNA of a biological sample
- supplemental processed nucleic acid e.g., the second plurality of nucleic acid molecules
- a predetermined total mass for use in a method or system described herein can be from 20 ng to 30 ng, from 30 ng to 40 ng, from 40 ng to 50 ng, from 50 ng to 60 ng, from 60 ng to 70 ng, from 70 ng to 80 ng, from 80 ng to 90 ng, from 90 ng to 100 ng, from 100 ng to 110 ng, from 110 ng to 120 ng, from 120 ng to 130 ng, from 130 ng to 140 ng, from 140 ng to 150 ng, from 150 ng to 160 ng, from 160 ng to 170 ng, from 170 ng to 180 ng, from 180 ng to 190 ng, from 190 ng to 200 ng, greater than 200 ng, or less than 20 ng.
- an amount of supplemental processed nucleic acid from 1 ng to 5 ng, from 5 ng to 10 ng, from 10 ng to 20 ng, from 20 ng to 30 ng, from 30 ng to 40 ng, from 40 ng to 50 ng, from 50 ng to 60 ng, from 60 ng to 70 ng, from 70 ng to 80 ng, from 80 ng to 90 ng, from 90 ng to 100 ng, from 100 ng to 110 ng, from 110 ng to 120 ng, from 120 ng to 130 ng, from 130 ng to 140 ng, from 140 ng to 150 ng, from 150 ng to 160 ng, from 160 ng to 170 ng, from 170 ng to 180 ng, from 180 ng to 190 ng, from 190 ng to 200 ng, greater than 200 ng, less than 20 ng, less than 10 ng, or less than 5 ng can be added to a first plurality of nucleic acid molecules (e.g., to bring the
- the present disclosure comprises methods and systems for filling in the sample with an amount of supplemental processed nucleic acid (e.g., filler DNA) to generate a mixture sample, wherein the mixture sample comprises at least about 50ng, 55ng, 60ng, 65ng, 70ng, 75ng, 80ng, 85ng, 90ng, 95ng, lOOng, 120ng, 140ng, 160ng, 180ng, 200ng, or any amount in between the numbers of the total amount of the nucleic acid mixture.
- supplemental processed nucleic acid e.g., filler DNA
- the supplemental processed DNA comprises at least about 5%, 10%, 15%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, or 100% methylated supplemental processed nucleic acid with remainder being unmethylated supplemental processed nucleic acid, in some cases between 5% and 50%, or between 10%-40%, or between 15%-30% methylated supplemental processed DNA.
- the mixture sample comprise an amount of supplemental processed DNA from about 20 ng to about 100 ng, about 30 ng to about 100 ng, or about 50 ng to about 100 ng.
- the cell-free DNA from the sample and the first amount of supplemental processed DNA together comprises at least 50 ng of total nucleic acid.
- the cfDNA from the sample and the first amount of supplemental processed DNA together comprises at least 100 ng of total nucleic acid.
- the supplemental processed DNA may not be needed and/or required for any one of the methods disclosed herein.
- supplemental processed nucleic acid may be produced by fragmentation (e.g., via sonication).
- the supplemental processed nucleic acid may be about 50 bp to about 800 bp long, about 100 bp to about 600 bp long, or about 200 bp to about 600 bp long.
- the supplemental processed nucleic acid can be double stranded.
- the supplemental processed nucleic acid may be double stranded DNA.
- the supplemental processed nucleic acid may be junk nucleic acid.
- the supplemental processed nucleic acid may also be endogenous or exogenous nucleic acid.
- the supplemental processed DNA can be non-human nucleic acid, such as DNA.
- DNA refers to Enterobacteria phage DNA.
- the supplemental processed nucleic acid has no alignment to human nucleic acid.
- supplemental nucleic acid can be hypermethylated.
- a sample can be any biological sample isolated from a subject.
- a sample may comprise, without limitation, bodily fluid, whole blood, platelets, serum, plasma, stool, white blood cells or leukocytes, endothelial cells, tissue biopsies, synovial fluid, lymphatic fluid, ascites fluid, interstitial or extracellular fluid, the fluid in spaces between cells, including gingival crevicular fluid, bone marrow, cerebrospinal fluid, saliva, mucous, sputum, semen, sweat, urine, fluid from nasal brushings, fluid from a pap smear, or any other bodily fluids.
- a bodily fluid may include saliva, blood, or serum.
- a sample may also be a tumor sample, which may be obtained from a subject by various approaches, including, but not limited to, venipuncture, excretion, ejaculation, massage, biopsy, needle aspirate, lavage, scraping, surgical incision, or intervention or other approaches.
- a sample may be a cell-free sample (e.g., substantially free of cells). DNA samples may be denatured, for example, using sufficient heat.
- the sample may be taken from a subject with a disease or disorder.
- the sample may be taken from a subject suspected of having a disease or a disorder.
- the sample can be taken from a subject suspected of having minimal residual disease.
- the sample can be taken from a subject suspected of with minimal residual disease.
- the sample may be obtained before and/or after treatment of a subject with a disease or disorder. Samples may be obtained from a subject during a treatment or a treatment regime. Multiple samples may be obtained from a subject to monitor the effects of the treatment over time.
- the disease or disorder may be a cancer.
- a cancer can be a late-stage or an early-stage cancer.
- cancer types include suitable for detection with the methods according to the disclosure include acute lymphoblastic leukemia, acute myeloid leukemia, adrenocortical carcinoma, AIDS-related cancers, AIDS-related lymphoma, anal cancer, appendix cancer, astrocytomas, basal cell carcinoma, bile duct cancer, bladder cancer, bone cancers, brain tumors, such as cerebellar astrocytoma, cerebral astrocytoma/malignant glioma, ependymoma, medulloblastoma, supratentorial primitive neuroectodermal tumors, visual pathway and hypothalamic glioma, breast cancer, bronchial adenomas, Burkitt lymphoma, carcinoma of unknown primary origin, central nervous system lymphoma, cerebellar astrocytoma, cervical cancer, childhood cancers, chronic lymphocytic leukemia, chronic myelogenous leukemia, chronic myeloproliferative disorders, colon cancer
- the sample may be taken from a healthy individual.
- the sample may be taken from an individual that may not be suffering from a disease.
- the sample may be taken from an individual that may not be suffering from cancer.
- samples may be taken longitudinally from the same individual.
- samples acquired longitudinally may be analyzed with the goal of monitoring individual health and early detection of health issues.
- the sample may be taken at a first time to analyze for a set of markers (e.g., DMRs).
- the sample may be used to initial determine a set of markers that are specific to the subject.
- a second sample may be taken from the sample at a later time to monitor the markers.
- a subject with a cancer may initially be analyzed.
- a second sample may be analyzed after the initiation of a therapy regimen.
- the monitoring may allow for the efficacy of a therapy regimen or treatment to be analyzed.
- a DMR associated with a subject’s cancer may be analyzed and observed to be absent from a subject.
- the sample may be collected at a home setting or at a point-of-care setting and subsequently transported by a mail delivery, courier delivery, or other transport method prior to analysis.
- a home user may collect a blood sample through a finger prick, which blood sample may be subsequently transported by mail delivery prior to analysis.
- samples acquired longitudinally may be used to monitor response to stimuli expected to impact healthy, athletic performance, or cognitive performance. Nonlimiting examples include response to medication, dieting, or an exercise regimen.
- the present disclosure provides a system, method, or kit that includes or uses one or more biological samples.
- the one or more samples used herein may comprise any substance containing or presumed to contain nucleic acids.
- a sample may include a biological sample obtained from a subject.
- a biological sample can be a liquid sample.
- the sample comprises less than about 100 ng, 90 ng, 80 ng, 75 ng, 70ng, 60 ng, 50 ng, 40 ng, 30 ng, 20 ng, 10 ng, 5 ng, or any amount in between the numbers of cell-free nucleic acid molecules.
- the sample comprises less than about 1 pg, less than about 5 pg, less than about 10 pg, less than about 20 pg, less than about 30 pg, less than about 40 pg, less than about 50 pg, less than about 100 pg, less than about 200 pg, less than about 500 pg, less than about 1 ng, less than about 5 ng, less than about 10 ng, less than about 20 ng, less than about 30 ng, less than about 40 ng, less than about 50 ng, less than about 100 ng, less than about 200 ng, less than about 500 ng, less than about 1000 ng, or any amount in between the numbers of cell-free nucleic acid molecules.
- creation or provision of a plurality of nucleic acid molecules from a biological sample can comprise performing one or more of end-repair, A-tailing, and/or adapter ligation on the plurality of nucleic acid molecules (e.g., after purification from the biological sample).
- a sample may be taken at a first time point and sequenced, and then another sample may be taken at a subsequent time point and sequenced.
- Such methods may be used, for example, for longitudinal monitoring purposes to track the development or progression of a disease.
- the progression of a disease may be tracked before treatment, after treatment, or during the course of treatment, to determine the treatment’s effectiveness.
- a method as described herein may be performed on a subject prior to, and after, a medical treatment to measure the disease’s progression or regression in response to the medical treatment.
- the sample may be processed to generate datasets indicative of a disease or disorder of the subject. For example, a presence, absence, or quantitative assessment of cell-free nucleic acid molecules (e.g., ctDNA molecules) of the sample at a panel of cancer-associated genomic loci or microbiome-associated loci may be indicative of a cancer of the subject.
- Processing the sample obtained from the subject may comprise (i) subjecting the sample to conditions that are sufficient to isolate, enrich, or extract a plurality of cell-free nucleic acid molecules, and (ii) assaying the plurality of cell- free nucleic acid molecules to generate the dataset (e.g., nucleic acid sequences).
- a plurality of cell-free nucleic acid molecules can be extracted from the sample and subjected to sequencing to generate a plurality of sequencing reads.
- the cell- free nucleic acid molecules may comprise cell-free ribonucleic acid (cfRNA) or cell-free deoxyribonucleic acid (cfDNA).
- the cell-free nucleic acid molecules e.g., cfRNA or cfDNA
- the cell-free nucleic acid molecule may be extracted from the sample by a variety of methods.
- the cell-free nucleic acid molecule may be enriched by a plurality of probes configured to enrich nucleic acid (e.g., RNA or DNA) molecules corresponding to a panel of cancer-associated genomic loci.
- the probes may have sequence complementarity with nucleic acid sequences from one or more of the genomic loci of the panel of cancer- associated genomic loci.
- the panel of cancer-associated genomic loci may comprise at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least about 25, at least about 30, at least about 35, at least about 40, at least about 45, at least about 50, at least about 55, at least about 60, at least about 65, at least about 70, at least about 75, at least about 80, at least about 85, at least about 90, at least about 95, at least about 100, at least 200, at least 300, at least 400, at least 500, at least 600. at least 700. at least 800.
- the probes may be nucleic acid molecules (e.g., RNA or DNA) having sequence complementarity with nucleic acid sequences (e.g., RNA or DNA) of the one or more genomic loci (e.g., cancer-associated genomic loci). These nucleic acid molecules may be primers or enrichment sequences.
- the assaying of the sample using probes that are selective for the one or more genomic loci may comprise use of array hybridization, polymerase chain reaction (PCR), or nucleic acid sequencing (e.g., RNA sequencing or DNA sequencing).
- PCR polymerase chain reaction
- nucleic acid sequencing e.g., RNA sequencing or DNA sequencing.
- Various assays may be used in methods of the present disclosure, such as library preparation (which may include polymerase chain reaction (PCR)) followed by sequencing (e.g., next-generation sequencing, Sanger sequencing, etc.).
- Next-generation sequencing (NGS) techniques also referred to as high-throughput sequencing, may include various sequencing technologies including: Illumina (Solexa) sequencing, Roche 454 sequencing, Ion torrent: Proton / PGM sequencing, SOLiD sequencing, long reads sequencing (Oxford Nanopore and Pactbio).
- NGS allow for the sequencing of DNA and RNA much more quickly and cheaply than the Sanger sequencing.
- the sequencing can be optimized for short read sequencing.
- the sequencing assays do not require amplification.
- Sequencing libraries that are hypermethylated may improve the specificity, the sensitivity, and/or the efficiency of methods and systems for processing nucleic acids.
- hypermethylated sequencing libraries may improve the specificity, the sensitivity, and/or the efficiency of assays for determining the presence and/or sequence identity of a nucleic acid sequence.
- a hypermethylated sequencing library may comprise a plurality of nucleic acids and/or fragments thereof.
- a hypermethylated sequencing library may comprise a plurality of nucleic acid molecules (e.g., a population of nucleic acids and/or fragments thereof).
- the plurality of nucleic acid molecules may comprise all or a portion of a first plurality of nucleic acid molecules, e.g., wherein the first plurality of nucleic acid molecules comprises one or more nucleic acid molecules that comprise a methylated nucleic acid residue and one or more nucleic acid molecules that does not comprise a methylated nucleic acid residue.
- a methylated nucleic acid may comprise one or more methylated nucleic acid residues.
- a methylated nucleic acid may comprise one or more methylated cytosines (e.g., one or more 5-methylcytosines (5mC) and/or one or more 5-hydroxymethylcytosines (5hmC)).
- a plurality of nucleic acid molecules may be hypermethylated and enriched by using a binder, e.g., as described herein, to form a hypermethylated sequencing library which can be used as a background as opposed to a whole-genome background for use in analysis of cfDNA.
- DNA may be hypermethylated before use of a binder to create a sequencing library with a background.
- the background sequencing library may comprise a set of background genomic regions that are enriched by the binder.
- the present disclosure provides methods and technologies for determining the sequence of nucleotide bases in one or more polynucleotides.
- the polynucleotides may be, for example, nucleic acid molecules such as deoxyribonucleic acid (DNA) or ribonucleic acid (RNA), including variants or derivatives thereof (e.g., single stranded DNA). Sequencing may be performed by various systems currently available, such as, without limitation, a sequencing system by Illumina®, Pacific Biosciences (PacBio®), Oxford Nanopore®, or Life Technologies (Ion Torrent®). Further, any sequencing methods that provide fragment length such as paired-end sequencing may be utilized.
- sequencing may be performed using nucleic acid amplification, polymerase chain reaction (PCR) (e.g., digital PCR, quantitative PCR, or real time PCR), or isothermal amplification.
- PCR polymerase chain reaction
- Such systems may provide a plurality of raw genetic data corresponding to the genetic information of a subject (e.g., human), as generated by the systems from a sample provided by the subject.
- sequencing reads also “reads” herein).
- a read may include a string of nucleic acid bases corresponding to a sequence of a nucleic acid molecule that has been sequenced.
- systems and methods provided herein may be used with proteomic information.
- the sequencing reads are obtained via a next-generation sequencing method or a next-next-generation sequencing method.
- the sequencing methods comprise cfMeDIP sequencing, e.g., comprising operations or systems as described by Shen et al., (“Sensitive tumor detection and classification using plasma cell- free DNA methyl omes,” (2016) Nature), which is incorporated herein in its entirety.
- sequencing can be performed using methyl-CpG-binding domain sequencing (MBD-seq).
- MBD-seq can comprise capture (e.g., via a binder, such as an antibody specific to a species of methylated nucleotide) of double-stranded, methylated DNA fragments for sequencing of methylation-enriched DNA fragment libraries.
- the sequencing methods comprise CAncer Personalized Profiling by deep Sequencing (CAPP-Seq), which can be a next-generation sequencing based method used to quantify circulating DNA in cancer (ctDNA).
- the sequencing methods comprise chromatin immunoprecipitation sequencing (ChlP-Seq). This method may be generalized for any cancer type that may recurrent mutations and may detect one molecule of mutant DNA in 10,000 molecules of healthy DNA.
- the sequencing comprises bisulfite sequencing. Alternatively, in some embodiments, the sequencing does not comprise bisulfite sequencing.
- Sequencing may comprise targeted sequencing.
- the sequencing reactions may comprise capture probes that are specific to regions of interest.
- the use of targeted sequencing may increase the amount of reads that are specific to regions that are informative (e.g., related to a DMR, or usable for distinguishing healthy subject vs a subject suffering from a disease).
- the capture probes may comprise one or more probes that are complementary or homologous to regions that comprises (i) one or more sites that are amenable to methylation enrichment and (ii) substantially no methylation in a healthy control.
- the capture probes may comprise one or more probes that are complementary or homologous to regions that have a known methylation state.
- a target panel that enriches for regions comprising (i) one or more sites that are amenable to methylation enrichment and (ii) substantially no methylation in a healthy control may allow for increasing the sequencing depth and signal for areas that can be used with differential methylation analysis without restricting the sequences to a specific disease or a specific type of cancer.
- the sequences generated from the targeted sequence may be pan-cancer or cancer type agnostic.
- the target sequencing may not comprise generating a custom panel for a specific subject or cancer.
- the use of panels that are not custom to a subject allows for an improved consistency in the sequencing and wet-lab protocols while still allowing for downstream in silico analysis to be personalized or customized to a subject. In some cases the use of panels that are not customized to a subject (e.g., universal panel of genomic region) can reduce the need for high sequencing depth regardless of the size of the panels.
- Sequencing can comprise analysis of the results of sequencing methods.
- sequencing analysis can comprise using Model-based Analysis for ChlP-Seq (MACS) software.
- MACS Model-based Analysis for ChlP-Seq
- the MACS algorithm captures the influence of genome complexity to evaluate the significance of enriched regions.
- Sequencing analysis can comprise identifying broad peaks (broad peak calling) or identifying narrow peaks (narrow peak calling).
- hypermethylated regions can be identified using narrow peak calling. Peak annotations that note regions of interest can be produced by the MACS algorithm by determining signals that differ significantly between two samples (e.g., between a sample and the background region).
- hypermethylated regions can be identified using broad peak calling.
- both narrow peak calling and broad peak calling may identify the same hypermethylated regions.
- narrow peak calling and broad peak calling may identify different hypermethylated regions. Additional processing of peak annotations can merge regions of interest across multiple samples to result in higher resolution and more accurate results. More accurate results can comprise the inclusion of regions that have very few reads in samples, but which can be leveraged to differentiate between healthy and disease samples.
- sequencing analysis can be illustrated using a gene heatmap. Alternatively or in addition to, sequencing analysis can be illustrated using a uniform manifold approximation and projection (UMAP) plot.
- UMAP uniform manifold approximation and projection
- a sample or portion thereof may be subjected to library preparation before sequencing.
- the samples are ligated to nucleic acid adapters and digested using enzymes.
- sequencing comprises modification of a nucleic acid molecule or fragment thereof, for example, by ligating a barcode, a unique molecular identifier (UMI), or another tag to the nucleic acid molecule or fragment thereof. Ligating a barcode, UMI, or tag to one end of a nucleic acid molecule or fragment thereof may facilitate analysis of the nucleic acid molecule or fragment thereof following sequencing.
- a barcode is a unique barcode (e.g., a UMI).
- a barcode can be nonunique, and barcode sequences may be used in connection with endogenous sequence information such as the start and stop sequences of a target nucleic acid (e.g., the target nucleic acid can be flanked by the barcode and the barcode sequences, in connection with the sequences at the beginning and end of the target nucleic acid, creates a uniquely tagged molecule).
- a barcode, UMI, or tag may be a known sequence used to associate a polynucleotide or fragment thereof with an input or target nucleic acid molecule or fragment thereof.
- a barcode, UMI, or tag may comprise natural nucleotides or non-natural (e.g., modified) nucleotides (e.g., as described herein).
- a barcode sequence may be contained within an adapter sequence such that the barcode sequence may be contained within a sequencing read.
- a barcode sequence may comprise at least 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, or more nucleotides in length. In some cases, a barcode sequence may be of sufficient length and may be sufficiently different from another barcode sequence to allow the identification of a sample based on a barcode sequence with which it can be associated.
- a barcode sequence, or a combination of barcode sequences may be used to tag and subsequently identify an “original” nucleic acid molecule or fragment thereof (e.g., a nucleic acid molecule or fragment thereof present in a sample from a subject).
- a barcode sequence, or a combination of barcode sequences can be used in conjunction with endogenous sequence information to identify an original nucleic acid molecule or fragment thereof.
- a barcode sequence, or a combination of barcode sequences may be used with endogenous sequences adjacent to a barcode, UMI, or tag (e.g., the beginning and end of the endogenous sequences).
- the prepared libraries may be combined with filler nucleic acids (e.g., filler DNAs) to minimize the effect of low abundance ctDNA in the prepared libraries and generate mixed samples.
- filler nucleic acids e.g., filler DNAs
- the amount of ctDNA can be low and may not be easily and accurately measured and quantified.
- the mixed samples are brought to at least about 50ng, 80ng, lOOng, 120ng, 150ng, or 200ng and are subjected to further enrichment.
- Processing a nucleic acid molecule or fragment thereof may comprise performing nucleic acid amplification.
- any type of nucleic acid amplification reaction may be used to amplify a target nucleic acid molecule or fragment thereof and generate an amplified product.
- nucleic acid amplification methods include reverse transcription, primer extension, polymerase chain reaction (PCR), ligase chain reaction, asymmetric amplification, rolling circle amplification, and multiple displacement amplification (MDA).
- PCR include, but are not limited to, quantitative PCR, real-time PCR, digital PCR, emulsion PCR, hot start PCR, multiplex PCR, asymmetric PCR, nested PCR, and assembly PCR.
- Nucleic acid amplification may involve one or more reagents such as one or more primers, probes, polymerases, buffers, enzymes, and deoxyribonucleotides. Nucleic acid amplification may be isothermal or may comprise thermal cycling, and/or with the length of the endogenous sequence.
- a binder may be used to deplete a population of nucleic acid molecules (e.g., a plurality of nucleic acid molecules derived from a biological sample). In some cases, a binder can be used to deplete a plurality of nucleic acid molecules of one or more nucleic acid molecules having a methylation level at or above a threshold methylation level (e.g., by binding to one or more methylated nucleotides of the one or more nucleic acid molecules). A binder may be used to enrich a population of nucleic acid molecules (e.g., a plurality of nucleic acids derived from a biological sample). In some cases, a binder can exhibit a reduced level of non-specific binding to non-methylated nucleotides or non-m ethylated genomic regions or sheared genomic nucleic acid molecule.
- a binder can be specific to one or more methylated nucleotide species (e.g., 5-methylcytosine (5mC), 5-hydroxymethylcytosine (5hmC), 4-methylcytosine (4mC), or 6-methyladenine (6mA)).
- a binder can be selected from the group consisting of an anti-5-methylcytosine antibody or a derivative thereof, an anti-5- carboxylcytosine antibody or a derivative thereof, an anti-5-formylcytosine antibody or a derivative thereof, an anti-5-hydroxymethylcytosine antibody or a derivative thereof, an anti- 3 -methylcytosine antibody or a derivative thereof, and any combinations thereof.
- the binder can be an anti-5-methylcytosine antibody or a derivative thereof.
- the binder can be a protein comprising a Methyl-CpG-binding domain.
- One such protein may be MBD2 protein.
- MBD Methyl-CpG-binding domain
- MBD Metal-CpG-binding domain
- MBD refers to certain domains of proteins and enzymes that can be approximately 70 residues long and binds to DNA that contains one or more symmetrically methylated CpGs.
- MBD of MeCP2, MBD1, MBD2, MBD4 and BAZ2 mediates binding to DNA, and in cases of MeCP2, MBD1 and MBD2, preferentially to methylated CpG.
- Human proteins MECP2, MBD1, MBD2, MBD3, and MBD4 comprise a family of nuclear proteins related by the presence in each of a methyl-CpG-binding domain (MBD). Each of these proteins, with the exception of MBD3, can be capable of binding specifically to methylated DNA.
- the binder can be an antibody and capturing cell-free methylated DNA comprises immunoprecipitating the cell-free methylated DNA using the antibody.
- immunoprecipitation refers a technique of precipitating an antigen (such as polypeptides and nucleotides) out of solution using an antibody that specifically binds to that particular antigen. This process may be used to isolate and concentrate a particular protein or DNA from a sample and requires that the antibody be coupled to a solid substrate at some point in the procedure.
- the solid substrate includes for example beads, such as magnetic beads. Other types of beads and solid substrates may be used.
- a 5-mC antibody (e.g., wherein the 5-mC antibody specifically binds to 5-methylcytosine) may be used as a binder.
- the immunoprecipitation procedure in some embodiments at least 0.05 pg of the antibody can be added to the sample; Alternatively, at least 0.16 pg of the antibody can be added to the sample.
- 0.05 pg to 0.80 pg, 0.16 pg to 0.80 pg, 0.40 pg to 0.80 pg, 0.16 pg to 0.40 pg, 0.10 pg to 0.80 pg, 0.20 pg to 0.60 pg, 0.30 pg to 0.50 pg, or 0.40 pg to 0.50 pg of the antibody can be used.
- the method described herein further comprises adding a second amount of control DNA to the sample.
- a methylation profile can comprise analysis (e.g., comprising sequencing) of a plurality of nucleic acids (e.g., a plurality of nucleic acid molecules of a depleted sequencing library, as described herein).
- a methylation profile can comprise detection of methylated nucleotides and/or quantification of methylated nucleotide counts.
- a methylation profile can comprise determination of a methylated signal, e.g., in a population of nucleic acids of a depleted sequencing library, as described herein.
- a methylation profile can be compared to a genome-wide background profile.
- a methylation profile can be compared to a background profile created using hypermethylated cfDNA.
- a methylation profile can be compared to a background profile comprising a plurality of regions, wherein the plurality of regions have been identified as regions of a genome that comprises (i) one or more sites that are amenable to methylation enrichment, (ii) methylation at below a threshold in a healthy control, or combination of (i) and (ii).
- the methylation profile may comprise a subset of possible DMRs or markers that are specific to subject’s cancer or tumor.
- the methylation profile may include a subset of regions that are determined to be specific to subject cancer.
- the methylation profile may comprise a smaller subset to allow for efficient data analysis. For example, by restricting the methylation profile to a smaller subset, the profile may maintain accuracy of monitoring a subject’s cancer, while reducing the amount of data to be processed. In some cases, by restricting the methylation profile to a universal panel of genomic regions disclosed herein, [0266] In various embodiments, generation of a methylation profile does not comprise individual analysis of the methylation state of each DMR or interest.
- a profile may comprise comparing an aggregate signal of a subset of DMRs or markers in a test sample compared to a control sample, or set of control samples.
- methylation profile may comprise DMRs that are specific to subject’s cancer or tumor.
- the generation may comprises identifying the aggregate signal from the subject-specific DMRs and comparing this aggregate signal against a reference or control signal.
- the present disclosure provides methods and systems for producing a mutation profile of a subject that has a disease and/or condition or can be suspected of having such disease and/or condition, wherein the methylation profile may be used to determine whether the subject has the disease and/or condition or can be at risk of having the disease and/or condition.
- the samples disclosed herein can be subjected to library preparation and next generation deep sequencing, for example to a depth of 1 million (M) to 60 M single reads, 10 M to 60 M single reads, 10 M to 100 M single reads, 40 M to 60 M single reads, 40 M to 100 M single reads, 60 M to 100 M single reads, 60 M to 200 M single reads, 1 M to 10 M single reads, 1 M to 40 M single reads, 1 M single reads to 100 M single reads, 1 M single reads to 200 M single reads, at least 1 M single reads, at least 10 M single reads, at least 40 M single reads, at least 60 M single reads, at least 100 M single reads, or at least 200 M single reads.
- M 1 million
- sequencing can be performed at low sequencing depth (e.g., 10 M single reads, 20 M single reads, 30 M single reads, 40 M single reads, from 1 M single reads to 10 M single reads, from 10 M single reads to 20 M single reads, from 20 M single reads to 30 M single reads, from 30 M single reads to 40 M single reads, at most 10 M single reads, at most 20 M single reads, at most 30 M single reads, or at most 40 M single reads).
- a sample disclosed herein can be subjected to 1 sequencing at a depth of 0.1X to 100X, 0.1X to 60X, O.IX to 40X, 0.1X to 30X, O.
- a plurality of sequencing reads can be generated and analyzed. In some embodiments, deep sequencing may be configured to maximize identifying genomic mutations associated with the disease and/or condition.
- the relative measure of ctDNA abundance can be calculated from the mean mutant allele fractions (MAFs).
- the mean MAF of mutations identified a subject and comprised in his/her mutation profile ranges from at least about 0.01% to at least about 10%.
- the MAF of a ctDNA fraction of a sample can be about at least 0.01%, 0.02%, 0.03%, 0.04%, 0.05%, 0.06%, 0.07%, 0.08%, 0.09%, 0.1%, 0.15%, 0.2%, 0.5%, 1%, 1.5%, 2%, 2.5%, 3%, 3.5%, 4%, 4.5%, 5%, 5.5%, 6%, 6.5%, 7%, 7.5%, 8%, 8.5%, 9%, 9.5%, 10%, or any percentage in between.
- the MAF of a ctDNA fraction of a sample can be about at most 0.01%, 0.02%, 0.03%, 0.04%, 0.05%, 0.06%, 0.07%, 0.08%, 0.09%, 0.1%, 0.15%, 0.2%, 0.5%, 1%, 1.5%, 2%, 2.5%, 3%, 3.5%, 4%, 4.5%, 5%, 5.5%, 6%, 6.5%, 7%, 7.5%, 8%, 8.5%, 9%, 9.5%, 10%, or any percentage in between.
- a generated mutation profile of a subject can be generated from sequencing results.
- the mutation profile comprises genetic polymorphisms, such as missense variant, a nonsense variant, a deletion variant, an insertion variant, a duplication variant, an inversion variant, a frameshift variant, or a repeat expansion variant.
- the mutation profile may comprise mutation variant derived from a fraction of cell-free nucleic acid molecules of a specific size range.
- the present disclosure provides methods, systems, and kits for producing a mutation profile of a subject that has a disease and/or condition or can be suspected of having such disease and/or condition, wherein the mutation profile may be used to determine whether the subject has the disease and/or condition or can be at risk of having the disease and/or condition.
- Producing a genomic mutation profile can comprise subjecting a plurality of nucleic acid molecules to library preparation and next generation deep sequencing (e.g., cfMeDIP-seq).
- a plurality of sequencing reads can be generated and analyzed, and, in some cases, deep sequencing may be configured to maximize identifying genomic mutations associated with the disease and/or condition.
- a panel of canonical cancer driver genes may be included in a selector for sequencing results analysis.
- the relative measure of ctDNA abundance can be calculated from the mean mutant allele fractions (MAFs).
- the mean MAF of mutations identified a subject and comprised in his/her mutation profile ranges from at least about 0.01% to at least about 10%.
- the ctDNA fraction of a sample disclosed herein can be about at least 0.01%, 0.02%, 0.03%, 0.04%, 0.05%, 0.06%, 0.07%, 0.08%, 0.09%, 0.1%, 0.15%, 0.2%, 0.5%, 1%, 1.5%, 2%, 2.5%, 3%, 3.5%, 4%, 4.5%, 5%, 5.5%, 6%, 6.5%, 7%, 7.5%, 8%, 8.5%, 9%, 9.5%, 10%, or any percentage in between.
- the generated mutation profile of a subject does not include mutation variants derived from cell-free nucleic acid molecules derived from a biological sample.
- the mutation profile comprises genetic polymorphisms, such as missense variant, a nonsense variant, a deletion variant, an insertion variant, a duplication variant, an inversion variant, a frameshift variant, or a repeat expansion variant.
- the mutation profile may comprise mutation variant derived from a fraction of cell-free nucleic acid molecules of a specific size range.
- the length of ctDNA fragments can be shorter than cell-free nucleic acid molecules derived from a healthy subject. In some embodiments, the length of ctDNA comprising at least one mutation can be shorter than the length of cell free nucleic acid molecule containing a corresponding reference allele.
- the sequencing does not utilize bisulfite sequencing because it causes degradation of ctDNA fragments and prevents the preservation of the length distribution of ctDNAs.
- the fragment length of a plurality of nucleic acids of the present disclosure can be from 1 to about 800 base pairs (bp), from about 50 bp to about 800 bp, from about 100 bp to about 200 bp, from about 120 bp to about 150 bp, from about 60 to about 500 bp, from about 80 to about 300 bp, from 90 to about 250 bp, from 80 to 170 bp, or from about 100 to about 150 bp.
- the fragment length of a plurality of nucleic acids of the present disclosure can be at least 800 base pairs (bp), at least 700 base pairs, at least 600 base pairs, at least 500 base pairs, at least 400 base pairs, at least 300 base pairs, at least 200 base pairs, at least 150 base pairs, at least 100 base pairs, or at least 50 base pairs.
- the fragment length of a plurality of nucleic acids of the present disclosure can be at most 800 base pairs (bp), at most 700 base pairs, at most 600 base pairs, at most 500 base pairs, at most 400 base pairs, at most 300 base pairs, at most 200 base pairs, at most 150 base pairs, at most 100 base pairs, or at most 50 base pairs.
- the present disclosure provides an enrichment of the cell free nucleic acid samples based on selecting cell free molecules of a certain size.
- the multimodal analysis comprises utilizing the mutation profile described herein and the fragment length profile by selectively including a plurality of nucleic acid molecules in the mutation profile based on their fragment length. In some embodiments, the multimodal analysis comprises utilizing the methylation profile described herein and the fragment length profile by selectively including a plurality of nucleic acid molecules in the methylation profile based on their fragment length. In some embodiments, the multimodal analysis comprises utilizing the mutation profile, methylation profile, and the fragment length profile together by selectively including a plurality of nucleic acid molecules in the mutation profile based on their fragment length and by selectively including a plurality of nucleic acid molecules in the methylation profile based on their fragment length respectively.
- the methods herein may be used to monitoring the efficacy of a therapeutic regimen for cancer.
- the methods herein may be used to detect minimal residual disease.
- the methods may be used to detect a cancer in an individual.
- the individual may be provided a therapeutic regimen and throughout the therapeutic regimen, the individual may be monitored to determine that the treatment may be working.
- the individual may be identified as free of cancer.
- the individual may be monitored to identify the presence of the recurrence of the cancer.
- the methods may use subject-specific DMRs or markers to detect and monitor the progression or regression of the subject’s condition.
- the subject may be identified as having a cancer.
- the cancer may be analyzed such that a subject-specific (or tumor specific) set of DMRs may be identified.
- This set of the DMRs may be used as markers for monitoring the progression and regression (e.g., observing a therapeutic efficacy or a recurrence of a cancer) of the cancer.
- a method of identifying a subject as having cancer comprising (a) obtaining a sample comprising cell-free nucleic acid from a subject described herein; (b) assaying the cell-free nucleic acid to identifying a methylation level of a subset of the cell-free nucleic acids, wherein the subset of cell-free nucleic acids correspond to a plurality of regions of a genome that comprises (i) one or more sites that are amenable to methylation enrichment and (ii) substantially no methylation in a healthy control; and/or (c) based at least on the methylation state of the subset of the cell-free nucleic acids, identifying the subject as having cancer.
- assaying comprises sequencing (e.g., cfMeDIP- seq described herein) with any one of the sequencers disclosed herein.
- the method based at least on (b), the method identifies a plurality of DMRs.
- the subject can be identified as having early-stage cancer. In some cases, the subject can be identified as having late-stage cancer.
- the methods provide a sensitivity of at least about 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or any percentage in between the numbers.
- the methods provide a specificity of at least about 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or any percentage in between the numbers.
- the methylation state can be compared to a control sample.
- the control sample may be a sample that can be derived from a subject with cancer, or other wise derived from a sample with a known characteristic (e.g., healthy, having a specific type of cancer).
- the control sample may be processed to identify a methylation state, for example by normalizing the methylation of regions of DMRs against regions that comprise (i) one or more sites that are amenable to methylation enrichment and (ii) substantially no methylation in a healthy control.
- the method described herein can be not limited to identifying or monitoring any specific type of cancer (e.g., pan cancer) of the subject.
- the plurality of regions disclosed herein can be not specific to any cancer, but rather the absence of cancer signal (e.g., methylation at below a threshold in a healthy control), the plurality of regions can be used to assess samples from any cancer types to identify DMRs within the plurality of regions described herein.
- the DMRs may be subject-specific, the DMRs may be related to a specific type of cancer.
- identification of the subject-specific DMRs may be agnostic to a specific type of cancer, and the method described in this disclosure may be robust such that any cancer may be monitored.
- the present disclosure provides methods and systems for determining a tissue origin of a tumor, comprising identifying a nucleotide sequence specific for a particular cancer (e.g., breast cancer, colon cancer, prostate cancer, HSNCC, or lung cancer) from which a fraction of cell-free nucleic acid molecules.
- a particular cancer e.g., breast cancer, colon cancer, prostate cancer, HSNCC, or lung cancer
- the fraction of the cell-free nucleic acid molecules can be derived from ctDNA.
- the methods provide a sensitivity of at least about 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or any percentage in between the numbers.
- the methods provide a specificity of at least about 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or any percentage in between the numbers.
- the present disclosure provides methods and systems for determining whether a subject has or can be at risk of having a disease, wherein the methods and systems comprise subjecting a plurality of nucleic acid molecules derived from a cell-free nucleic acid sample obtained from the subject to sequencing to generate at least one profile of (i) a methylation profile, (ii) a mutation profile, and (iii) a fragment length profile; and processing the at least one profile to determine whether the subject can or can be at risk of the disease at a sensitivity of at least 80% or at a specificity of at least about 90%, wherein the cell-free nucleic acid sample comprises less than 30 ng/ml of the plurality of nucleic acid molecules.
- the sensitivity can be at least about 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or any percentage in between the numbers.
- the specificity can be at least about 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or any percentage in between the numbers.
- the methods and systems can comprise subjecting a plurality of nucleic acid molecules derived from a cell-free nucleic acid sample obtained from the subject to sequencing to generate at least two profiles of (i) a methylation profile, (ii) a mutation profile, and (iii) a fragment length profile.
- the methods provide a sensitivity of at least about 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or any percentage in between the numbers.
- the sensitivity when using two profiles can be increased by at least about 0.5%, 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, or percentage in between any of the numbers compared to the sensitivity when using one profile.
- the sensitivity when using three profiles can be increased by at least about 0.5%, 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, or percentage in between any of the numbers compared to the sensitivity when using two profiles.
- the methods provide a specificity of at least about 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or any percentage in between the numbers.
- the specificity when using two profiles can be increased by at least about 0.5%, 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, or percentage in between any of the numbers compared to the specificity when using one profile.
- the specificity when using three profiles can be increased by at least about 0.5%, 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, or percentage in between any of the numbers compared to the specificity when using two profiles.
- the present disclosure provides methods and systems for processing a cell-free nucleic acid sample of a subject to determine whether the subject can or can be at risk of having a disease
- the methods and systems comprise providing the cell-free nucleic acid sample comprising a plurality of nucleic acid molecules; subjecting the plurality of nucleic acid molecules or derivatives thereof to sequencing to generate a plurality of sequencing reads; computer processing the plurality of sequencing reads to identify, for the plurality of nucleic acid molecules, (i) a methylation profile, (ii) a mutation profile, and (iii) a fragment length profile; and using at least the methylation profile, the mutation profile and the fragment length profile to determine whether the subject can or can by at risk of having the disease.
- the methods provide a sensitivity of at least about 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or any percentage in between the numbers.
- the methods provide a specificity of at least about 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or any percentage in between the numbers.
- the present disclosure describes methods and systems for providing a prognosis to a subject after receiving a treatment for a disease and/or condition.
- the treatment comprises a surgical removal of a tumor, a chemotherapy designed for a specific type of cancer, a radio therapy, or an immune therapy (e.g., TCR, CAR, etc.).
- the methods or systems comprise subjecting a plurality of nucleic acid molecules derived from a cell-free nucleic acid sample obtained from the subject to sequencing to generate at least one profile of (i) a methylation profile, (ii) a mutation profile, and (iii) a fragment length profile; and monitoring or detecting minimal residual disease (MRD) based at least based on the at least one profile.
- MRD minimal residual disease
- the method further comprises adding a second amount of control DNA to the sample for confirming the immunoprecipitation reaction.
- the method further comprises adding a second amount of control DNA to the sample for confirming the capture of cell-free methylated DNA.
- identifying the presence of DNA from cancer cells further includes identifying the cancer cell tissue of origin.
- tumor tissue sampling may be challenging or carry significant risks, in which case diagnosing and/or subtyping the cancer without the need for tumor tissue sampling may be achieved.
- lung tumor tissue sampling may require invasive procedures such as mediastinoscopy, thoracotomy, or percutaneous needle biopsy; these procedures may result in a need for hospitalization, chest tube, mechanical ventilation, antibiotics, or other medical interventions.
- Some individuals may not undergo the invasive procedures needed for tumor tissue sampling either because of medical comorbidities or due to preference.
- the actual procedure for tumor tissue procurement may depend on the suspected cancer subtype.
- cancer subtype may evolve over time within the same individual; serial assessment with invasive tumor tissue sampling procedures can be often impractical and not well tolerated by patients.
- non-invasive cancer subtyping via blood test may have many advantageous applications in the practice of clinical oncology.
- identifying the cancer cell tissue of origin further includes identifying a cancer subtype.
- the cancer subtype differentiates the cancer based on stage (e.g., early stage lung cancer treated with surgery vs late stage lung cancer treated with chemotherapy), histology (e.g., small cell carcinoma vs adenocarcinoma vs squamous cell carcinoma in lung cancer), gene expression pattern or transcription factor activity (e.g., ER status in breast cancer), copy number aberrations (e.g., HER2 status in breast cancer), specific rearrangements (e.g., FLT3 in AML), specific gene point mutational status (e.g., IDH gene point mutations), and DNA methylation patterns (e.g., MGMT gene promoter methylation in brain cancer).
- stage e.g., early stage lung cancer treated with surgery vs late stage lung cancer treated with chemotherapy
- histology e.g., small cell carcinoma vs adenocarcinoma vs squamous cell carcinoma
- comparisons can be carried out genome wide.
- the comparisons can be restricted from genome-wide to specific regulatory regions, such as, but not limited to, long interspersed nuclear elements (LINEs), short interspersed nuclear elements (SINEs), long terminal repeats (LTRs), FANTOM5 enhancers, CpG Islands, CpG shores, CpG Shelves, or any combination of the foregoing.
- the comparisons can be restricted from genome-wide to the background defined by enrichment of hypermethylated regions.
- comparisons can be restricted to the background defined by methylation levels of a plurality of regions, wherein the plurality of regions have been identified as regions of a genome that comprises (i) one or more sites that are amenable to methylation enrichment, (ii) methylation at below a threshold in a healthy control), or combination of (i) and (ii).
- the methods and/or systems disclosed herein can be used to for monitoring, selecting therapy, assessing therapy response, and/or prognostic risk stratification.
- the methods and/or systems can before multi-cancer early detection (MCED).
- the methods and/or systems can be use when a tissue can be available (e.g., tissue informed).
- methods and/or systems disclosed herein can be used when a plasma sample or a blood sample can be available.
- methods and/or systems disclosed herein can be used when both tissue and non-tissue samples are available.
- the methods and/or systems can tissue informed approach.
- the methods and/or systems can tissue naive approach.
- the methods and/or systems disclosed herein can be used to generate various outputs indicative of cancer.
- the various outputs can include but not limited to screening for the presence and/or absence of cancer, detecting minimal residual disease (MRD), and/or a therapy response.
- MRD minimal residual disease
- the baseline-informed approach, the baseline-agnostic approach, and/or the joint model can generate a methylation score indicative of a presence or absence of a cancer.
- a methylation score compared to a threshold score.
- a methylation score higher than a threshold score can be indicative of a presence of a cancer.
- a methylation score lower than a threshold score can be indicative of an absence of a cancer.
- the baseline-informed approach, the baseline-agnostic approach, and/or the joint model can generate a methylation score indicative of MRD.
- a methylation score compared to a threshold score can be indicative of MRD.
- the presence of MRD can further indicate possible recurrence of a cancer.
- a methylation score lower than a threshold score can be indicative of an absence of MRD.
- the absence of MRD can further indicate possible regression of a cancer.
- the threshold score can be generated with a plurality of control samples obtained from a plurality of control subjects.
- the control subjects can be subjects without cancer.
- the plurality of control samples can be subjected to the baseline/tissue-informed approach and/or baseline/tissue-agnostic approach disclosed herein to compute a plurality of methylation scores.
- the plurality of methylation scores can be analyzed to generate a threshold score. For example, a threshold score can be determined to be where at least 5% of the control subjects had a score defined to be false positives, ensuring that at least 95% of the control subjects fall below the threshold score.
- the threshold score can be set where at least 1%, at least 2%, at least 3%, at least 4%, at least 5%, at least 6%, at least 7%, at least 8%, at least 9%, at least 10% or more of the control subjects had a score defined to be false positives. In some cases, the threshold score can be set where at most 1%, at most 2%, at most 3%, at most 4%, at most 5%, at most 6%, at most 7%, at most 8%, at most 9%, at most 10% or more of the control subjects had a score defined to be false positives.
- the methods and systems disclosed herein can generate an output indicative of a therapy response.
- a methylation score can be generated with a baseline sample, and another methylation score can be generated with a sample obtained subsequent to obtaining the baseline sample.
- the methylation score and another methylation score can be compared to predict a therapy response.
- a methylation score higher than another methylation score can indicate that a treatment may not be effective.
- An increase in methylation score compared to a methylation score obtained at a later time point may indicate recurrence and/or resistance to a therapy.
- a methylation score lower than another methylation score can indicate that a treatment may not be effective.
- the methods and systems disclosed herein may comprise algorithms or uses thereof.
- the one or more algorithms may be used to classify one or more samples from one or more subjects.
- the one or more algorithms may be applied to data from one or more samples.
- the data may comprise biomarker expression data.
- the algorithms may analyze a set of subjectspecific DMRs.
- the algorithms may filter data that does not pertain to the set of subject-
- Assigning the classification to the sample may comprise applying an algorithm to the methylation profile, mutation profile, and fragment length profile.
- at least one profile can be inputted to a data analysis system comprising a trained algorithm for classifying the sample as obtained from a subject which has a disease or minor injuries.
- a data analysis system may be a trained algorithm.
- the algorithm may comprise a linear classifier.
- the linear classifier comprises one or more of linear discriminant analysis, Fisher's linear discriminant, Naive Bayes classifier, Logistic regression, Perceptron, Support vector machine, or a combination thereof.
- the linear classifier may be a support vector machine (SVM) algorithm.
- the algorithm may comprise a two-way classifier.
- the two-way classifier may comprise one or more decision tree, random forest, Bayesian network, support vector machine, neural network, or logistic regression algorithms.
- the method can comprise determining the presence of differentially methylated regions (DMRs) in a plurality of regions in a set of biological samples.
- the set of biological samples can be non-tissue samples.
- the set of biological samples can be tissues samples.
- the biological samples can comprise nucleic acid molecules from non-tissue samples or tissue samples.
- nucleic acid molecules e.g., cell-free nucleic acid
- the characteristics comprises an age or sex of a subject. In some cases, the characteristics comprises one or more comorbidities. In some cases, a subset of the set of biological samples are derived from subjects having cancer. In some cases, a subset of the set of biological samples are derived from subjects that do not have cancer.
- the present disclosure describes methods and systems for providing a prognosis to a subject after receiving a treatment for a disease and/or condition.
- the treatment comprises a surgical removal of a tumor, a chemotherapy designed for a specific type of cancer, a radio therapy, or an immune therapy (e.g., TCR, CAR, etc.).
- the methods or systems comprise subjecting a plurality of nucleic acid molecules derived from a cell-free nucleic acid sample obtained from the subject to sequencing to generate at least one profile of (i) a methylation profile, (ii) a mutation profile, and (iii) a fragment length profile; and monitoring or detecting minimal residual disease (MRD) based on the at least one profile.
- MRD minimal residual disease
- FIG. 2 shows a computer system 201 that can be programmed or otherwise configured to identify, analyze, and compare sequences of hypermethylated cell- free nucleic acids (e.g., cfDNA).
- the computer system 201 can be an electronic device of a user or a computer system that can be remotely located with respect to the electronic device.
- the electronic device can be a mobile electronic device.
- the computer system 201 includes a central processing unit (CPU, also “processor” and “computer processor” herein) 205, which can be a single core or multi core processor, or a plurality of processors for parallel processing.
- CPU central processing unit
- the computer system 201 also includes memory or memory location 210 (e.g., random-access memory, read-only memory, flash memory), electronic storage unit 215 (e.g., hard disk), communication interface 220 (e.g., network adapter) for communicating with one or more other systems, and peripheral devices 225, such as cache, other memory, data storage and/or electronic display adapters.
- the memory 210, storage unit 215, interface 220 and peripheral devices 225 are in communication with the CPU 205 through a communication bus (solid lines), such as a motherboard.
- the storage unit 215 can be a data storage unit (or data repository) for storing data.
- the computer system 201 can be operatively coupled to a computer network (“network”) 230 with the aid of the communication interface 220.
- the network 230 can be the Internet, an internet and/or extranet, or an intranet and/or extranet that can be in communication with the Internet.
- the network 230 in some cases can be a telecommunication and/or data network.
- the network 230 can include one or more computer servers, which can enable distributed computing, such as cloud computing.
- the network 230 in some cases with the aid of the computer system 201, can implement a peer-to-peer network, which may enable devices coupled to the computer system 201 to behave as a client or a server.
- the CPU 205 can execute a sequence of machine-readable instructions, which can be embodied in a program or software.
- the instructions may be stored in a memory location, such as the memory 210.
- the instructions can be directed to the CPU 205, which can subsequently program or otherwise configure the CPU 205 to implement methods of the present disclosure. Examples of operations performed by the CPU 205 can include fetch, decode, execute, and writeback.
- the CPU 205 can be part of a circuit, such as an integrated circuit.
- a circuit such as an integrated circuit.
- One or more other components of the system 201 can be included in the circuit.
- the circuit can be an application specific Integra ted circuit (ASIC).
- ASIC application specific Integra ted circuit
- the storage unit 215 can store files, such as drivers, libraries, and saved programs.
- the storage unit 215 can store user data, e.g., user preferences and user programs.
- the computer system 201 in some cases can include one or more additional data storage units that are external to the computer system 201, such as located on a remote server that can be in communication with the computer system 201 through an intranet or the Internet.
- the computer system 201 can communicate with one or more remote computer systems through the network 230.
- the computer system 201 can communicate with a remote computer system of a user.
- remote computer systems include personal computers (e.g., portable PC), slate or tablet PC’s (e.g., Apple® iPad, Samsung® Galaxy Tab), telephones, Smart phones (e.g., Apple® iPhone, Android-enabled device, Blackberry®), or personal digital assistants.
- the user can access the computer system 201 via the network 230.
- Methods as described herein can be implemented by way of machine (e.g., computer processor) executable code stored on an electronic storage location of the computer system 201, such as, for example, on the memory 210 or electronic storage unit 215.
- the machine executable or machine-readable code can be provided in the form of software.
- the code can be executed by the processor 205.
- the code can be retrieved from the storage unit 215 and stored on the memory 210 for ready access by the processor 205.
- the electronic storage unit 215 can be precluded, and machine-executable instructions are stored on memory 210.
- the code can be pre-compiled and configured for use with a machine having a processer adapted to execute the code or can be compiled during runtime.
- the code can be supplied in a programming language that can be selected to enable the code to execute in a pre-compiled or as-compiled fashion.
- aspects of the systems and methods provided herein can be embodied in programming.
- Various aspects of the technology may be thought of as “products” or “articles of manufacture” typically in the form of machine (or processor) executable code and/or associated data that can be carried on or embodied in a type of machine readable medium.
- Machine-executable code can be stored on an electronic storage unit, such as memory (e.g., read-only memory, random-access memory, flash memory) or a hard disk.
- “Storage” type media can include any or all of the tangible memory of the computers, processors or the like, or associated modules thereof, such as various semiconductor memories, tape drives, disk drives and the like, which may provide non- transitory storage at any time for the software programming. All or portions of the software may at times be communicated through the Internet or various other telecommunication networks. Such communications, for example, may enable loading of the software from one computer or processor into another, for example, from a management server or host computer into the computer platform of an application server.
- another type of media that may bear the software elements includes optical, electrical, and electromagnetic waves, such as used across physical interfaces between local devices, through wired and optical landline networks and over various air-links.
- a machine readable medium such as computer-executable code
- a tangible storage medium such as computer-executable code
- Non-volatile storage media include, for example, optical or magnetic disks, such as any of the storage devices in any computer(s) or the like, such as may be used to implement the databases, etc. shown in the drawings.
- Volatile storage media include dynamic memory, such as main memory of such a computer platform.
- Tangible transmission media include coaxial cables; copper wire and fiber optics, including the wires that comprise a bus within a computer system.
- Carrier-wave transmission media may take the form of electric or electromagnetic signals, or acoustic or light waves such as those generated during radio frequency (RF) and infrared (IR) data communications.
- RF radio frequency
- IR infrared
- Common forms of computer-readable media therefore include for example: a floppy disk, a flexible disk, hard disk, magnetic tape, any other magnetic medium, a CD-ROM, DVD or DVD-ROM, any other optical medium, punch cards paper tape, any other physical storage medium with patterns of holes, a RAM, a ROM, a PROM and EPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier wave transporting data or instructions, cables or links transporting such a carrier wave, or any other medium from which a computer may read programming code and/or data.
- Many of these forms of computer readable media may be involved in carrying one or more sequences of one or more instructions to a processor for execution.
- the computer system 201 can include or be in communication with an electronic display 235 that comprises a user interface (UI) 240.
- UI user interface
- Examples of UFs include, without limitation, a graphical user interface (GUI) and web-based user interface.
- Methods and systems of the present disclosure can be implemented by way of one or more algorithms.
- An algorithm can be implemented by way of software upon execution by the central processing unit 205.
- the algorithm can, for example, identify, sequence, and analyze hypermethylated DNA regions.
- Example 1 Selection of blood quiet regions (proto-DMRs)
- cfDNA from a collection of blood samples from presumed cancer free individuals was assessed for regions of low signal using the cfMeDIP assay. These regions were then intersected with regions that have high binding affinity as assessed by synthetically fully methylating DNA samples, as shown in FIG. 3. These regions consist of genomic sections (e.g., 300 base pairs (bp)) that have little or no signal in presumed cancer free individual’s cfDNA, while also having high binding affinity to a methylation specific binder in the presence of methylated DNA of the fully methylated synthetic samples.
- genomic sections e.g., 300 base pairs (bp)
- regions can be useful for contrasting signal in methylated regions in cfDNA samples from individuals who have been diagnosed with cancer versus those that are presumed cancer free.
- the low background signal in the cancer free population plus high binding affinity can be a powerful tool for finding biological relevant hyper methylated regions that can distinguish a cancer specific signature from non-cancer signatures in the methylome.
- each 300bp genomic region was assessed for the maximum raw counts (as measured by reads aligning within the 300bp region) associated with pull down events of the antibody of methylated single stranded DNA. If that maximum count across all non-cancer samples for a single 300bp region was lower than the threshold, it was included as a blood quiet region.
- DMR differentially methylated regions
- DMRs can be identified using Several methodologies. For example, a Wilcoxen rank sum test between the normalized counts of all non-cancer participants for an individual blood quiet region (e.g., 300bp) vs the normalized counts of all cancer participants across various tissue types and clinical stages at the same genomic region.
- DeSeq2 algorithm can be leveraged for identifying differentially methylated regions by comparing non-cancer and cancer participants counts from individual blood quiet regions.
- DMRs can either ranked by p-value from the statistical test applied or by the fold change of the normalized counts.
- the DMRs can be normalized based on the blood quiet regions.
- the method of normalizing counts based on the universe of quiet regions (Signal to Noise Ratio (SNR);
- FIG. 4B was shown to be powerful in removing non-cancer specific background signal present in the blood quiet regions.
- This methodology can be superior to removing biological variability in non-cancer controls when compared to other normalization techniques as assessed by measuring LOD in the 3 cell-line titration experiment described in Example 1.
- the SNR normalization method was compared to normalizing the mean counts from hyper methylated DMRs to the mean unique (deduplicated) methylated counts from high binding affinity regions as determined by the fully methylated synthetic molecules.
- Example 3 Selection of cancer samples for differentially methylated regions (DMRs) calculations
- a model was built using a cohort of samples that were characterized with the TS0500 assay.
- the TS0500 assay was used to estimate the ctDNA fraction of these samples and subsequently used to titrate the cfDNA into pooled normal cfDNA at varying concentrations.
- This model was able to produce a relationship between the methylated ctDNA quantification of the cfMeDIP assay and the determined ctDNA levels of the clinical samples.
- the model was bootstrapped using DMRs that were first selected by using all latestage samples (stage 3 and 4) from a cohort of lung samples. DMRs were identified as described in Example 2 (e.g., using a Wilcoxen rank sum test or a DeSeq2 algorithm).
- Developing a targeted panel a priori that can be used for all future cancer types can be a common challenge in the liquid biopsy space.
- regions of the human methylome that have little basal methylation in cancer-free individuals can be used. These “blood quiet” regions contain proto-hypermethylated DMRs that can be used to contrast cancer free signatures from ctDNA signatures in some future cancer types.
- the targeted panel can be designed with samples from cancer free individuals, it may not be designed for any specific cancer, but instead designed for the lack of cancer signature. When methylation signal can be significantly higher than the expected basal signal found present in these regions (hypermethylation), it can then be deduced that the signature can be likely to be cancer specific.
- a cohort of cancer free blood samples can be used to explore regions that have high binding affinity but exceptionally low methylation in presumed cancer free individuals.
- This set of quiet regions can then be used as proto-DMRs for potential hypermethylated regions in cancer, but in no way is limited to any specific type of cancer.
- Designing a panel to target these regions has multiple benefits.
- the first is that these regions may not have high methylation in cancer free individuals, which may be that even if the panel can be large, the amount of sequencing that may be consumed by these targets may be low (e.g., cfMeDIP pulls down single stranded DNA that can be methylated).
- the advantage of this is the panel can be quite large (e.g., 10s of thousands or larger) without taking much real estate on a flow cell and thus keeping sequencing depths requirements low. This contrasts technologies like bisulfite conversion and sequencing which requires a commensurate amount of sequencing depth as a function of the panel's size, since in this technique unmethylated DNA is sequenced too.
- the panel also includes “housekeeping” genes that are may have high basal expression in a cohort of cancer free individuals. These regions can be used for both normalization and for a positive control to ensure the chemistry is functioning optimally (in contrast, the quiet regions are a poor positive control since they are expected to have little signal in the absence of ctDNA).
- the binding specificity of the antibody has been demonstrated to be important to the ability for the assay to distinguish between ctDNA and background noise of non-specific pulldown.
- a set of 100 probes was added to the panel that target regions of the genome that may lack CpGs (e.g., the most commonly methylated motif of the human genome). By adding these probes, it is possible to measure the binding specificity of the reaction, since it is not expected that there may be a large amount of endogenous pull down from these regions. This can therefore serve as a measure of background noise as well as a general QC of the reaction.
- a set of probes that represent the continuum of CpG density of the human genome are also added to the panel. These probes can target regions that have consistent basal expression in cancer free individuals and that represent the full continuum of CpG densities. The presence of these probes allow for the estimate of the absolute quantification of methylated ctDNA in a sample by providing a known standard within the sample of methylation.
- a data set for house-keeping methylation was refined to lower the number of ROIs. 20 top HKM ROIs correlating with sequencing depth (after removing extremely low count ROIs) was selected. Specifically, HKMs that reflect expected counts (e.g., 1 to 20) at 100% methylation based on the CpG enrichment effect were identified. A 300 bp window for CpG counts 1 to 20 was selected. Then, the mean count across controls was calculated to determine the median count across windows A window that matches the median count was then randomly selected.
- Non-cancer noise DMR regions was also refined. About 16k moderately quiet regions (e.g., regions with max control count 7) was optimized using grid search for the signal to noise ration approach described in the prior Examples. Subsets of 2000 and 4000 ROIs were randomly generated and then LOD was evaluated.
- Endogenous Binding Specificity (EBS) ROIs were selected from hg38 zero-regions by identifying sizes > 500 bp, no overlap with repeats (using data from UCSC genome browser), no overlap with comment SNPs, and GC content 35-50%. The center was kept at 300 bp to avoid capturing fragments with CpGs close to the 0-Cpg region. 100 EBS ROIs were randomly selected from remaining 264 candidate regions. The Mean counts of 100 EBS ROIs showed high negative correction with binding specificity, suggesting it may be potentially a good quality control metric monitoring non-specific binding with the absence of spike-ins.
- a set of circulating tumor DNA are subjected to enrichment via a methylation specific binder.
- the panel is then used to enrich the ctDNA to the regions identified for the panel (e.g., blood quiet regions) to get blood quiet ctDNA using probes designed for the optimized panel and subjected to a sequencing reaction.
- Differentially methylated regions are identified by comparing cfDNA from noncancer participants to those with cancer at varying stages and tissues.
- the sequencing reads pertaining to the DMRs can be identified and counted to generate a DMR specific methylation level. Finally, the DMRs can then be normalized to all blood quiet regions (or a subset of the blood quiet regions) to obtain ctDNA specific methylation level. This methylation level can then be compared to other samples or compared against an expected value to determine that the subject that the sample is derived from has a disease.
- a different normalization scheme may be used.
- the total number of sequencing reads pertaining to the DMRs can be identified and counted. Additionally, the sequencing reads that are of length of less than 150 bp and comprise 2 or more CpG are counted.
- the DMRs can then be normalized by dividing the number of sequenced reads of that are of length of less than 150 bp and comprise 2 or more CpG by the total number of sequencing reads pertaining to the DMRs. This methylation level can then be compared to other samples or compared against an expected value to determine that the subject that the sample is derived from has a disease
- FIG. 6 shows data relating to using a targeted panel to the cancer methylome as opposed to a sequencing the whole methylome. Dilutions of the samples were performed and then subjected to methylation pull down, targeted panels, and sequencing. The targeted panel resulted in the technical level of detection of 0.04% and resulting in -800K total reads. The whole methylome resulted in a technical level of detection of 0.06%, and resulted in 140 M total reads.
- Subject-specific DMRs can be monitored to track the progression of a disease of a subject. Observing a subset of DMRs can allow for more efficient and more personalized monitoring of a subject. As opposed to an analysis of a large set of markers which may be irrelevant to a given subject, subject-specific markers can reduce the amount of data needing to be analyzed without sacrificing the accuracy of the outputs. Additionally, the signal to noise can be improved as the data processed is known to be specific to the subject, thereby reducing non-relevant signals.
- FIG. 5A shows a schematic of the method.
- An sample 501 is obtained from a subject’s tumor.
- the sample can be a tissue biopsy or a plasma sample that has been obtained from a subject subsequent to diagnosis and before treatment of a disease or cancer.
- a set of control samples (e.g., non-cancer) 505 are initially run to obtain a dataset representing a noncancer control.
- An analysis 510 to determine differential methylation states is run.
- Methylation states of the control samples are analyzed via cfMeDIP, or other suitable approaches.
- the subject derived sample is analyzed via cfMeDIP (or other suitable approaches) to identify methylation states and are compared against a non-cancer control data set to identify a set of differentially methylated regions (DMRs).
- the non-cancer control data set can comprise plasma samples taken for other individuals that do not have the cancer of interest.
- These DMRs 515 e.g., subject-specific DMRs), or subsets of the DMRs
- the same DMRs e.g., 515) can be tracked to monitor progression.
- a sample 550 is taken from the same subject at a later time.
- a blood draw or other non -invasive sample
- a cfMeDIP assay is run and sequencing reads and data relating to the methylated regions are obtained.
- the regions comprising DMRs that were identified are analyzed for methylation states and the presence of the subject-specific DMRs, and a report 560 can be generated regarding these particular DMRs.
- the treatment regimen can be updated to reflect the efficacy (or lack of efficacy).
- the samples are initially analyzed using a targeted sequencing or a panel, such as a subject agnostic probe panel 520.
- the panel can be one such as those described in Example 4. Similar to the panel described in Example 4, the panel is a set panel that is agnostic to a given tissue or subject. The panel is identified a priori based on blood quiet regions (e.g., regions that are amenable to methylation and show a methylation state that is below a threshold in a non-cancer/non-diseased control), or other areas of interest.
- the panel is a set panel that is specific to a large number of potential markers of which a subset of these is relevant to a given subject.
- This panel design allows for a standardized sequencing workflow for all subjects while still having an analysis platform that can be customized for a given subject.
- the panel 520 can be used in the sequencing and differential methylation analysis.
- the resulting subject-specific DMRs may then be those that are present in the subject agnostic probe panel (as opposed to the whole genome).
- the panel 520 can also be used during the sequencing. Because the subject-specific DMRs overlap with the regions of the panel, the subject-specific DMRs can still be analyzed.
- MRD molecular residual disease
- Biomarkers such as specific DMRs and methylation patterns, may stratify patients with cancer (e.g., head and neck cancer (EINC)) into those who are likely or unlikely to relapse after definite curativeintent treatment.
- EINC head and neck cancer
- This identified DMR signature was used as MRD detection, wherein 62 out of the 163 patients had MRD detected and 101 out of the 163 patients had no MRD detected.
- the identified DMR signature was characterized to demonstrate the biological relevance of the DMR signature and its possible use as a biomarker for cancer detection or cancer recurrence.
- TCGA Cancer Genome Atlas
- the signature DMRs were significantly enriched in differential HNC CpG sites in TCGA, CpG islands, shores and shelves, reflecting cancer hallmarks.
- the enrichment of DMRs were also found to be associated with PAX (paired box) group of transcription factors, specifically with PAX6, which are hypermethylated in HNC tumors.
- PAX paired box
- HNSC head and neck squamous cell carcinoma
- LUSC lung squamous cell carcinoma
- CEC cervical squamous cell carcinoma
- the method for identifying the proto-DMR, anti-DMR, DMR is detailed.
- the identified proto-DMR, anti-DMR, DMR can be used for detecting cancer as shown in Examples 8 and 9.
- Proto-DMRs were identified as regions exhibiting minimal methylation in non-cancer samples but strong methylation signal when fully methylated, as determined by high binding affinity of the cfMeDIP assay to an in-vitro methylated non-cancer sample. Proto-DMR selection was hypothesis-free with respect to their association with any specific cancer type; rather, regions were chosen solely based on their low signal in non-cancer controls and high enrichment in fully methylated samples. The proto-DMR panel was used in subsequent Examples.
- a fully methylated control sample was used to define these high methylation enrichment regions.
- In vitro methylation was carried on cfDNA from bulk non-cancer donor plasma using CpG methyltransferase enzyme ( M.SssI). Selection criteria included differential read depth in non-cancer vs. fully methylated controls, peak calling algorithms (e.g., MACS3), and statistical methods assessing methylation variability across populations.
- Default parameters for MACS3 were used, with the following key settings: —broad: Enables broad peak calling, -broad-cutoff 0.1 : Sets the threshold for broad peaks, -g hs: Specifies the human genome size, — qvalue 0.05: False discovery rate (FDR) threshold for peak calling. Any called peak was considered a high methylation enrichment region without further optimization.
- Plasma cfDNA from commercial biobanked non-cancer controls (10 ng input per sample) was used to identify regions with little to no detectable methylation signal.
- Proto- DMRs were derived from an independent cohort of non-cancer controls obtained from commercial biobanks.
- the proto-DMR panel was designed before receiving these study samples or training for DMRs in head and neck cancer, demonstrating its applicability across multiple cancer types.
- the distribution of read counts across these regions follows a Zero- Inflated Negative Binomial (ZINB) model, where many regions have no detectable signal, while others exhibit measurable but low levels of methylation, and a subset display robust methylation.
- ZINB Zero- Inflated Negative Binomial
- Zero-Inflation Negative Binomial (ZINB) vs. Negative Binomial (NB) Model Comparison was used. Each region was tested by fitting both a ZINB model (accounting for excess zeros) and a standard NB model (assuming no excess zeros). A Likelihood Ratio Test (LRT) was used to compare the likelihood of the two models. Regions where the ZINB model was significantly favored (p ⁇ 0.01) were classified as technical noise and selected as Proto-DMR candidates.
- ZINB Zero-Inflation Negative Binomial
- NB Negative Binomial
- bedtools intersect was used to overlap high methylation enrichment regions with low-signal regions in non-cancer controls.
- the resulting intersected regions were designated as Proto-DMRs, forming the basis for downstream DMR and Anti- DMR selection.
- Proto-DMRs can be compatible with other methods, such as methylation-specific PCR, digital droplet PCR (ddPCR), or other amplification-based enrichment techniques. Since Proto-DMRs exhibit low signal in non-cancer controls, the number of Proto-DMRs that can be enriched may not inherently be limited by sequencing depth, allowing for broad panel scalability constrained primarily by hybrid capture panel size.
- ddPCR digital droplet PCR
- Targeted quantification was performed using sequencing-based approaches (e.g., Next Generation Sequencing on an Illumina NovaSeq 6000 platform).
- the target sequencing depth for the Proto-DMR panel was approximately 50-60 million reads.
- a deduplication rate of approximately 10: 1 was observed, indicating that effective sequencing depth was closer to 5 million unique reads per sample.
- DMRs Differentially Methylated Regions
- Proto-DMRs can be the foundation for DMR and Anti-DMR selection, serving as candidate regions that demonstrate meaningful differential methylation in cancer vs. noncancer samples. All DMRs and Anti -DMRs can be subsets of Proto-DMRs, but not all Proto- DMRs become DMRs or Anti-DMRs. Anti-DMRs were selected by requiring stability (e.g., low methylation signal) in both non-cancer and cancer samples, ensuring they provided reliable normalization across sample types. DMRs were defined as regions exhibiting significant differential methylation in cancer samples relative to non-cancer controls. This selection strategy can ensure that targeted regions are pre-filtered for the absence of biological signal in non-cancer controls while maintaining high binding affinity, making them optimal candidates for detecting cancer-associated methylation changes. The methodology was applied to subsequent Examples.
- stability e.g., low methylation signal
- a region was considered stable if both the cancer cohort and the non-cancer cohort exhibited a preference for the ZINB model (p ⁇ 0.01), indicating that the observed signal was likely due to technical noise rather than biological variation.
- regions that were classified as Proto-DMRs were eligible for selection as Anti-DMRs.
- bedtools intersect was used to extract stable regions that were fully contained within Proto- DMR boundaries, ensuring that Anti-DMRs remained within the preselected regions lacking biological signal.
- Anti-DMRs were selected exclusively for normalization purposes and were not expected to carry any cancer-specific signal. They were chosen based on their ability to reflect the technical background of the assay, ensuring reliable baseline adjustment for downstream analyses. Anti-DMRs were selected based solely on how many regions met the stability criteria, without any predefined matching to DMR counts.
- DMR e.g., hypermethylated DMRs
- a Wilcoxon rank-sum test was applied to compare normalized methylation counts between the non-cancer and cancer populations. A one-tailed test was used with the null hypothesis that the non-cancer mean was lower than the cancer mean. Any region with p ⁇ 0.01 was classified as a DMR. Reads ⁇ 150 bp in length were considered. Reads containing at least two CpGs were included in the methylation quantification. The number of reads mapping to a particular Proto-DMR region was divided by the total number of reads mapping to all Proto-DMR regions.
- DMRs were exclusively selected from the Proto- DMR universe, ensuring that differential methylation analysis was performed on pre-filtered regions lacking signal in non-cancer controls. Any region that passed the Wilcoxon rank-sum test (p ⁇ 0.01) was classified as a DMR.
- Table 1 shows examples of DMR and anti -DMRs selected within the Proto-DMR region, and other example proto-DMRs.
- the DMRs, from which these 3 example regions were picked can be identified following the baseline informed cell line titration experiment described in Example 8.
- the other example proto-DMRs can be useful for identifying subject-specific DMRs, subject-specific anti-DMRs, or cancer specific anti- DMRs, and/or cancer specific DMRs.
- Table 1 Genomic locations of Proto-DMRs, DMRs and anti-DMRs selected from within the Proto-DMR regions
- Normalization was performed using a depth -normalized mean of DMRs divided by a depth-normalized mean of Anti-DMRs to compute a score (baseline-informed score, baseline/tissue agnostic score).
- Statistical processing included signal-to-noise ratio analysis, prevalence filtering, and computational threshold calibration based on specificity requirements.
- Example 8 Assessing baseline-informed and baseline/tissue agnostic approaches using cancer cell line and patient derived cell free DNA
- HNSCC FaDu head and neck squamous cell carcinoma
- Tumor-derived cfDNA was diluted into a background of pooled non-cancer control cfDNA at defined ctDNA percentages.
- the tumor-derived cfDNA was spiked into pooled non-cancer cfDNA backgrounds at variant allele fractions (VAFs) ranging from 0% to 0.3%.
- VAFs variant allele fractions
- Each reaction was prepared with a fixed total input of 10 ng cfDNA, which included both the spiked-in tumor-derived DNA and the background non- cancer cfDNA. This ensured that the total cfDNA input remained constant across all dilution levels, allowing for a direct assessment of MRD detection at varying ctDNA fractions.
- FaDu cell line To subject the FaDu cell line to baseline-informed approach (e.g., FIG. 15), 100% undiluted FaDu cell line sample was used to establish a patient-specific methylation signature, mimicking a personalized baseline sample. This was to allow for reduced technical and biological variability in post-treatment samples by directly capturing tumor-specific methylation patterns. Sensitivity and specificity were assessed across longitudinal samples to determine the effect of baseline-informed normalization in minimizing both sources of noise.
- a universal classification algorithm for cancer detection e.g., MRD
- the cell line samples were either subjected to Proto-DMR enrichment prior to sequencing as described in Example 7 or was subjected to whole methylome sequencing (WM).
- WM whole methylome sequencing
- every samples were sequenced to a depth of -200 million reads on an Illumina NovaSeq 6000 platform to ensure sufficient coverage for detecting low-frequency methylation signals.
- Proto-DMR enrichment the samples were sequenced to a depth of -50 million reads on an Illumina NovaSeq 6000 platform.
- a deduplication ratio was calculated to be approximately 10: 1, indicating that nearly all unique molecules had been captured, ensuring maximal detection efficiency.
- Two sets of controls were also included for analysis: (1) Technical controls, representing the pooled cfDNA background used across all dilution points, and (2) Biological controls, which were unique individuals not included in the dilution series.
- the limit of detection was determined using the Probit approach.
- the technical LOD was estimated by setting a false positive rate (FPR) of 5% in technical replicates of pooled non-cancer control cfDNA.
- FPR false positive rate
- the biological LOD was estimated by setting an FPR of 5% in biological replicates from unique non-cancer donors, capturing natural inter-individual variation. cfDNA from ctDNA positive patient samples
- cfDNA Cell free DNA from circulating tumor DNA positive patient samples, which was determined by The TruSight Oncology (TSO) 500 Panel, was used.
- cfDNA was mixed into a pooled non-cancer cfDNA background to simulate varying ctDNA fractions.
- Selected cfDNA samples were serially diluted into a pooled non-cancer cfDNA background at 0.025% to 1.0% ctDNA fractions, with each dilution performed using 10 ng of input cfDNA.
- the baseline/tissue agnostic scores were generated from subjecting FaDu cell at various dilutions (e.g., 0.01 %-0.3%) to the baseline/agnostic approach with whole methylome (WM).
- the baseline-informed scores were generated from subjecting FaDu cell at various dilutions (e.g., 0.01%-0.3%) to the baseline- informed approach with whole methylome (WM).
- the baseline-informed scores showed improved biological and technical LODs compared to the baseline/tissue agnostic scores, which suggested enhanced MRD detection in samples with low ctDNA presence with baseline-informed approach.
- the baseline/tissue agnostic scores were also generated from subjecting FaDu cell at various dilutions (e.g., 0.01 %-0.3%) to the baseline/tissue agnostic approach with Proto-DMR enrichment (CM, cancer methylome).
- the baseline- informed scores were generated from subjecting FaDu cell at various dilutions (e.g., 0.01%- 0.3%) to the baseline-informed approach with Proto-DMR enrichment (CM, cancer methylome).
- the technical LOD and the biological LOD were found to be improved, with the biological LOD improved by an order-of- magnitude when using the baseline-informed approach versus the baseline/tissue informed approach.
- the results also showed the proto-DMR can be used for detecting cancer.
- the baseline/tissue agnostic scores were generated from subjecting cfDNA (derived from colorectal Stage III patients samples) at various dilutions (.025%-!%) to the baseline/tissue-agnostic approach using whole methylome (WM) sequencing.
- cfDNA derived from colorectal Stage III patients samples
- WM whole methylome
- the baseline informed scores were generated from subjecting cfDNA (derived from colorectal cancer, Stage IV patients samples) at various dilutions (.025%-l%) to the baseline-informed agnostic approach using Proto-DMR panel (CM).
- the baseline/tissue agnostic scores were generated from subjecting cfDNA (derived from colorectal Stage IV patients samples) at various dilutions (.025%-l%) to the baseline/tissue agnostic approach using Proto-DMR panel (CM).
- the baseline-informed scoring had lower biological and technical LOD than the baseline/tissue agnostic scoring, suggesting that baseline-informed scoring may improve technical and biological LOD, reducing false negatives at lower ctDNA fractions.
- the results also showed the proto-DMR can be used for detecting cancer.
- Example 9 Assessing baseline-informed and baseline/tissue agnostic approaches using patient samples
- HNC stage I-IVB head and neck cancer
- the blood samples were collected in EDTA tubes and processed within two hours of collection to isolate the plasma. A total of 559 plasma samples from this training cohort were obtained and stored at -80°C until processing. These longitudinal samples, with confirmed recurrence or non-recurrence status through clinical follow-up, allowed evaluation of molecular residual disease (MRD) detection.
- MRD molecular residual disease
- cfDNA cell-free DNA
- cfMeDIP-seq cell-free Methylated DNA Immunoprecipitation sequencing
- 5mC 5 -methylcytosine
- Proto-DMR panel was further used as described above in Example 7 prior to sequencing.
- the cfDNA samples were also subjected to baseline informed model or baseline/tissue agnostic model.
- baseline informed model pre-treatment, postdiagnosis plasma samples were used to create a personalized methylation signature for each patient to enhance specificity in detecting recurrence.
- baseline/tissue agnostic model a universal classification algorithm was applied for cancer detection (e.g., MRD detection) without requiring a prior patient-specific sample to ensure broad clinical applicability.
- cancer detection e.g., MRD detection
- joint model both baseline-informed and baseline/tissue-agnostic scores generated as described above were integrated into a single predictive framework to optimize both sensitivity and specificity.
- FIG. 28A show the sensitivity and specificity measured across different the three approaches: baseline-informed approach, joint model approach, and baseline/tissue agnostic approach, each preformed using whole methylome sequencing.
- FIG. 28B show the sensitivity and specificity measured across the three different approaches: baseline-informed approach, joint model approach, and baseline/tissue agnostic approach, each preformed using the proto-DMR panel. High specificity of detecting MRD was observed across the different approaches, with further greater specificity observed when using the proto-DMR panel. Trade-off between sensitivity and specificity was found across the different approaches. The results showed that there was complementary nature of baseline-informed and tissue-agnostic models, and that using Proto-DMR enrichment may further improve detection of MRD.
- the Baseline-Informed and Tissue- Agnostic models were complementary, and the Joint Model improved MRD detection accuracy, reinforcing the utility of integrating both approaches for enhanced clinical performance.
- the Proto-DMR panel approach improved sensitivity across all three models while requiring about 25% of the sequencing depth compared to the whole methylome approach, making it a more cost-effective and scalable solution.
- the whole methylome approach targeted 200 million reads per sample, whereas the Proto-DMR panel required about 50 million reads, significantly reducing sequencing costs while maintaining high detection sensitivity.
- the whole methylome approach had a deduplication ratio of approximately 1 : 1, indicating that it remained in the linear phase of unique molecule discovery, meaning significantly higher sequencing depth may be required to reach saturation.
- the Proto-DMR panel achieved a deduplication ratio of 10: 1, demonstrating that it fully saturated all available unique molecules, maximizing detection efficiency while minimizing sequencing costs.
- FIG. 29 is an example method for evaluating and thresholding joint predictive models leveraging Baseline-Informed (BI) and Baseline/Tissue- Agnostic (BA) algorithms for circulating tumor DNA (ctDNA) quantification in patients with Head & Neck (H&N) cancer.
- BI Baseline-Informed
- BA Baseline/Tissue- Agnostic
- MRD minimal residual disease
- a nested cross-validation (CV) approach is employed to ensure robust performance evaluation while mitigating information leakage.
- a machine learning model such as a linear support vector machine (SVM) — is trained and validated.
- Threshold determination occurs within the inner CV loop, where per-fold thresholds are computed and averaged to yield a final decision threshold.
- the trained model is then applied to the test set from the outer CV loop, producing predictive scores. These scores are subsequently binarized using the calibrated threshold to generate final recurrence versus nonrecurrence predictions.
- Performance metrics including sensitivity, specificity, and overall classification accuracy, are computed to evaluate model efficacy.
- the disclosed system provides an optimized, leakage-free methodology for integrating multiple ctDNA detection algorithms into a single predictive model.
Landscapes
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Chemical & Material Sciences (AREA)
- Medical Informatics (AREA)
- Pathology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Public Health (AREA)
- Biomedical Technology (AREA)
- Organic Chemistry (AREA)
- General Health & Medical Sciences (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Genetics & Genomics (AREA)
- Analytical Chemistry (AREA)
- Databases & Information Systems (AREA)
- Epidemiology (AREA)
- Data Mining & Analysis (AREA)
- Immunology (AREA)
- Wood Science & Technology (AREA)
- Zoology (AREA)
- Primary Health Care (AREA)
- Microbiology (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biochemistry (AREA)
- Molecular Biology (AREA)
- Biotechnology (AREA)
- Biophysics (AREA)
- Physics & Mathematics (AREA)
- Oncology (AREA)
- Hospice & Palliative Care (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
La présente invention concerne un procédé de classification d'un échantillon prélevé sur un sujet. Le procédé peut comporter le prélèvement d'un échantillon contenant des molécules d'acide nucléique du sujet. Le procédé peut comporter le dosage des molécules d'acide nucléique pour générer un ensemble de données contenant les états de méthylation d'une ou plusieurs régions génomiques. La ou les régions génomiques peuvent comporter des régions de méthylation différentielle (DMR). Le procédé peut comporter le traitement d'au moins une partie de l'ensemble de données pour générer un résultat indiquant la présence d'un cancer chez le sujet. La partie de l'ensemble de données peut concerner un ensemble de DMR spécifiques du sujet.
Applications Claiming Priority (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202463556347P | 2024-02-21 | 2024-02-21 | |
| US63/556,347 | 2024-02-21 | ||
| US202463711616P | 2024-10-24 | 2024-10-24 | |
| US63/711,616 | 2024-10-24 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2025179073A1 true WO2025179073A1 (fr) | 2025-08-28 |
Family
ID=96847730
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2025/016672 Pending WO2025179073A1 (fr) | 2024-02-21 | 2025-02-20 | Procédés et systèmes pour l'analyse des régions différentiellement méthylées en fonction des tissus |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2025179073A1 (fr) |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20120282613A1 (en) * | 2010-01-26 | 2012-11-08 | Nipd Genetics Ltd | Methods and compositions for noninvasive prenatal diagnosis of fetal aneuploidies |
| US20210156863A1 (en) * | 2017-11-03 | 2021-05-27 | University Health Network | Cancer detection, classification, prognostication, therapy prediction and therapy monitoring using methylome analysis |
| US20220177956A1 (en) * | 2019-03-18 | 2022-06-09 | Nucleix Ltd. | Methods and systems for detecting methylation changes in dna samples |
| WO2023107709A1 (fr) * | 2021-12-10 | 2023-06-15 | Adela, Inc. | Procédés et systèmes pour générer des banques de séquençage |
| US20230374601A1 (en) * | 2022-02-07 | 2023-11-23 | Centre For Novostics Limited | Fragmentation for measuring methylation and disease |
-
2025
- 2025-02-20 WO PCT/US2025/016672 patent/WO2025179073A1/fr active Pending
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20120282613A1 (en) * | 2010-01-26 | 2012-11-08 | Nipd Genetics Ltd | Methods and compositions for noninvasive prenatal diagnosis of fetal aneuploidies |
| US20210156863A1 (en) * | 2017-11-03 | 2021-05-27 | University Health Network | Cancer detection, classification, prognostication, therapy prediction and therapy monitoring using methylome analysis |
| US20220177956A1 (en) * | 2019-03-18 | 2022-06-09 | Nucleix Ltd. | Methods and systems for detecting methylation changes in dna samples |
| WO2023107709A1 (fr) * | 2021-12-10 | 2023-06-15 | Adela, Inc. | Procédés et systèmes pour générer des banques de séquençage |
| US20230374601A1 (en) * | 2022-02-07 | 2023-11-23 | Centre For Novostics Limited | Fragmentation for measuring methylation and disease |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US12410480B2 (en) | Methods and systems for detecting colorectal cancer via nucleic acid methylation analysis | |
| US20240084397A1 (en) | Methods and systems for detecting cancer via nucleic acid methylation analysis | |
| JP2022539443A (ja) | メチル化核酸の高深度シーケンシングのための方法とシステム | |
| JP2021521536A (ja) | 生体試料の多検体アッセイのための機械学習実装 | |
| JP7620803B2 (ja) | 組織特異的メチル化マーカー | |
| JP7665659B2 (ja) | 循環腫瘍核酸分子のマルチモーダル分析 | |
| JP2022501033A (ja) | 膵臓病変の評価における無細胞dnaヒドロキシメチル化プロファイル | |
| Wang et al. | Terminal modifications independent cell-free RNA sequencing enables sensitive early cancer detection and classification | |
| US20250002904A1 (en) | Methods and systems for generating sequencing libraries | |
| WO2025059485A1 (fr) | Procédés et systèmes de séquençage de méthylation | |
| WO2025179073A1 (fr) | Procédés et systèmes pour l'analyse des régions différentiellement méthylées en fonction des tissus | |
| US20250230507A1 (en) | Methods and systems for cell-free nucleic acid processing | |
| US11427874B1 (en) | Methods and systems for detection of prostate cancer by DNA methylation analysis | |
| WO2024216205A1 (fr) | Procédés et systèmes de traitement d'acide nucléique acellulaire | |
| WO2024192294A1 (fr) | Procédés et systèmes pour générer des banques de séquençage | |
| WO2024155681A1 (fr) | Procédés et systèmes de détection et d'évaluation des maladies hépatiques | |
| WO2025254592A1 (fr) | Procédé de quantification d'acides nucléiques |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 25758814 Country of ref document: EP Kind code of ref document: A1 |