[go: up one dir, main page]

WO2025024497A1 - Significance modeling of clonal-level target variants using methylation detection - Google Patents

Significance modeling of clonal-level target variants using methylation detection Download PDF

Info

Publication number
WO2025024497A1
WO2025024497A1 PCT/US2024/039252 US2024039252W WO2025024497A1 WO 2025024497 A1 WO2025024497 A1 WO 2025024497A1 US 2024039252 W US2024039252 W US 2024039252W WO 2025024497 A1 WO2025024497 A1 WO 2025024497A1
Authority
WO
WIPO (PCT)
Prior art keywords
determining
nucleic acid
value
sample
target nucleic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/US2024/039252
Other languages
French (fr)
Inventor
Andrew M. Gross
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guardant Health Inc
Original Assignee
Guardant Health Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guardant Health Inc filed Critical Guardant Health Inc
Publication of WO2025024497A1 publication Critical patent/WO2025024497A1/en
Pending legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search

Definitions

  • ctDNA cell-free circulating tumor DNA tests have been used as rule-in tests for positive detection of tumor-derived genomic alterations and microsatellite instability (MSI) with high concordance to tissue sequencing.
  • MSI microsatellite instability
  • Using ctDNA or other nucleic acids to determine the wild-type status of specific genes within a tumor with high confidence would facilitate timely therapeutic decision making and avoid tissue biopsy for confirmation of wild-type status. [0003] As such, it would greatly improve therapeutic approaches if able to identify whether a subject has any actionable variants.
  • Described herein are methods for determining no actionable variants are present in a sample.
  • an informative likelihood is achieved of correctly identifying the lack of a set of actionable variants.
  • Described herein is a method of determining that a first variant of interest at a first locus is absent at a clonal level in a cell-free deoxyribonucleic acid (cfDNA) sample of a Attorney Docket No.
  • cfDNA cell-free deoxyribonucleic acid
  • the method comprising: accessing a plurality of sequence reads of the cfDNA sample; [0006] determining that the first variant has not been detected at the first locus in the sample based on the plurality of sequence reads; generating a first likelihood value based on a probability that the first variant is absent at the clonal level and/or a second likelihood value based on a probability that the first variant is not absent at the clonal level; optionally, determining a quantitative value based on the first likelihood value and/or the second likelihood value; [0007] comparing the quantitative value and/or the first likelihood value and/or the second likelihood value to a threshold; and determining that the first variant of interest at the first locus is absent at the clonal level based on the comparison.
  • the method includes generating the first likelihood value and the second likelihood value comprises: determining a tumor fraction estimate of the sample, wherein the first likelihood value and the second likelihood value is based on the tumor fraction estimate.
  • the method includes determining the tumor fraction estimate comprises: determining a maximum mutant allele frequency (epi MAF) of a tumor mutation in the sample.
  • the method includes determining the epi MAF comprises determining a molecule count associated with the tumor mutation based on the plurality of sequence reads.
  • the method includes generating the first likelihood value and the second likelihood value comprises: determining an allele frequency of at least a second variant, wherein the first likelihood value and the second likelihood value are based further on the allele frequency and the epi MAF.
  • the method includes comparing the allele frequency with a second threshold that is based on the epi MAF, wherein determining that the first variant of interest at the first locus is absent at the clonal level is based further on the comparison of the MAF with the second threshold.
  • the method includes determining the allele frequency comprises: determining a first molecule count associated with the first variant based on the plurality of sequence reads.
  • the method includes determining the quantitative value comprises: accessing covariable information indicating a historical prevalence of one or more variants exhibiting co-occurrence and/or mutual exclusivity with the first variant, wherein the quantitative value is based on the covariable information.
  • the method includes determining a prevalence of at least a second variant in the cfDNA sample, wherein the quantitative value is based further on the covariable information. In other embodiments, the method includes determining the quantitative value comprises: accessing covariable Attorney Docket No. GH0160WO information indicating a historical prevalence of one or more variants exhibiting co- occurrence and/or mutual exclusivity with the first variant, wherein the quantitative value is based on the covariable information. In other embodiments, the method includes determining a prevalence of at least a second variant in the cfDNA sample, wherein the quantitative value is based further on the prevalence of the second variant. In other embodiments, the quantitative value is based on the ratio of the first likelihood value to the second likelihood value.
  • the method includes determining a level of confidence that the first variant is absent at the clonal level in the cfDNA sample based on the quantitative value. In other embodiments, the method includes determining a treatment plan to treat a disease in the human subject. In other embodiments, the disease is cancer. In other embodiments, the method includes determining a prevalence of at least a second variant in the cfDNA sample; and adjusting the quantitative value based on the prevalence of at least a second variant in the cfDNA sample.
  • a method of determining that a first target nucleic acid variant is absent at a first genetic locus in a cell-free nucleic acid (cfNA) sample obtained from a subject having a given cancer type at least partially using a computer comprising: [0009] determining that the first target nucleic acid variant at the first genetic locus is not detected in the cfNA sample; determining, by the computer, a coverage of the first genetic locus from sequence information generated from the cfNA sample; determining, by the computer, a tumor fraction from the sequence information generated from the cfNA sample; determining, by the computer, a probability that the first target nucleic acid variant is not absent at the first genetic locus in the cfNA sample from the coverage and the tumor fraction to generate a quantitative value; and, determining that the first target nucleic acid variant is absent at the first genetic locus in the cfNA sample when the quantitative value differs from a threshold value.
  • cfNA cell-free nucleic acid
  • a method of determining that a first target nucleic acid variant is absent at a first genetic locus in a cell-free nucleic acid (cfNA) sample obtained from a subject at least partially using a computer comprising: determining that the first target nucleic acid variant is not detected in the cfNA sample obtained from the subject to generate a first test result; determining that at least a second target nucleic acid variant is detected in the cfNA sample obtained from the subject to generate a second test result; determining, by the computer, a first probability that the first target nucleic acid variant is absent in the cfNA sample given the second test result and/or a second probability that the first target nucleic acid is not absent in the cfNA sample given the second test result; Attorney Docket No.
  • GH0160WO generating, by the computer, a quantitative value using the first probability, the second probability, and/or a ratio thereof; and determining that the first target nucleic acid variant is absent at the first genetic locus in the cfNA sample when the quantitative value differs from a threshold value.
  • a method of determining that a first target nucleic acid variant is absent at a first genetic locus in a cell-free nucleic acid (cfNA) sample obtained from a subject having a given cancer type at least partially using a computer comprising: [0012] determining that the first target nucleic acid variant is not detected in the cfNA sample obtained from the subject; generating, by the computer, at least one tumor fraction based value; generating, by the computer, at least one mutual exclusivity value; and, determining that the first target nucleic acid variant is absent at the first genetic locus in the cfNA sample using the tumor fraction based value and/or the mutual exclusivity value.
  • the quantitative value is less than the threshold value. In other embodiments, the quantitative value is greater than the threshold value. In other embodiments, the first and second test results are dependent upon one another. In other embodiments, the method includes determining that a plurality of other selected target nucleic variants are absent at one or more other genetic loci. In other embodiments, the quantitative value comprises a log likelihood ratio (LLR) threshold value. In other embodiments, the method includes determining that the first target nucleic acid variant is absent at the first genetic locus in a plurality of reference cfNA samples to generate the threshold value. In other embodiments, the threshold value comprises a clonality or a sub-clonality threshold value.
  • the first target nucleic acid variant comprises a driver mutation.
  • the method includes administering one or more therapies to the subject based upon the determination that the first target nucleic acid variant is absent at the first genetic locus in the cfNA sample.
  • the method includes estimating a probability of detecting the first target nucleic acid variant at the first genetic locus in the cfNA sample using the tumor fraction and a binomial model.
  • the binomial model comprises information about the given cancer type and/or the second target nucleic acid variant.
  • the method includes the determination that the first target nucleic acid variant is absent at the first genetic locus in the cfNA sample indicates that the first genetic locus is wild type.
  • the given cancer type is colorectal cancer, wherein the first genetic locus is KRAS, BRAF, or NRAS, and wherein the determination that the first target nucleic acid variant is absent at the first genetic locus in the cfNA sample indicates that the first genetic locus is wild type KRAS, BRAF, or NRAS.
  • the method includes administering Cetuximab and/or Panitumumab to the subject.
  • the cfNA comprises cfDNA.
  • the cfNA comprises cfRNA.
  • the method includes repeating the method one or more times to monitor whether the first target nucleic acid variant is absent at the first genetic locus in different cfNA samples obtained from the subject at different time points. In other embodiments, the method includes performing one or more additional tests to confirm or refute the determination that the first target nucleic acid variant is absent at the first genetic locus in the cfNA sample. In other embodiments, the method includes determining a maximum mutant allele frequency (epi MAF) for the cfNA sample and using the epi MAF as an estimate of the tumor fraction.
  • epi MAF maximum mutant allele frequency
  • the method includes determining that first target nucleic acid variant at the first genetic locus is not detected in the cfNA sample based upon a plurality of sequencing reads obtained from the cfNA sample. In other embodiments, the method includes determining that the first target nucleic acid variant is absent at a clonal level in the cfNA sample. In other embodiments, the method includes generating a first likelihood value based on the first probability and a second likelihood value based on the second probability. In other embodiments, the method includes determining the quantitative value based on the first likelihood value and the second likelihood value.
  • the method includes generating the first likelihood value and the second likelihood value comprises determining the tumor fraction estimate of the cfNA sample, wherein the first likelihood value and the second likelihood value is based on the tumor fraction estimate.
  • the method includes determining the tumor fraction estimate comprises determining a maximum mutant allele frequency (epi MAF) of a tumor mutation in the cfNA sample.
  • the method includes determining the epi MAF comprises determining a molecule count associated with the tumor mutation based on the plurality of sequence reads.
  • the method includes generating the first likelihood value and the second likelihood value comprises determining an allele frequency of at least a second variant, wherein the first likelihood value and the second likelihood value are based further on the allele frequency and the epi MAF.
  • the method includes comparing the allele frequency with a second threshold that is based on the epi MAF, wherein determining that the first target nucleic acid variant of interest at the first genetic locus is absent at the clonal level is based further on the comparison of the MAF with the second threshold.
  • the method includes determining the allele frequency comprises determining a first molecule count associated with the first target nucleic acid variant based on the plurality of sequence reads. Attorney Docket No.
  • the method includes determining the quantitative value comprises accessing covariable information indicating a historical prevalence of one or more variants exhibiting co-occurrence and/or mutual exclusivity with the first variant, wherein the quantitative value is based on the covariable information. In other embodiments, the method includes determining a prevalence of at least the second target nucleic acid variant in the cfDNA sample, wherein the quantitative value is based further on the covariable information. [0015] In other embodiments, the method includes determining the quantitative value comprises accessing covariable information indicating a historical prevalence of one or more variants exhibiting co-occurrence and/or mutual exclusivity with the first target nucleic acid variant, wherein the quantitative value is based on the covariable information.
  • the method includes determining a prevalence of at least the second target nucleic acid variant in the cfNA sample, wherein the quantitative value is based further on the prevalence of the second target nucleic acid variant. In other embodiments, the quantitative value is based on the ratio of the first likelihood value to the second likelihood value. In other embodiments, the method includes determining a level of confidence that the first target nucleic acid variant is absent at a clonal level in the cfNA sample based on the quantitative value. In other embodiments, the method includes determining a prevalence of at least the second target nucleic acid variant in the cfNA sample; and adjusting the quantitative value based on the prevalence of at least the second target nucleic acid variant in the cfNA sample.
  • the ratio comprises a log posterior probability ratio (LPPR) equal to a sum of a log likelihood tumor fraction value, a log likelihood mutual exclusivity value, and a log prior value.
  • the first genetic locus or a second genetic locus comprises the second target nucleic acid variant.
  • the quantitative value comprises a negative predictive value (NPV) score.
  • the given cancer type comprises lung cancer and the first target nucleic acid variant is a mutation in a gene selected from the group consisting of: EGFR, BRAF, ALK, ROS1, and MET.
  • the given cancer type comprises colorectal cancer and the first target nucleic acid variant is a mutation in a gene selected from the group consisting of: KRAS, BRAF, and NRAS.
  • a controller comprising, or capable of accessing, computer readable media comprising non-transitory computer executable instructions which, when executed by at least one electronic processor, perform at least: accessing a plurality of sequence reads of the cfDNA sample; determining that the first Attorney Docket No.
  • GH0160WO variant has not been detected at the first locus in the sample based on the plurality of sequence reads; [0018] generating a first likelihood value based on a probability that the first variant is absent at the clonal level and a second likelihood value based on a probability that the first variant is not absent at the clonal level; determining a quantitative value based on the first likelihood value and the second likelihood value; comparing the quantitative value to a threshold; and [0019] determining that the first variant of interest at the first locus is absent at the clonal level based on the comparison.
  • a system comprising a controller comprising, or capable of accessing, computer readable media comprising non-transitory computer executable instructions which, when executed by at least one electronic processor, perform at least: [0021] accessing sequence information generated from a cell-free nucleic acid (cfNA) sample obtained from a subject having a given cancer type; determining that a first target nucleic acid variant at a first genetic locus is not detected in cfNA sample from the sequence information; determining a coverage of the first genetic locus from the sequence information; [0022] determining a tumor fraction from the sequence information; determining a probability that the first target nucleic acid variant is not absent at the first genetic locus in the cfNA sample from the coverage and the tumor fraction to generate a quantitative value; and, determining that the first target nucleic acid variant is absent at the first genetic locus in the cfNA sample when the quantitative value differs from a threshold value.
  • cfNA cell-free nucleic acid
  • Described herein is a system, comprising a controller comprising, or capable of accessing, computer readable media comprising non-transitory computer executable instructions which, when executed by at least one electronic processor, perform at least: [0024] accessing sequence information generated from a cell-free nucleic acid (cfNA) sample obtained from a subject; determining that the first target nucleic acid variant is not detected in the cfNA sample from the sequence information to generate a first test result; determining that at least a second target nucleic acid variant is detected in the cfNA sample from the sequence information to generate a second test result; determining a first probability that the first target nucleic acid variant is absent in the cfNA sample given the second test result and/or a second probability that the first target nucleic acid is not absent in the cfNA sample given the second test result; generating a quantitative value using the first probability, the second probability, and/or a ratio thereof; and, determining that the first target nucleic acid variant is absent
  • Described herein is a system, comprising a controller comprising, or capable of accessing, computer readable media comprising non-transitory computer executable instructions which, when executed by at least one electronic processor, perform at least: [0026] accessing sequence information generated from a cell-free nucleic acid (cfNA) sample obtained from a subject; determining that the first target nucleic acid variant is not detected in the cfNA sample from the sequence information; generating at least one tumor fraction based value; generating at least one mutual exclusivity value; and, determining that the first target nucleic acid variant is absent at the first genetic locus in the cfNA sample using the tumor fraction based value and/or the mutual exclusivity value.
  • cfNA cell-free nucleic acid
  • Described herein is a computer readable media comprising non-transitory computer executable instruction which, when executed by at least electronic processor perform at least: accessing a plurality of sequence reads of the cfDNA sample; determining that the first variant has not been detected at the first locus in the sample based on the plurality of sequence reads; generating a first likelihood value based on a probability that the first variant is absent at the clonal level and a second likelihood value based on a probability that the first variant is not absent at the clonal level; determining a quantitative value based on the first likelihood value and the second likelihood value; comparing the quantitative value to a threshold; and determining that the first variant of interest at the first locus is absent at the clonal level based on the comparison.
  • Described herein is a computer readable media comprising non-transitory computer executable instruction which, when executed by at least electronic processor perform at least: accessing sequence information generated from a cell-free nucleic acid (cfNA) sample obtained from a subject having a given cancer type; determining that a first target nucleic acid variant at a first genetic locus is not detected in cfNA sample from the sequence information; determining a coverage of the first genetic locus from the sequence information; determining a tumor fraction from the sequence information; determining a probability that the first target nucleic acid variant is not absent at the first genetic locus in the cfNA sample from the coverage and the tumor fraction to generate a quantitative value; and, determining that the first target nucleic acid variant is absent at the first genetic locus in the cfNA sample when the quantitative value differs from a threshold value.
  • cfNA cell-free nucleic acid
  • Described herein is a computer readable media comprising non-transitory computer executable instruction which, when executed by at least electronic processor perform at least: accessing sequence information generated from a cell-free nucleic acid (cfNA) sample obtained from a subject; determining that the first target nucleic acid variant is not detected in Attorney Docket No.
  • cfNA cell-free nucleic acid
  • a computer readable media comprising non-transitory computer executable instruction which, when executed by at least electronic processor perform at least: accessing sequence information generated from a cell-free nucleic acid (cfNA) sample obtained from a subject; determining that the first target nucleic acid variant is not detected in the cfNA sample from the sequence information; generating at least one tumor fraction based value; generating at least one mutual exclusivity value; and, determining that the first target nucleic acid variant is absent at the first genetic locus in the cfNA sample using the tumor fraction based value and/or the mutual exclusivity value.
  • the system or computer readable media of any one of the preceding claims wherein the quantitative value is less than the threshold value.
  • the system or computer readable media of any one of the preceding claims wherein the quantitative value is greater than the threshold value. In other embodiments, the system or computer readable media of any one of the preceding claims, wherein the first and second test results are dependent upon one another. In other embodiments, the system or computer readable media of any one of the preceding claims, comprising determining that a plurality of other selected target nucleic variants are absent at one or more other genetic loci. In other embodiments, the system or computer readable media of any one of the preceding claims, wherein the quantitative value comprises a log likelihood ratio (LLR) threshold value.
  • LLR log likelihood ratio
  • the system or computer readable media of any one of the preceding claims comprising determining that the first target nucleic acid variant is absent at the first genetic locus in a plurality of reference cfNA samples to generate the threshold value.
  • the system or computer readable media of claim 74 wherein the threshold value comprises a clonality or sub- clonality threshold value.
  • the system or computer readable media of any one of the preceding claims wherein the first target nucleic acid variant comprises a driver mutation.
  • the instructions further perform at least: outputting one or more therapy recommendations for the subject based upon the determination that the first target nucleic acid variant is absent at the first genetic locus in the cfNA sample.
  • the system or computer readable media of any one of the preceding claims wherein the instructions further perform at least: estimating a probability of detecting the first target nucleic acid variant at the first genetic locus in the cfNA sample using the tumor fraction and a binomial model.
  • the system or computer readable media of any one of the preceding claims wherein the instructions further perform at least: determining a maximum mutant allele frequency (epi MAF) for the cfNA sample and using the epi MAF as an estimate of the tumor fraction. In other embodiments, the system or computer readable media of any one of the preceding claims, wherein the instructions further perform at least: determining that the first target nucleic acid variant is absent at a clonal level in the cfNA sample. In other embodiments, the system or computer readable media of any one of the preceding claims, wherein the instructions further perform at least: generating a first likelihood value based on the first probability and a second likelihood value based on the second probability.
  • epi MAF maximum mutant allele frequency
  • the system or computer readable media of any one of the preceding claims wherein the instructions further perform at least: determining the quantitative value based on the first likelihood value and the second likelihood value. In other embodiments, the system or computer readable media of any one of the preceding claims, wherein the instructions further perform at least: generating the first likelihood value and the second likelihood value by determining the tumor fraction estimate of the cfNA sample, wherein the first likelihood value and the second likelihood value is based on the tumor fraction estimate. In other embodiments, the system or computer readable media of claim 83, wherein the instructions further perform at least: determining the tumor fraction estimate by determining a maximum mutant allele frequency (epi MAF) of a tumor mutation in the cfNA sample.
  • epi MAF maximum mutant allele frequency
  • the system or computer readable media of claim 84 wherein the instructions further perform at least: determining the epi MAF by determining a molecule count associated with the tumor mutation based on the plurality of sequence reads. In other embodiments, the system or computer readable media of claim 84, wherein the instructions further perform at least: generating the first likelihood value and the second likelihood value by determining an allele frequency of at least a second variant, wherein the first likelihood value and the second likelihood value are based further on the allele frequency and the epi MAF. In other embodiments, the system or computer readable media of claim 86, wherein the instructions further perform at least: comparing the allele frequency with a second Attorney Docket No.
  • the system or computer readable media of claim 86 wherein the instructions further perform at least: determining the allele frequency by determining a first molecule count associated with the first target nucleic acid variant based on the plurality of sequence reads.
  • the system or computer readable media of claim 86 wherein the instructions further perform at least: determining the quantitative value by accessing covariable information indicating a historical prevalence of one or more variants exhibiting co- occurrence and/or mutual exclusivity with the first variant, wherein the quantitative value is based on the covariable information.
  • the system or computer readable media of claim 89 wherein the instructions further perform at least: determining a prevalence of at least the second target nucleic acid variant in the cfDNA sample, wherein the quantitative value is based further on the covariable information.
  • the system or computer readable media of claim 83 wherein the instructions further perform at least: determining the quantitative value by accessing covariable information indicating a historical prevalence of one or more variants exhibiting co-occurrence and/or mutual exclusivity with the first target nucleic acid variant, wherein the quantitative value is based on the covariable information.
  • the system or computer readable media of claim 91 wherein the instructions further perform at least: determining a prevalence of at least the second target nucleic acid variant in the cfNA sample, wherein the quantitative value is based further on the prevalence of the second target nucleic acid variant.
  • the system or computer readable media of claim 83 wherein the instructions further perform at least: determining a level of confidence that the first target nucleic acid variant is absent at a clonal level in the cfNA sample based on the quantitative value. In other embodiments, the system or computer readable media of claim 83, wherein the instructions further perform at least: determining a prevalence of at least the second target nucleic acid variant in the cfNA sample; and adjusting the quantitative value based on the prevalence of at least the second target nucleic acid variant in the cfNA sample.
  • the system or computer readable media of any one of the preceding claims wherein the ratio comprises a log posterior probability ratio (LPPR) equal to a sum of a log likelihood tumor fraction value, a log likelihood mutual exclusivity value, and a log prior value.
  • LPPR log posterior probability ratio
  • the system of any one of the preceding further comprising generating a report which optionally includes information on, and/or information derived from, the absence of Attorney Docket No. GH0160WO the first target nucleic acid variant at the first genetic locus in the sample.
  • the method or system of any of the preceding further comprising communicating the report to a third party, such as the subject from whom the sample derived or a health care practitioner.
  • FIGURE. 1 illustrates an example of sample level NPV values increase with TF.
  • Per Sample NPV values increase wrt TF for different Alteration types. Limited to NSCLC, Breast, Colorectal, Pancreatic and Prostate cancers
  • FIGURE.2 illustrates an example of biomarker level NPV values increase with TF. Per Biomarker NPV values increase wrt TF for different Alteration types. Limited to NSCLC, Breast, Colorectal, Pancreatic and Prostate cancers [0033]
  • FIGURE.3 illustrates an example of blood tissue PPA of FDA approved biomarkers [0034] FIGURE.
  • FIGURE. 4 illustrates an example of negative prediction power is directly tied to tumor fraction
  • FIGURE. 5 illustrates an example of allele fraction of CRC hotspot variants in Epigenome cohort
  • FIGURE.6 illustrates an example of a system for generating negative predictions of a target variant in a sample of a subject, according to an embodiment of the disclosure.
  • FIGURE. 7 illustrates a schematic diagram of inputs and outputs of a negative prediction analyzer, according to an embodiment.
  • FIGURE.8 illustrates an example of a method for generating negative predictions of a target variant in a sample of a subject, according to an embodiment of the disclosure. [0039] FIGURE.
  • FIG. 9A illustrates a graph of a test hypothesis in which a target variant (the target variant) is absent (or present at sub-clonal MAF) from the sample, according to an embodiment.
  • FIGURE.9B illustrates a graph of a null hypothesis in which the target variant is not absent in the sample, according to an embodiment.
  • DETAILED DESCRIPTION [0040] While various embodiments of the disclosure have been shown and described herein, those skilled in the art will understand that such embodiments are provided by way of example only. Numerous variations, changes, and substitutions may occur to those skilled in Attorney Docket No. GH0160WO the art without departing from the disclosure. It should be understood that various alternatives to the embodiments of the disclosure described herein may be employed.
  • the term “about” and its grammatical equivalents in relation to a reference numerical value can include a range of values up to plus or minus 10% from that value.
  • the amount “about 10” can include amounts from 9 to 11.
  • the term “about” in relation to a reference numerical value can include a range of values plus or minus 10%, 9%, 8%, 7%, 6%, 5%, 4%, 3%, 2%, or 1% from that value.
  • the term “at least” and its grammatical equivalents in relation to a reference numerical value can include the reference numerical value and greater than that value.
  • the amount “at least 10” can include the value 10 and any numerical value above 10, such as 11, 100, and 1,000.
  • the term “at most” and its grammatical equivalents in relation to a reference numerical value can include the reference numerical value and less than that value.
  • the amount “at most 10” can include the value 10 and any numerical value under 10, such as 9, 8, 5, 1, 0.5, and 0.1.
  • the singular forms “a”, “an”, and “the” can include plural referents unless the context clearly dictates otherwise.
  • reference to “a cell” can include a plurality of such cells and reference to “the culture” can include reference to one or more cultures and equivalents thereof known to those skilled in the art, and so forth.
  • Cancer can be indicated by epigenetic variations, such as methylation.
  • methylation changes in cancer include local gains of DNA methylation in the CpG islands at the transcription start site (TSS) of genes involved in normal growth control, DNA repair, cell cycle regulation, and/or cell differentiation. This hypermethylation can be associated with an aberrant loss of transcriptional capacity of involved genes and occurs at least as frequently as point mutations and deletions as a cause of altered gene expression.
  • TSS transcription start site
  • DNA methylation profiling can be used to detect regions with different extents of methylation (“differentially methylated regions” or “DMRs”) of the genome that are altered during development or that are perturbed by disease, for example, cancer or any cancer-associated disease.
  • DMRs differentiated regions
  • the genome of cancer cells harbor imbalance in the above DNA methylation patterns, and therefore in functional packaging of the DNA.
  • the abnormalities of chromatin organization are therefore coupled with methylation changes and may contribute to enhanced cancer profiling when analyzed jointly.
  • Methylation profiling can involve determining methylation patterns across different regions of the genome. For example, after partitioning molecules based on extent of methylation (e.g., relative number of methylated sites per molecule) and sequencing, the sequences of molecules in the different partitions can be mapped to a reference genome. This can show regions of the genome that, compared with other regions, are more highly methylated or are less highly methylated.
  • a characteristic of nucleic acid molecules may be a modification, which may include various chemical or protein modifications (i.e. epigenetic modifications).
  • chemical modification may include, but are not limited to, covalent DNA modifications, including DNA methylation.
  • DNA methylation includes addition of a methyl group to a cytosine at a CpG site (a cytosine followed by a guanine in a nucleic acid sequence).
  • DNA methylation includes addition of a methyl group to adenine, such as in N6-methyladenine.
  • DNA methylation is 5- methylation (modification of the 5th carbon of the 6 carbon ring of cytosine).
  • 5-methylation includes addition of a methyl group to the 5C position of the cytosine to create 5-methylcytosine (m5c).
  • methylation includes a derivative of m5c. Derivatives of m5c include, but are not limited to, 5-hydroxymethylcytosine (5-hmC), 5-formylcytosine (5-fC), and 5-caryboxylcytosine (5-caC).
  • DNA methylation is 3C methylation (modification of the 3rd carbon of the 6 carbon ring of cytosine).
  • 3C methylation includes addition of a methyl group to the 3C position of the cytosine to generate 3-methylcytosine (3mC).
  • Other examples include N6- methyladenine or glycosylation.
  • DNA methylation includes addition of methyl groups to DNA (e.g. CpG) and can change the expression of methylated DNA region.. Methylation can also occur at non CpG sites, for example, methylation can occur at a CpA, CpT, or CpC site. DNA methylation can change the activity of methylated DNA region. For example, when DNA in a promoter region is methylated, transcription of the gene may be repressed.
  • a CpG dyad is the dinucleotide CpG (cytosine-phosphate-guanine, i.e. a cytosine followed by a guanine in a 5’ ⁇ 3’ direction of the nucleic acid sequence) on the sense strand and its complementary CpG on the antisense strand of a double-stranded DNA molecule.
  • CpG dyads can be either fully methylated or hemi-methylated (methylated on one strand only).
  • the CpG dinucleotide is underrepresented in the normal human genome, with the majority of CpG dinucleotide sequences being transcriptionally inert (e.g. DNA heterochromatic regions in pericentromeric parts of the chromosome and in repeat elements) and methylated. However, many CpG islands are protected from such methylation especially around transcription start sites (TSS).
  • TSS transcription start sites
  • tumor fraction prediction CNV on epi-MAF gene, CHIP, ect
  • TDD tumor-fraction
  • Static parameters include s0ub-clonal variant purity boundary (30%), prior variant likelihood, and mutual exclusivity or co-occurrence with other variants.
  • Key derived parameters include tumor fraction and cancer tissue of origin. Taking into account these parameters, a probability distribution of tumor fraction based on methylation data can be measured.
  • Protein modifications include binding to components of chromatin, particularly histones including modified forms thereof, and binding to other proteins, such as proteins involved in replication or transcription.
  • the disclosure provides methods of processing and analyzing nucleic acids with different extents of modification, such that the nature of their original modification is correlated with a nucleic acid tag and can be decoded by sequencing the tag when nucleic acids are analyzed. Genetic variation of sample nucleic acid modifications can then be associated with the extent of modification (epigenetic variation) of that nucleic acid in the original sample. include single stranded (e.g., ssDNA or RNA) or double stranded molecules (e.g., dsDNA).
  • the loss of DNA can reduce the presence of one or more types of DNA such that the presence of the one or more types of DNA such as cfDNA, is difficult to detect.
  • existing methods to measure DNA methylation such as enrichment or depletion methods, can have a relatively high level of resolution, such as about 100 base pairs (bp) to about 200 bp that can make accurately determining an amount of methylation of DNA difficult.
  • the accuracy with which DNA methylation is determined can impact the Attorney Docket No. GH0160WO accuracy of estimates of tumor fraction for samples.
  • a sample can be any biological sample isolated from a subject.
  • a sample can be a bodily sample.
  • Samples can include body tissues, such as known or suspected solid tumors, whole blood, platelets, serum, plasma, stool, red blood cells, white blood cells or leucocytes, endothelial cells, tissue biopsies, cerebrospinal fluid synovial fluid, lymphatic fluid, ascites fluid, interstitial or extracellular fluid, the fluid in spaces between cells, including gingival crevicular fluid, bone marrow, pleural effusions, cerebrospinal fluid, saliva, mucous, sputum, semen, sweat, urine. Samples are preferably body fluids, particularly blood and fractions thereof, and urine.
  • a sample can be in the form originally isolated from a subject or can have been subjected to further processing to remove or add components, such as cells, or enrich for one component relative to another.
  • a preferred body fluid for analysis is plasma or serum containing cell-free nucleic acids.
  • a sample can be isolated or obtained from a subject and transported to a site of sample analysis. The sample may be preserved and shipped at a desirable temperature, e.g., room temperature, 4°C, -20°C, and/or -80°C.
  • a sample can be isolated or obtained from a subject at the site of the sample analysis.
  • the subject can be a human, a mammal, an animal, a companion animal, a service animal, or a pet.
  • the subject may have a cancer.
  • the subject may not have cancer or a detectable cancer symptom.
  • the subject may have been treated with one or more cancer therapy, e.g., any one or more of chemotherapies, antibodies, vaccines or biologics.
  • the subject may be in remission.
  • the subject may or may not be diagnosed of being susceptible to cancer or any cancer-associated genetic mutations/disorders.
  • the volume of plasma can depend on the desired read depth for sequenced regions. Exemplary volumes are 0.4-40 ml, 5-20 ml, 10-20 ml.
  • the volume can be 0.5 mL, 1 mL, 5 mL 10 mL, 20 mL, 30 mL, or 40 mL.
  • a volume of sampled plasma may be 5 to 20 mL.
  • a sample can comprise various amount of nucleic acid that contains genome equivalents.
  • a sample of about 30 ng DNA can contain about 10,000 (104) haploid human genome equivalents and, in the case of cfDNA, about 200 billion (2x1011) individual polynucleotide molecules.
  • a sample of about 100 ng of DNA can contain Attorney Docket No. GH0160WO about 30,000 haploid human genome equivalents and, in the case of cfDNA, about 600 billion individual molecules.
  • a sample can comprise nucleic acids from different sources, e.g., from cells and cell-free of the same subject, from cells and cell-free of different subjects.
  • a sample can comprise nucleic acids carrying mutations.
  • a sample can comprise DNA carrying germline mutations and/or somatic mutations.
  • Germline mutations refer to mutations existing in germline DNA of a subject.
  • Somatic mutations refer to mutations originating in somatic cells of a subject, e.g., cancer cells.
  • a sample can comprise DNA carrying cancer-associated mutations (e.g., cancer-associated somatic mutations).
  • a sample can comprise an epigenetic variant (i.e. a chemical or protein modification), wherein the epigenetic variant associated with the presence of a genetic variant such as a cancer-associated mutation.
  • the sample includes an epigenetic variant associated with the presence of a genetic variant, wherein the sample does not comprise the genetic variant.
  • Exemplary amounts of cell-free nucleic acids in a sample before amplification range from about 1 fg to about 1 ⁇ g, e.g., 1 pg to 200 ng, 1 ng to 100 ng, 10 ng to 1000 ng.
  • the amount can be up to about 600 ng, up to about 500 ng, up to about 400 ng, up to about 300 ng, up to about 200 ng, up to about 100 ng, up to about 50 ng, or up to about 20 ng of cell-free nucleic acid molecules.
  • the amount can be at least 1 fg, at least 10 fg, at least 100 fg, at least 1 pg, at least 10 pg, at least 100 pg, at least 1 ng, at least 10 ng, at least 100 ng, at least 150 ng, or at least 200 ng of cell-free nucleic acid molecules.
  • the amount can be up to 1 femtogram (fg), 10 fg, 100 fg, 1 picogram (pg), 10 pg, 100 pg, 1 ng, 10 ng, 100 ng, 150 ng, or 200 ng of cell-free nucleic acid molecules.
  • the method can comprise obtaining 1 femtogram (fg) to 200 ng.
  • Cell-free nucleic acids are nucleic acids not contained within or otherwise bound to a cell or in other words nucleic acids remaining in a sample after removing intact cells.
  • Cell- free nucleic acids include DNA, RNA, and hybrids thereof, including genomic DNA, mitochondrial DNA, siRNA, miRNA, circulating RNA (cRNA), tRNA, rRNA, small nucleolar RNA (snoRNA), Piwi-interacting RNA (piRNA), long non-coding RNA (long ncRNA), or fragments of any of these.
  • Cell-free nucleic acids can be double-stranded, single-stranded, or a hybrid thereof.
  • a cell-free nucleic acid can be released into bodily fluid through secretion or cell death processes, e.g., cellular necrosis and apoptosis.
  • Some cell-free nucleic acids are released into bodily fluid from cancer cells e.g., circulating tumor DNA, (ctDNA). Others are released from healthy cells.
  • cfDNA is cell-free fetal DNA (cffDNA)
  • cell free nucleic acids are produced by tumor cells.
  • cell free nucleic acids are produced by a mixture of tumor cells and non-tumor cells.
  • Cell-free nucleic acids have an exemplary size distribution of about 100-500 nucleotides, with molecules of 110 to about 230 nucleotides representing about 90% of molecules, with a mode of about 168 nucleotides and a second minor peak in a range between 240 to 440 nucleotides.
  • Cell-free nucleic acids can be isolated from bodily fluids through a fractionation or partitioning step in which cell-free nucleic acids, as found in solution, are separated from intact cells and other non-soluble components of the bodily fluid. Partitioning may include techniques such as centrifugation or filtration. Alternatively, cells in bodily fluids can be lysed and cell-free and cellular nucleic acids processed together.
  • nucleic acids can be precipitated with an alcohol. Further clean up steps may be used such as silica based columns to remove contaminants or salts.
  • Non-specific bulk carrier nucleic acids such as Cot-1 DNA, DNA or protein for bisulfite sequencing, hybridization, and/or ligation, may be added throughout the reaction to optimize certain aspects of the procedure such as yield.
  • samples can include various forms of nucleic acid including double stranded DNA, single stranded DNA and single stranded RNA.
  • single stranded DNA and RNA can be converted to double stranded forms so they are included in subsequent processing and analysis steps.
  • Analytes can include nucleic acid analytes, and non-nucleic acid analytes.
  • the disclosure provides for detecting genetic variations in biological samples from a subject.
  • Biological samples may include polynucleotides from cancer cells. Polynucleotides may be DNA (e.g., genomic DNA, cDNA), RNA (e.g., mRNA, small RNAs), or any combination thereof.
  • Biological samples may include tumor tissue, e.g., from a biopsy. In some cases, biological samples may include blood or saliva. In particular cases, biological samples may comprise cell free DNA (“cfDNA”) or circulating tumor DNA (“ctDNA”). Cell free DNA can be present in, e.g., blood.
  • cfDNA cell free DNA
  • ctDNA circulating tumor DNA
  • non-nucleic acid analytes include, but are not limited to, lipids, carbohydrates, peptides, proteins, glycoproteins (N-linked or O-linked), lipoproteins, phosphoproteins, specific phosphorylated or acetylated variants of proteins, amidation variants of proteins, hydroxylation variants of proteins, methylation variants of proteins, ubiquity lati on variants of proteins, sulfation variants of proteins, viral proteins (e.g., viral capsid, viral Attorney Docket No. GH0160WO envelope, viral coat, viral accessory, viral glycoproteins, viral spike, etc.), extracellular and intracellular proteins, antibodies, and antigen binding fragments.
  • viral proteins e.g., viral capsid, viral Attorney Docket No. GH0160WO envelope, viral coat, viral accessory, viral glycoproteins, viral spike, etc.
  • a posttranslational modification e.g., phosphorylation, glycosylation, ubiquitination, nitrosylation, methylation, acetylation or lipidation
  • the systems, apparatus, methods, and compositions can be used to analyze any number of analytes, further including both nucleic acid analytes and non-nucleic acid analytes.
  • the number of analytes that are analyzed can be at least about 2, at least about 3, at least about 4, at least about 5, at least about 6, at least about 7, at least about 8, at least about 9, at least about 10, at least about 11, at least about 12, at least about 13, at least about 14, at least about 15, at least about 20, at least about 25, at least about 30, at least about 40, at least about 50, at least about 100, at least about 1,000, at least about 10,000, at least about 100,000 or more different analytes present in a region of the sample or within an individual feature of the substrate.
  • nucleic acid analytes and/or non-nucleic acid analytes constitute a set of molecular interactions in a biological system under study (e.g., cells), which may be regarded as “interactome” – the molecular interactions that occur between molecules belonging to different biochemical families (proteins, nucleic acids, lipids, carbohydrates, etc.) and also within a given family.
  • an interactome is a protein-DNA interactome (network formed by transcription factors (and DNA or chromatin regulatory proteins) and their target genes.
  • interactome refers to protein-protein interaction network(PPI), or protein interaction network (PIN).
  • PPI protein-protein interaction network
  • PIN protein interaction network
  • the methods described herein allow for study and analysis of the interactome. Techniques such as proteogenomics (whole genome sequencing, whole exome sequencing and RNA-seq, and mass spectrometry as examples) can support study of the interactome. Analysis [0065]
  • the present methods can be used to diagnose presence of conditions, particularly cancer, in a subject, to characterize conditions (e.g., staging cancer or determining Attorney Docket No. GH0160WO heterogeneity of a cancer), monitor response to treatment of a condition, effect prognosis risk of developing a condition or subsequent course of a condition.
  • the present disclosure can also be useful in determining the efficacy of a particular treatment option.
  • Successful treatment options may increase the amount of copy number variation or rare mutations detected in subject's blood if the treatment is successful as more cancers may die and shed DNA. In other examples, this may not occur.
  • certain treatment options may be correlated with genetic profiles of cancers over time. This correlation may be useful in selecting a therapy.
  • the present methods can be used to monitor residual disease or recurrence of disease.
  • the types and number of cancers that may be detected may include blood cancers, brain cancers, lung cancers, skin cancers, nose cancers, throat cancers, liver cancers, bone cancers, lymphomas, pancreatic cancers, skin cancers, bowel cancers, rectal cancers, thyroid cancers, bladder cancers, kidney cancers, mouth cancers, stomach cancers, solid state tumors, heterogeneous tumors, homogenous tumors and the like.
  • Type and/or stage of cancer can be detected from genetic variations including mutations, rare mutations, indels, copy number variations, transversions, translocations, inversion, deletions, aneuploidy, partial aneuploidy, polyploidy, chromosomal instability, chromosomal structure alterations, gene fusions, chromosome fusions, gene truncations, gene amplification, gene duplications, chromosomal lesions, DNA lesions, abnormal changes in nucleic acid chemical modifications, abnormal changes in epigenetic patterns, and abnormal changes in nucleic acid 5-methylcytosine.
  • Genetic and other analyte data can also be used for characterizing a specific form of cancer. Cancers are often heterogeneous in both composition and staging.
  • Genetic profile data may allow characterization of specific sub-types of cancer that may be important in the diagnosis or treatment of that specific sub-type. This information may also provide a subject or practitioner clues regarding the prognosis of a specific type of cancer and allow either a subject or practitioner to adapt treatment options in accord with the progress of the disease. Some cancers can progress to become more aggressive and genetically unstable. Other cancers may remain benign, inactive or dormant. The system and methods of this disclosure may be useful in determining disease progression. [0068] The present analyses are also useful in determining the efficacy of a particular treatment option. Successful treatment options may increase the amount of copy number variation or rare mutations detected in subject's blood if the treatment is successful as more cancers may die and shed DNA. In other examples, this may not occur.
  • certain treatment options may be correlated with genetic profiles of cancers over time.
  • Attorney Docket No. GH0160WO This correlation may be useful in selecting a therapy.
  • the present methods can be used to monitor residual disease or recurrence of disease.
  • the present methods can also be used for detecting genetic variations in conditions other than cancer. Immune cells, such as B cells, may undergo rapid clonal expansion upon the presence certain diseases. Clonal expansions may be monitored using copy number variation detection and certain immune states may be monitored. In this example, copy number variation analysis may be performed over time to produce a profile of how a particular disease may be progressing.
  • Copy number variation or even rare mutation detection may be used to determine how a population of pathogens is changing during the course of infection. This may be particularly important during chronic infections, such as HIV/AIDS or Hepatitis infections, whereby viruses may change life cycle state and/or mutate into more virulent forms during the course of infection.
  • the present methods may be used to determine or profile rejection activities of the host body, as immune cells attempt to destroy transplanted tissue to monitor the status of transplanted tissue as well as altering the course of treatment or prevention of rejection. [0070] Further, the methods of the disclosure may be used to characterize the heterogeneity of an abnormal condition in a subject.
  • an abnormal condition is cancer.
  • the abnormal condition may be one resulting in a heterogeneous genomic population.
  • some tumors are known to comprise tumor cells in different stages of the cancer.
  • heterogeneity may comprise multiple foci of disease. Again, in the example of cancer, there may be multiple tumor foci, perhaps where one or more foci are the result of metastases that have spread from a primary site.
  • the present methods can be used to generate or profile, fingerprint or set of data that is a summation of genetic information derived from different cells in a heterogeneous disease. This set of data may comprise copy number variation and mutation analyses alone or in combination.
  • the present methods can be used to diagnose, prognose, monitor or observe cancers. or other diseases. In some embodiments, the methods herein do not involve the diagnosing, prognosing or monitoring a fetus and as such are not directed to non-invasive prenatal testing. In other embodiments, these methodologies may be employed in a pregnant subject to Attorney Docket No.
  • GH0160WO diagnose, prognose, monitor or observe cancers or other diseases in an unborn subject whose DNA and other polynucleotides may co-circulate with maternal molecules.
  • Determination of 5-methylcytosine pattern of nucleic acids includes distinguishing 5-methylcytosine (5mC) from non-methylated cytosine. In some embodiments, determining methylation pattern includes distinguishing N6- methyladenine from non-methylated adenine.
  • determining the methylation pattern includes distinguishing 5-hydroxymethylcytosine (5hmC), 5- formylcytosine (5fC), and 5-carboxylcytosine (5caC) from non-methylated cytosine.
  • bisulfite sequencing include, but are not limited to oxidative bisulfite sequencing (OX-BS- seq), Tet-assisted bisulfite sequencing (TAB-seq), and reduced bisulfite sequencing (redBS- seq).
  • Oxidative bisulfite sequencing (OX-BS-seq) is used to distinguish between 5mC and 5hmC, by first converting the 5hmC to 5fC, and then proceeding with bisulfite sequencing as previously described.
  • Tet-assisted bisulfite sequencing can also be used to distinguish 5mc and 5hmC.
  • TAB-seq 5hmC is protected by glucosylation.
  • a Tet enzyme is then used to convert 5mC to 5caC before proceeding with bisulfite sequencing, as previously described.
  • Reduced bisulfite sequencing is used to distinguish 5fC from modified cytosines.
  • bisulfite sequencing a nucleic acid sample is divided into two aliquots and one aliquot is treated with bisulfite.
  • the bisulfite converts native cytosine and certain modified cytosine nucleotides (e.g.5-formylcytosine or 5-carboxylcytosine) to uracil whereas other modified cytosines (e.g., 5- methylcytosine, 5-hydroxylmethylcystosine) are not converted.
  • modified cytosines e.g., 5- methylcytosine, 5-hydroxylmethylcystosine
  • the initial splitting of the sample into two aliquots is disadvantageous for samples containing only small amounts of nucleic acids, and/or composed of heterogeneous cell/tissue origins such as bodily fluids containing cell-free DNA.
  • the present disclosure provides methods allowing bisulfite sequencing and variants thereof. These methods work by linking nucleic acids in a population to a capture moiety, i.e., a label that can be captured or immobilized.
  • Capture moieties include, without limitation, biotin, avidin, streptavidin, a nucleic acid including a particular nucleotide sequence, a hapten recognized by an antibody, and magnetically attractable particles.
  • the extraction moiety can Attorney Docket No.
  • GH0160WO be a member of a binding pair, such as biotin/streptavidin or hapten/antibody.
  • a capture moiety that is attached to an analyte is captured by its binding pair which is attached to an isolatable moiety, such as a magnetically attractable particle or a large particle that can be sedimented through centrifugation.
  • the capture moiety can be any type of molecule that allows affinity separation of nucleic acids bearing the capture moiety from nucleic acids lacking the capture moiety.
  • Exemplary capture moieties are biotin which allows affinity separation by binding to streptavidin linked or linkable to a solid phase or an oligonucleotide, which allows affinity separation through binding to a complementary oligonucleotide linked or linkable to a solid phase.
  • the sample nucleic acids serve as templates for amplification.
  • the original templates remain linked to the capture moieties but amplicons are not linked to capture moieties.
  • the capture moiety can be linked to sample nucleic acids as a component of an adapter, which may also provide amplification and/or sequencing primer binding sites.
  • sample nucleic acids are linked to adapters at both ends, with both adapters bearing a capture moiety.
  • any cytosine residues in the adapters are modified, such as by 5methylcytosine, to protect against the action of bisulfite.
  • the capture moieties are linked to the original templates by a cleavable linkage (e.g., photocleavable desthiobiotin-TEG or uracil residues cleavable with USERTM enzyme, Chem. Commun. (Camb).2015 Feb 21; 51(15): 3266-3269), in which case the capture moieties can, if desired, be removed.
  • the amplicons are denatured and contacted with an affinity reagent for the capture tag.
  • Original templates bind to the affinity reagent whereas nucleic acid molecules resulting from amplification do not.
  • the original templates can be separated from nucleic acid molecules resulting from amplification.
  • the respective populations of nucleic acids i.e., original templates and amplification products
  • the amplification products can be subjected to bisulfite treatment and the original template population not.
  • the respective populations can be amplified (which in the case of the original template population converts uracils to thymines).
  • the populations can also be subjected to biotin probe hybridization for enrichment.
  • the respective populations are then analyzed and sequences compared to determine which cytosines were 5- methylated (or 5-hydroxylmethylated) in the original. Detection of a T nucleotide in the Attorney Docket No. GH0160WO template population (corresponding to an unmethylated cytosine converted to uracil) and a C nucleotide at the corresponding position of the amplified population indicates an unmodified C.
  • the presence of C's at corresponding positions of the original template and amplified populations indicates a modified C in the original sample.
  • a method uses sequential DNA-seq and bisulfite-seq (BIS- seq) NGS library preparation of molecular tagged DNA libraries. This process is performed by labeling of adapters (e.g., biotin), DNA-seq amplification of whole library, parent molecule recovery (e.g. streptavidin bead pull down), bisulfite conversion and BIS-seq.
  • the method identifies 5-methylcytosine with single-base resolution, through sequential NGS-preparative amplification of parent library molecules with and without bisulfite treatment.
  • sample DNA molecules are adapter ligated, and amplified (e.g., by PCR). As only the parent molecules will have a labeled adapter end, they can be selectively recovered from their amplified progeny by label-specific capture methods (e.g., streptavidin-magnetic beads).
  • label-specific capture methods e.g., streptavidin-magnetic beads.
  • the bisulfite treated library can be combined with a non-treated library prior to enrichment/NGS by addition of a sample tag DNA sequence in standard multiplexed NGS workflow.
  • bioinformatics analysis can be carried out for genomic alignment and 5-methylated base identification. In sum, this method provides the ability to selectively recover the parent, ligated molecules, carrying 5-methylcytosine marks, after library amplification, thereby allowing for parallel processing for bisulfite converted DNA.
  • the disclosure provides alternative methods for analyzing modified nucleic acids (e.g., methylated, linked to histones and other modifications discussed above).
  • modified nucleic acids e.g., methylated, linked to histones and other modifications discussed above.
  • a population of nucleic acids bearing the modification to different extents e.g., 0, 1, 2, 3, 4, 5 or more methyl groups per nucleic acid molecule
  • adapters attach to either one end or both ends of nucleic acid molecules in the population.
  • the adapters include different tags of sufficient numbers that the number of combinations of tags results in a low probability e.g., 95, 99 or 99.9% of two nucleic acids with the same start and stop points receiving the same combination of tags.
  • the nucleic acids are amplified from primers binding to the primer binding sites within the adapters.
  • Adapters, whether bearing the same or different tags, can include the same or different primer binding sites, but preferably adapters include the same primer binding site.
  • the nucleic acids are contacted with an agent that preferably binds to nucleic acids bearing the modification (such as the previously described such agents).
  • the nucleic acids are separated into at least two partitions differing in the extent to which the nucleic acids bear the modification from binding to the agents. For example, if the agent has affinity for nucleic acids bearing the modification, nucleic acids overrepresented in the modification (compared with median representation in the population) preferentially bind to the agent, whereas nucleic acids underrepresented for the modification do not bind or are more easily eluted from the agent.
  • the different partitions can then be subject to further processing steps, which typically include further amplification, and sequence analysis, in parallel but separately. Sequence data from the different partitions can then be compared.
  • Nucleic acids can be linked at both ends to Y-shaped adapters including primer binding sites and tags. The molecules are amplified.
  • the amplified molecules are then fractionated by contact with an antibody preferentially binding to 5-methylcytosine to produce two partitions.
  • One partition includes original molecules lacking methylation and amplification copies having lost methylation.
  • the other partition includes original DNA molecules with methylation.
  • the two partitions are then processed and sequenced separately with further amplification of the methylated partition.
  • the sequence data of the two partitions can then be compared.
  • tags are not used to distinguish between methylated and unmethylated DNA but rather to distinguish between different molecules within these partitions so that one can determine whether reads with the same start and stop points are based on the same or different molecules.
  • the disclosure provides further methods for analyzing a population of nucleic acid in which at least some of the nucleic acids include one or more modified cytosine residues, such as 5-methylcytosine and any of the other modifications described previously.
  • the population of nucleic acids is contacted with adapters including one or more cytosine residues modified at the 5C position, such as 5-methylcytosine.
  • cytosine residues in such adapters are also modified, or all such cytosines in a primer binding region of the adapters are modified.
  • Adapters attach to both ends of nucleic acid molecules in the population.
  • the adapters include different tags of sufficient numbers that the number of combinations of tags results in a low probability e.g., 95, 99 or 99.9% of two nucleic acids with the same start and stop points receiving the same combination of tags.
  • the primer binding sites in such adapters can be the same or different, but are preferably the same.
  • the nucleic acids are amplified from primers binding to the primer binding sites of the adapters.
  • the amplified nucleic acids are split into first and second aliquots.
  • the first aliquot is assayed for sequence data with or without further processing.
  • the sequence data on molecules in the first aliquot is thus determined irrespective of the initial methylation state of the nucleic acid molecules.
  • the nucleic acid molecules in the second aliquot are treated with bisulfite. This treatment converts unmodified cytosines to uracils.
  • the bisulfite treated nucleic acids are then subjected to amplification primed by primers to the original primer binding sites of the adapters linked to nucleic acid. Only the nucleic acid molecules originally linked to adapters (as distinct from amplification products thereof) are now amplifiable because these nucleic acids retain cytosines in the primer binding sites of the adapters, whereas amplification products have lost the methylation of these cytosine residues, which have undergone conversion to uracils in the bisulfite treatment. Thus, only original molecules in the populations, at least some of which are methylated, undergo amplification.
  • nucleic acids After amplification, these nucleic acids are subject to sequence analysis. Comparison of sequences determined from the first and second aliquots can indicate among other things, which cytosines in the nucleic acid population were subject to methylation. Partitioning the Sample into a Plurality of Subsamples; Aspects of Samples; Analysis of Epigenetic Characteristics [0084]
  • a population of different forms of nucleic acids e.g., hypermethylated and hypomethylated DNA in a sample, such as a captured set of cfDNA as described herein
  • GH0160WO nucleobase tagging, and/or sequencing. This approach can be used to determine, for example, whether certain sequences are hypermethylated or hypomethylated.
  • hypermethylation variable epigenetic target regions are analyzed to determine whether they show hypermethylation characteristic of tumor cells and/or hypomethylation variable epigenetic target regions are analyzed to determine whether they show hypomethylation characteristic of tumor cells.
  • partitioning a heterogeneous nucleic acid population one may increase rare signals, e.g., by enriching rare nucleic acid molecules that are more prevalent in one fraction (or partition) of the population.
  • a genetic variation present in hyper-methylated DNA but less (or not) in hypomethylated DNA can be more easily detected by partitioning a sample into hyper-methylated and hypo-methylated nucleic acid molecules.
  • a multi-dimensional analysis of a single locus of a genome or species of nucleic acid can be performed and hence, greater sensitivity can be achieved.
  • a heterogeneous nucleic acid sample is partitioned into two or more partitions (e.g., at least 3, 4, 5, 6 or 7 partitions).
  • each partition is differentially tagged. Tagged partitions can then be pooled together for collective sample prep and/or sequencing.
  • partitioning-tagging-pooling steps can occur more than once, with each round of partitioning occurring based on a different characteristics (examples provided herein), and tagged using differential tags that are distinguished from other partitions and partitioning means.
  • characteristics that can be used for partitioning include sequence length, methylation level, nucleosome binding, sequence mismatch, immunoprecipitation, and/or proteins that bind to DNA.
  • Resulting partitions can include one or more of the following nucleic acid forms: single-stranded DNA (ssDNA), double-stranded DNA (dsDNA), shorter DNA fragments and longer DNA fragments.
  • partitioning based on a cytosine modification e.g., cytosine methylation
  • methylation generally is performed and is optionally combined with at least one additional partitioning step, which may be based on any of the foregoing characteristics or forms of DNA.
  • a heterogeneous population of nucleic acids is partitioned into nucleic acids with one or more epigenetic modifications and without the one or more epigenetic modifications.
  • epigenetic modifications include presence or absence of methylation; level of methylation; type of methylation (e.g., 5-methylcytosine versus other types of methylation, such as adenine methylation and/or cytosine hydroxymethylation); and association and level of association with one or more proteins, such as histones.
  • a heterogeneous Attorney Docket No. GH0160WO population of nucleic acids can be partitioned into nucleic acid molecules associated with nucleosomes and nucleic acid molecules devoid of nucleosomes.
  • a heterogeneous population of nucleic acids may be partitioned into single-stranded DNA (ssDNA) and double-stranded DNA (dsDNA).
  • a heterogeneous population of nucleic acids may be partitioned based on nucleic acid length (e.g., molecules of up to 160 bp and molecules having a length of greater than 160 bp).
  • nucleic acid length e.g., molecules of up to 160 bp and molecules having a length of greater than 160 bp.
  • each partition representsative of a different nucleic acid form
  • the partitions are pooled together prior to sequencing.
  • the different forms are separately sequenced.
  • a population of different nucleic acids is partitioned into two or more different partitions.
  • Each partition is representative of a different nucleic acid form, and a first partition (also referred to as a subsample) includes DNA with a cytosine modification in a greater proportion than a second subsample. Each partition is distinctly tagged.
  • the first subsample is subjected to a procedure that affects a first nucleobase in the DNA differently from a second nucleobase in the DNA of the first subsample, wherein the first nucleobase is a modified or unmodified nucleobase, the second nucleobase is a modified or unmodified nucleobase different from the first nucleobase, and the first nucleobase and the second nucleobase have the same base pairing specificity.
  • the tagged nucleic acids are pooled together prior to sequencing. Sequence reads are obtained and analyzed, including to distinguish the first nucleobase from the second nucleobase in the DNA of the first subsample, in silico. Tags are used to sort reads from different partitions. Analysis to detect genetic variants can be performed on a partition-by-partition level, as well as whole nucleic acid population level. For example, analysis can include in silico analysis to determine genetic variants, such as CNV, SNV, indel, fusion in nucleic acids in each partition. In some instances, in silico analysis can include determining chromatin structure. For example, coverage of sequence reads can be used to determine nucleosome positioning in chromatin.
  • Samples can include nucleic acids varying in modifications including post- replication modifications to nucleotides and binding, usually noncovalently, to one or more proteins.
  • the population of nucleic acids is one obtained from a serum, plasma or blood sample from a subject suspected of having neoplasia, a tumor, or cancer or previously diagnosed with neoplasia, a tumor, or cancer.
  • the population of nucleic acids Attorney Docket No. GH0160WO includes nucleic acids having varying levels of methylation.
  • Methylation can occur from any one or more post-replication or transcriptional modifications.
  • Post-replication modifications include modifications of the nucleotide cytosine, particularly at the 5-position of the nucleobase, e.g., 5-methylcytosine, 5-hydroxymethylcytosine, 5-formylcytosine and 5- carboxylcytosine.
  • the affinity agents can be antibodies with the desired specificity, natural binding partners or variants thereof (Bock et al., Nat Biotech 28: 1106-1114 (2010); Song et al., Nat Biotech 29: 68-72 (2011)), or artificial peptides selected e.g., by phage display to have specificity to a given target.
  • capture moieties contemplated herein include methyl binding domain (MBDs) and methyl binding proteins (MBPs) as described herein, including proteins such as MeCP2 and antibodies preferentially binding to 5-methylcytosine.
  • MBDs methyl binding domain
  • MBPs methyl binding proteins
  • partitioning of different forms of nucleic acids can be performed using histone binding proteins which can separate nucleic acids bound to histones from free or unbound nucleic acids.
  • histone binding proteins examples include RBBP4, RbAp48 and SANT domain peptides.
  • nucleic acids overrepresented in a modification bind to the agent at a greater extent that nucleic acids underrepresented in the modification.
  • nucleic acids having modifications may bind in an all or nothing manner. But then, various levels of modifications may be sequentially eluted from the binding agent.
  • partitioning can be binary or based on degree/level of modifications. For example, all methylated fragments can be partitioned from unmethylated fragments using methyl-binding domain proteins (e.g., MethylMiner Methylated DNA Enrichment Kit (ThermoFisher Scientific)).
  • additional partitioning may involve eluting fragments having different levels of methylation by adjusting the salt concentration in a solution with the methyl-binding domain and bound fragments. As salt concentration increases, fragments having greater methylation levels are eluted.
  • the final partitions are representative of nucleic acids having different extents of modifications (overrepresentative or underrepresentative of modifications). Overrepresentation and underrepresentation can be defined by the number of modifications born by a nucleic acid relative to the median number of modifications per strand in a population. For example, if the median number of 5-methylcytosine residues in nucleic acid in a sample is 2, a nucleic acid including more than two 5-methylcytosine residues is Attorney Docket No.
  • GH0160WO overrepresented in this modification and a nucleic acid with 1 or zero 5-methylcytosine residues is underrepresented.
  • the effect of the affinity separation is to enrich for nucleic acids overrepresented in a modification in a bound phase and for nucleic acids underrepresented in a modification in an unbound phase (i.e. in solution).
  • the nucleic acids in the bound phase can be eluted before subsequent processing.
  • MethylMiner Methylated DNA Enrichment Kit (ThermoFisher Scientific) various levels of methylation can be partitioned using sequential elutions.
  • a hypomethylated partition (e.g., no methylation) can be separated from a methylated partition by contacting the nucleic acid population with the MBD from the kit, which is attached to magnetic beads. The beads are used to separate out the methylated nucleic acids from the non- methylated nucleic acids. Subsequently, one or more elution steps are performed sequentially to elute nucleic acids having different levels of methylation.
  • a first set of methylated nucleic acids can be eluted at a salt concentration of 160 mM or higher, e.g., at least 150 mM, at least 200 mM, at least 300 mM, at least 400 mM, at least 500 mM, at least 600 mM, at least 700 mM, at least 800 mM, at least 900 mM, at least 1000 mM, or at least 2000 mM.
  • magnetic separation is once again used to separate higher level of methylated nucleic acids from those with lower level of methylation.
  • nucleic acids bound to an agent used for affinity separation are subjected to a wash step.
  • the wash step washes off nucleic acids weakly bound to the affinity agent.
  • nucleic acids can be enriched in nucleic acids having the modification to an extent close to the mean or median (i.e., intermediate between nucleic acids remaining bound to the solid phase and nucleic acids not binding to the solid phase on initial contacting of the sample with the agent).
  • the affinity separation results in at least two, and sometimes three or more partitions of nucleic acids with different extents of a modification. While the partitions are still separate, the nucleic acids of at least one partition, and usually two or three (or more) partitions are linked to nucleic acid tags, usually provided as components of adapters, with the nucleic acids in different partitions receiving different tags that distinguish members of one partition from another.
  • the tags linked to nucleic acid molecules of the same partition can be the same or different from one another. But if different from one another, the tags may have part of their code in common so as to identify the molecules to which they are attached as being of a Attorney Docket No. GH0160WO particular partition.
  • nucleic acid molecules can be fractionated into different partitions based on the nucleic acid molecules that are bound to a specific protein or a fragment thereof and those that are not bound to that specific protein or fragment thereof.
  • Nucleic acid molecules can be fractionated based on DNA-protein binding. Protein- DNA complexes can be fractionated based on a specific property of a protein. Examples of such properties include various epitopes, modifications (e.g., histone methylation or acetylation) or enzymatic activity.
  • proteins which may bind to DNA and serve as a basis for fractionation may include, but are not limited to, protein A and protein G. Any suitable method can be used to fractionate the nucleic acid molecules based on protein bound regions. Examples of methods used to fractionate nucleic acid molecules based on protein bound regions include, but are not limited to, SDS-PAGE, chromatin-immuno-precipitation (ChIP), heparin chromatography, and asymmetrical field flow fractionation (AF4).
  • partitioning of the nucleic acids is performed by contacting the nucleic acids with a methylation binding domain (“MBD”) of a methylation binding protein (“MBP”). MBD binds to 5-methylcytosine (5mC).
  • MBD is coupled to paramagnetic beads, such as Dynabeads® M-280 Streptavidin via a biotin linker. Partitioning into fractions with different extents of methylation can be performed by eluting fractions by increasing the NaCl concentration.
  • An exemplary method for molecular tag identification of MBD-bead partitioned libraries through NGS is as follows: [0097] Physical partitioning of an extracted DNA sample (e.g., extracted blood plasma DNA from a human sample) using a methyl-binding domain protein-bead purification kit, saving all elutions from process for downstream processing. [0098] Parallel application of differential molecular tags and NGS-enabling adapter sequences to each partition.
  • the hypermethylated, residual methylation ('wash'), and hypomethylated partitions are ligated with NGS-adapters with molecular tags.
  • genomic regions of interest e.g., cancer-specific genetic variants and differentially methylated regions.
  • Attorney Docket No. GH0160WO Re-amplification of the enriched total DNA library, appending a sample tag. Different samples are pooled, and assayed in multiplex on an NGS instrument.
  • MBPs contemplated herein include, but are not limited to: [0104] (a) MeCP2 is a protein preferentially binding to 5-methyl-cytosine over unmodified cytosine. [0105] (b) RPL26, PRP8 and the DNA mismatch repair protein MHS6 preferentially bind to 5- hydroxymethyl-cytosine over unmodified cytosine.
  • FOXK1, FOXK2, FOXP1, FOXP4 and FOXI3 preferably bind to 5-formyl- cytosine over unmodified cytosine (Iurlaro et al., Genome Biol.14: R119 (2013)).
  • elution is a function of number of methylated sites per molecule, with molecules having more methylation eluting under increased salt concentrations. To elute the DNA into distinct populations based on the extent of methylation, one can use a series of elution buffers of increasing NaCl concentration.
  • Salt concentration can range from about 100 nM to about 2500 mM NaCl.
  • the process results in three (3) partitions. Molecules are contacted with a solution at a first salt concentration and including a molecule including a methyl binding domain, which molecule can be attached to a capture moiety, such as streptavidin. At the first salt concentration a population of molecules will bind to the MBD and a population will remain unbound. The unbound population can be separated as a “hypomethylated” population. For example, a first partition representative of the hypomethylated form of DNA is that which remains unbound at a low salt concentration, e.g., 100 mM or 160 mM.
  • a second partition representative of intermediate methylated DNA is eluted using an intermediate salt concentration, e.g., between 100 mM and 2000 mM concentration. This is also separated from the sample.
  • a third partition representative of hypermethylated form of DNA is eluted using a high salt concentration, e.g., at least about 2000 mM.
  • the disclosure provides further methods for analyzing a population of nucleic acids in which at least some of the nucleic acids include one or more modified cytosine residues, such as 5-methylcytosine and any of the other modifications described previously. In these methods, after partitioning, the subsamples of nucleic acids are contacted with adapters Attorney Docket No.
  • GH0160WO including one or more cytosine residues modified at the 5C position, such as 5-methylcytosine.
  • cytosine residues in such adapters are also modified, or all such cytosines in a primer binding region of the adapters are modified.
  • Adapters attach to both ends of nucleic acid molecules in the population.
  • the adapters include different tags of sufficient numbers that the number of combinations of tags results in a low probability e.g., 95, 99 or 99.9% of two nucleic acids with the same start and stop points receiving the same combination of tags.
  • the primer binding sites in such adapters can be the same or different, but are preferably the same.
  • the nucleic acids After attachment of adapters, the nucleic acids are amplified from primers binding to the primer binding sites of the adapters.
  • the amplified nucleic acids are split into first and second aliquots.
  • the first aliquot is assayed for sequence data with or without further processing.
  • the sequence data on molecules in the first aliquot is thus determined irrespective of the initial methylation state of the nucleic acid molecules.
  • the nucleic acid molecules in the second aliquot are subjected to a procedure that affects a first nucleobase in the DNA differently from a second nucleobase in the DNA, wherein the first nucleobase includes a cytosine modified at the 5 position, and the second nucleobase includes unmodified cytosine.
  • This procedure may be bisulfite treatment or another procedure that converts unmodified cytosines to uracils.
  • the nucleic acids subjected to the procedure are then amplified with primers to the original primer binding sites of the adapters linked to nucleic acid. Only the nucleic acid molecules originally linked to adapters (as distinct from amplification products thereof) are now amplifiable because these nucleic acids retain cytosines in the primer binding sites of the adapters, whereas amplification products have lost the methylation of these cytosine residues, which have undergone conversion to uracils in the bisulfite treatment. Thus, only original molecules in the populations, at least some of which are methylated, undergo amplification.
  • cytosines in the nucleic acid population were subject to methylation.
  • Such an analysis can be performed using the following exemplary procedure. After partitioning, methylated DNA is linked to Y-shaped adapters at both ends including primer binding sites and tags. The cytosines in the adapters are modified at the 5 position (e.g., 5- methylated). The modification of the adapters serves to protect the primer binding sites in a subsequent conversion step (e.g., bisulfite treatment, TAP conversion, or any other conversion that does not affect the modified cytosine but affects unmodified cytosine).
  • a subsequent conversion step e.g., bisulfite treatment, TAP conversion, or any other conversion that does not affect the modified cytosine but affects unmodified cytosine.
  • the DNA molecules are amplified.
  • the amplification product is split into two aliquots for sequencing with and without conversion.
  • the aliquot not subjected to conversion Attorney Docket No. GH0160WO can be subjected to sequence analysis with or without further processing.
  • the other aliquot is subjected to a procedure that affects a first nucleobase in the DNA differently from a second nucleobase in the DNA, wherein the first nucleobase includes a cytosine modified at the 5 position, and the second nucleobase includes unmodified cytosine.
  • This procedure may be bisulfite treatment or another procedure that converts unmodified cytosines to uracils.
  • nucleic acid tags in adapters are not used to distinguish between methylated and unmethylated DNA but to distinguish nucleic acid molecules within the same partition.
  • Methods disclosed herein comprise a step of subjecting the first subsample to a procedure that affects a first nucleobase in the DNA differently from a second nucleobase in the DNA of the first subsample, wherein the first nucleobase is a modified or unmodified nucleobase, the second nucleobase is a modified or unmodified nucleobase different from the first nucleobase, and the first nucleobase and the second nucleobase have the same base pairing specificity.
  • the second nucleobase is a modified or unmodified adenine; if the first nucleobase is a modified or unmodified cytosine, then the second nucleobase is a modified or unmodified cytosine; if the first nucleobase is a modified or unmodified guanine, then the second nucleobase is a modified or unmodified guanine; and if the first nucleobase is a modified or unmodified thymine, then the second nucleobase is a modified or unmodified thymine (where modified and unmodified uracil are encompassed within modified thymine for the purpose of this step).
  • the first nucleobase is a modified or unmodified cytosine
  • the second nucleobase is a modified or unmodified cytosine.
  • first nucleobase may comprise unmodified cytosine (C) and the second nucleobase may comprise one or more of 5-methylcytosine (mC) and 5-hydroxymethylcytosine (hmC).
  • second nucleobase may comprise C and the first nucleobase may comprise one or more of mC and hmC.
  • Other combinations are also possible, as indicated, e.g., in the Summary above and the Attorney Docket No.
  • the procedure that affects a first nucleobase in the DNA differently from a second nucleobase in the DNA of the first subsample includes bisulfite conversion.
  • Treatment with bisulfite converts unmodified cytosine and certain modified cytosine nucleotides (e.g. 5-formyl cytosine (fC) or 5-carboxylcytosine (caC)) to uracil whereas other modified cytosines (e.g., 5-methylcytosine, 5-hydroxylmethylcystosine) are not converted.
  • modified cytosine nucleotides e.g. 5-formyl cytosine (fC) or 5-carboxylcytosine (caC)
  • other modified cytosines e.g., 5-methylcytosine, 5-hydroxylmethylcystosine
  • the first nucleobase includes one or more of unmodified cytosine, 5-formyl cytosine, 5-carboxylcytosine, or other cytosine forms affected by bisulfite
  • the second nucleobase may comprise one or more of mC and hmC, such as mC and optionally hmC.
  • Sequencing of bisulfite-treated DNA identifies positions that are read as cytosine as being mC or hmC positions. Meanwhile, positions that are read as T are identified as being T or a bisulfite-susceptible form of C, such as unmodified cytosine, 5-formyl cytosine, or 5-carboxylcytosine.
  • the procedure that affects a first nucleobase in the DNA differently from a second nucleobase in the DNA of the first subsample includes oxidative bisulfite (Ox-BS) conversion.
  • the procedure that affects a first nucleobase in the DNA differently from a second nucleobase in the DNA of the first subsample includes Tet-assisted bisulfite (TAB) conversion.
  • the procedure that affects a first nucleobase in the DNA differently from a second nucleobase in the DNA of the first subsample includes Tet-assisted conversion with a substituted borane reducing agent, optionally wherein the substituted borane reducing agent is 2-picoline borane, borane pyridine, tert-butylamine borane, or ammonia borane.
  • the procedure that affects a first nucleobase in the DNA differently from a second nucleobase in the DNA of the first subsample includes chemical-assisted conversion with a substituted borane reducing agent, optionally wherein the substituted borane reducing agent is 2-picoline borane, borane pyridine, tert-butylamine borane, or ammonia borane.
  • the procedure that affects a first nucleobase in the DNA differently from a second nucleobase in the DNA of the first subsample includes APOBEC-coupled epigenetic (ACE) conversion.
  • ACE APOBEC-coupled epigenetic
  • procedure that affects a first nucleobase in the DNA differently from a second nucleobase in the DNA of the first subsample includes enzymatic Attorney Docket No. GH0160WO conversion of the first nucleobase, e.g., as in EM-Seq. See, e.g., Vaisvila R, et al. (2019) EM- seq: Detection of DNA methylation at single base resolution from picograms of DNA. bioRxiv; DOI: 10.1101/2019.12.20.884692, available at www.biorxiv.org/content/10.1101/2019.12.20.884692v1.
  • TET2 and T4- ⁇ GT can be used to convert 5mC and 5hmC into substrates that cannot be deaminated by a deaminase (e.g., APOBEC3A), and then a deaminase (e.g., APOBEC3A) can be used to deaminate unmodified cytosines converting them to uracils.
  • a deaminase e.g., APOBEC3A
  • the procedure that affects a first nucleobase in the DNA differently from a second nucleobase in the DNA of the first subsample includes separating DNA originally including the first nucleobase from DNA not originally including the first nucleobase.
  • the first nucleobase is a modified or unmodified adenine
  • the second nucleobase is a modified or unmodified adenine.
  • the modified adenine is N6-methyladenine (mA).
  • the modified adenine is one or more of N6-methyladenine (mA), N6-hydroxymethyladenine (hmA), or N6- formyladenine (fA).
  • Techniques including methylated DNA immunoprecipitation (MeDIP) can be used to separate DNA containing modified bases such as mA from other DNA.
  • methods disclosed herein comprise a step of capturing one or more sets of target regions of DNA, such as cfDNA. Capture may be performed using any suitable approach known in the art. In some embodiments, capturing includes contacting the DNA to be captured with a set of target-specific probes.
  • the set of target-specific probes may have any of the features described herein for sets of target-specific probes, including but not limited to in the embodiments set forth above and the sections relating to probes below.
  • Attorney Docket No. GH0160WO Capturing may be performed on one or more subsamples prepared during methods disclosed herein.
  • DNA is captured from at least the first subsample or the second subsample, e.g., at least the first subsample and the second subsample.
  • capturing may be performed on any, any two, or all of the DNA originally including the first nucleobase (e.g., hmC), the DNA not originally including the first nucleobase, and the second subsample.
  • the subsamples are differentially tagged (e.g., as described herein) and then pooled before undergoing capture.
  • the capturing step may be performed using conditions suitable for specific nucleic acid hybridization, which generally depend to some extent on features of the probes such as length, base composition, etc. Those skilled in the art will be familiar with appropriate conditions given general knowledge in the art regarding nucleic acid hybridization.
  • complexes of target-specific probes and DNA are formed.
  • a method described herein includes capturing cfDNA obtained from a test subject for a plurality of sets of target regions.
  • the target regions comprise epigenetic target regions, which may show differences in methylation levels and/or fragmentation patterns depending on whether they originated from a tumor or from healthy cells.
  • the target regions also comprise sequence-variable target regions, which may show differences in sequence depending on whether they originated from a tumor or from healthy cells.
  • the capturing step produces a captured set of cfDNA molecules, and the cfDNA molecules corresponding to the sequence-variable target region set are captured at a greater capture yield in the captured set of cfDNA molecules than cfDNA molecules corresponding to the epigenetic target region set.
  • WO2020/160414 For additional discussion of capturing steps, capture yields, and related aspects, see WO2020/160414, which is incorporated herein by reference for all purposes.
  • a method described herein includes contacting cfDNA obtained from a test subject with a set of target-specific probes, wherein the set of target- specific probes is configured to capture cfDNA corresponding to the sequence-variable target region set at a greater capture yield than cfDNA corresponding to the epigenetic target region set.
  • the set of target- specific probes is configured to capture cfDNA corresponding to the sequence-variable target region set at a greater capture yield than cfDNA corresponding to the epigenetic target region set.
  • the volume of data needed to determine fragmentation patterns (e.g., to test fsor perturbation of transcription start sites or CTCF binding sites) or fragment abundance (e.g., in hypermethylated and hypomethylated partitions) is generally less than the volume of data needed to determine the presence or absence of cancer-related sequence mutations.
  • Capturing the target region sets at different yields can facilitate sequencing the target regions to different depths of sequencing in the same sequencing run (e.g., using a pooled mixture and/or in the same sequencing cell).
  • the methods further comprise sequencing the captured cfDNA, e.g., to different degrees of sequencing depth for the epigenetic and sequence-variable target region sets, consistent with the discussion herein.
  • complexes of target-specific probes and DNA are separated from DNA not bound to target-specific probes.
  • a washing or aspiration step can be used to separate unbound material.
  • the complexes have chromatographic properties distinct from unbound material (e.g., where the probes comprise a ligand that binds a chromatographic resin), chromatography can be used.
  • the set of target-specific probes may comprise a plurality of sets such as probes for a sequence-variable target region set and probes for an epigenetic target region set.
  • the capturing step is performed with the probes for the sequence-variable target region set and the probes for the epigenetic target region set in the same vessel at the same time, e.g., the probes for the sequence-variable and epigenetic target region sets are in the same composition.
  • concentration of the probes for the sequence-variable target region set is greater that the concentration of the probes for the epigenetic target region set.
  • the capturing step is performed with the sequence-variable target region probe set in a first vessel and with the epigenetic target region probe set in a second vessel, or the contacting step is performed with the sequence-variable target region probe set at a first time and a first vessel and the epigenetic target region probe set at a second time before or after the first time.
  • This approach allows for preparation of separate first and second compositions including captured DNA corresponding to the sequence-variable target region set and captured DNA corresponding to the epigenetic target region set.
  • the compositions can be processed separately as desired (e.g., to fractionate based on methylation as described Attorney Docket No.
  • the DNA is amplified. In some embodiments, amplification is performed before the capturing step. In some embodiments, amplification is performed after the capturing step.
  • adapters are included in the DNA. This may be done concurrently with an amplification procedure, e.g., by providing the adapters in a 5’ portion of a primer, e.g., as described above. Alternatively, adapters can be added by other approaches, such as ligation.
  • tags which may be or include barcodes, are included in the DNA.
  • Tags can facilitate identification of the origin of a nucleic acid.
  • barcodes can be used to allow the origin (e.g., subject) whence the DNA came to be identified following pooling of a plurality of samples for parallel sequencing. This may be done concurrently with an amplification procedure, e.g., by providing the barcodes in a 5’ portion of a primer, e.g., as described above.
  • adapters and tags/barcodes are provided by the same primer or primer set.
  • the barcode may be located 3’ of the adapter and 5’ of the target-hybridizing portion of the primer.
  • a captured set of DNA (e.g., cfDNA) is provided.
  • the captured set of DNA may be provided, e.g., by performing a capturing step after a partitioning step as described herein.
  • the captured set may comprise DNA corresponding to a sequence-variable target region set, an epigenetic target region set, or a combination thereof.
  • the quantity of captured sequence-variable target region DNA is greater than the quantity of the captured epigenetic target region DNA, when normalized for the difference in the size of the targeted regions (footprint size).
  • Attorney Docket No. GH0160WO [0132]
  • first and second captured sets may be provided, including, respectively, DNA corresponding to a sequence-variable target region set and DNA corresponding to an epigenetic target region set. The first and second captured sets may be combined to provide a combined captured set.
  • the DNA corresponding to the sequence-variable target region set may be present at a greater concentration than the DNA corresponding to the epigenetic target region set, e.g., a 1.1 to 1.2-fold greater concentration, a 1.2- to 1.4-fold greater concentration, a 1.4- to 1.6-fold greater concentration, a 1.6- to 1.8-fold greater concentration, a 1.8- to 2.0-fold greater concentration, a 2.0- to 2.2-fold greater concentration, a 2.2- to 2.4- fold greater concentration a 2.4- to 2.6-fold greater concentration, a 2.6- to 2.8-fold greater concentration, a 2.8- to 3.0-fold greater concentration, a 3.0- to 3.5-fold greater concentration, a 3.5- to 4.0, a 4.0- to 4.5-fold greater concentration, a 4.5- to 5.0-fold
  • the degree of difference in concentrations accounts for normalization for the footprint sizes of the target regions, as discussed in the definition section.
  • Epigenetic Target Region Set may comprise one or more types of target regions likely to differentiate DNA from neoplastic (e.g., tumor or cancer) cells and from healthy cells, e.g., non-neoplastic circulating cells. Exemplary types of such regions are discussed in detail Attorney Docket No. GH0160WO herein.
  • the epigenetic target region set may also comprise one or more control regions, e.g., as described herein.
  • the epigenetic target region set has a footprint of at least 100 kb, e.g., at least 200 kb, at least 300 kb, or at least 400 kb. In some embodiments, the epigenetic target region set has a footprint in the range of 100-1000 kb, e.g., 100-200 kb, 200- 300 kb, 300-400 kb, 400-500 kb, 500-600 kb, 600-700 kb, 700-800 kb, 800-900 kb, and 900- 1,000 kb.
  • Hypermethylation Variable Target Regions [0135] In some embodiments, the epigenetic target region set includes one or more hypermethylation variable target regions.
  • hypermethylation variable target regions refer to regions where an increase in the level of observed methylation, e.g., in a cfDNA sample, indicates an increased likelihood that a sample (e.g., of cfDNA) contains DNA produced by neoplastic cells, such as tumor or cancer cells.
  • a sample e.g., of cfDNA
  • hypermethylation of promoters of tumor suppressor genes has been observed repeatedly. See, e.g., Kang et al., Genome Biol. 18:53 (2017) and references cited therein.
  • hypermethylation variable target regions can include regions that do not necessarily differ in methylation in cancerous tissue relative to DNA from healthy tissue of the same type, but do differ in methylation (e.g., have more methylation) relative to cfDNA that is typical in healthy subjects.
  • methylation e.g., have more methylation
  • the presence of a cancer results in increased cell death such as apoptosis of cells of the tissue type corresponding to the cancer
  • such a cancer can be detected at least in part using such hypermethylation variable target regions.
  • hypermethylation variable target regions include one or more genomic regions, where the cfDNA molecules in those regions do not differ in methylation state in cancer subjects relative to cfDNA from healthy subjects, but the presence/increased quantity of hypermethylated cfDNA in those regions is indicative of a particular tissue type (e.g., cancer origin) and is presented as cfDNA with increased apoptosis (e.g. tumor shedding) into circulation.
  • tissue type e.g., cancer origin
  • apoptosis e.g. tumor shedding
  • Hypermethylation target regions may be obtained, e.g., from the Cancer Genome Atlas. Kang et al., Genome Biology 18:53 (2017), describe construction of a probabilistic method called CancerLocator using hypermethylation target regions from breast, colon, kidney, liver, and lung.
  • the hypermethylation target regions can be specific to one or more types of cancer. Accordingly, in some embodiments, the hypermethylation target regions include one, two, three, four, or five subsets of hypermethylation target regions that collectively show hypermethylation in one, two, three, four, or five of breast, colon, kidney, liver, and lung cancers. Attorney Docket No. GH0160WO [0137]
  • the probes for the epigenetic target region set comprise probes specific for one or more hypermethylation variable target regions.
  • the hypermethylation variable target regions may be any of those set forth above.
  • the probes specific for hypermethylation variable target regions comprise probes specific for a plurality of loci listed in Table 1, e.g., at least 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, or 100% of the loci listed in Table 1.
  • the probes specific for hypermethylation variable target regions comprise probes specific for a plurality of loci listed in Table 2, e.g., at least 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, or 100% of the loci listed in Table 2.
  • the probes specific for hypermethylation variable target regions comprise probes specific for a plurality of loci listed in Table 1 or Table 2, e.g., at least 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, or 100% of the loci listed in Table 1 or Table 2.
  • the one or more probes bind within 300 bp of the listed position, e.g., within 200 or 100 bp.
  • a probe has a hybridization site overlapping the position listed above.
  • the probes specific for the hypermethylation target regions include probes specific for one, two, three, four, or five subsets of hypermethylation target regions that collectively show hypermethylation in one, two, three, four, or five of breast, colon, kidney, liver, and lung cancers.
  • Hypomethylation Variable Target Regions [0138] Global hypomethylation is a commonly observed phenomenon in various cancers. See, e.g., Hon et al., Genome Res.
  • regions such as repeated elements, e.g., LINE1 elements, Alu elements, centromeric tandem repeats, pericentromeric tandem repeats, and satellite DNA, and intergenic regions that are ordinarily methylated in healthy cells may show reduced methylation in tumor cells.
  • the epigenetic target region set includes hypomethylation variable target regions, where a decrease in the level of observed methylation indicates an increased likelihood that a sample (e.g., of cfDNA) contains DNA produced by neoplastic cells, such as tumor or cancer cells.
  • hypomethylation variable target regions can include regions that do not Attorney Docket No. GH0160WO necessarily differ in methylation state in cancerous tissue relative to DNA from healthy tissue of the same type, but do differ in methylation (e.g., are less methylated) relative to cfDNA that is typical in healthy subjects.
  • hypomethylation variable target regions include one or more genomic regions, where the cfDNA molecules in those regions do not differ in methylation state in cancer subjects relative to cfDNA from healthy subjects, but the presence/increased quantity of hypomethylated cfDNA in those regions is indicative of a particular tissue type (e.g., cancer origin) and is presented as cfDNA with increased apoptosis (e.g. tumor shedding) into circulation.
  • tissue type e.g., cancer origin
  • apoptosis e.g. tumor shedding
  • hypomethylation variable target regions include repeated elements and/or intergenic regions.
  • repeated elements include one, two, three, four, or five of LINE1 elements, Alu elements, centromeric tandem repeats, pericentromeric tandem repeats, and/or satellite DNA.
  • Exemplary specific genomic regions that show cancer-associated hypomethylation include nucleotides 8403565-8953708 and 151104701-151106035 of human chromosome 1.
  • the hypomethylation variable target regions overlap or comprise one or both of these regions.
  • the probes for the epigenetic target region set comprise probes specific for one or more hypomethylation variable target regions.
  • the hypomethylation variable target regions may be any of those set forth above.
  • the probes specific for one or more hypomethylation variable target regions may include probes for regions such as repeated elements, e.g., LINE1 elements, Alu elements, centromeric tandem repeats, pericentromeric tandem repeats, and satellite DNA, and intergenic regions that are ordinarily methylated in healthy cells may show reduced methylation in tumor cells.
  • probes specific for hypomethylation variable target regions include probes specific for repeated elements and/or intergenic regions.
  • probes specific for repeated elements include probes specific for one, two, three, four, or five of LINE1 elements, Alu elements, centromeric tandem repeats, pericentromeric tandem repeats, and/or satellite DNA.
  • Exemplary probes specific for genomic regions that show cancer-associated hypomethylation include probes specific for nucleotides 8403565-8953708 and/or 151104701- 151106035 of human chromosome 1. In some embodiments, the probes specific for Attorney Docket No.
  • GH0160WO hypomethylation variable target regions include probes specific for regions overlapping or including nucleotides 8403565-8953708 and/or 151104701-151106035 of human chromosome [0144] Probes for detecting the panel of regions can include those for detecting genomic regions of interest (hotspot regions) as well as nucleosome-aware probes (e.g., KRAS codons 12 and 13) and may be designed to optimize capture based on analysis of cfDNA coverage and fragment size variation impacted by nucleosome binding patterns and GC sequence composition. Regions used herein can also include non-hotspot regions optimized based on nucleosome positions and GC models.
  • the DNA is obtained from a subject having a cancer. In some embodiments, the DNA (e.g., cfDNA) is obtained from a subject suspected of having a cancer. In some embodiments, the DNA (e.g., cfDNA) is obtained from a subject having a tumor. In some embodiments, the DNA (e.g., cfDNA) is obtained from a subject suspected of having a tumor. In some embodiments, the DNA (e.g., cfDNA) is obtained from a subject having neoplasia.
  • the DNA (e.g., cfDNA) is obtained from a subject suspected of having neoplasia. In some embodiments, the DNA (e.g., cfDNA) is obtained from a subject in remission from a tumor, cancer, or neoplasia (e.g., following chemotherapy, surgical resection, radiation, or a combination thereof).
  • the cancer, tumor, or neoplasia or suspected cancer, tumor, or neoplasia may be of the lung, colon, rectum, kidney, breast, prostate, or liver.
  • the cancer, tumor, or neoplasia or suspected cancer, tumor, or neoplasia is of the lung. In some embodiments, the cancer, tumor, or neoplasia or suspected cancer, tumor, or neoplasia is of the colon or rectum. In some embodiments, the cancer, tumor, or neoplasia or suspected cancer, tumor, or neoplasia is of the breast. In some embodiments, the cancer, tumor, or neoplasia or suspected cancer, tumor, or neoplasia is of the prostate. In any of the foregoing embodiments, the subject may be a human subject.
  • the sequence-variable target region probe set has a footprint of at least 0.5 kb, e.g., at least 1 kb, at least 2 kb, at least 5 kb, at least 10 kb, at least 20 kb, at least 30 kb, or at least 40 kb.
  • the epigenetic target region probe set has a footprint in the range of 0.5-100 kb, e.g., 0.5-2 kb, 2-10 kb, 10-20 kb, 20-30 kb, 30-40 kb, 40-50 kb, 50-60 kb, 60-70 kb, 70-80 kb, 80-90 kb, and 90-100 kb.
  • the probes specific for the sequence-variable target region set comprise probes specific for target regions from at least 10, 20, 30, or 35 cancer-related genes, such as AKT1, ALK, BRAF, CCND1, CDK2A, CTNNB1, EGFR, ERBB2, ESR1, FGFR1, FGFR2, FGFR3, FOXL2, GATA3, GNA11, GNAQ, GNAS, HRAS, IDH1, IDH2, Attorney Docket No. GH0160WO KIT, KRAS, MED12, MET, MYC, NFE2L2, NRAS, PDGFRA, PIK3CA, PPP2R1A, PTEN, RET, STK11, TP53, and U2AF1.
  • cancer-related genes such as AKT1, ALK, BRAF, CCND1, CDK2A, CTNNB1, EGFR, ERBB2, ESR1, FGFR1, FGFR2, FGFR3, FOXL2, GATA3, GNA11, GNAQ, GNAS, HR
  • compositions Including Captured DNA Provided herein is a combination including first and second populations of captured DNA.
  • the first population may comprise or be derived from DNA with a cytosine modification in a greater proportion than the second population.
  • the first population may comprise a form of a first nucleobase originally present in the DNA with altered base pairing specificity and a second nucleobase without altered base pairing specificity, wherein the form of the first nucleobase originally present in the DNA prior to alteration of base pairing specificity is a modified or unmodified nucleobase, the second nucleobase is a modified or unmodified nucleobase different from the first nucleobase, and the form of the first nucleobase originally present in the DNA prior to alteration of base pairing specificity and the second nucleobase have the same base pairing specificity.
  • the second population does not comprise the form of the first nucleobase originally present in the DNA with altered base pairing specificity.
  • the cytosine modification is cytosine methylation.
  • the first nucleobase is a modified or unmodified cytosine and the second nucleobase is a modified or unmodified cytosine.
  • the first and second nucleobase may be any of those discussed herein in the Summary or with respect to subjecting the first subsample to a procedure that affects a first nucleobase in the DNA differently from a second nucleobase in the DNA of the first subsample.
  • the first population includes a sequence tag selected from a first set of one or more sequence tags and the second population includes a sequence tag selected from a second set of one or more sequence tags, and the second set of sequence tags is different from the first set of sequence tags.
  • the sequence tags may comprise barcodes.
  • the first population includes protected hmC, such as glucosylated hmC.
  • the first population was subjected to any of the conversion procedures discussed herein, such as bisulfite conversion, Ox-BS conversion, TAB conversion, ACE conversion, TAP conversion, TAPS ⁇ conversion, or CAP conversion.
  • the first population was subjected to protection of hmC followed by deamination of mC and/or C.
  • the first population includes or was derived from DNA with a cytosine modification in a greater proportion than the second population and the first population includes first and second subpopulations
  • the first nucleobase is a modified or unmodified nucleobase
  • the second nucleobase is a Attorney Docket No. GH0160WO modified or unmodified nucleobase different from the first nucleobase
  • the first nucleobase and the second nucleobase have the same base pairing specificity.
  • the second population does not comprise the first nucleobase.
  • the first nucleobase is a modified or unmodified cytosine
  • the second nucleobase is a modified or unmodified cytosine, optionally wherein the modified cytosine is mC or hmC.
  • the first nucleobase is a modified or unmodified adenine
  • the second nucleobase is a modified or unmodified adenine, optionally wherein the modified adenine is mA.
  • the first nucleobase e.g., a modified cytosine
  • the first nucleobase e.g., a modified cytosine
  • the first nucleobase is a product of a Huisgen cycloaddition to ⁇ -6-azide-glucosyl-5-hydroxymethylcytosine that includes an affinity label (e.g., biotin).
  • the captured DNA may comprise cfDNA.
  • the captured DNA may have any of the features described herein concerning captured sets, including, e.g., a greater concentration of the DNA corresponding to the sequence-variable target region set (normalized for footprint size as discussed above) than of the DNA corresponding to the epigenetic target region set.
  • the DNA of the captured set includes sequence tags, which may be added to the DNA as described herein.
  • sequence tags results in the DNA molecules differing from their naturally occurring, untagged form.
  • the combination may further comprise a probe set described herein or sequencing primers, each of which may differ from naturally occurring nucleic acid molecules.
  • a probe set described herein may comprise a capture moiety
  • sequencing primers may comprise a non-naturally occurring label.
  • RWE Real World Evidence
  • such methods may comprise: partitioning the sample into a plurality of subsamples, including a first subsample and a second subsample, wherein the first subsample includes DNA with a cytosine modification in a greater proportion than the second subsample; subjecting the first subsample to a procedure that affects a first nucleobase in the DNA differently from a second nucleobase in the DNA of the first subsample, wherein the first nucleobase is a modified or unmodified nucleobase, the second nucleobase is a modified or unmodified nucleobase different from the first nucleobase, and the first nucleobase and the Attorney Docket No.
  • GH0160WO second nucleobase have the same base pairing specificity; and sequencing DNA in the first subsample and DNA in the second subsample in a manner that distinguishes the first nucleobase from the second nucleobase in the DNA of the first subsample.
  • the present disclosure provides a non-transitory computer-readable medium including computer-executable instructions which, when executed by at least one electronic processor, perform at least a portion of a method including: collecting cfDNA from a test subject; capturing a plurality of sets of target regions from the cfDNA, wherein the plurality of target region sets includes a sequence-variable target region set and an epigenetic target region set, whereby a captured set of cfDNA molecules is produced; sequencing the captured cfDNA molecules, wherein the captured cfDNA molecules of the sequence-variable target region set are sequenced to a greater depth of sequencing than the captured cfDNA molecules of the epigenetic target region set; obtaining a plurality of sequence reads generated by a nucleic acid sequencer from sequencing the captured cfDNA molecules; mapping the plurality of sequence reads to one or more reference sequences to generate mapped sequence reads; and processing the mapped sequence reads corresponding to the sequence-variable target region set and to the epi
  • the code can be pre-compiled and configured for use with a machine have a processer adapted to execute the code or can be compiled during runtime.
  • the code can be supplied in a programming language that can be selected to enable the code to execute in a pre- compiled or as-compiled fashion.
  • Additional details relating to computer systems and networks, databases, and computer program products are also provided in, for example, Peterson, Computer Networks: A Systems Approach, Morgan Kaufmann, 5th Ed. (2011), Kurose, Computer Networking: A Top-Down Approach, Pearson, 7th Ed. (2016), Elmasri, Fundamentals of Database Systems, Addison Wesley, 6th Ed. (2010), Coronel, Database Systems: Design, Implementation, & Management, Cengage Learning, 11th Ed.
  • FIG.6 illustrates an example of a system 100 for generating negative predictions of a target variant in a sample of a subject 111, according to an embodiment of the disclosure.
  • the system 100 may process one or more samples 101 from the subject 111 to generate sequence reads for variant detection and negative predictions.
  • the system 100 may include a Attorney Docket No. GH0160WO laboratory system 102, a computer system 110, and/or other components.
  • the laboratory system 102 and the computer system 110 may be remote from one another, and connected to one another through a computer network (not illustrated).
  • the laboratory system 102 may include a sample collection and preparation pipeline 103, a sequencing pipeline 105, a sequence read datastore 109, and/or other components.
  • the sequencing pipeline 105 may include one or more sequencing devices 107 (illustrated in FIG. 1 as sequencing devices 107a...n).
  • the computer system 110 may include a sequence analysis pipeline 112, a processor 120, a storage device 122, a variant detection pipeline 130, and/or other components.
  • the sequence analysis pipeline 112 may include a sequence quality control (QC) component 113 that may trim or trash sequence reads from the laboratory system 102, other analysis components 115 that may perform preliminary alignments to a reference genome, and an analysis QC component 116 that may perform quality control on the output of the analysis components 115.
  • Output, such as sequence reads of a sample 101 of a subject 111, from the sequence analysis pipeline 112 may be stored in an analysis datastore 117.
  • the processor 120 may implement (be programmed by) various components of the variant detection pipeline 130, such as the variant detector 132, the negative prediction analyzer 134, and/or other components. Alternatively, it should be noted that each of these components of the variant detection pipeline 130 may include a hardware module.
  • the variant detection pipeline 130 may cause the computer system 110 to identify variants, diseases from the variants (precision diagnostics), negative predictions, and/or treatment regiments.
  • the precision diagnostic and treatment regimen may be stored in a repository such as clinical result store 160 or diagnostic result store 150.
  • the variant detector 132 may determine that a target variant has not been detected based on an analysis of the sequence reads from laboratory system 102. It should be noted that at least one sequence read and/or at least one molecule that is sequenced may support the target variant – but this may not be sufficient for the variant detector 132 to detect the target variant.
  • the variant detector 132 might detect the target variant only if the number of sequence reads (and/or the number of molecules that are sequenced) which support the target variant is greater than a threshold. Additionally or alternatively, the variant detector 132 might detect a target variant only if the target variant which is supported by a Attorney Docket No. GH0160WO sequence read and/or a molecule that is sequenced meets a quality threshold. Target variants that are supported by at least one sequence read and/or at least one molecule that is sequenced, but do not meet a threshold, may thus be ignored in some embodiments as false positives, and may not be detected by the variant detector 132.
  • the negative prediction analyzer 134 may access the output of the variant detector 132 and confirm negative predictions as an add-on to the variant detector. Alternatively, or additionally, the negative prediction analyzer 134 may be integrated with the variant detector 132.
  • FIG.7 illustrates a schematic diagram of exemplary inputs and outputs of a negative prediction analyzer 134, according to an embodiment.
  • the negative prediction analyzer 134 may use covariable information 202, coverage information at target sites 204, disease type 206, and/or other input information for significance modeling.
  • the negative prediction analyzer 134 may generate a quantitative value output 210 that may represent a likelihood of whether a negative prediction is correct and a negative prediction assessment 212 that may include a level of confidence or precision diagnostic based on the quantitative value output 210.
  • the sequence reads from the laboratory system 102 may be aligned to a reference genome and in particular to various loci in the reference genome to determine covariable information 202.
  • the covariable information 202 may include covariance variant information that may include historical mutual exclusivity data and/or co-occurrence data of variants.
  • Covariable variants may refer to two or more variants that have a negative (mutually exclusive) or positive (co-occurrence) correlation to one another based on historical observations of sequence data from the laboratory system 102 and/or other data sources.
  • mutually exclusive variants may include variants that tend to not be observed with one another.
  • Co-occurrence variants may be observed to occur when another variant is observed, such as a driver variant mutation and its co-occurrence variant.
  • the significance modeling may generate and use computational estimates of tumor fraction (TF), including methylation determined TF, of a target variant based on nucleic acid sequence reads generated from the sample.
  • the significance modeling may determine and use the diversity of other variants that are detected – or not detected – in the sample.
  • the significance modeling may use detection of covariance variants that usually (based on historical covariance variant information) co-occur with the target variant or mutually exclusive variants that usually (based Attorney Docket No. GH0160WO on the historical covariance variant information) do not co-occur with the target variant.
  • a negative predictive value (“NPV”) may be generated based on the TF, including methylation determined TF, estimates and/or diversity of variants that are detected, or not detected, in the sample. The result may be used to provide a level of confidence in a negative diagnosis and/or to further guide treatment plans based on the negative diagnosis.
  • FIG. 8 illustrates an example of a method 300 for generating negative predictions of a target variant in a sample of a subject, according to an embodiment of the disclosure.
  • Methods of the invention can be used for determining as a true negative result that a variant of interest is absent (e.g. absent at the clonal level).
  • the method 300 may include accessing a plurality of sequence reads of the cfDNA sample.
  • the method 300 may include determining that a target variant (the target variant) has not been detected at a first locus in the sample (e.g., a cfNA sample) based on the plurality of sequence reads.
  • the target variant (and/or other variants described herein) may include a somatic variant.
  • the target variant (and/or other variants described herein) may not include a germline variant.
  • Assessing Negative Predictions [0169]
  • the method 300 may include generating a first likelihood value based on a probability that the target variant is absent at the clonal level and a second likelihood value based on a probability that the target variant is not absent at the clonal level.
  • the method 300 may include determining a quantitative value based on the first likelihood value and the second likelihood value.
  • the method 300 may include comparing the quantitative value to a threshold.
  • the method 300 may include determining that the target variant at the first locus is absent at the clonal level based on the comparison. For example, the method 300 may include determining that the allele frequency of the target variant does not exceed the threshold (such as the sub-clonal threshold described with reference to FIGS. 4A and 4B).
  • the method 300 and/or the negative prediction analyzer 134 may model the probability that the target variant is absent at the clonal level (or present at a sub-clonal level of a tumor variant) as a test or alternative hypothesis (H 1 ) to generate the first likelihood value.
  • FIG.4A illustrates a graph 400A of a test hypothesis in which a target variant (the target variant) is absent (or present at sub-clonal level of the tumor variant) from the sample, according to an embodiment.
  • the negative prediction analyzer 134 may model the probability that the target variant is not absent at the clonal level as a null hypothesis ((H 0 )) to generate the second likelihood value.
  • FIG.4B illustrates a graph 400B of a null hypothesis in which the target variant is not absent in the sample (and correlates with an allele frequency of the tumor variant), according to an embodiment.
  • “C” reflects the minor allele at a target locus.
  • the value “0.3” reflects a weight applied to ⁇ 1 (the TF estimation based on mutant allele frequency of a tumor variant) such that the product of 0.3 x ⁇ 1 serves as a sub-clonal threshold value.
  • An allele frequency ( ⁇ 2) of a target variant in the sample 101 of the subject 111 above the sub-clonal threshold value may indicate that the target variant is correlated with the tumor variant.
  • the negative prediction analyzer 134 may generate the first likelihood value and the second likelihood value by determining a tumor fraction (TF) estimate, including methylation determined TF, (such as ⁇ 1 in the Equations described herein) of the sample.
  • the TF estimate may indicate a fraction of tumor DNA detected in the sample.
  • the TF estimate may be determined by determining an allele frequency of a tumor variant (referred to as epi MAF) in the sample.
  • the epi MAF may be determined by determining a molecule count associated with the tumor variant based on the plurality of sequence reads.
  • the first likelihood value based on the probability that the target variant is absent at the clonal level (such as L 1 in the Equations described herein) and the second likelihood value that the target variant is not absent at the clonal level or is present at a sub- clonal level (such as L 0 in the Equations described herein) may be based on the TF estimate.
  • the negative prediction analyzer 134 may use the TF estimate to generate the quantitative value that assesses the quality of the negative prediction (such as by indicating a probability of whether or not the negative prediction is correct or false). For example, the negative prediction analyzer 134 may determine a first allele frequency of the target variant (the target variant).
  • the negative prediction analyzer 134 may determine the first allele frequency by determining a first molecule count associated with the target variant based Attorney Docket No. GH0160WO on the plurality of sequence reads.
  • the negative prediction analyzer 134 may use the first allele frequency with the epi MAF to determine the first likelihood value and the second likelihood value are based further on the first allele frequency and the epi MAF.
  • the probability that the target variant is absent at the clonal level (or present at a sub-clonal level) may be based on a sub-clonal threshold value (illustrated as 0.3* ⁇ 1).
  • L 1 refers to the likelihood value for the test hypothesis where the variant is absent at the clonal level.
  • ⁇ 1 refers to an allele frequency of a tumor variant, which may be used as a TF estimate
  • ⁇ 2 refers to an allele frequency of a target variant (the target variant)
  • M v refers to a number of molecules supporting a tumor variant at a locus of the tumor variant
  • M r refers to a number of molecules supporting a reference wildtype at the locus of the tumor variant
  • M v ’ refers to a number of molecules supporting a target variant at a locus of the target variant Mr’ refers to a number of molecules supporting a reference wildtype at the locus of the target variant
  • refers to an error rate for the TF estimate ⁇ ’ refers to an error rate for the target variant Attorney Docket No.
  • GH0160WO Error rates are typically derived from sequence information obtained from samples obtained from healthy or normal subjects (e.g., z-scores or the like).
  • ⁇ 2 ⁇ ⁇ ⁇ 1 (Eq. 4) This equation is for simplification purposes (same as for Eq. 1), but is easier to compute than the integral in Eq.1.
  • Epsilon ( ⁇ ) is taken from calculation of a z-score derived from sequence information obtained from samples obtained from healthy or normal subjects.
  • the negative prediction analyzer 134 may adjust the quantitative value determined from the TF estimate based on the presence of one or more variants other than the target variant in a sample 101 of the subject 111.
  • the negative prediction analyzer 134 may determine a prevalence of at least a second variant in the cfDNA sample 101, and adjust the quantitative value based on the prevalence of at least a second variant.
  • the prevalence data may be determined according to Equations 7 and 8: [0188]
  • the likelihood value (L1) that the test hypothesis is correct may be adjusted based on Equation 9 to generate an adjusted likelihood value (L 1a ), and a likelihood ratio (LR a )may be generated according to Equation 10: Attorney Docket No. GH0160WO (Eq.10) [0191]
  • Eq.10 is a likelihood ratio using the properties of condition dependence.
  • the quantitative value may be based on an LLR between the first likelihood value and the second likelihood value. As such, the quantitative value may be based on a ratio between the first likelihood value (such as L 1 of Equation 14) and the second likelihood value (such as L 0 of Equation 15). In some examples, the negative prediction analyzer 134 may generate a TF-based LLR (such as LLR tf illustrated in Equation 16).
  • the quantitative value may be based on LLR of covariance data.
  • the negative prediction analyzer 134 may generate the LLR me that reflects covariance data, as illustrated in Equation 18 (conditional probability of how many times variants are observed together).
  • the quantitative value may be expressed as a log posterior probability ratio (LPPR) based on a combination of the TF-based log likelihood of whether the null or test hypothesis is correct, a covariance-based (e.g., mutual exclusivity) log likelihood of whether the null or test hypothesis is correct, and prior-data based log data, such as expressed in Equations 19 and 21 below.
  • the quantitative value (such as an LLR in Equation 11) may be based further on a LogPrior data that is based on historical, observed, data Attorney Docket No. GH0160WO not necessarily limited to the sample 101 of the subject 111.
  • Such LogPrior data may be based on covariable information indicating a historical prevalence of one or more variants exhibiting co-occurrence and/or mutual exclusivity with the target variant.
  • the LogPrior data ⁇ ⁇ ( ⁇ may be expressed as: log ⁇ ) ⁇ ⁇ ( ⁇ +) .
  • the LogPrior data may be used to generate the quantitative value in combination with other values, such as in Equation 19.
  • the negative prediction analyzer 134 has been described as implementing the method 300 and performing the foregoing additional operations. It should be further understood that the foregoing additional operations may be part of and extend the method 300. [0209]
  • the various processing operations and/or methods depicted in the Figures may be accomplished using some or all of the system components described in detail herein and, in some implementations, various operations may be performed in different sequences and various operations may be omitted. Additional operations may be performed along with some or all of the operations shown in the depicted flow diagrams. One or more operations may be performed simultaneously. Accordingly, the operations as illustrated (and described in greater detail herein) are provided as example and, as such, should not be viewed as limiting.
  • the present methods can be computer-implemented, such that any or all of the operations described in the specification or appended claims other than wet chemistry steps can be performed in a suitable programmed computer.
  • the computer can be a mainframe, personal computer, tablet, smart phone, cloud, online data storage, remote data storage, or the like.
  • the computer can be operated in one or more locations.
  • Various operations of the present methods can utilize information and/or programs and generate results that are stored on computer-readable media (e.g., hard drive, auxiliary memory, external memory, server; database, portable memory device (e.g., CD-R, DVD, ZIP disk, flash memory cards), and the like.
  • computer-readable media e.g., hard drive, auxiliary memory, external memory, server; database, portable memory device (e.g., CD-R, DVD, ZIP disk, flash memory cards), and the like.
  • the present disclosure also includes an article of manufacture for analyzing a nucleic acid population that includes a machine-readable medium containing one or more programs which when executed implement the steps of the present methods.
  • the disclosure can be implemented in hardware and/or software. For example, different aspects of the disclosure can be implemented in either client-side logic or server-side logic. The disclosure or components thereof can be embodied in a fixed media program component containing logic instructions and/or data that when loaded into an appropriately configured computing device cause that device to perform according to the disclosure.
  • a fixed media containing logic instructions can be delivered to a viewer on a fixed media for physically loading into a viewer's computer or a fixed media containing logic instructions may reside on a remote server that a viewer accesses through a communication medium to download a program component.
  • the present disclosure provides computer control systems that are programmed to implement methods of the disclosure.
  • the processor 120 may include a single core or multi core processor, or a plurality of processors for parallel processing.
  • the storage device 122 may include random-access memory, read-only memory, flash memory, a hard disk, and/or other type of storage.
  • the computer system 110 may include a communication interface (e.g., network adapter) for communicating with one or more other systems, and peripheral devices, such as cache, other memory, data storage and/or electronic display adapters.
  • the components of the computer system 110 may communicate with one another through an internal communication bus, such as a motherboard.
  • the storage device 122 may be a data storage unit (or data repository) for storing data.
  • the computer system 110 may be operatively coupled to a computer network ("network") with the aid of the communication interface.
  • the network may be the Internet, an internet and/or extranet, or an intranet and/or extranet that is in communication with the Internet.
  • the network in some cases is a telecommunication and/or Attorney Docket No. GH0160WO data network.
  • the network may include a local area network.
  • the network may include one or more computer servers, which can enable distributed computing, such as cloud computing.
  • the network in some cases with the aid of the computer system 110, may implement a peer-to- peer network, which may enable devices coupled to the computer system 120 to behave as a client or a server.
  • the processor 120 may execute a sequence of machine-readable instructions, which can be embodied in a program or software.
  • the instructions may be stored in a memory location, such as the storage device 122.
  • the instructions can be directed to the processor 120, which can subsequently program or otherwise configure the processor 120 to implement methods of the present disclosure. Examples of operations performed by the processor 120 may include fetch, decode, execute, and writeback.
  • the processor 120 may be part of a circuit, such as an integrated circuit. One or more other components of the system 100 may be included in the circuit. In some cases, the circuit may include an application specific integrated circuit (ASIC).
  • ASIC application specific integrated circuit
  • the storage device 122 may store files, such as drivers, libraries and saved programs. The storage device 122 can store user data, e.g., user preferences and user programs.
  • the computer system 110 in some cases may include one or more additional data storage units that are external to the computer system 110, such as located on a remote server that is in communication with the computer system 110 through an intranet or the Internet.
  • the computer system 110 can communicate with one or more remote computer systems through the network. For instance, the computer system 110 can communicate with a remote computer system of a user.
  • remote computer systems examples include personal computers (e.g., portable PC), slate or tablet PC's (e.g., Apple® iPad, Samsung® Galaxy Tab), telephones, Smart phones (e.g., Apple® iPhone, Android-enabled device, Blackberry®), or personal digital assistants.
  • the user can access the computer system 110 via the network.
  • Methods as described herein can be implemented by way of machine (e.g., computer processor) executable code stored on an electronic storage location of the computer system 110, such as, for example, on the storage device 122.
  • the machine executable or machine readable code can be provided in the form of software (e.g., computer readable media). During use, the code can be executed by the processor 120.
  • the code can be retrieved from the storage device 122 and stored on the storage device 122 for ready access by the processor 120.
  • the code may be pre-compiled and configured for use with a machine having a processer adapted to execute the code, or can be compiled during runtime.
  • the code can be Attorney Docket No. GH0160WO supplied in a programming language that can be selected to enable the code to execute in a precompiled or as-compiled fashion.
  • Aspects of the systems and methods provided herein, such as the computer system 110, can be embodied in programming.
  • Machine-executable code can be stored on an electronic storage unit, such as memory (e.g., read-only memory, random-access memory, flash memory) or a hard disk.
  • a machine readable medium such as computer-executable code, may take many forms, including but not limited to, a tangible storage medium, a carrier wave medium or physical transmission medium.
  • Non-volatile storage media include, for example, optical or magnetic disks, such as any of the storage devices in any computer(s) or the like, such as may be used to implement the databases, etc. shown in the drawings.
  • Volatile storage media include dynamic memory, such as main memory of such a computer platform.
  • Tangible transmission media include coaxial cables; copper wire and fiber optics, including the wires that comprise a bus within a computer system.
  • Carrier-wave transmission media may take the form of electric or electromagnetic signals, or acoustic or light waves such as those generated during radio frequency (RF) and infrared (IR) data communications.
  • Computer-readable media therefore include for example: a floppy disk, a flexible disk, hard disk, magnetic tape, any other magnetic medium, a CD-ROM, DVD or DVD-ROM, any other optical medium, punch cards paper tape, any other physical storage medium with patterns of holes, a RAM, a ROM, a PROM and EPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier wave transporting data or instructions, cables or links transporting such a carrier wave, or any other medium from which a computer may read programming code and/or data. Many of these forms of computer readable media may be involved in carrying one or more sequences of one or more instructions to a processor for execution.
  • the computer system 110 can include or be in communication with an electronic display 935 that comprises a user interface (UI) for providing, for example, a report.
  • UI user interface
  • Examples of UI's include, without limitation, a graphical user interface (GUI) and web-based user interface.
  • GUI graphical user interface
  • Methods and systems of the present disclosure can be implemented by way of one or more algorithms. An algorithm can be implemented by way of software upon execution by the processor 120. Further information can be found in PCT App. No. PCT/US2021/015837. Attorney Docket No.
  • Example 1 Liquid biopsy wild type prediction of negative predictors
  • Sample NPV is 0/1 (e.g., 1 - Tissue does not have actionable NCCN biomarker as well as blood and 0 - Tissue has actionable NCCN biomarker but not blood.
  • NPV sample level negative predictive value
  • Example 2 Allele fraction of CRC hotspot variants in Infinity cohort [0227] The scatter plots are colored by the molecule count for the variants. It seems likely that the 1-variant-count samples are spread across the mutation spectrum indicating that there is likely noise. We also see a clear linear trend for variants that are likely in the major clone driving the tumor fraction, as well as a cluster of variants around 1% MAF at higher tumor fractions that are likely sub-clonal variants. Colorectal cancer (CRC) hotspots were measured on epigenomic detection platform, with application of The Cancer Genome Atlas (TCGA) frequencies applied as priors. Allele fraction of CRC hostpots in a cohort is shown.
  • CRC Colorectal cancer

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Chemical & Material Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Genetics & Genomics (AREA)
  • Molecular Biology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

Provided herein are methods of making negative predictions. In some aspects, methods of determining, including via epigenomic detection, that a first target nucleic acid variant is absent at a first genetic locus in a cell-free nucleic acid (cfNA) sample obtained from a subject having a given cancer type at least partially using a computer are provided. Certain of these methods include determining that the first target nucleic acid variant is not detected in the cfNA sample obtained from the subject, generating, by the computer, at least one tumor fraction based value including methylation based estimation; generating, by the computer, at least one mutual exclusivity value; and determining that the first target nucleic acid variant is absent at the first genetic locus in the cfNA sample using the tumor fraction based value and/or the mutual exclusivity value. Additional methods and related systems and computer readable media are also provided.

Description

Attorney Docket No. GH0160WO SIGNIFICANCE MODELING OF CLONAL-LEVEL TARGET VARIANTS USING METHYLATION DETECTION CROSS-REFERENCE TO RELATED APPLICATIONS [0001] This application claims the benefit of priority of US Provisional Patent Application No.63/515,227, filed July 24, 2023, which is incorporated by reference herein in its entirety for all purposes. BACKGROUND [0002] In advanced colorectal cancer (CRC), guidelines recommend the use of anti-EGFR therapies only in patients whose tumors are wild-type for KRAS, NRAS, and BRAF. To date, cell-free circulating tumor DNA (ctDNA) tests have been used as rule-in tests for positive detection of tumor-derived genomic alterations and microsatellite instability (MSI) with high concordance to tissue sequencing. However, the ability to rule out such mutations has been limited due to the potential of low ctDNA shedding impacting sensitivity of detection. Using ctDNA or other nucleic acids to determine the wild-type status of specific genes within a tumor with high confidence would facilitate timely therapeutic decision making and avoid tissue biopsy for confirmation of wild-type status. [0003] As such, it would greatly improve therapeutic approaches if able to identify whether a subject has any actionable variants. Accordingly, there remains a need to identify genetic variants, or the absence thereof, to diagnose and/or guide the treatment of diseases that are detectable through genetic analysis, especially from cell-free nucleic acid (cfDNA) samples. [0004] Described herein are methods for determining no actionable variants are present in a sample. A variety of techniques described herein, including using evidence of tumor in our sample at a purity (tumor fraction) that is above the detection limit of assay, variants of interest in the assay covered with sufficient depth and molecules that we observe over hotspot variants support the effective determination for absence of an alteration. In short, given an observed tumor fraction, an informative likelihood is achieved of correctly identifying the lack of a set of actionable variants. SUMMARY OF THE INVENTION [0005] Described herein is a method of determining that a first variant of interest at a first locus is absent at a clonal level in a cell-free deoxyribonucleic acid (cfDNA) sample of a Attorney Docket No. GH0160WO human subject, the method comprising: accessing a plurality of sequence reads of the cfDNA sample; [0006] determining that the first variant has not been detected at the first locus in the sample based on the plurality of sequence reads; generating a first likelihood value based on a probability that the first variant is absent at the clonal level and/or a second likelihood value based on a probability that the first variant is not absent at the clonal level; optionally, determining a quantitative value based on the first likelihood value and/or the second likelihood value; [0007] comparing the quantitative value and/or the first likelihood value and/or the second likelihood value to a threshold; and determining that the first variant of interest at the first locus is absent at the clonal level based on the comparison. In other embodiments, the method includes generating the first likelihood value and the second likelihood value comprises: determining a tumor fraction estimate of the sample, wherein the first likelihood value and the second likelihood value is based on the tumor fraction estimate. In other embodiments, the method includes determining the tumor fraction estimate comprises: determining a maximum mutant allele frequency (epi MAF) of a tumor mutation in the sample. In other embodiments, the method includes determining the epi MAF comprises determining a molecule count associated with the tumor mutation based on the plurality of sequence reads. In other embodiments, the method includes generating the first likelihood value and the second likelihood value comprises: determining an allele frequency of at least a second variant, wherein the first likelihood value and the second likelihood value are based further on the allele frequency and the epi MAF. In other embodiments, the method includes comparing the allele frequency with a second threshold that is based on the epi MAF, wherein determining that the first variant of interest at the first locus is absent at the clonal level is based further on the comparison of the MAF with the second threshold. In other embodiments, the method includes determining the allele frequency comprises: determining a first molecule count associated with the first variant based on the plurality of sequence reads. In other embodiments, the method includes determining the quantitative value comprises: accessing covariable information indicating a historical prevalence of one or more variants exhibiting co-occurrence and/or mutual exclusivity with the first variant, wherein the quantitative value is based on the covariable information. In other embodiments, the method includes determining a prevalence of at least a second variant in the cfDNA sample, wherein the quantitative value is based further on the covariable information. In other embodiments, the method includes determining the quantitative value comprises: accessing covariable Attorney Docket No. GH0160WO information indicating a historical prevalence of one or more variants exhibiting co- occurrence and/or mutual exclusivity with the first variant, wherein the quantitative value is based on the covariable information. In other embodiments, the method includes determining a prevalence of at least a second variant in the cfDNA sample, wherein the quantitative value is based further on the prevalence of the second variant. In other embodiments, the quantitative value is based on the ratio of the first likelihood value to the second likelihood value. In other embodiments, the method includes determining a level of confidence that the first variant is absent at the clonal level in the cfDNA sample based on the quantitative value. In other embodiments, the method includes determining a treatment plan to treat a disease in the human subject. In other embodiments, the disease is cancer. In other embodiments, the method includes determining a prevalence of at least a second variant in the cfDNA sample; and adjusting the quantitative value based on the prevalence of at least a second variant in the cfDNA sample. [0008] Described herein is a method of determining that a first target nucleic acid variant is absent at a first genetic locus in a cell-free nucleic acid (cfNA) sample obtained from a subject having a given cancer type at least partially using a computer, the method comprising: [0009] determining that the first target nucleic acid variant at the first genetic locus is not detected in the cfNA sample; determining, by the computer, a coverage of the first genetic locus from sequence information generated from the cfNA sample; determining, by the computer, a tumor fraction from the sequence information generated from the cfNA sample; determining, by the computer, a probability that the first target nucleic acid variant is not absent at the first genetic locus in the cfNA sample from the coverage and the tumor fraction to generate a quantitative value; and, determining that the first target nucleic acid variant is absent at the first genetic locus in the cfNA sample when the quantitative value differs from a threshold value. [0010] Described herein is a method of determining that a first target nucleic acid variant is absent at a first genetic locus in a cell-free nucleic acid (cfNA) sample obtained from a subject at least partially using a computer, the method comprising: determining that the first target nucleic acid variant is not detected in the cfNA sample obtained from the subject to generate a first test result; determining that at least a second target nucleic acid variant is detected in the cfNA sample obtained from the subject to generate a second test result; determining, by the computer, a first probability that the first target nucleic acid variant is absent in the cfNA sample given the second test result and/or a second probability that the first target nucleic acid is not absent in the cfNA sample given the second test result; Attorney Docket No. GH0160WO generating, by the computer, a quantitative value using the first probability, the second probability, and/or a ratio thereof; and determining that the first target nucleic acid variant is absent at the first genetic locus in the cfNA sample when the quantitative value differs from a threshold value. [0011] Described herein is a method of determining that a first target nucleic acid variant is absent at a first genetic locus in a cell-free nucleic acid (cfNA) sample obtained from a subject having a given cancer type at least partially using a computer, the method comprising: [0012] determining that the first target nucleic acid variant is not detected in the cfNA sample obtained from the subject; generating, by the computer, at least one tumor fraction based value; generating, by the computer, at least one mutual exclusivity value; and, determining that the first target nucleic acid variant is absent at the first genetic locus in the cfNA sample using the tumor fraction based value and/or the mutual exclusivity value. In other embodiments, the quantitative value is less than the threshold value. In other embodiments, the quantitative value is greater than the threshold value. In other embodiments, the first and second test results are dependent upon one another. In other embodiments, the method includes determining that a plurality of other selected target nucleic variants are absent at one or more other genetic loci. In other embodiments, the quantitative value comprises a log likelihood ratio (LLR) threshold value. In other embodiments, the method includes determining that the first target nucleic acid variant is absent at the first genetic locus in a plurality of reference cfNA samples to generate the threshold value. In other embodiments, the threshold value comprises a clonality or a sub-clonality threshold value. In other embodiments, the first target nucleic acid variant comprises a driver mutation. In other embodiments, the method includes administering one or more therapies to the subject based upon the determination that the first target nucleic acid variant is absent at the first genetic locus in the cfNA sample. In other embodiments, the method includes estimating a probability of detecting the first target nucleic acid variant at the first genetic locus in the cfNA sample using the tumor fraction and a binomial model. In other embodiments, the binomial model comprises information about the given cancer type and/or the second target nucleic acid variant. In other embodiments, the method includes the determination that the first target nucleic acid variant is absent at the first genetic locus in the cfNA sample indicates that the first genetic locus is wild type. In other embodiments, the given cancer type is colorectal cancer, wherein the first genetic locus is KRAS, BRAF, or NRAS, and wherein the determination that the first target nucleic acid variant is absent at the first genetic locus in the cfNA sample indicates that the first genetic locus is wild type KRAS, BRAF, or NRAS. In Attorney Docket No. GH0160WO other embodiments, the method includes administering Cetuximab and/or Panitumumab to the subject. In other embodiments, the cfNA comprises cfDNA. In other embodiments, the cfNA comprises cfRNA. In other embodiments, the method includes repeating the method one or more times to monitor whether the first target nucleic acid variant is absent at the first genetic locus in different cfNA samples obtained from the subject at different time points. In other embodiments, the method includes performing one or more additional tests to confirm or refute the determination that the first target nucleic acid variant is absent at the first genetic locus in the cfNA sample. In other embodiments, the method includes determining a maximum mutant allele frequency (epi MAF) for the cfNA sample and using the epi MAF as an estimate of the tumor fraction. In other embodiments, the method includes determining that first target nucleic acid variant at the first genetic locus is not detected in the cfNA sample based upon a plurality of sequencing reads obtained from the cfNA sample. In other embodiments, the method includes determining that the first target nucleic acid variant is absent at a clonal level in the cfNA sample. In other embodiments, the method includes generating a first likelihood value based on the first probability and a second likelihood value based on the second probability. In other embodiments, the method includes determining the quantitative value based on the first likelihood value and the second likelihood value. [0013] In other embodiments, the method includes generating the first likelihood value and the second likelihood value comprises determining the tumor fraction estimate of the cfNA sample, wherein the first likelihood value and the second likelihood value is based on the tumor fraction estimate. In other embodiments, the method includes determining the tumor fraction estimate comprises determining a maximum mutant allele frequency (epi MAF) of a tumor mutation in the cfNA sample. In other embodiments, the method includes determining the epi MAF comprises determining a molecule count associated with the tumor mutation based on the plurality of sequence reads. In other embodiments, the method includes generating the first likelihood value and the second likelihood value comprises determining an allele frequency of at least a second variant, wherein the first likelihood value and the second likelihood value are based further on the allele frequency and the epi MAF. In other embodiments, the method includes comparing the allele frequency with a second threshold that is based on the epi MAF, wherein determining that the first target nucleic acid variant of interest at the first genetic locus is absent at the clonal level is based further on the comparison of the MAF with the second threshold. In other embodiments, the method includes determining the allele frequency comprises determining a first molecule count associated with the first target nucleic acid variant based on the plurality of sequence reads. Attorney Docket No. GH0160WO [0014] In other embodiments, the method includes determining the quantitative value comprises accessing covariable information indicating a historical prevalence of one or more variants exhibiting co-occurrence and/or mutual exclusivity with the first variant, wherein the quantitative value is based on the covariable information. In other embodiments, the method includes determining a prevalence of at least the second target nucleic acid variant in the cfDNA sample, wherein the quantitative value is based further on the covariable information. [0015] In other embodiments, the method includes determining the quantitative value comprises accessing covariable information indicating a historical prevalence of one or more variants exhibiting co-occurrence and/or mutual exclusivity with the first target nucleic acid variant, wherein the quantitative value is based on the covariable information. In other embodiments, the method includes determining a prevalence of at least the second target nucleic acid variant in the cfNA sample, wherein the quantitative value is based further on the prevalence of the second target nucleic acid variant. In other embodiments, the quantitative value is based on the ratio of the first likelihood value to the second likelihood value. In other embodiments, the method includes determining a level of confidence that the first target nucleic acid variant is absent at a clonal level in the cfNA sample based on the quantitative value. In other embodiments, the method includes determining a prevalence of at least the second target nucleic acid variant in the cfNA sample; and adjusting the quantitative value based on the prevalence of at least the second target nucleic acid variant in the cfNA sample. [0016] In other embodiments, the ratio comprises a log posterior probability ratio (LPPR) equal to a sum of a log likelihood tumor fraction value, a log likelihood mutual exclusivity value, and a log prior value. In other embodiments, the first genetic locus or a second genetic locus comprises the second target nucleic acid variant. In other embodiments, the quantitative value comprises a negative predictive value (NPV) score. In other embodiments, the given cancer type comprises lung cancer and the first target nucleic acid variant is a mutation in a gene selected from the group consisting of: EGFR, BRAF, ALK, ROS1, and MET. In other embodiments, the given cancer type comprises colorectal cancer and the first target nucleic acid variant is a mutation in a gene selected from the group consisting of: KRAS, BRAF, and NRAS. [0017] Described herein is a system, comprising a controller comprising, or capable of accessing, computer readable media comprising non-transitory computer executable instructions which, when executed by at least one electronic processor, perform at least: accessing a plurality of sequence reads of the cfDNA sample; determining that the first Attorney Docket No. GH0160WO variant has not been detected at the first locus in the sample based on the plurality of sequence reads; [0018] generating a first likelihood value based on a probability that the first variant is absent at the clonal level and a second likelihood value based on a probability that the first variant is not absent at the clonal level; determining a quantitative value based on the first likelihood value and the second likelihood value; comparing the quantitative value to a threshold; and [0019] determining that the first variant of interest at the first locus is absent at the clonal level based on the comparison. [0020] Described herein is a system, comprising a controller comprising, or capable of accessing, computer readable media comprising non-transitory computer executable instructions which, when executed by at least one electronic processor, perform at least: [0021] accessing sequence information generated from a cell-free nucleic acid (cfNA) sample obtained from a subject having a given cancer type; determining that a first target nucleic acid variant at a first genetic locus is not detected in cfNA sample from the sequence information; determining a coverage of the first genetic locus from the sequence information; [0022] determining a tumor fraction from the sequence information; determining a probability that the first target nucleic acid variant is not absent at the first genetic locus in the cfNA sample from the coverage and the tumor fraction to generate a quantitative value; and, determining that the first target nucleic acid variant is absent at the first genetic locus in the cfNA sample when the quantitative value differs from a threshold value. [0023] Described herein is a system, comprising a controller comprising, or capable of accessing, computer readable media comprising non-transitory computer executable instructions which, when executed by at least one electronic processor, perform at least: [0024] accessing sequence information generated from a cell-free nucleic acid (cfNA) sample obtained from a subject; determining that the first target nucleic acid variant is not detected in the cfNA sample from the sequence information to generate a first test result; determining that at least a second target nucleic acid variant is detected in the cfNA sample from the sequence information to generate a second test result; determining a first probability that the first target nucleic acid variant is absent in the cfNA sample given the second test result and/or a second probability that the first target nucleic acid is not absent in the cfNA sample given the second test result; generating a quantitative value using the first probability, the second probability, and/or a ratio thereof; and, determining that the first target nucleic acid variant is absent at the first genetic locus in the cfNA sample when the quantitative value differs from a threshold value. Attorney Docket No. GH0160WO [0025] Described herein is a system, comprising a controller comprising, or capable of accessing, computer readable media comprising non-transitory computer executable instructions which, when executed by at least one electronic processor, perform at least: [0026] accessing sequence information generated from a cell-free nucleic acid (cfNA) sample obtained from a subject; determining that the first target nucleic acid variant is not detected in the cfNA sample from the sequence information; generating at least one tumor fraction based value; generating at least one mutual exclusivity value; and, determining that the first target nucleic acid variant is absent at the first genetic locus in the cfNA sample using the tumor fraction based value and/or the mutual exclusivity value. [0027] Described herein is a computer readable media comprising non-transitory computer executable instruction which, when executed by at least electronic processor perform at least: accessing a plurality of sequence reads of the cfDNA sample; determining that the first variant has not been detected at the first locus in the sample based on the plurality of sequence reads; generating a first likelihood value based on a probability that the first variant is absent at the clonal level and a second likelihood value based on a probability that the first variant is not absent at the clonal level; determining a quantitative value based on the first likelihood value and the second likelihood value; comparing the quantitative value to a threshold; and determining that the first variant of interest at the first locus is absent at the clonal level based on the comparison. [0028] Described herein is a computer readable media comprising non-transitory computer executable instruction which, when executed by at least electronic processor perform at least: accessing sequence information generated from a cell-free nucleic acid (cfNA) sample obtained from a subject having a given cancer type; determining that a first target nucleic acid variant at a first genetic locus is not detected in cfNA sample from the sequence information; determining a coverage of the first genetic locus from the sequence information; determining a tumor fraction from the sequence information; determining a probability that the first target nucleic acid variant is not absent at the first genetic locus in the cfNA sample from the coverage and the tumor fraction to generate a quantitative value; and, determining that the first target nucleic acid variant is absent at the first genetic locus in the cfNA sample when the quantitative value differs from a threshold value. [0029] Described herein is a computer readable media comprising non-transitory computer executable instruction which, when executed by at least electronic processor perform at least: accessing sequence information generated from a cell-free nucleic acid (cfNA) sample obtained from a subject; determining that the first target nucleic acid variant is not detected in Attorney Docket No. GH0160WO the cfNA sample from the sequence information to generate a first test result; determining that at least a second target nucleic acid variant is detected in the cfNA sample from the sequence information to generate a second test result; determining a first probability that the first target nucleic acid variant is absent in the cfNA sample given the second test result and/or a second probability that the first target nucleic acid is not absent in the cfNA sample given the second test result; generating a quantitative value using the first probability, the second probability, and/or a ratio thereof; and, determining that the first target nucleic acid variant is absent at the first genetic locus in the cfNA sample when the quantitative value differs from a threshold value. [0030] Described herein is a computer readable media comprising non-transitory computer executable instruction which, when executed by at least electronic processor perform at least: accessing sequence information generated from a cell-free nucleic acid (cfNA) sample obtained from a subject; determining that the first target nucleic acid variant is not detected in the cfNA sample from the sequence information; generating at least one tumor fraction based value; generating at least one mutual exclusivity value; and, determining that the first target nucleic acid variant is absent at the first genetic locus in the cfNA sample using the tumor fraction based value and/or the mutual exclusivity value. In other embodiments, the system or computer readable media of any one of the preceding claims, wherein the quantitative value is less than the threshold value. In other embodiments, the system or computer readable media of any one of the preceding claims, wherein the quantitative value is greater than the threshold value. In other embodiments, the system or computer readable media of any one of the preceding claims, wherein the first and second test results are dependent upon one another. In other embodiments, the system or computer readable media of any one of the preceding claims, comprising determining that a plurality of other selected target nucleic variants are absent at one or more other genetic loci. In other embodiments, the system or computer readable media of any one of the preceding claims, wherein the quantitative value comprises a log likelihood ratio (LLR) threshold value. In other embodiments, the system or computer readable media of any one of the preceding claims, comprising determining that the first target nucleic acid variant is absent at the first genetic locus in a plurality of reference cfNA samples to generate the threshold value. In other embodiments, the system or computer readable media of claim 74, wherein the threshold value comprises a clonality or sub- clonality threshold value. In other embodiments, the system or computer readable media of any one of the preceding claims, wherein the first target nucleic acid variant comprises a driver mutation. In other embodiments, the system or computer readable media of any one of Attorney Docket No. GH0160WO the preceding claims, wherein the instructions further perform at least: outputting one or more therapy recommendations for the subject based upon the determination that the first target nucleic acid variant is absent at the first genetic locus in the cfNA sample. In other embodiments, the system or computer readable media of any one of the preceding claims, wherein the instructions further perform at least: estimating a probability of detecting the first target nucleic acid variant at the first genetic locus in the cfNA sample using the tumor fraction and a binomial model. In other embodiments, the system or computer readable media of any one of the preceding claims, wherein the instructions further perform at least: determining a maximum mutant allele frequency (epi MAF) for the cfNA sample and using the epi MAF as an estimate of the tumor fraction. In other embodiments, the system or computer readable media of any one of the preceding claims, wherein the instructions further perform at least: determining that the first target nucleic acid variant is absent at a clonal level in the cfNA sample. In other embodiments, the system or computer readable media of any one of the preceding claims, wherein the instructions further perform at least: generating a first likelihood value based on the first probability and a second likelihood value based on the second probability. In other embodiments, the system or computer readable media of any one of the preceding claims, wherein the instructions further perform at least: determining the quantitative value based on the first likelihood value and the second likelihood value. In other embodiments, the system or computer readable media of any one of the preceding claims, wherein the instructions further perform at least: generating the first likelihood value and the second likelihood value by determining the tumor fraction estimate of the cfNA sample, wherein the first likelihood value and the second likelihood value is based on the tumor fraction estimate. In other embodiments, the system or computer readable media of claim 83, wherein the instructions further perform at least: determining the tumor fraction estimate by determining a maximum mutant allele frequency (epi MAF) of a tumor mutation in the cfNA sample. In other embodiments, the system or computer readable media of claim 84, wherein the instructions further perform at least: determining the epi MAF by determining a molecule count associated with the tumor mutation based on the plurality of sequence reads. In other embodiments, the system or computer readable media of claim 84, wherein the instructions further perform at least: generating the first likelihood value and the second likelihood value by determining an allele frequency of at least a second variant, wherein the first likelihood value and the second likelihood value are based further on the allele frequency and the epi MAF. In other embodiments, the system or computer readable media of claim 86, wherein the instructions further perform at least: comparing the allele frequency with a second Attorney Docket No. GH0160WO threshold that is based on the epi MAF and determining that the first target nucleic acid variant of interest at the first genetic locus is absent at the clonal level based further on the comparison of the MAF with the second threshold. In other embodiments, the system or computer readable media of claim 86, wherein the instructions further perform at least: determining the allele frequency by determining a first molecule count associated with the first target nucleic acid variant based on the plurality of sequence reads. In other embodiments, the system or computer readable media of claim 86, wherein the instructions further perform at least: determining the quantitative value by accessing covariable information indicating a historical prevalence of one or more variants exhibiting co- occurrence and/or mutual exclusivity with the first variant, wherein the quantitative value is based on the covariable information. In other embodiments, the system or computer readable media of claim 89, wherein the instructions further perform at least: determining a prevalence of at least the second target nucleic acid variant in the cfDNA sample, wherein the quantitative value is based further on the covariable information. In other embodiments, the system or computer readable media of claim 83, wherein the instructions further perform at least: determining the quantitative value by accessing covariable information indicating a historical prevalence of one or more variants exhibiting co-occurrence and/or mutual exclusivity with the first target nucleic acid variant, wherein the quantitative value is based on the covariable information. In other embodiments, the system or computer readable media of claim 91, wherein the instructions further perform at least: determining a prevalence of at least the second target nucleic acid variant in the cfNA sample, wherein the quantitative value is based further on the prevalence of the second target nucleic acid variant. In other embodiments, the system or computer readable media of claim 83, wherein the instructions further perform at least: determining a level of confidence that the first target nucleic acid variant is absent at a clonal level in the cfNA sample based on the quantitative value. In other embodiments, the system or computer readable media of claim 83, wherein the instructions further perform at least: determining a prevalence of at least the second target nucleic acid variant in the cfNA sample; and adjusting the quantitative value based on the prevalence of at least the second target nucleic acid variant in the cfNA sample. In other embodiments, the system or computer readable media of any one of the preceding claims, wherein the ratio comprises a log posterior probability ratio (LPPR) equal to a sum of a log likelihood tumor fraction value, a log likelihood mutual exclusivity value, and a log prior value. In other embodiments, the system of any one of the preceding, further comprising generating a report which optionally includes information on, and/or information derived from, the absence of Attorney Docket No. GH0160WO the first target nucleic acid variant at the first genetic locus in the sample. In other embodiments, the method or system of any of the preceding, further comprising communicating the report to a third party, such as the subject from whom the sample derived or a health care practitioner. BRIEF DESCRIPTION OF THE FIGURES [0031] FIGURE. 1 illustrates an example of sample level NPV values increase with TF. Per Sample NPV values increase wrt TF for different Alteration types. Limited to NSCLC, Breast, Colorectal, Pancreatic and Prostate cancers [0032] FIGURE.2 illustrates an example of biomarker level NPV values increase with TF. Per Biomarker NPV values increase wrt TF for different Alteration types. Limited to NSCLC, Breast, Colorectal, Pancreatic and Prostate cancers [0033] FIGURE.3 illustrates an example of blood tissue PPA of FDA approved biomarkers [0034] FIGURE. 4 illustrates an example of negative prediction power is directly tied to tumor fraction [0035] FIGURE. 5 illustrates an example of allele fraction of CRC hotspot variants in Epigenome cohort [0036] FIGURE.6 illustrates an example of a system for generating negative predictions of a target variant in a sample of a subject, according to an embodiment of the disclosure. [0037] FIGURE. 7 illustrates a schematic diagram of inputs and outputs of a negative prediction analyzer, according to an embodiment. [0038] FIGURE.8 illustrates an example of a method for generating negative predictions of a target variant in a sample of a subject, according to an embodiment of the disclosure. [0039] FIGURE. 9A illustrates a graph of a test hypothesis in which a target variant (the target variant) is absent (or present at sub-clonal MAF) from the sample, according to an embodiment. FIGURE.9B illustrates a graph of a null hypothesis in which the target variant is not absent in the sample, according to an embodiment. DETAILED DESCRIPTION [0040] While various embodiments of the disclosure have been shown and described herein, those skilled in the art will understand that such embodiments are provided by way of example only. Numerous variations, changes, and substitutions may occur to those skilled in Attorney Docket No. GH0160WO the art without departing from the disclosure. It should be understood that various alternatives to the embodiments of the disclosure described herein may be employed. [0041] The term “about” and its grammatical equivalents in relation to a reference numerical value can include a range of values up to plus or minus 10% from that value. For example, the amount “about 10” can include amounts from 9 to 11. The term “about” in relation to a reference numerical value can include a range of values plus or minus 10%, 9%, 8%, 7%, 6%, 5%, 4%, 3%, 2%, or 1% from that value. [0042] The term “at least” and its grammatical equivalents in relation to a reference numerical value can include the reference numerical value and greater than that value. For example, the amount “at least 10” can include the value 10 and any numerical value above 10, such as 11, 100, and 1,000. [0043] The term “at most” and its grammatical equivalents in relation to a reference numerical value can include the reference numerical value and less than that value. For example, the amount “at most 10” can include the value 10 and any numerical value under 10, such as 9, 8, 5, 1, 0.5, and 0.1. [0044] As used herein the singular forms “a”, “an”, and “the” can include plural referents unless the context clearly dictates otherwise. Thus, for example, reference to “a cell” can include a plurality of such cells and reference to “the culture” can include reference to one or more cultures and equivalents thereof known to those skilled in the art, and so forth. All technical and scientific terms used herein can have the same meaning as commonly understood to one of ordinary skill in the art to which this disclosure belongs unless clearly indicated otherwise. [0045] Cancer can be indicated by epigenetic variations, such as methylation. Examples of methylation changes in cancer include local gains of DNA methylation in the CpG islands at the transcription start site (TSS) of genes involved in normal growth control, DNA repair, cell cycle regulation, and/or cell differentiation. This hypermethylation can be associated with an aberrant loss of transcriptional capacity of involved genes and occurs at least as frequently as point mutations and deletions as a cause of altered gene expression. DNA methylation profiling can be used to detect regions with different extents of methylation (“differentially methylated regions” or “DMRs”) of the genome that are altered during development or that are perturbed by disease, for example, cancer or any cancer-associated disease. The genome of cancer cells harbor imbalance in the above DNA methylation patterns, and therefore in functional packaging of the DNA. The abnormalities of chromatin organization are therefore coupled with methylation changes and may contribute to enhanced cancer profiling when analyzed jointly. Attorney Docket No. GH0160WO Combining MBD-partitioning with fragmentomic data, such as fragment mapped starts and stops positions (correlated with nucleosome positions) , fragment length and associated nucleosome occupancy, can be used for chromatin structure analysis in hypermethylation studies with the aim to improve biomarker detection rate. [0046] Methylation profiling can involve determining methylation patterns across different regions of the genome. For example, after partitioning molecules based on extent of methylation (e.g., relative number of methylated sites per molecule) and sequencing, the sequences of molecules in the different partitions can be mapped to a reference genome. This can show regions of the genome that, compared with other regions, are more highly methylated or are less highly methylated. In this way, genomic regions, in contrast to individual molecules, may differ in their extent of methylation. [0047] A characteristic of nucleic acid molecules may be a modification, which may include various chemical or protein modifications (i.e. epigenetic modifications). Non-limiting examples of chemical modification may include, but are not limited to, covalent DNA modifications, including DNA methylation. In some embodiments, DNA methylation includes addition of a methyl group to a cytosine at a CpG site (a cytosine followed by a guanine in a nucleic acid sequence). In some embodiments, DNA methylation includes addition of a methyl group to adenine, such as in N6-methyladenine. In some embodiments, DNA methylation is 5- methylation (modification of the 5th carbon of the 6 carbon ring of cytosine). In some embodiments, 5-methylation includes addition of a methyl group to the 5C position of the cytosine to create 5-methylcytosine (m5c). In some embodiments, methylation includes a derivative of m5c. Derivatives of m5c include, but are not limited to, 5-hydroxymethylcytosine (5-hmC), 5-formylcytosine (5-fC), and 5-caryboxylcytosine (5-caC). In some embodiments, DNA methylation is 3C methylation (modification of the 3rd carbon of the 6 carbon ring of cytosine). In some embodiments, 3C methylation includes addition of a methyl group to the 3C position of the cytosine to generate 3-methylcytosine (3mC). Other examples include N6- methyladenine or glycosylation. DNA methylation includes addition of methyl groups to DNA (e.g. CpG) and can change the expression of methylated DNA region.. Methylation can also occur at non CpG sites, for example, methylation can occur at a CpA, CpT, or CpC site. DNA methylation can change the activity of methylated DNA region. For example, when DNA in a promoter region is methylated, transcription of the gene may be repressed. DNA methylation is critical for normal development and abnormality in methylation may disrupt epigenetic regulation. The disruption, e.g., repression, in epigenetic regulation may cause diseases, such as cancer. Promoter methylation in DNA may be indicative of cancer. Attorney Docket No. GH0160WO [0048] A CpG dyad is the dinucleotide CpG (cytosine-phosphate-guanine, i.e. a cytosine followed by a guanine in a 5’ ^ 3’ direction of the nucleic acid sequence) on the sense strand and its complementary CpG on the antisense strand of a double-stranded DNA molecule. CpG dyads can be either fully methylated or hemi-methylated (methylated on one strand only). [0049] The CpG dinucleotide is underrepresented in the normal human genome, with the majority of CpG dinucleotide sequences being transcriptionally inert (e.g. DNA heterochromatic regions in pericentromeric parts of the chromosome and in repeat elements) and methylated. However, many CpG islands are protected from such methylation especially around transcription start sites (TSS). [0050] Specifically, in accordance with methods and techniques described herein, epigenomic measurement of tumor fraction can improve negative predicition. Assumig some tumor fraction based on the data and provide the likelihood of the absence of a variant >30% clonality, inflation of tumor fraction prediction (CNV on epi-MAF gene, CHIP, ect) can cause overconfidence in negative prediction, while deflation of tumor-fraction (TND, ect) leads to the inability to make confident negative calls. Static parameters include s0ub-clonal variant purity boundary (30%), prior variant likelihood, and mutual exclusivity or co-occurrence with other variants. Key derived parameters include tumor fraction and cancer tissue of origin. Taking into account these parameters, a probability distribution of tumor fraction based on methylation data can be measured. [0051] Protein modifications include binding to components of chromatin, particularly histones including modified forms thereof, and binding to other proteins, such as proteins involved in replication or transcription. The disclosure provides methods of processing and analyzing nucleic acids with different extents of modification, such that the nature of their original modification is correlated with a nucleic acid tag and can be decoded by sequencing the tag when nucleic acids are analyzed. Genetic variation of sample nucleic acid modifications can then be associated with the extent of modification (epigenetic variation) of that nucleic acid in the original sample. include single stranded (e.g., ssDNA or RNA) or double stranded molecules (e.g., dsDNA). [0052] The loss of DNA can reduce the presence of one or more types of DNA such that the presence of the one or more types of DNA such as cfDNA, is difficult to detect. In one or more additional scenarios, existing methods to measure DNA methylation, such as enrichment or depletion methods, can have a relatively high level of resolution, such as about 100 base pairs (bp) to about 200 bp that can make accurately determining an amount of methylation of DNA difficult. The accuracy with which DNA methylation is determined can impact the Attorney Docket No. GH0160WO accuracy of estimates of tumor fraction for samples. Since tumor fraction can be used to determine whether a sample is derived from a subject in which a tumor is present or not, the accuracy of determinations of tumor fraction estimates can impact diagnosis and/or treatment decisions for individuals. Samples [0053] A sample can be any biological sample isolated from a subject. A sample can be a bodily sample. Samples can include body tissues, such as known or suspected solid tumors, whole blood, platelets, serum, plasma, stool, red blood cells, white blood cells or leucocytes, endothelial cells, tissue biopsies, cerebrospinal fluid synovial fluid, lymphatic fluid, ascites fluid, interstitial or extracellular fluid, the fluid in spaces between cells, including gingival crevicular fluid, bone marrow, pleural effusions, cerebrospinal fluid, saliva, mucous, sputum, semen, sweat, urine. Samples are preferably body fluids, particularly blood and fractions thereof, and urine. A sample can be in the form originally isolated from a subject or can have been subjected to further processing to remove or add components, such as cells, or enrich for one component relative to another. Thus, a preferred body fluid for analysis is plasma or serum containing cell-free nucleic acids. A sample can be isolated or obtained from a subject and transported to a site of sample analysis. The sample may be preserved and shipped at a desirable temperature, e.g., room temperature, 4°C, -20°C, and/or -80°C. A sample can be isolated or obtained from a subject at the site of the sample analysis. The subject can be a human, a mammal, an animal, a companion animal, a service animal, or a pet. The subject may have a cancer. The subject may not have cancer or a detectable cancer symptom. The subject may have been treated with one or more cancer therapy, e.g., any one or more of chemotherapies, antibodies, vaccines or biologics. The subject may be in remission. The subject may or may not be diagnosed of being susceptible to cancer or any cancer-associated genetic mutations/disorders. [0054] The volume of plasma can depend on the desired read depth for sequenced regions. Exemplary volumes are 0.4-40 ml, 5-20 ml, 10-20 ml. For examples, the volume can be 0.5 mL, 1 mL, 5 mL 10 mL, 20 mL, 30 mL, or 40 mL. A volume of sampled plasma may be 5 to 20 mL. [0055] A sample can comprise various amount of nucleic acid that contains genome equivalents. For example, a sample of about 30 ng DNA can contain about 10,000 (104) haploid human genome equivalents and, in the case of cfDNA, about 200 billion (2x1011) individual polynucleotide molecules. Similarly, a sample of about 100 ng of DNA can contain Attorney Docket No. GH0160WO about 30,000 haploid human genome equivalents and, in the case of cfDNA, about 600 billion individual molecules. [0056] A sample can comprise nucleic acids from different sources, e.g., from cells and cell-free of the same subject, from cells and cell-free of different subjects. A sample can comprise nucleic acids carrying mutations. For example, a sample can comprise DNA carrying germline mutations and/or somatic mutations. Germline mutations refer to mutations existing in germline DNA of a subject. Somatic mutations refer to mutations originating in somatic cells of a subject, e.g., cancer cells. A sample can comprise DNA carrying cancer-associated mutations (e.g., cancer-associated somatic mutations). A sample can comprise an epigenetic variant (i.e. a chemical or protein modification), wherein the epigenetic variant associated with the presence of a genetic variant such as a cancer-associated mutation. In some embodiments, the sample includes an epigenetic variant associated with the presence of a genetic variant, wherein the sample does not comprise the genetic variant. [0057] Exemplary amounts of cell-free nucleic acids in a sample before amplification range from about 1 fg to about 1 μg, e.g., 1 pg to 200 ng, 1 ng to 100 ng, 10 ng to 1000 ng. For example, the amount can be up to about 600 ng, up to about 500 ng, up to about 400 ng, up to about 300 ng, up to about 200 ng, up to about 100 ng, up to about 50 ng, or up to about 20 ng of cell-free nucleic acid molecules. The amount can be at least 1 fg, at least 10 fg, at least 100 fg, at least 1 pg, at least 10 pg, at least 100 pg, at least 1 ng, at least 10 ng, at least 100 ng, at least 150 ng, or at least 200 ng of cell-free nucleic acid molecules. The amount can be up to 1 femtogram (fg), 10 fg, 100 fg, 1 picogram (pg), 10 pg, 100 pg, 1 ng, 10 ng, 100 ng, 150 ng, or 200 ng of cell-free nucleic acid molecules. The method can comprise obtaining 1 femtogram (fg) to 200 ng. [0058] Cell-free nucleic acids are nucleic acids not contained within or otherwise bound to a cell or in other words nucleic acids remaining in a sample after removing intact cells. Cell- free nucleic acids include DNA, RNA, and hybrids thereof, including genomic DNA, mitochondrial DNA, siRNA, miRNA, circulating RNA (cRNA), tRNA, rRNA, small nucleolar RNA (snoRNA), Piwi-interacting RNA (piRNA), long non-coding RNA (long ncRNA), or fragments of any of these. Cell-free nucleic acids can be double-stranded, single-stranded, or a hybrid thereof. A cell-free nucleic acid can be released into bodily fluid through secretion or cell death processes, e.g., cellular necrosis and apoptosis. Some cell-free nucleic acids are released into bodily fluid from cancer cells e.g., circulating tumor DNA, (ctDNA). Others are released from healthy cells. In some embodiments, cfDNA is cell-free fetal DNA (cffDNA) In Attorney Docket No. GH0160WO some embodiments, cell free nucleic acids are produced by tumor cells. In some embodiments, cell free nucleic acids are produced by a mixture of tumor cells and non-tumor cells. [0059] Cell-free nucleic acids have an exemplary size distribution of about 100-500 nucleotides, with molecules of 110 to about 230 nucleotides representing about 90% of molecules, with a mode of about 168 nucleotides and a second minor peak in a range between 240 to 440 nucleotides. Cell-free nucleic acids can be isolated from bodily fluids through a fractionation or partitioning step in which cell-free nucleic acids, as found in solution, are separated from intact cells and other non-soluble components of the bodily fluid. Partitioning may include techniques such as centrifugation or filtration. Alternatively, cells in bodily fluids can be lysed and cell-free and cellular nucleic acids processed together. Generally, after addition of buffers and wash steps, nucleic acids can be precipitated with an alcohol. Further clean up steps may be used such as silica based columns to remove contaminants or salts. Non- specific bulk carrier nucleic acids, such as Cot-1 DNA, DNA or protein for bisulfite sequencing, hybridization, and/or ligation, may be added throughout the reaction to optimize certain aspects of the procedure such as yield. [0060] After such processing, samples can include various forms of nucleic acid including double stranded DNA, single stranded DNA and single stranded RNA. In some embodiments, single stranded DNA and RNA can be converted to double stranded forms so they are included in subsequent processing and analysis steps. Analytes [0061] Analytes can include nucleic acid analytes, and non-nucleic acid analytes. The disclosure provides for detecting genetic variations in biological samples from a subject. Biological samples may include polynucleotides from cancer cells. Polynucleotides may be DNA (e.g., genomic DNA, cDNA), RNA (e.g., mRNA, small RNAs), or any combination thereof. Biological samples may include tumor tissue, e.g., from a biopsy. In some cases, biological samples may include blood or saliva. In particular cases, biological samples may comprise cell free DNA (“cfDNA”) or circulating tumor DNA (“ctDNA”). Cell free DNA can be present in, e.g., blood. [0062] Examples of non-nucleic acid analytes include, but are not limited to, lipids, carbohydrates, peptides, proteins, glycoproteins (N-linked or O-linked), lipoproteins, phosphoproteins, specific phosphorylated or acetylated variants of proteins, amidation variants of proteins, hydroxylation variants of proteins, methylation variants of proteins, ubiquity lati on variants of proteins, sulfation variants of proteins, viral proteins (e.g., viral capsid, viral Attorney Docket No. GH0160WO envelope, viral coat, viral accessory, viral glycoproteins, viral spike, etc.), extracellular and intracellular proteins, antibodies, and antigen binding fragments. This further includes receptor, an antigen, a surface protein, a transmembrane protein, a cluster of differentiation protein, a protein channel, a protein pump, a carrier protein, a phospholipid, a glycoprotein, a glycolipid, a cell-cell interaction protein complex, an antigen-presenting complex, a major histocompatibility complex, an engineered T-cell receptor, a T-cell receptor, a B-cell receptor, a chimeric antigen receptor, an extracellular matrix protein, a posttranslational modification (e.g., phosphorylation, glycosylation, ubiquitination, nitrosylation, methylation, acetylation or lipidation) state of a cell surface protein, a gap junction, and an adherens junction. [0063] In general, the systems, apparatus, methods, and compositions can be used to analyze any number of analytes, further including both nucleic acid analytes and non-nucleic acid analytes. For example, the number of analytes that are analyzed can be at least about 2, at least about 3, at least about 4, at least about 5, at least about 6, at least about 7, at least about 8, at least about 9, at least about 10, at least about 11, at least about 12, at least about 13, at least about 14, at least about 15, at least about 20, at least about 25, at least about 30, at least about 40, at least about 50, at least about 100, at least about 1,000, at least about 10,000, at least about 100,000 or more different analytes present in a region of the sample or within an individual feature of the substrate. Methods for performing multiplexed assays to analyze two or more different analytes will be discussed in a subsequent section of this disclosure. [0064] One or more nucleic acid analytes and/or non-nucleic acid analytes constitute a set of molecular interactions in a biological system under study (e.g., cells), which may be regarded as “interactome” – the molecular interactions that occur between molecules belonging to different biochemical families (proteins, nucleic acids, lipids, carbohydrates, etc.) and also within a given family. In various embodiments, an interactome is a protein-DNA interactome (network formed by transcription factors (and DNA or chromatin regulatory proteins) and their target genes. In other embodiments, interactome refers to protein-protein interaction network(PPI), or protein interaction network (PIN). The methods described herein allow for study and analysis of the interactome. Techniques such as proteogenomics (whole genome sequencing, whole exome sequencing and RNA-seq, and mass spectrometry as examples) can support study of the interactome. Analysis [0065] The present methods can be used to diagnose presence of conditions, particularly cancer, in a subject, to characterize conditions (e.g., staging cancer or determining Attorney Docket No. GH0160WO heterogeneity of a cancer), monitor response to treatment of a condition, effect prognosis risk of developing a condition or subsequent course of a condition. The present disclosure can also be useful in determining the efficacy of a particular treatment option. Successful treatment options may increase the amount of copy number variation or rare mutations detected in subject's blood if the treatment is successful as more cancers may die and shed DNA. In other examples, this may not occur. In another example, perhaps certain treatment options may be correlated with genetic profiles of cancers over time. This correlation may be useful in selecting a therapy. Additionally, if a cancer is observed to be in remission after treatment, the present methods can be used to monitor residual disease or recurrence of disease. [0066] The types and number of cancers that may be detected may include blood cancers, brain cancers, lung cancers, skin cancers, nose cancers, throat cancers, liver cancers, bone cancers, lymphomas, pancreatic cancers, skin cancers, bowel cancers, rectal cancers, thyroid cancers, bladder cancers, kidney cancers, mouth cancers, stomach cancers, solid state tumors, heterogeneous tumors, homogenous tumors and the like. Type and/or stage of cancer can be detected from genetic variations including mutations, rare mutations, indels, copy number variations, transversions, translocations, inversion, deletions, aneuploidy, partial aneuploidy, polyploidy, chromosomal instability, chromosomal structure alterations, gene fusions, chromosome fusions, gene truncations, gene amplification, gene duplications, chromosomal lesions, DNA lesions, abnormal changes in nucleic acid chemical modifications, abnormal changes in epigenetic patterns, and abnormal changes in nucleic acid 5-methylcytosine. [0067] Genetic and other analyte data can also be used for characterizing a specific form of cancer. Cancers are often heterogeneous in both composition and staging. Genetic profile data may allow characterization of specific sub-types of cancer that may be important in the diagnosis or treatment of that specific sub-type. This information may also provide a subject or practitioner clues regarding the prognosis of a specific type of cancer and allow either a subject or practitioner to adapt treatment options in accord with the progress of the disease. Some cancers can progress to become more aggressive and genetically unstable. Other cancers may remain benign, inactive or dormant. The system and methods of this disclosure may be useful in determining disease progression. [0068] The present analyses are also useful in determining the efficacy of a particular treatment option. Successful treatment options may increase the amount of copy number variation or rare mutations detected in subject's blood if the treatment is successful as more cancers may die and shed DNA. In other examples, this may not occur. In another example, perhaps certain treatment options may be correlated with genetic profiles of cancers over time. Attorney Docket No. GH0160WO This correlation may be useful in selecting a therapy. Additionally, if a cancer is observed to be in remission after treatment, the present methods can be used to monitor residual disease or recurrence of disease. [0069] The present methods can also be used for detecting genetic variations in conditions other than cancer. Immune cells, such as B cells, may undergo rapid clonal expansion upon the presence certain diseases. Clonal expansions may be monitored using copy number variation detection and certain immune states may be monitored. In this example, copy number variation analysis may be performed over time to produce a profile of how a particular disease may be progressing. Copy number variation or even rare mutation detection may be used to determine how a population of pathogens is changing during the course of infection. This may be particularly important during chronic infections, such as HIV/AIDS or Hepatitis infections, whereby viruses may change life cycle state and/or mutate into more virulent forms during the course of infection. The present methods may be used to determine or profile rejection activities of the host body, as immune cells attempt to destroy transplanted tissue to monitor the status of transplanted tissue as well as altering the course of treatment or prevention of rejection. [0070] Further, the methods of the disclosure may be used to characterize the heterogeneity of an abnormal condition in a subject. Such methods can include, e.g., generating a genetic profile of extracellular polynucleotides derived from the subject, wherein the genetic profile includes a plurality of data resulting from copy number variation and rare mutation analyses. In some embodiments, an abnormal condition is cancer. In some embodiments, the abnormal condition may be one resulting in a heterogeneous genomic population. In the example of cancer, some tumors are known to comprise tumor cells in different stages of the cancer. In other examples, heterogeneity may comprise multiple foci of disease. Again, in the example of cancer, there may be multiple tumor foci, perhaps where one or more foci are the result of metastases that have spread from a primary site. [0071] The present methods can be used to generate or profile, fingerprint or set of data that is a summation of genetic information derived from different cells in a heterogeneous disease. This set of data may comprise copy number variation and mutation analyses alone or in combination. [0072] The present methods can be used to diagnose, prognose, monitor or observe cancers. or other diseases. In some embodiments, the methods herein do not involve the diagnosing, prognosing or monitoring a fetus and as such are not directed to non-invasive prenatal testing. In other embodiments, these methodologies may be employed in a pregnant subject to Attorney Docket No. GH0160WO diagnose, prognose, monitor or observe cancers or other diseases in an unborn subject whose DNA and other polynucleotides may co-circulate with maternal molecules. Determination of 5-methylcytosine pattern of nucleic acids [0073] Bisulfite-based sequencing and variants thereof provides a means of determining the methylation pattern of a nucleic acid. In some embodiments, determining the methylation pattern includes distinguishing 5-methylcytosine (5mC) from non-methylated cytosine. In some embodiments, determining methylation pattern includes distinguishing N6- methyladenine from non-methylated adenine. In some embodiments, determining the methylation pattern includes distinguishing 5-hydroxymethylcytosine (5hmC), 5- formylcytosine (5fC), and 5-carboxylcytosine (5caC) from non-methylated cytosine. Examples of bisulfite sequencing include, but are not limited to oxidative bisulfite sequencing (OX-BS- seq), Tet-assisted bisulfite sequencing (TAB-seq), and reduced bisulfite sequencing (redBS- seq). [0074] Oxidative bisulfite sequencing (OX-BS-seq) is used to distinguish between 5mC and 5hmC, by first converting the 5hmC to 5fC, and then proceeding with bisulfite sequencing as previously described. Tet-assisted bisulfite sequencing (TAB-seq) can also be used to distinguish 5mc and 5hmC. In TAB-seq, 5hmC is protected by glucosylation. A Tet enzyme is then used to convert 5mC to 5caC before proceeding with bisulfite sequencing, as previously described. Reduced bisulfite sequencing is used to distinguish 5fC from modified cytosines. [0075] Generally, in bisulfite sequencing, a nucleic acid sample is divided into two aliquots and one aliquot is treated with bisulfite. The bisulfite converts native cytosine and certain modified cytosine nucleotides (e.g.5-formylcytosine or 5-carboxylcytosine) to uracil whereas other modified cytosines (e.g., 5- methylcytosine, 5-hydroxylmethylcystosine) are not converted. Comparison of nucleic acid sequences of molecules from the two aliquots indicates which cytosines were and were not converted to uracils. Consequently, cytosines which were and were not modified can be determined. The initial splitting of the sample into two aliquots is disadvantageous for samples containing only small amounts of nucleic acids, and/or composed of heterogeneous cell/tissue origins such as bodily fluids containing cell-free DNA. [0076] The present disclosure provides methods allowing bisulfite sequencing and variants thereof. These methods work by linking nucleic acids in a population to a capture moiety, i.e., a label that can be captured or immobilized. Capture moieties include, without limitation, biotin, avidin, streptavidin, a nucleic acid including a particular nucleotide sequence, a hapten recognized by an antibody, and magnetically attractable particles. The extraction moiety can Attorney Docket No. GH0160WO be a member of a binding pair, such as biotin/streptavidin or hapten/antibody. In some embodiments, a capture moiety that is attached to an analyte is captured by its binding pair which is attached to an isolatable moiety, such as a magnetically attractable particle or a large particle that can be sedimented through centrifugation. The capture moiety can be any type of molecule that allows affinity separation of nucleic acids bearing the capture moiety from nucleic acids lacking the capture moiety. Exemplary capture moieties are biotin which allows affinity separation by binding to streptavidin linked or linkable to a solid phase or an oligonucleotide, which allows affinity separation through binding to a complementary oligonucleotide linked or linkable to a solid phase. Following linking of capture moieties to sample nucleic acids, the sample nucleic acids serve as templates for amplification. Following amplification, the original templates remain linked to the capture moieties but amplicons are not linked to capture moieties. [0077] The capture moiety can be linked to sample nucleic acids as a component of an adapter, which may also provide amplification and/or sequencing primer binding sites. In some methods, sample nucleic acids are linked to adapters at both ends, with both adapters bearing a capture moiety. Preferably any cytosine residues in the adapters are modified, such as by 5methylcytosine, to protect against the action of bisulfite. In some instances, the capture moieties are linked to the original templates by a cleavable linkage (e.g., photocleavable desthiobiotin-TEG or uracil residues cleavable with USER™ enzyme, Chem. Commun. (Camb).2015 Feb 21; 51(15): 3266-3269), in which case the capture moieties can, if desired, be removed. [0078] The amplicons are denatured and contacted with an affinity reagent for the capture tag. Original templates bind to the affinity reagent whereas nucleic acid molecules resulting from amplification do not. Thus, the original templates can be separated from nucleic acid molecules resulting from amplification. [0079] Following separation or partition, the respective populations of nucleic acids (i.e., original templates and amplification products) can be subjected to bisulfite treatment with the original template population receiving bisulfite treatment and the amplification products not. Alternatively, the amplification products can be subjected to bisulfite treatment and the original template population not. Following such treatment, the respective populations can be amplified (which in the case of the original template population converts uracils to thymines). The populations can also be subjected to biotin probe hybridization for enrichment. The respective populations are then analyzed and sequences compared to determine which cytosines were 5- methylated (or 5-hydroxylmethylated) in the original. Detection of a T nucleotide in the Attorney Docket No. GH0160WO template population (corresponding to an unmethylated cytosine converted to uracil) and a C nucleotide at the corresponding position of the amplified population indicates an unmodified C. The presence of C's at corresponding positions of the original template and amplified populations indicates a modified C in the original sample. [0080] In some embodiments, a method uses sequential DNA-seq and bisulfite-seq (BIS- seq) NGS library preparation of molecular tagged DNA libraries. This process is performed by labeling of adapters (e.g., biotin), DNA-seq amplification of whole library, parent molecule recovery (e.g. streptavidin bead pull down), bisulfite conversion and BIS-seq. In some embodiments, the method identifies 5-methylcytosine with single-base resolution, through sequential NGS-preparative amplification of parent library molecules with and without bisulfite treatment. This can be achieved by modifying the 5-methyl-ated NGS-adapters (directional adapters; Y-shaped/forked with 5-methylcytosine replacing) used in BIS-seq with a label (e.g., biotin) on one of the two adapter strands. Sample DNA molecules are adapter ligated, and amplified (e.g., by PCR). As only the parent molecules will have a labeled adapter end, they can be selectively recovered from their amplified progeny by label-specific capture methods (e.g., streptavidin-magnetic beads). As the parent molecules retain 5-methylation marks, bisulfite conversion on the captured library will yield single-base resolution 5- methylation status upon BIS-seq, retaining molecular information to corresponding DNA-seq. In some embodiments, the bisulfite treated library can be combined with a non-treated library prior to enrichment/NGS by addition of a sample tag DNA sequence in standard multiplexed NGS workflow. As with BIS-seq workflows, bioinformatics analysis can be carried out for genomic alignment and 5-methylated base identification. In sum, this method provides the ability to selectively recover the parent, ligated molecules, carrying 5-methylcytosine marks, after library amplification, thereby allowing for parallel processing for bisulfite converted DNA. This overcomes the destructive nature of bisulfite treatment on the quality/sensitivity of the DNA-seq information extracted from a workflow. With this method, the recovered ligated, parent DNA molecules (via labeled adapters) allow amplification of the complete DNA library and parallel application of treatments that elicit epigenetic DNA modifications. The present disclosure discusses the use of BIS-seq methods to identify cytosine5-methylation (5- methylcytosine), but this should is not limiting. Variants of BIS-seq have been developed to identify hydroxymethylated cytosines (5hmC; OX- BS-seq, TAB-seq), formylcytosine (5fC; redBS-seq) and carboxylcytosines. These methodologies can be implemented with the sequential/parallel library preparation described herein. Attorney Docket No. GH0160WO Alternative Methods of Modified Nucleic Acid Analysis [0081] The disclosure provides alternative methods for analyzing modified nucleic acids (e.g., methylated, linked to histones and other modifications discussed above). In some such methods, a population of nucleic acids bearing the modification to different extents (e.g., 0, 1, 2, 3, 4, 5 or more methyl groups per nucleic acid molecule) is contacted with adapters before fractionation of the population depending on the extent of the modification. Adapters attach to either one end or both ends of nucleic acid molecules in the population. Preferably, the adapters include different tags of sufficient numbers that the number of combinations of tags results in a low probability e.g., 95, 99 or 99.9% of two nucleic acids with the same start and stop points receiving the same combination of tags. Following attachment of adapters, the nucleic acids are amplified from primers binding to the primer binding sites within the adapters. Adapters, whether bearing the same or different tags, can include the same or different primer binding sites, but preferably adapters include the same primer binding site. Following amplification, the nucleic acids are contacted with an agent that preferably binds to nucleic acids bearing the modification (such as the previously described such agents). The nucleic acids are separated into at least two partitions differing in the extent to which the nucleic acids bear the modification from binding to the agents. For example, if the agent has affinity for nucleic acids bearing the modification, nucleic acids overrepresented in the modification (compared with median representation in the population) preferentially bind to the agent, whereas nucleic acids underrepresented for the modification do not bind or are more easily eluted from the agent. Following separation, the different partitions can then be subject to further processing steps, which typically include further amplification, and sequence analysis, in parallel but separately. Sequence data from the different partitions can then be compared. [0082] Nucleic acids can be linked at both ends to Y-shaped adapters including primer binding sites and tags. The molecules are amplified. The amplified molecules are then fractionated by contact with an antibody preferentially binding to 5-methylcytosine to produce two partitions. One partition includes original molecules lacking methylation and amplification copies having lost methylation. The other partition includes original DNA molecules with methylation. The two partitions are then processed and sequenced separately with further amplification of the methylated partition. The sequence data of the two partitions can then be compared. In this example, tags are not used to distinguish between methylated and unmethylated DNA but rather to distinguish between different molecules within these partitions so that one can determine whether reads with the same start and stop points are based on the same or different molecules. Attorney Docket No. GH0160WO [0083] The disclosure provides further methods for analyzing a population of nucleic acid in which at least some of the nucleic acids include one or more modified cytosine residues, such as 5-methylcytosine and any of the other modifications described previously. In these methods, the population of nucleic acids is contacted with adapters including one or more cytosine residues modified at the 5C position, such as 5-methylcytosine. Preferably all cytosine residues in such adapters are also modified, or all such cytosines in a primer binding region of the adapters are modified. Adapters attach to both ends of nucleic acid molecules in the population. Preferably, the adapters include different tags of sufficient numbers that the number of combinations of tags results in a low probability e.g., 95, 99 or 99.9% of two nucleic acids with the same start and stop points receiving the same combination of tags. The primer binding sites in such adapters can be the same or different, but are preferably the same. After attachment of adapters, the nucleic acids are amplified from primers binding to the primer binding sites of the adapters. The amplified nucleic acids are split into first and second aliquots. The first aliquot is assayed for sequence data with or without further processing. The sequence data on molecules in the first aliquot is thus determined irrespective of the initial methylation state of the nucleic acid molecules. The nucleic acid molecules in the second aliquot are treated with bisulfite. This treatment converts unmodified cytosines to uracils. The bisulfite treated nucleic acids are then subjected to amplification primed by primers to the original primer binding sites of the adapters linked to nucleic acid. Only the nucleic acid molecules originally linked to adapters (as distinct from amplification products thereof) are now amplifiable because these nucleic acids retain cytosines in the primer binding sites of the adapters, whereas amplification products have lost the methylation of these cytosine residues, which have undergone conversion to uracils in the bisulfite treatment. Thus, only original molecules in the populations, at least some of which are methylated, undergo amplification. After amplification, these nucleic acids are subject to sequence analysis. Comparison of sequences determined from the first and second aliquots can indicate among other things, which cytosines in the nucleic acid population were subject to methylation. Partitioning the Sample into a Plurality of Subsamples; Aspects of Samples; Analysis of Epigenetic Characteristics [0084] In certain embodiments described herein, a population of different forms of nucleic acids (e.g., hypermethylated and hypomethylated DNA in a sample, such as a captured set of cfDNA as described herein) can be physically partitioned based on one or more characteristics of the nucleic acids prior to further analysis, e.g., differentially modifying or isolating a Attorney Docket No. GH0160WO nucleobase, tagging, and/or sequencing. This approach can be used to determine, for example, whether certain sequences are hypermethylated or hypomethylated. In some embodiments, hypermethylation variable epigenetic target regions are analyzed to determine whether they show hypermethylation characteristic of tumor cells and/or hypomethylation variable epigenetic target regions are analyzed to determine whether they show hypomethylation characteristic of tumor cells. Additionally, by partitioning a heterogeneous nucleic acid population, one may increase rare signals, e.g., by enriching rare nucleic acid molecules that are more prevalent in one fraction (or partition) of the population. For example, a genetic variation present in hyper-methylated DNA but less (or not) in hypomethylated DNA can be more easily detected by partitioning a sample into hyper-methylated and hypo-methylated nucleic acid molecules. By analyzing multiple fractions of a sample, a multi-dimensional analysis of a single locus of a genome or species of nucleic acid can be performed and hence, greater sensitivity can be achieved. [0085] In some instances, a heterogeneous nucleic acid sample is partitioned into two or more partitions (e.g., at least 3, 4, 5, 6 or 7 partitions). In some embodiments, each partition is differentially tagged. Tagged partitions can then be pooled together for collective sample prep and/or sequencing. The partitioning-tagging-pooling steps can occur more than once, with each round of partitioning occurring based on a different characteristics (examples provided herein), and tagged using differential tags that are distinguished from other partitions and partitioning means. [0086] Examples of characteristics that can be used for partitioning include sequence length, methylation level, nucleosome binding, sequence mismatch, immunoprecipitation, and/or proteins that bind to DNA. Resulting partitions can include one or more of the following nucleic acid forms: single-stranded DNA (ssDNA), double-stranded DNA (dsDNA), shorter DNA fragments and longer DNA fragments. In some embodiments, partitioning based on a cytosine modification (e.g., cytosine methylation) or methylation generally is performed and is optionally combined with at least one additional partitioning step, which may be based on any of the foregoing characteristics or forms of DNA. In some embodiments, a heterogeneous population of nucleic acids is partitioned into nucleic acids with one or more epigenetic modifications and without the one or more epigenetic modifications. Examples of epigenetic modifications include presence or absence of methylation; level of methylation; type of methylation (e.g., 5-methylcytosine versus other types of methylation, such as adenine methylation and/or cytosine hydroxymethylation); and association and level of association with one or more proteins, such as histones. Alternatively or additionally, a heterogeneous Attorney Docket No. GH0160WO population of nucleic acids can be partitioned into nucleic acid molecules associated with nucleosomes and nucleic acid molecules devoid of nucleosomes. Alternatively or additionally, a heterogeneous population of nucleic acids may be partitioned into single-stranded DNA (ssDNA) and double-stranded DNA (dsDNA). Alternatively, or additionally, a heterogeneous population of nucleic acids may be partitioned based on nucleic acid length (e.g., molecules of up to 160 bp and molecules having a length of greater than 160 bp). [0087] In some instances, each partition (representative of a different nucleic acid form) is differentially labelled, and the partitions are pooled together prior to sequencing. In other instances, the different forms are separately sequenced. In some embodiments, a population of different nucleic acids is partitioned into two or more different partitions. Each partition is representative of a different nucleic acid form, and a first partition (also referred to as a subsample) includes DNA with a cytosine modification in a greater proportion than a second subsample. Each partition is distinctly tagged. The first subsample is subjected to a procedure that affects a first nucleobase in the DNA differently from a second nucleobase in the DNA of the first subsample, wherein the first nucleobase is a modified or unmodified nucleobase, the second nucleobase is a modified or unmodified nucleobase different from the first nucleobase, and the first nucleobase and the second nucleobase have the same base pairing specificity. The tagged nucleic acids are pooled together prior to sequencing. Sequence reads are obtained and analyzed, including to distinguish the first nucleobase from the second nucleobase in the DNA of the first subsample, in silico. Tags are used to sort reads from different partitions. Analysis to detect genetic variants can be performed on a partition-by-partition level, as well as whole nucleic acid population level. For example, analysis can include in silico analysis to determine genetic variants, such as CNV, SNV, indel, fusion in nucleic acids in each partition. In some instances, in silico analysis can include determining chromatin structure. For example, coverage of sequence reads can be used to determine nucleosome positioning in chromatin. Higher coverage can correlate with higher nucleosome occupancy in genomic region while lower coverage can correlate with lower nucleosome occupancy or nucleosome depleted region (NDR). [0088] Samples can include nucleic acids varying in modifications including post- replication modifications to nucleotides and binding, usually noncovalently, to one or more proteins. [0089] In an embodiment, the population of nucleic acids is one obtained from a serum, plasma or blood sample from a subject suspected of having neoplasia, a tumor, or cancer or previously diagnosed with neoplasia, a tumor, or cancer. The population of nucleic acids Attorney Docket No. GH0160WO includes nucleic acids having varying levels of methylation. Methylation can occur from any one or more post-replication or transcriptional modifications. Post-replication modifications include modifications of the nucleotide cytosine, particularly at the 5-position of the nucleobase, e.g., 5-methylcytosine, 5-hydroxymethylcytosine, 5-formylcytosine and 5- carboxylcytosine. The affinity agents can be antibodies with the desired specificity, natural binding partners or variants thereof (Bock et al., Nat Biotech 28: 1106-1114 (2010); Song et al., Nat Biotech 29: 68-72 (2011)), or artificial peptides selected e.g., by phage display to have specificity to a given target. [0090] Examples of capture moieties contemplated herein include methyl binding domain (MBDs) and methyl binding proteins (MBPs) as described herein, including proteins such as MeCP2 and antibodies preferentially binding to 5-methylcytosine. Likewise, partitioning of different forms of nucleic acids can be performed using histone binding proteins which can separate nucleic acids bound to histones from free or unbound nucleic acids. Examples of histone binding proteins that can be used in the methods disclosed herein include RBBP4, RbAp48 and SANT domain peptides. Although for some affinity agents and modifications, binding to the agent may occur in an essentially all or none manner depending on whether a nucleic acid bears a modification, the separation may be one of degree. In such instances, nucleic acids overrepresented in a modification bind to the agent at a greater extent that nucleic acids underrepresented in the modification. Alternatively, nucleic acids having modifications may bind in an all or nothing manner. But then, various levels of modifications may be sequentially eluted from the binding agent. [0091] For example, in some embodiments, partitioning can be binary or based on degree/level of modifications. For example, all methylated fragments can be partitioned from unmethylated fragments using methyl-binding domain proteins (e.g., MethylMiner Methylated DNA Enrichment Kit (ThermoFisher Scientific)). Subsequently, additional partitioning may involve eluting fragments having different levels of methylation by adjusting the salt concentration in a solution with the methyl-binding domain and bound fragments. As salt concentration increases, fragments having greater methylation levels are eluted. In some instances, the final partitions are representative of nucleic acids having different extents of modifications (overrepresentative or underrepresentative of modifications). Overrepresentation and underrepresentation can be defined by the number of modifications born by a nucleic acid relative to the median number of modifications per strand in a population. For example, if the median number of 5-methylcytosine residues in nucleic acid in a sample is 2, a nucleic acid including more than two 5-methylcytosine residues is Attorney Docket No. GH0160WO overrepresented in this modification and a nucleic acid with 1 or zero 5-methylcytosine residues is underrepresented. The effect of the affinity separation is to enrich for nucleic acids overrepresented in a modification in a bound phase and for nucleic acids underrepresented in a modification in an unbound phase (i.e. in solution). The nucleic acids in the bound phase can be eluted before subsequent processing. [0092] When using MethylMiner Methylated DNA Enrichment Kit (ThermoFisher Scientific) various levels of methylation can be partitioned using sequential elutions. For example, a hypomethylated partition (e.g., no methylation) can be separated from a methylated partition by contacting the nucleic acid population with the MBD from the kit, which is attached to magnetic beads. The beads are used to separate out the methylated nucleic acids from the non- methylated nucleic acids. Subsequently, one or more elution steps are performed sequentially to elute nucleic acids having different levels of methylation. For example, a first set of methylated nucleic acids can be eluted at a salt concentration of 160 mM or higher, e.g., at least 150 mM, at least 200 mM, at least 300 mM, at least 400 mM, at least 500 mM, at least 600 mM, at least 700 mM, at least 800 mM, at least 900 mM, at least 1000 mM, or at least 2000 mM. After such methylated nucleic acids are eluted, magnetic separation is once again used to separate higher level of methylated nucleic acids from those with lower level of methylation. The elution and magnetic separation steps can repeat themselves to create various partitions such as a hypomethylated partition (representative of no methylation), a methylated partition (representative of low level of methylation), and a hyper methylated partition (representative of high level of methylation). [0093] In some methods, nucleic acids bound to an agent used for affinity separation are subjected to a wash step. The wash step washes off nucleic acids weakly bound to the affinity agent. Such nucleic acids can be enriched in nucleic acids having the modification to an extent close to the mean or median (i.e., intermediate between nucleic acids remaining bound to the solid phase and nucleic acids not binding to the solid phase on initial contacting of the sample with the agent). The affinity separation results in at least two, and sometimes three or more partitions of nucleic acids with different extents of a modification. While the partitions are still separate, the nucleic acids of at least one partition, and usually two or three (or more) partitions are linked to nucleic acid tags, usually provided as components of adapters, with the nucleic acids in different partitions receiving different tags that distinguish members of one partition from another. The tags linked to nucleic acid molecules of the same partition can be the same or different from one another. But if different from one another, the tags may have part of their code in common so as to identify the molecules to which they are attached as being of a Attorney Docket No. GH0160WO particular partition. For further details regarding portioning nucleic acid samples based on characteristics such as methylation, see WO2018/119452, which is incorporated herein by reference. In some embodiments, the nucleic acid molecules can be fractionated into different partitions based on the nucleic acid molecules that are bound to a specific protein or a fragment thereof and those that are not bound to that specific protein or fragment thereof. [0094] Nucleic acid molecules can be fractionated based on DNA-protein binding. Protein- DNA complexes can be fractionated based on a specific property of a protein. Examples of such properties include various epitopes, modifications (e.g., histone methylation or acetylation) or enzymatic activity. Examples of proteins which may bind to DNA and serve as a basis for fractionation may include, but are not limited to, protein A and protein G. Any suitable method can be used to fractionate the nucleic acid molecules based on protein bound regions. Examples of methods used to fractionate nucleic acid molecules based on protein bound regions include, but are not limited to, SDS-PAGE, chromatin-immuno-precipitation (ChIP), heparin chromatography, and asymmetrical field flow fractionation (AF4). [0095] In some embodiments, partitioning of the nucleic acids is performed by contacting the nucleic acids with a methylation binding domain (“MBD”) of a methylation binding protein (“MBP”). MBD binds to 5-methylcytosine (5mC). MBD is coupled to paramagnetic beads, such as Dynabeads® M-280 Streptavidin via a biotin linker. Partitioning into fractions with different extents of methylation can be performed by eluting fractions by increasing the NaCl concentration. [0096] An exemplary method for molecular tag identification of MBD-bead partitioned libraries through NGS is as follows: [0097] Physical partitioning of an extracted DNA sample (e.g., extracted blood plasma DNA from a human sample) using a methyl-binding domain protein-bead purification kit, saving all elutions from process for downstream processing. [0098] Parallel application of differential molecular tags and NGS-enabling adapter sequences to each partition. For example, the hypermethylated, residual methylation ('wash'), and hypomethylated partitions are ligated with NGS-adapters with molecular tags. [0099] Re-combining all molecular tagged partitions, and subsequent amplification using adapter-specific DNA primer sequences. [0100] Enrichment/hybridization of re-combined and amplified total library, targeting genomic regions of interest (e.g., cancer-specific genetic variants and differentially methylated regions). Attorney Docket No. GH0160WO [0101] Re-amplification of the enriched total DNA library, appending a sample tag. Different samples are pooled, and assayed in multiplex on an NGS instrument. [0102] Bioinformatics analysis of NGS data, with the molecular tags being used to identify unique molecules, as well deconvolution of the sample into molecules that were differentially MBD-partitioned. This analysis can yield information on relative 5-methylcytosine for genomic regions, concurrent with standard genetic sequencing/variant detection. [0103] Examples of MBPs contemplated herein include, but are not limited to: [0104] (a) MeCP2 is a protein preferentially binding to 5-methyl-cytosine over unmodified cytosine. [0105] (b) RPL26, PRP8 and the DNA mismatch repair protein MHS6 preferentially bind to 5- hydroxymethyl-cytosine over unmodified cytosine. [0106] (c) FOXK1, FOXK2, FOXP1, FOXP4 and FOXI3 preferably bind to 5-formyl- cytosine over unmodified cytosine (Iurlaro et al., Genome Biol.14: R119 (2013)). [0107] (d) Antibodies specific to one or more methylated nucleotide bases. [0108] In general, elution is a function of number of methylated sites per molecule, with molecules having more methylation eluting under increased salt concentrations. To elute the DNA into distinct populations based on the extent of methylation, one can use a series of elution buffers of increasing NaCl concentration. Salt concentration can range from about 100 nM to about 2500 mM NaCl. In one embodiment, the process results in three (3) partitions. Molecules are contacted with a solution at a first salt concentration and including a molecule including a methyl binding domain, which molecule can be attached to a capture moiety, such as streptavidin. At the first salt concentration a population of molecules will bind to the MBD and a population will remain unbound. The unbound population can be separated as a “hypomethylated” population. For example, a first partition representative of the hypomethylated form of DNA is that which remains unbound at a low salt concentration, e.g., 100 mM or 160 mM. A second partition representative of intermediate methylated DNA is eluted using an intermediate salt concentration, e.g., between 100 mM and 2000 mM concentration. This is also separated from the sample. A third partition representative of hypermethylated form of DNA is eluted using a high salt concentration, e.g., at least about 2000 mM. [0109] The disclosure provides further methods for analyzing a population of nucleic acids in which at least some of the nucleic acids include one or more modified cytosine residues, such as 5-methylcytosine and any of the other modifications described previously. In these methods, after partitioning, the subsamples of nucleic acids are contacted with adapters Attorney Docket No. GH0160WO including one or more cytosine residues modified at the 5C position, such as 5-methylcytosine. Preferably all cytosine residues in such adapters are also modified, or all such cytosines in a primer binding region of the adapters are modified. Adapters attach to both ends of nucleic acid molecules in the population. Preferably, the adapters include different tags of sufficient numbers that the number of combinations of tags results in a low probability e.g., 95, 99 or 99.9% of two nucleic acids with the same start and stop points receiving the same combination of tags. The primer binding sites in such adapters can be the same or different, but are preferably the same. After attachment of adapters, the nucleic acids are amplified from primers binding to the primer binding sites of the adapters. The amplified nucleic acids are split into first and second aliquots. The first aliquot is assayed for sequence data with or without further processing. The sequence data on molecules in the first aliquot is thus determined irrespective of the initial methylation state of the nucleic acid molecules. The nucleic acid molecules in the second aliquot are subjected to a procedure that affects a first nucleobase in the DNA differently from a second nucleobase in the DNA, wherein the first nucleobase includes a cytosine modified at the 5 position, and the second nucleobase includes unmodified cytosine. This procedure may be bisulfite treatment or another procedure that converts unmodified cytosines to uracils. The nucleic acids subjected to the procedure are then amplified with primers to the original primer binding sites of the adapters linked to nucleic acid. Only the nucleic acid molecules originally linked to adapters (as distinct from amplification products thereof) are now amplifiable because these nucleic acids retain cytosines in the primer binding sites of the adapters, whereas amplification products have lost the methylation of these cytosine residues, which have undergone conversion to uracils in the bisulfite treatment. Thus, only original molecules in the populations, at least some of which are methylated, undergo amplification. After amplification, these nucleic acids are subject to sequence analysis. Comparison of sequences determined from the first and second aliquots can indicate among other things, which cytosines in the nucleic acid population were subject to methylation. [0110] Such an analysis can be performed using the following exemplary procedure. After partitioning, methylated DNA is linked to Y-shaped adapters at both ends including primer binding sites and tags. The cytosines in the adapters are modified at the 5 position (e.g., 5- methylated). The modification of the adapters serves to protect the primer binding sites in a subsequent conversion step (e.g., bisulfite treatment, TAP conversion, or any other conversion that does not affect the modified cytosine but affects unmodified cytosine). After attachment of adapters, the DNA molecules are amplified. The amplification product is split into two aliquots for sequencing with and without conversion. The aliquot not subjected to conversion Attorney Docket No. GH0160WO can be subjected to sequence analysis with or without further processing. The other aliquot is subjected to a procedure that affects a first nucleobase in the DNA differently from a second nucleobase in the DNA, wherein the first nucleobase includes a cytosine modified at the 5 position, and the second nucleobase includes unmodified cytosine. This procedure may be bisulfite treatment or another procedure that converts unmodified cytosines to uracils. Only primer binding sites protected by modification of cytosines can support amplification when contacted with primers specific for original primer binding sites. Thus, only original molecules and not copies from the first amplification are subjected to further amplification. The further amplified molecules are then subjected to sequence analysis. Sequences can then be compared from the two aliquots. As in the separation scheme discussed above, nucleic acid tags in adapters are not used to distinguish between methylated and unmethylated DNA but to distinguish nucleic acid molecules within the same partition. Subjecting the First Subsample to a Procedure that Affects a First Nucleobase in the DNA Differently from a Second Nucleobase in the DNA of the First Subsample [0111] Methods disclosed herein comprise a step of subjecting the first subsample to a procedure that affects a first nucleobase in the DNA differently from a second nucleobase in the DNA of the first subsample, wherein the first nucleobase is a modified or unmodified nucleobase, the second nucleobase is a modified or unmodified nucleobase different from the first nucleobase, and the first nucleobase and the second nucleobase have the same base pairing specificity. In some embodiments, if the first nucleobase is a modified or unmodified adenine, then the second nucleobase is a modified or unmodified adenine; if the first nucleobase is a modified or unmodified cytosine, then the second nucleobase is a modified or unmodified cytosine; if the first nucleobase is a modified or unmodified guanine, then the second nucleobase is a modified or unmodified guanine; and if the first nucleobase is a modified or unmodified thymine, then the second nucleobase is a modified or unmodified thymine (where modified and unmodified uracil are encompassed within modified thymine for the purpose of this step). [0112] In some embodiments, the first nucleobase is a modified or unmodified cytosine, then the second nucleobase is a modified or unmodified cytosine. For example, first nucleobase may comprise unmodified cytosine (C) and the second nucleobase may comprise one or more of 5-methylcytosine (mC) and 5-hydroxymethylcytosine (hmC). Alternatively, the second nucleobase may comprise C and the first nucleobase may comprise one or more of mC and hmC. Other combinations are also possible, as indicated, e.g., in the Summary above and the Attorney Docket No. GH0160WO following discussion, such as where one of the first and second nucleobases includes mC and the other includes hmC. [0113] In some embodiments, the procedure that affects a first nucleobase in the DNA differently from a second nucleobase in the DNA of the first subsample includes bisulfite conversion. Treatment with bisulfite converts unmodified cytosine and certain modified cytosine nucleotides (e.g. 5-formyl cytosine (fC) or 5-carboxylcytosine (caC)) to uracil whereas other modified cytosines (e.g., 5-methylcytosine, 5-hydroxylmethylcystosine) are not converted. Thus, where bisulfite conversion is used, the first nucleobase includes one or more of unmodified cytosine, 5-formyl cytosine, 5-carboxylcytosine, or other cytosine forms affected by bisulfite, and the second nucleobase may comprise one or more of mC and hmC, such as mC and optionally hmC. Sequencing of bisulfite-treated DNA identifies positions that are read as cytosine as being mC or hmC positions. Meanwhile, positions that are read as T are identified as being T or a bisulfite-susceptible form of C, such as unmodified cytosine, 5-formyl cytosine, or 5-carboxylcytosine. Performing bisulfite conversion on a first subsample as described herein thus facilitates identifying positions containing mC or hmC using the sequence reads obtained from the first subsample. For an exemplary description of bisulfite conversion, see, e.g., Moss et al., Nat Commun.2018; 9: 5068.. [0114] In some embodiments, the procedure that affects a first nucleobase in the DNA differently from a second nucleobase in the DNA of the first subsample includes oxidative bisulfite (Ox-BS) conversion. In some embodiments, the procedure that affects a first nucleobase in the DNA differently from a second nucleobase in the DNA of the first subsample includes Tet-assisted bisulfite (TAB) conversion. In some embodiments, the procedure that affects a first nucleobase in the DNA differently from a second nucleobase in the DNA of the first subsample includes Tet-assisted conversion with a substituted borane reducing agent, optionally wherein the substituted borane reducing agent is 2-picoline borane, borane pyridine, tert-butylamine borane, or ammonia borane. In some embodiments, the procedure that affects a first nucleobase in the DNA differently from a second nucleobase in the DNA of the first subsample includes chemical-assisted conversion with a substituted borane reducing agent, optionally wherein the substituted borane reducing agent is 2-picoline borane, borane pyridine, tert-butylamine borane, or ammonia borane. In some embodiments, the procedure that affects a first nucleobase in the DNA differently from a second nucleobase in the DNA of the first subsample includes APOBEC-coupled epigenetic (ACE) conversion. [0115] In some embodiments, procedure that affects a first nucleobase in the DNA differently from a second nucleobase in the DNA of the first subsample includes enzymatic Attorney Docket No. GH0160WO conversion of the first nucleobase, e.g., as in EM-Seq. See, e.g., Vaisvila R, et al. (2019) EM- seq: Detection of DNA methylation at single base resolution from picograms of DNA. bioRxiv; DOI: 10.1101/2019.12.20.884692, available at www.biorxiv.org/content/10.1101/2019.12.20.884692v1. For example, TET2 and T4-βGT can be used to convert 5mC and 5hmC into substrates that cannot be deaminated by a deaminase (e.g., APOBEC3A), and then a deaminase (e.g., APOBEC3A) can be used to deaminate unmodified cytosines converting them to uracils. [0116] In some embodiments, the procedure that affects a first nucleobase in the DNA differently from a second nucleobase in the DNA of the first subsample includes separating DNA originally including the first nucleobase from DNA not originally including the first nucleobase. [0117] In some embodiments, the first nucleobase is a modified or unmodified adenine, and the second nucleobase is a modified or unmodified adenine. In some embodiments, the modified adenine is N6-methyladenine (mA). In some embodiments, the modified adenine is one or more of N6-methyladenine (mA), N6-hydroxymethyladenine (hmA), or N6- formyladenine (fA). [0118] Techniques including methylated DNA immunoprecipitation (MeDIP) can be used to separate DNA containing modified bases such as mA from other DNA. See, e.g., Kumar et al., Frontiers Genet.2018; 9: 640; Greer et al., Cell 2015; 161: 868-878. An antibody specific for mA is described in Sun et al., Bioessays 2015; 37:1155-62. Antibodies for various modified nucleobases, such as forms of thymine/uracil including halogenated forms such as 5- bromouracil, are commercially available. Various modified bases can also be detected based on alterations in their base-pairing specificity. For example, hypoxanthine is a modified form of adenine that can result from deamination and is read in sequencing as a G. See, e.g., US Patent 8,486,630; Brown, Genomes, 2nd Ed., John Wiley & Sons, Inc., New York, N.Y., 2002, chapter 14, “Mutation, Repair, and Recombination.” Enriching/Capturing Step, Amplification, Adaptors, Barcodes [0119] In some embodiments, methods disclosed herein comprise a step of capturing one or more sets of target regions of DNA, such as cfDNA. Capture may be performed using any suitable approach known in the art. In some embodiments, capturing includes contacting the DNA to be captured with a set of target-specific probes. The set of target-specific probes may have any of the features described herein for sets of target-specific probes, including but not limited to in the embodiments set forth above and the sections relating to probes below. Attorney Docket No. GH0160WO Capturing may be performed on one or more subsamples prepared during methods disclosed herein. In some embodiments, DNA is captured from at least the first subsample or the second subsample, e.g., at least the first subsample and the second subsample. Where the first subsample undergoes a separation step (e.g., separating DNA originally including the first nucleobase (e.g., hmC) from DNA not originally including the first nucleobase, such as hmC- seal), capturing may be performed on any, any two, or all of the DNA originally including the first nucleobase (e.g., hmC), the DNA not originally including the first nucleobase, and the second subsample. In some embodiments, the subsamples are differentially tagged (e.g., as described herein) and then pooled before undergoing capture. [0120] The capturing step may be performed using conditions suitable for specific nucleic acid hybridization, which generally depend to some extent on features of the probes such as length, base composition, etc. Those skilled in the art will be familiar with appropriate conditions given general knowledge in the art regarding nucleic acid hybridization. In some embodiments, complexes of target-specific probes and DNA are formed. [0121] In some embodiments, a method described herein includes capturing cfDNA obtained from a test subject for a plurality of sets of target regions. The target regions comprise epigenetic target regions, which may show differences in methylation levels and/or fragmentation patterns depending on whether they originated from a tumor or from healthy cells. The target regions also comprise sequence-variable target regions, which may show differences in sequence depending on whether they originated from a tumor or from healthy cells. The capturing step produces a captured set of cfDNA molecules, and the cfDNA molecules corresponding to the sequence-variable target region set are captured at a greater capture yield in the captured set of cfDNA molecules than cfDNA molecules corresponding to the epigenetic target region set. For additional discussion of capturing steps, capture yields, and related aspects, see WO2020/160414, which is incorporated herein by reference for all purposes. [0122] In some embodiments, a method described herein includes contacting cfDNA obtained from a test subject with a set of target-specific probes, wherein the set of target- specific probes is configured to capture cfDNA corresponding to the sequence-variable target region set at a greater capture yield than cfDNA corresponding to the epigenetic target region set. [0123] It can be beneficial to capture cfDNA corresponding to the sequence-variable target region set at a greater capture yield than cfDNA corresponding to the epigenetic target region set because a greater depth of sequencing may be necessary to analyze the sequence-variable Attorney Docket No. GH0160WO target regions with sufficient confidence or accuracy than may be necessary to analyze the epigenetic target regions. The volume of data needed to determine fragmentation patterns (e.g., to test fsor perturbation of transcription start sites or CTCF binding sites) or fragment abundance (e.g., in hypermethylated and hypomethylated partitions) is generally less than the volume of data needed to determine the presence or absence of cancer-related sequence mutations. Capturing the target region sets at different yields can facilitate sequencing the target regions to different depths of sequencing in the same sequencing run (e.g., using a pooled mixture and/or in the same sequencing cell). [0124] In various embodiments, the methods further comprise sequencing the captured cfDNA, e.g., to different degrees of sequencing depth for the epigenetic and sequence-variable target region sets, consistent with the discussion herein. In some embodiments, complexes of target-specific probes and DNA are separated from DNA not bound to target-specific probes. For example, where target-specific probes are bound covalently or noncovalently to a solid support, a washing or aspiration step can be used to separate unbound material. Alternatively, where the complexes have chromatographic properties distinct from unbound material (e.g., where the probes comprise a ligand that binds a chromatographic resin), chromatography can be used. [0125] As discussed in detail elsewhere herein, the set of target-specific probes may comprise a plurality of sets such as probes for a sequence-variable target region set and probes for an epigenetic target region set. In some such embodiments, the capturing step is performed with the probes for the sequence-variable target region set and the probes for the epigenetic target region set in the same vessel at the same time, e.g., the probes for the sequence-variable and epigenetic target region sets are in the same composition. This approach provides a relatively streamlined workflow. In some embodiments, the concentration of the probes for the sequence-variable target region set is greater that the concentration of the probes for the epigenetic target region set. [0126] Alternatively, the capturing step is performed with the sequence-variable target region probe set in a first vessel and with the epigenetic target region probe set in a second vessel, or the contacting step is performed with the sequence-variable target region probe set at a first time and a first vessel and the epigenetic target region probe set at a second time before or after the first time. This approach allows for preparation of separate first and second compositions including captured DNA corresponding to the sequence-variable target region set and captured DNA corresponding to the epigenetic target region set. The compositions can be processed separately as desired (e.g., to fractionate based on methylation as described Attorney Docket No. GH0160WO elsewhere herein) and recombined in appropriate proportions to provide material for further processing and analysis such as sequencing. [0127] In some embodiments, the DNA is amplified. In some embodiments, amplification is performed before the capturing step. In some embodiments, amplification is performed after the capturing step. [0128] In some embodiments, adapters are included in the DNA. This may be done concurrently with an amplification procedure, e.g., by providing the adapters in a 5’ portion of a primer, e.g., as described above. Alternatively, adapters can be added by other approaches, such as ligation. [0129] In some embodiments, tags, which may be or include barcodes, are included in the DNA. Tags can facilitate identification of the origin of a nucleic acid. For example, barcodes can be used to allow the origin (e.g., subject) whence the DNA came to be identified following pooling of a plurality of samples for parallel sequencing. This may be done concurrently with an amplification procedure, e.g., by providing the barcodes in a 5’ portion of a primer, e.g., as described above. In some embodiments, adapters and tags/barcodes are provided by the same primer or primer set. For example, the barcode may be located 3’ of the adapter and 5’ of the target-hybridizing portion of the primer. Alternatively, barcodes can be added by other approaches, such as ligation, optionally together with adapters in the same ligation substrate. [0130] Additional details regarding amplification, tags, and barcodes are discussed in the “General Features of the Methods” section below, which can be combined to the extent practicable with any of the foregoing embodiments and the embodiments set forth in the introduction and summary section. Captured Set [0131] In some embodiments, a captured set of DNA (e.g., cfDNA) is provided. With respect to the disclosed methods, the captured set of DNA may be provided, e.g., by performing a capturing step after a partitioning step as described herein. The captured set may comprise DNA corresponding to a sequence-variable target region set, an epigenetic target region set, or a combination thereof. In some embodiments the quantity of captured sequence-variable target region DNA is greater than the quantity of the captured epigenetic target region DNA, when normalized for the difference in the size of the targeted regions (footprint size). Attorney Docket No. GH0160WO [0132] Alternatively, first and second captured sets may be provided, including, respectively, DNA corresponding to a sequence-variable target region set and DNA corresponding to an epigenetic target region set. The first and second captured sets may be combined to provide a combined captured set. [0133] In some embodiments in which a captured set including DNA corresponding to the sequence-variable target region set and the epigenetic target region set includes a combined captured set as discussed above, the DNA corresponding to the sequence-variable target region set may be present at a greater concentration than the DNA corresponding to the epigenetic target region set, e.g., a 1.1 to 1.2-fold greater concentration, a 1.2- to 1.4-fold greater concentration, a 1.4- to 1.6-fold greater concentration, a 1.6- to 1.8-fold greater concentration, a 1.8- to 2.0-fold greater concentration, a 2.0- to 2.2-fold greater concentration, a 2.2- to 2.4- fold greater concentration a 2.4- to 2.6-fold greater concentration, a 2.6- to 2.8-fold greater concentration, a 2.8- to 3.0-fold greater concentration, a 3.0- to 3.5-fold greater concentration, a 3.5- to 4.0, a 4.0- to 4.5-fold greater concentration, a 4.5- to 5.0-fold greater concentration, a 5.0- to 5.5-fold greater concentration, a 5.5- to 6.0-fold greater concentration, a 6.0- to 6.5-fold greater concentration, a 6.5- to 7.0-fold greater, a 7.0- to 7.5-fold greater concentration, a 7.5- to 8.0-fold greater concentration, an 8.0- to 8.5-fold greater concentration, an 8.5- to 9.0-fold greater concentration, a 9.0- to 9.5-fold greater concentration, 9.5- to 10.0-fold greater concentration, a 10- to 11-fold greater concentration, an 11- to 12-fold greater concentration a 12- to 13-fold greater concentration, a 13- to 14-fold greater concentration, a 14- to 15-fold greater concentration, a 15- to 16-fold greater concentration, a 16- to 17-fold greater concentration, a 17- to 18-fold greater concentration, an 18- to 19-fold greater concentration, a 19- to 20-fold greater concentration, a 20- to 30-fold greater concentration, a 30- to 40-fold greater concentration, a 40- to 50-fold greater concentration, a 50- to 60-fold greater concentration, a 60- to 70-fold greater concentration, a 70- to 80-fold greater concentration, a 80- to 90-fold greater concentration, a 90- to 100-fold greater concentration, a 10- to 20-fold greater concentration, a 10- to 40-fold greater concentration, a 10- to 50-fold greater concentration, a 10- to 70-fold greater concentration, or a 10- to 100-fold greater concentration. The degree of difference in concentrations accounts for normalization for the footprint sizes of the target regions, as discussed in the definition section. Epigenetic Target Region Set [0134] The epigenetic target region set may comprise one or more types of target regions likely to differentiate DNA from neoplastic (e.g., tumor or cancer) cells and from healthy cells, e.g., non-neoplastic circulating cells. Exemplary types of such regions are discussed in detail Attorney Docket No. GH0160WO herein. The epigenetic target region set may also comprise one or more control regions, e.g., as described herein. In some embodiments, the epigenetic target region set has a footprint of at least 100 kb, e.g., at least 200 kb, at least 300 kb, or at least 400 kb. In some embodiments, the epigenetic target region set has a footprint in the range of 100-1000 kb, e.g., 100-200 kb, 200- 300 kb, 300-400 kb, 400-500 kb, 500-600 kb, 600-700 kb, 700-800 kb, 800-900 kb, and 900- 1,000 kb. Hypermethylation Variable Target Regions [0135] In some embodiments, the epigenetic target region set includes one or more hypermethylation variable target regions. In general, hypermethylation variable target regions refer to regions where an increase in the level of observed methylation, e.g., in a cfDNA sample, indicates an increased likelihood that a sample (e.g., of cfDNA) contains DNA produced by neoplastic cells, such as tumor or cancer cells. For example, hypermethylation of promoters of tumor suppressor genes has been observed repeatedly. See, e.g., Kang et al., Genome Biol. 18:53 (2017) and references cited therein. In an example, hypermethylation variable target regions can include regions that do not necessarily differ in methylation in cancerous tissue relative to DNA from healthy tissue of the same type, but do differ in methylation (e.g., have more methylation) relative to cfDNA that is typical in healthy subjects. Where, for example, the presence of a cancer results in increased cell death such as apoptosis of cells of the tissue type corresponding to the cancer, such a cancer can be detected at least in part using such hypermethylation variable target regions. In some embodiments, hypermethylation variable target regions include one or more genomic regions, where the cfDNA molecules in those regions do not differ in methylation state in cancer subjects relative to cfDNA from healthy subjects, but the presence/increased quantity of hypermethylated cfDNA in those regions is indicative of a particular tissue type (e.g., cancer origin) and is presented as cfDNA with increased apoptosis (e.g. tumor shedding) into circulation. [0136] Hypermethylation target regions may be obtained, e.g., from the Cancer Genome Atlas. Kang et al., Genome Biology 18:53 (2017), describe construction of a probabilistic method called CancerLocator using hypermethylation target regions from breast, colon, kidney, liver, and lung. In some embodiments, the hypermethylation target regions can be specific to one or more types of cancer. Accordingly, in some embodiments, the hypermethylation target regions include one, two, three, four, or five subsets of hypermethylation target regions that collectively show hypermethylation in one, two, three, four, or five of breast, colon, kidney, liver, and lung cancers. Attorney Docket No. GH0160WO [0137] In some embodiments, the probes for the epigenetic target region set comprise probes specific for one or more hypermethylation variable target regions. The hypermethylation variable target regions may be any of those set forth above. For example, in some embodiments, the probes specific for hypermethylation variable target regions comprise probes specific for a plurality of loci listed in Table 1, e.g., at least 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, or 100% of the loci listed in Table 1. In some embodiments, the probes specific for hypermethylation variable target regions comprise probes specific for a plurality of loci listed in Table 2, e.g., at least 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, or 100% of the loci listed in Table 2. In some embodiments, the probes specific for hypermethylation variable target regions comprise probes specific for a plurality of loci listed in Table 1 or Table 2, e.g., at least 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, or 100% of the loci listed in Table 1 or Table 2. In some embodiments, for each locus included as a target region, there may be one or more probes with a hybridization site that binds between the transcription start site and the stop codon (the last stop codon for genes that are alternatively spliced) of the gene. In some embodiments, the one or more probes bind within 300 bp of the listed position, e.g., within 200 or 100 bp. In some embodiments, a probe has a hybridization site overlapping the position listed above. In some embodiments, the probes specific for the hypermethylation target regions include probes specific for one, two, three, four, or five subsets of hypermethylation target regions that collectively show hypermethylation in one, two, three, four, or five of breast, colon, kidney, liver, and lung cancers. Hypomethylation Variable Target Regions [0138] Global hypomethylation is a commonly observed phenomenon in various cancers. See, e.g., Hon et al., Genome Res. 22:246-258 (2012) (breast cancer); Ehrlich, Epigenomics 1:239-259 (2009) (review article noting observations of hypomethylation in colon, ovarian, prostate, leukemia, hepatocellular, and cervical cancers). For example, regions such as repeated elements, e.g., LINE1 elements, Alu elements, centromeric tandem repeats, pericentromeric tandem repeats, and satellite DNA, and intergenic regions that are ordinarily methylated in healthy cells may show reduced methylation in tumor cells. Accordingly, in some embodiments, the epigenetic target region set includes hypomethylation variable target regions, where a decrease in the level of observed methylation indicates an increased likelihood that a sample (e.g., of cfDNA) contains DNA produced by neoplastic cells, such as tumor or cancer cells. In an example, hypomethylation variable target regions can include regions that do not Attorney Docket No. GH0160WO necessarily differ in methylation state in cancerous tissue relative to DNA from healthy tissue of the same type, but do differ in methylation (e.g., are less methylated) relative to cfDNA that is typical in healthy subjects. Where, for example, the presence of a cancer results in increased cell death such as apoptosis of cells of the tissue type corresponding to the cancer, such a cancer can be detected at least in part using such hypomethylation variable target regions. In some embodiments, hypomethylation variable target regions include one or more genomic regions, where the cfDNA molecules in those regions do not differ in methylation state in cancer subjects relative to cfDNA from healthy subjects, but the presence/increased quantity of hypomethylated cfDNA in those regions is indicative of a particular tissue type (e.g., cancer origin) and is presented as cfDNA with increased apoptosis (e.g. tumor shedding) into circulation. [0139] In some embodiments, hypomethylation variable target regions include repeated elements and/or intergenic regions. In some embodiments, repeated elements include one, two, three, four, or five of LINE1 elements, Alu elements, centromeric tandem repeats, pericentromeric tandem repeats, and/or satellite DNA. [0140] Exemplary specific genomic regions that show cancer-associated hypomethylation include nucleotides 8403565-8953708 and 151104701-151106035 of human chromosome 1. In some embodiments, the hypomethylation variable target regions overlap or comprise one or both of these regions. [0141] In some embodiments, the probes for the epigenetic target region set comprise probes specific for one or more hypomethylation variable target regions. The hypomethylation variable target regions may be any of those set forth above. For example, the probes specific for one or more hypomethylation variable target regions may include probes for regions such as repeated elements, e.g., LINE1 elements, Alu elements, centromeric tandem repeats, pericentromeric tandem repeats, and satellite DNA, and intergenic regions that are ordinarily methylated in healthy cells may show reduced methylation in tumor cells. [0142] In some embodiments, probes specific for hypomethylation variable target regions include probes specific for repeated elements and/or intergenic regions. In some embodiments, probes specific for repeated elements include probes specific for one, two, three, four, or five of LINE1 elements, Alu elements, centromeric tandem repeats, pericentromeric tandem repeats, and/or satellite DNA. [0143] Exemplary probes specific for genomic regions that show cancer-associated hypomethylation include probes specific for nucleotides 8403565-8953708 and/or 151104701- 151106035 of human chromosome 1. In some embodiments, the probes specific for Attorney Docket No. GH0160WO hypomethylation variable target regions include probes specific for regions overlapping or including nucleotides 8403565-8953708 and/or 151104701-151106035 of human chromosome [0144] Probes for detecting the panel of regions can include those for detecting genomic regions of interest (hotspot regions) as well as nucleosome-aware probes (e.g., KRAS codons 12 and 13) and may be designed to optimize capture based on analysis of cfDNA coverage and fragment size variation impacted by nucleosome binding patterns and GC sequence composition. Regions used herein can also include non-hotspot regions optimized based on nucleosome positions and GC models. Subjects [0145] In some embodiments, the DNA (e.g., cfDNA) is obtained from a subject having a cancer. In some embodiments, the DNA (e.g., cfDNA) is obtained from a subject suspected of having a cancer. In some embodiments, the DNA (e.g., cfDNA) is obtained from a subject having a tumor. In some embodiments, the DNA (e.g., cfDNA) is obtained from a subject suspected of having a tumor. In some embodiments, the DNA (e.g., cfDNA) is obtained from a subject having neoplasia. In some embodiments, the DNA (e.g., cfDNA) is obtained from a subject suspected of having neoplasia. In some embodiments, the DNA (e.g., cfDNA) is obtained from a subject in remission from a tumor, cancer, or neoplasia (e.g., following chemotherapy, surgical resection, radiation, or a combination thereof). In any of the foregoing embodiments, the cancer, tumor, or neoplasia or suspected cancer, tumor, or neoplasia may be of the lung, colon, rectum, kidney, breast, prostate, or liver. In some embodiments, the cancer, tumor, or neoplasia or suspected cancer, tumor, or neoplasia is of the lung. In some embodiments, the cancer, tumor, or neoplasia or suspected cancer, tumor, or neoplasia is of the colon or rectum. In some embodiments, the cancer, tumor, or neoplasia or suspected cancer, tumor, or neoplasia is of the breast. In some embodiments, the cancer, tumor, or neoplasia or suspected cancer, tumor, or neoplasia is of the prostate. In any of the foregoing embodiments, the subject may be a human subject. [0146] In some embodiments, the sequence-variable target region probe set has a footprint of at least 0.5 kb, e.g., at least 1 kb, at least 2 kb, at least 5 kb, at least 10 kb, at least 20 kb, at least 30 kb, or at least 40 kb. In some embodiments, the epigenetic target region probe set has a footprint in the range of 0.5-100 kb, e.g., 0.5-2 kb, 2-10 kb, 10-20 kb, 20-30 kb, 30-40 kb, 40-50 kb, 50-60 kb, 60-70 kb, 70-80 kb, 80-90 kb, and 90-100 kb. [0147] In some embodiments, the probes specific for the sequence-variable target region set comprise probes specific for target regions from at least 10, 20, 30, or 35 cancer-related genes, such as AKT1, ALK, BRAF, CCND1, CDK2A, CTNNB1, EGFR, ERBB2, ESR1, FGFR1, FGFR2, FGFR3, FOXL2, GATA3, GNA11, GNAQ, GNAS, HRAS, IDH1, IDH2, Attorney Docket No. GH0160WO KIT, KRAS, MED12, MET, MYC, NFE2L2, NRAS, PDGFRA, PIK3CA, PPP2R1A, PTEN, RET, STK11, TP53, and U2AF1. Compositions Including Captured DNA [0148] Provided herein is a combination including first and second populations of captured DNA. The first population may comprise or be derived from DNA with a cytosine modification in a greater proportion than the second population. The first population may comprise a form of a first nucleobase originally present in the DNA with altered base pairing specificity and a second nucleobase without altered base pairing specificity, wherein the form of the first nucleobase originally present in the DNA prior to alteration of base pairing specificity is a modified or unmodified nucleobase, the second nucleobase is a modified or unmodified nucleobase different from the first nucleobase, and the form of the first nucleobase originally present in the DNA prior to alteration of base pairing specificity and the second nucleobase have the same base pairing specificity. The second population does not comprise the form of the first nucleobase originally present in the DNA with altered base pairing specificity. In some embodiments, the cytosine modification is cytosine methylation. In some embodiments, the first nucleobase is a modified or unmodified cytosine and the second nucleobase is a modified or unmodified cytosine. The first and second nucleobase may be any of those discussed herein in the Summary or with respect to subjecting the first subsample to a procedure that affects a first nucleobase in the DNA differently from a second nucleobase in the DNA of the first subsample. [0149] In some embodiments, the first population includes a sequence tag selected from a first set of one or more sequence tags and the second population includes a sequence tag selected from a second set of one or more sequence tags, and the second set of sequence tags is different from the first set of sequence tags. The sequence tags may comprise barcodes. [0150] In some embodiments, the first population includes protected hmC, such as glucosylated hmC. In some embodiments, the first population was subjected to any of the conversion procedures discussed herein, such as bisulfite conversion, Ox-BS conversion, TAB conversion, ACE conversion, TAP conversion, TAPSβ conversion, or CAP conversion. In some embodiments, the first population was subjected to protection of hmC followed by deamination of mC and/or C. In some embodiments of the combination, the first population includes or was derived from DNA with a cytosine modification in a greater proportion than the second population and the first population includes first and second subpopulations, and the first nucleobase is a modified or unmodified nucleobase, the second nucleobase is a Attorney Docket No. GH0160WO modified or unmodified nucleobase different from the first nucleobase, and the first nucleobase and the second nucleobase have the same base pairing specificity. In some embodiments, the second population does not comprise the first nucleobase. In some embodiments, the first nucleobase is a modified or unmodified cytosine, and the second nucleobase is a modified or unmodified cytosine, optionally wherein the modified cytosine is mC or hmC. In some embodiments, the first nucleobase is a modified or unmodified adenine, and the second nucleobase is a modified or unmodified adenine, optionally wherein the modified adenine is mA. [0151] In some embodiments, the first nucleobase (e.g., a modified cytosine) is biotinylated. In some embodiments, the first nucleobase (e.g., a modified cytosine) is a product of a Huisgen cycloaddition to β-6-azide-glucosyl-5-hydroxymethylcytosine that includes an affinity label (e.g., biotin). [0152] In any of the combinations described herein, the captured DNA may comprise cfDNA. The captured DNA may have any of the features described herein concerning captured sets, including, e.g., a greater concentration of the DNA corresponding to the sequence-variable target region set (normalized for footprint size as discussed above) than of the DNA corresponding to the epigenetic target region set. In some embodiments, the DNA of the captured set includes sequence tags, which may be added to the DNA as described herein. In general, the inclusion of sequence tags results in the DNA molecules differing from their naturally occurring, untagged form. [0153] The combination may further comprise a probe set described herein or sequencing primers, each of which may differ from naturally occurring nucleic acid molecules. For example, a probe set described herein may comprise a capture moiety, and sequencing primers may comprise a non-naturally occurring label. Computer Systems, Processing of Real World Evidence (RWE) [0154] Methods of the present disclosure can be implemented using, or with the aid of, computer systems. For example, such methods may comprise: partitioning the sample into a plurality of subsamples, including a first subsample and a second subsample, wherein the first subsample includes DNA with a cytosine modification in a greater proportion than the second subsample; subjecting the first subsample to a procedure that affects a first nucleobase in the DNA differently from a second nucleobase in the DNA of the first subsample, wherein the first nucleobase is a modified or unmodified nucleobase, the second nucleobase is a modified or unmodified nucleobase different from the first nucleobase, and the first nucleobase and the Attorney Docket No. GH0160WO second nucleobase have the same base pairing specificity; and sequencing DNA in the first subsample and DNA in the second subsample in a manner that distinguishes the first nucleobase from the second nucleobase in the DNA of the first subsample. [0155] In an aspect, the present disclosure provides a non-transitory computer-readable medium including computer-executable instructions which, when executed by at least one electronic processor, perform at least a portion of a method including: collecting cfDNA from a test subject; capturing a plurality of sets of target regions from the cfDNA, wherein the plurality of target region sets includes a sequence-variable target region set and an epigenetic target region set, whereby a captured set of cfDNA molecules is produced; sequencing the captured cfDNA molecules, wherein the captured cfDNA molecules of the sequence-variable target region set are sequenced to a greater depth of sequencing than the captured cfDNA molecules of the epigenetic target region set; obtaining a plurality of sequence reads generated by a nucleic acid sequencer from sequencing the captured cfDNA molecules; mapping the plurality of sequence reads to one or more reference sequences to generate mapped sequence reads; and processing the mapped sequence reads corresponding to the sequence-variable target region set and to the epigenetic target region set to determine the likelihood that the subject has cancer. [0156] The code can be pre-compiled and configured for use with a machine have a processer adapted to execute the code or can be compiled during runtime. The code can be supplied in a programming language that can be selected to enable the code to execute in a pre- compiled or as-compiled fashion. [0157] Additional details relating to computer systems and networks, databases, and computer program products are also provided in, for example, Peterson, Computer Networks: A Systems Approach, Morgan Kaufmann, 5th Ed. (2011), Kurose, Computer Networking: A Top-Down Approach, Pearson, 7th Ed. (2016), Elmasri, Fundamentals of Database Systems, Addison Wesley, 6th Ed. (2010), Coronel, Database Systems: Design, Implementation, & Management, Cengage Learning, 11th Ed. (2014), Tucker, Programming Languages, McGraw-Hill Science/Engineering/Math, 2nd Ed. (2006), and Rhoton, Cloud Computing Architected: Solution Design Handbook, Recursive Press (2011), each of which is hereby incorporated by reference in its entirety. [0158] FIG.6 illustrates an example of a system 100 for generating negative predictions of a target variant in a sample of a subject 111, according to an embodiment of the disclosure. The system 100 may process one or more samples 101 from the subject 111 to generate sequence reads for variant detection and negative predictions. The system 100 may include a Attorney Docket No. GH0160WO laboratory system 102, a computer system 110, and/or other components. It should be noted that the laboratory system 102 and the computer system 110 may be remote from one another, and connected to one another through a computer network (not illustrated). The laboratory system 102 may include a sample collection and preparation pipeline 103, a sequencing pipeline 105, a sequence read datastore 109, and/or other components. The sequencing pipeline 105 may include one or more sequencing devices 107 (illustrated in FIG. 1 as sequencing devices 107a…n). [0159] The computer system 110 may include a sequence analysis pipeline 112, a processor 120, a storage device 122, a variant detection pipeline 130, and/or other components. [0160] The sequence analysis pipeline 112 may include a sequence quality control (QC) component 113 that may trim or trash sequence reads from the laboratory system 102, other analysis components 115 that may perform preliminary alignments to a reference genome, and an analysis QC component 116 that may perform quality control on the output of the analysis components 115. Output, such as sequence reads of a sample 101 of a subject 111, from the sequence analysis pipeline 112 may be stored in an analysis datastore 117. [0161] Generally speaking, the processor 120 may implement (be programmed by) various components of the variant detection pipeline 130, such as the variant detector 132, the negative prediction analyzer 134, and/or other components. Alternatively, it should be noted that each of these components of the variant detection pipeline 130 may include a hardware module. Although illustrated separately for convenience, one or more of the various components or instructions, such as the variant detector 132 and the negative prediction analyzer 134 may be integrated with one another. In any event, the variant detection pipeline 130 may cause the computer system 110 to identify variants, diseases from the variants (precision diagnostics), negative predictions, and/or treatment regiments. The precision diagnostic and treatment regimen may be stored in a repository such as clinical result store 160 or diagnostic result store 150. [0162] The variant detector 132 may determine that a target variant has not been detected based on an analysis of the sequence reads from laboratory system 102. It should be noted that at least one sequence read and/or at least one molecule that is sequenced may support the target variant – but this may not be sufficient for the variant detector 132 to detect the target variant. For instance, in some embodiments the variant detector 132 might detect the target variant only if the number of sequence reads (and/or the number of molecules that are sequenced) which support the target variant is greater than a threshold. Additionally or alternatively, the variant detector 132 might detect a target variant only if the target variant which is supported by a Attorney Docket No. GH0160WO sequence read and/or a molecule that is sequenced meets a quality threshold. Target variants that are supported by at least one sequence read and/or at least one molecule that is sequenced, but do not meet a threshold, may thus be ignored in some embodiments as false positives, and may not be detected by the variant detector 132. Other ways to determine that a target variant has not been detected based on an analysis of the sequence reads may also be used, but further details of making this determination are omitted for clarity. [0163] The negative prediction analyzer 134 may access the output of the variant detector 132 and confirm negative predictions as an add-on to the variant detector. Alternatively, or additionally, the negative prediction analyzer 134 may be integrated with the variant detector 132. [0164] FIG.7 illustrates a schematic diagram of exemplary inputs and outputs of a negative prediction analyzer 134, according to an embodiment. The negative prediction analyzer 134 may use covariable information 202, coverage information at target sites 204, disease type 206, and/or other input information for significance modeling. The negative prediction analyzer 134 may generate a quantitative value output 210 that may represent a likelihood of whether a negative prediction is correct and a negative prediction assessment 212 that may include a level of confidence or precision diagnostic based on the quantitative value output 210. [0165] For example, the sequence reads from the laboratory system 102 may be aligned to a reference genome and in particular to various loci in the reference genome to determine covariable information 202. The covariable information 202 may include covariance variant information that may include historical mutual exclusivity data and/or co-occurrence data of variants. Covariable variants may refer to two or more variants that have a negative (mutually exclusive) or positive (co-occurrence) correlation to one another based on historical observations of sequence data from the laboratory system 102 and/or other data sources. For example, mutually exclusive variants may include variants that tend to not be observed with one another. Co-occurrence variants may be observed to occur when another variant is observed, such as a driver variant mutation and its co-occurrence variant. [0166] In particular examples, the significance modeling may generate and use computational estimates of tumor fraction (TF), including methylation determined TF, of a target variant based on nucleic acid sequence reads generated from the sample. Alternatively, or additionally, the significance modeling may determine and use the diversity of other variants that are detected – or not detected – in the sample. For example, the significance modeling may use detection of covariance variants that usually (based on historical covariance variant information) co-occur with the target variant or mutually exclusive variants that usually (based Attorney Docket No. GH0160WO on the historical covariance variant information) do not co-occur with the target variant. A negative predictive value (“NPV”) may be generated based on the TF, including methylation determined TF, estimates and/or diversity of variants that are detected, or not detected, in the sample. The result may be used to provide a level of confidence in a negative diagnosis and/or to further guide treatment plans based on the negative diagnosis. In the context of cancer diagnosis, for example, covariance variants may include driver variants that tend to promote oncogenesis and mutually exclusive variants may include tumor suppressor variants that tend to suppress oncogenesis. Negative Prediction [0167] FIG. 8 illustrates an example of a method 300 for generating negative predictions of a target variant in a sample of a subject, according to an embodiment of the disclosure. [0168] Methods of the invention can be used for determining as a true negative result that a variant of interest is absent (e.g. absent at the clonal level). Thus, with reference to FIG.3, at 302 the method 300 may include accessing a plurality of sequence reads of the cfDNA sample. At 304, the method 300 may include determining that a target variant (the target variant) has not been detected at a first locus in the sample (e.g., a cfNA sample) based on the plurality of sequence reads. In some examples, the target variant (and/or other variants described herein) may include a somatic variant. In some examples, the target variant (and/or other variants described herein) may not include a germline variant. Assessing Negative Predictions [0169] At 306, the method 300 may include generating a first likelihood value based on a probability that the target variant is absent at the clonal level and a second likelihood value based on a probability that the target variant is not absent at the clonal level. At 308, the method 300 may include determining a quantitative value based on the first likelihood value and the second likelihood value. At 310, the method 300 may include comparing the quantitative value to a threshold. At 312, the method 300 may include determining that the target variant at the first locus is absent at the clonal level based on the comparison. For example, the method 300 may include determining that the allele frequency of the target variant does not exceed the threshold (such as the sub-clonal threshold described with reference to FIGS. 4A and 4B). Attorney Docket No. GH0160WO Assessing Negative Predictions Based on Tumor Fraction Estimates [0170] In some examples, the method 300 and/or the negative prediction analyzer 134 (by implementing the method 300) may model the probability that the target variant is absent at the clonal level (or present at a sub-clonal level of a tumor variant) as a test or alternative hypothesis (H1) to generate the first likelihood value. For example, FIG.4A illustrates a graph 400A of a test hypothesis in which a target variant (the target variant) is absent (or present at sub-clonal level of the tumor variant) from the sample, according to an embodiment. Correspondingly, the negative prediction analyzer 134 may model the probability that the target variant is not absent at the clonal level as a null hypothesis ((H0)) to generate the second likelihood value. For example, FIG.4B illustrates a graph 400B of a null hypothesis in which the target variant is not absent in the sample (and correlates with an allele frequency of the tumor variant), according to an embodiment. In both graphs 400A and 400B, “C” reflects the minor allele at a target locus. The value “0.3” reflects a weight applied to α1 (the TF estimation based on mutant allele frequency of a tumor variant) such that the product of 0.3 x α1 serves as a sub-clonal threshold value. An allele frequency (α2) of a target variant in the sample 101 of the subject 111 above the sub-clonal threshold value may indicate that the target variant is correlated with the tumor variant. [0171] In these examples, the negative prediction analyzer 134 may generate the first likelihood value and the second likelihood value by determining a tumor fraction (TF) estimate, including methylation determined TF, (such as α1 in the Equations described herein) of the sample. The TF estimate may indicate a fraction of tumor DNA detected in the sample. In some examples, the TF estimate may be determined by determining an allele frequency of a tumor variant (referred to as epi MAF) in the sample. The epi MAF may be determined by determining a molecule count associated with the tumor variant based on the plurality of sequence reads. The first likelihood value based on the probability that the target variant is absent at the clonal level (such as L1 in the Equations described herein) and the second likelihood value that the target variant is not absent at the clonal level or is present at a sub- clonal level (such as L0 in the Equations described herein) may be based on the TF estimate. [0172] In some embodiments, the negative prediction analyzer 134 may use the TF estimate to generate the quantitative value that assesses the quality of the negative prediction (such as by indicating a probability of whether or not the negative prediction is correct or false). For example, the negative prediction analyzer 134 may determine a first allele frequency of the target variant (the target variant). The negative prediction analyzer 134 may determine the first allele frequency by determining a first molecule count associated with the target variant based Attorney Docket No. GH0160WO on the plurality of sequence reads. The negative prediction analyzer 134 may use the first allele frequency with the epi MAF to determine the first likelihood value and the second likelihood value are based further on the first allele frequency and the epi MAF. [0173] Referring to FIG.9A, the probability that the target variant is absent at the clonal level (or present at a sub-clonal level) may be based on a sub-clonal threshold value (illustrated as 0.3*α1). Which may be a sub-clonal weight (illustrated as 0.3) multiplied by a tumor fraction estimate (illustrated as an allele frequency such as epi MAF of a tumor variant). The sub-clonal threshold value may be determined based on specific genes, cancer type, or other expected values. These values may range anywhere from 0.01 to 0.99, including but not limited to 0.01, 0.10, 0.20, 0.30, 0.40, 0.50, 0.60, 0.70, 0.80, 0.90, and 0.99. Equations 1-3 that follow relate to generating the first and second likelihood values and resulting quantitative value in certain embodiments.
Figure imgf000053_0001
[0175] p ( α1,α2 ) = p ( α1 ) ∗ p(α2|α1) (Eq. 2) [0176] ∫1 0 ^^^^( ^^^^2| ^^^^1 ) ^^^^ ^^^^2 = 1,
Figure imgf000053_0002
(Eq.3) (sum of probabilities for all possible values) [0177] Referring to Eqs.1-3, L1 refers to the likelihood value for the test hypothesis where the variant is absent at the clonal level. Null hypothesis generated using the same formula for L1, but alpha 2 has a different range of values (e.g., 0.3 to 1). α1 refers to an allele frequency of a tumor variant, which may be used as a TF estimate α2 refers to an allele frequency of a target variant (the target variant) Mv refers to a number of molecules supporting a tumor variant at a locus of the tumor variant Mr refers to a number of molecules supporting a reference wildtype at the locus of the tumor variant Mv’ refers to a number of molecules supporting a target variant at a locus of the target variant Mr’ refers to a number of molecules supporting a reference wildtype at the locus of the target variant ε refers to an error rate for the TF estimate ε’ refers to an error rate for the target variant Attorney Docket No. GH0160WO Error rates are typically derived from sequence information obtained from samples obtained from healthy or normal subjects (e.g., z-scores or the like). [0178] ^^^^2 = ^^^^ ∗ ^^^^1 (Eq. 4) This equation is for simplification purposes (same as for Eq. 1), but is easier to compute than the integral in Eq.1. [0179] ^ and ^erorr rates in tumor fraction (maxmaf) and target variants correspondingly
Figure imgf000054_0002
[0182] Epsilon ( ^^^^) is taken from calculation of a z-score derived from sequence information obtained from samples obtained from healthy or normal subjects. [0183] In the equations that follow: ^^^^ refers to the target variant is absent on clonal level ^^^^ +refers to target variant is present on clonal level ^^^^ ^^^ + ^ refers to variant (other than target) is present (i=1,…,n all other called variants) ℒ ^^^^ refers to likelihood value (base hypothesis i=0, test hypothesis i=1) Adjusting the Quantitative Value based on Prevalence of Other Variants [0184] In some examples, the negative prediction analyzer 134 may adjust the quantitative value determined from the TF estimate based on the presence of one or more variants other than the target variant in a sample 101 of the subject 111. For example, the negative prediction analyzer 134 may determine a prevalence of at least a second variant in the cfDNA sample 101, and adjust the quantitative value based on the prevalence of at least a second variant. [0185] For example, the prevalence data may be determined according to Equations 7 and 8:
Figure imgf000054_0001
[0188] The likelihood value (L1) that the test hypothesis is correct may be adjusted based on Equation 9 to generate an adjusted likelihood value (L1a), and a likelihood ratio (LRa)may be generated according to Equation 10: Attorney Docket No. GH0160WO
Figure imgf000055_0001
(Eq.10) [0191] Eq.10 is a likelihood ratio using the properties of condition dependence. Assessing Negative Predictions Based on LLRs [0192] In some examples, the quantitative value may be based on an LLR between the first likelihood value and the second likelihood value. As such, the quantitative value may be based on a ratio between the first likelihood value (such as L1 of Equation 14) and the second likelihood value (such as L0 of Equation 15). In some examples, the negative prediction analyzer 134 may generate a TF-based LLR (such as LLRtf illustrated in Equation 16). The negative prediction analyzer 134 may generate the quantitative value (such as LLR) based on Equation 11: [0193] LLR = LLRtf + LLRme (Eq.11) (Log likelihood ratio (LLR) of tumor fraction (LLRtf) and mutual exclusivity (LLRme). Assessing Negative Predictions Using LLR Based on Covariance (Mutual Exclusivity) Data [0194] In some examples, the quantitative value may be based on LLR of covariance data. For example, the negative prediction analyzer 134 may generate the LLRme that reflects covariance data, as illustrated in Equation 18 (conditional probability of how many times variants are observed together).
Figure imgf000055_0002
[0197] Assessing Negative Predictions Using Combinations of LLRs [0198] In some embodiments, the quantitative value may be expressed as a log posterior probability ratio (LPPR) based on a combination of the TF-based log likelihood of whether the null or test hypothesis is correct, a covariance-based (e.g., mutual exclusivity) log likelihood of whether the null or test hypothesis is correct, and prior-data based log data, such as expressed in Equations 19 and 21 below. In some examples, the quantitative value (such as an LLR in Equation 11) may be based further on a LogPrior data that is based on historical, observed, data Attorney Docket No. GH0160WO not necessarily limited to the sample 101 of the subject 111. Such LogPrior data may be based on covariable information indicating a historical prevalence of one or more variants exhibiting co-occurrence and/or mutual exclusivity with the target variant. For example, the LogPrior data ^^^^( − may be expressed as: log ^^^^ ) ^^^^( ^^^^+). The LogPrior data may be used to generate the quantitative value in combination with other values, such as in Equation 19.
Figure imgf000056_0001
^^^^( ^^^^−| ^^^^, ^^^^) ^^^^( ^^^^ , ^^ − [0206] ^^^^ ^^^^ ^^^^ ^^^^ = log = ^^^^ ^^^ ^^^^ ^^ ^^^^| ^^^^ ) ^^^^( ^^^^+| ^^^^, ^^^^) ^ ^^^^ ^^^^ ^^^^ + ^^^^ ^^^^ ^^^^ ^^^^ ^^^^ + log ^^^^ ^^^^ ^^^^ ^^^^ ^^^^ = log ^^^^( ^^^^ ^^^^, ^^^^ ^^^^| ^^^^+) + ∑ ^^^^={1,…, ^^^^} (log ^^^^( ^^^^ ^^^ + ^ | ^^^^ ) − log ^^^^( ^^^^ ^^^ + ^ | ^^^^ +)) + log ^^^^( ^^^^) − log ^^^^( ^^^^+)(Eq.21) [0207] ^^^^( ^^^^| ^^^^, ^^^^) = 1 ^^^^− ^^^^ ^^^^ ^^^^ ^^^^+1 (Eq.22). [0208] It should be understood that in the previous examples, the negative prediction analyzer 134 has been described as implementing the method 300 and performing the foregoing additional operations. It should be further understood that the foregoing additional operations may be part of and extend the method 300. [0209] The various processing operations and/or methods depicted in the Figures may be accomplished using some or all of the system components described in detail herein and, in some implementations, various operations may be performed in different sequences and various operations may be omitted. Additional operations may be performed along with some or all of the operations shown in the depicted flow diagrams. One or more operations may be performed simultaneously. Accordingly, the operations as illustrated (and described in greater detail herein) are provided as example and, as such, should not be viewed as limiting. Computer Implementation Attorney Docket No. GH0160WO [0210] The present methods can be computer-implemented, such that any or all of the operations described in the specification or appended claims other than wet chemistry steps can be performed in a suitable programmed computer. The computer can be a mainframe, personal computer, tablet, smart phone, cloud, online data storage, remote data storage, or the like. The computer can be operated in one or more locations. [0211] Various operations of the present methods can utilize information and/or programs and generate results that are stored on computer-readable media (e.g., hard drive, auxiliary memory, external memory, server; database, portable memory device (e.g., CD-R, DVD, ZIP disk, flash memory cards), and the like. [0212] The present disclosure also includes an article of manufacture for analyzing a nucleic acid population that includes a machine-readable medium containing one or more programs which when executed implement the steps of the present methods. [0213] The disclosure can be implemented in hardware and/or software. For example, different aspects of the disclosure can be implemented in either client-side logic or server-side logic. The disclosure or components thereof can be embodied in a fixed media program component containing logic instructions and/or data that when loaded into an appropriately configured computing device cause that device to perform according to the disclosure. A fixed media containing logic instructions can be delivered to a viewer on a fixed media for physically loading into a viewer's computer or a fixed media containing logic instructions may reside on a remote server that a viewer accesses through a communication medium to download a program component. [0214] The present disclosure provides computer control systems that are programmed to implement methods of the disclosure. The processor 120 may include a single core or multi core processor, or a plurality of processors for parallel processing. The storage device 122 may include random-access memory, read-only memory, flash memory, a hard disk, and/or other type of storage. The computer system 110 may include a communication interface (e.g., network adapter) for communicating with one or more other systems, and peripheral devices, such as cache, other memory, data storage and/or electronic display adapters. The components of the computer system 110 may communicate with one another through an internal communication bus, such as a motherboard. The storage device 122 may be a data storage unit (or data repository) for storing data. The computer system 110 may be operatively coupled to a computer network ("network") with the aid of the communication interface. The network may be the Internet, an internet and/or extranet, or an intranet and/or extranet that is in communication with the Internet. The network in some cases is a telecommunication and/or Attorney Docket No. GH0160WO data network. The network may include a local area network. The network may include one or more computer servers, which can enable distributed computing, such as cloud computing. The network, in some cases with the aid of the computer system 110, may implement a peer-to- peer network, which may enable devices coupled to the computer system 120 to behave as a client or a server. [0215] The processor 120 may execute a sequence of machine-readable instructions, which can be embodied in a program or software. The instructions may be stored in a memory location, such as the storage device 122. The instructions can be directed to the processor 120, which can subsequently program or otherwise configure the processor 120 to implement methods of the present disclosure. Examples of operations performed by the processor 120 may include fetch, decode, execute, and writeback. [0216] The processor 120 may be part of a circuit, such as an integrated circuit. One or more other components of the system 100 may be included in the circuit. In some cases, the circuit may include an application specific integrated circuit (ASIC). [0217] The storage device 122 may store files, such as drivers, libraries and saved programs. The storage device 122 can store user data, e.g., user preferences and user programs. The computer system 110 in some cases may include one or more additional data storage units that are external to the computer system 110, such as located on a remote server that is in communication with the computer system 110 through an intranet or the Internet. [0218] The computer system 110 can communicate with one or more remote computer systems through the network. For instance, the computer system 110 can communicate with a remote computer system of a user. Examples of remote computer systems include personal computers (e.g., portable PC), slate or tablet PC's (e.g., Apple® iPad, Samsung® Galaxy Tab), telephones, Smart phones (e.g., Apple® iPhone, Android-enabled device, Blackberry®), or personal digital assistants. The user can access the computer system 110 via the network. [0219] Methods as described herein can be implemented by way of machine (e.g., computer processor) executable code stored on an electronic storage location of the computer system 110, such as, for example, on the storage device 122. The machine executable or machine readable code can be provided in the form of software (e.g., computer readable media). During use, the code can be executed by the processor 120. In some cases, the code can be retrieved from the storage device 122 and stored on the storage device 122 for ready access by the processor 120. [0220] The code may be pre-compiled and configured for use with a machine having a processer adapted to execute the code, or can be compiled during runtime. The code can be Attorney Docket No. GH0160WO supplied in a programming language that can be selected to enable the code to execute in a precompiled or as-compiled fashion. [0221] Aspects of the systems and methods provided herein, such as the computer system 110, can be embodied in programming. Various aspects of the technology may be thought of as "products" or "articles of manufacture" typically in the form of machine (or processor) executable code and/or associated data that is carried on or embodied in a type of machine readable medium. Machine-executable code can be stored on an electronic storage unit, such as memory (e.g., read-only memory, random-access memory, flash memory) or a hard disk. [0222] Hence, a machine readable medium, such as computer-executable code, may take many forms, including but not limited to, a tangible storage medium, a carrier wave medium or physical transmission medium. Non-volatile storage media include, for example, optical or magnetic disks, such as any of the storage devices in any computer(s) or the like, such as may be used to implement the databases, etc. shown in the drawings. Volatile storage media include dynamic memory, such as main memory of such a computer platform. Tangible transmission media include coaxial cables; copper wire and fiber optics, including the wires that comprise a bus within a computer system. Carrier-wave transmission media may take the form of electric or electromagnetic signals, or acoustic or light waves such as those generated during radio frequency (RF) and infrared (IR) data communications. Common forms of computer-readable media therefore include for example: a floppy disk, a flexible disk, hard disk, magnetic tape, any other magnetic medium, a CD-ROM, DVD or DVD-ROM, any other optical medium, punch cards paper tape, any other physical storage medium with patterns of holes, a RAM, a ROM, a PROM and EPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier wave transporting data or instructions, cables or links transporting such a carrier wave, or any other medium from which a computer may read programming code and/or data. Many of these forms of computer readable media may be involved in carrying one or more sequences of one or more instructions to a processor for execution. [0223] The computer system 110 can include or be in communication with an electronic display 935 that comprises a user interface (UI) for providing, for example, a report. Examples of UI's include, without limitation, a graphical user interface (GUI) and web-based user interface. [0224] Methods and systems of the present disclosure can be implemented by way of one or more algorithms. An algorithm can be implemented by way of software upon execution by the processor 120. Further information can be found in PCT App. No. PCT/US2021/015837. Attorney Docket No. GH0160WO EXAMPLES Example 1: Liquid biopsy wild type prediction of negative predictors [0225] Using a match liquid and tissue dataset, it was demonstrated that both biomarker and sample level negative predictive value (NPV) values increase with tumor fraction. Sample NPV is 0/1 (e.g., 1 - Tissue does not have actionable NCCN biomarker as well as blood and 0 - Tissue has actionable NCCN biomarker but not blood. [0226] It was observed that out of 341/3145 samples with a high confidence of mutation (99.9%+), while for the remaining samples, the confidence in a negative call is correlated with tumor fraction. A tumor fraction of ~1% supports a 99% confidence of negative prediction across this set of hotspot variants. Positive percent agreement (PPA) similarly correlated with tumor fraction. Example 2: Allele fraction of CRC hotspot variants in Infinity cohort [0227] The scatter plots are colored by the molecule count for the variants. It seems likely that the 1-variant-count samples are spread across the mutation spectrum indicating that there is likely noise. We also see a clear linear trend for variants that are likely in the major clone driving the tumor fraction, as well as a cluster of variants around 1% MAF at higher tumor fractions that are likely sub-clonal variants. Colorectal cancer (CRC) hotspots were measured on epigenomic detection platform, with application of The Cancer Genome Atlas (TCGA) frequencies applied as priors. Allele fraction of CRC hostpots in a cohort is shown. [0228] All patent filings, websites, other publications, accession numbers and the like cited above or below are incorporated by reference in their entirety for all purposes to the same extent as if each individual item were specifically and individually indicated to be so incorporated by reference. If different versions of a sequence are associated with an accession number at different times, the version associated with the accession number at the effective filing date of this application is meant. The effective filing date means the earlier of the actual filing date or filing date of a priority application referring to the accession number if applicable. Likewise if different versions of a publication, website or the like are published at different times, the version most recently published at the effective filing date of the application is meant unless otherwise indicated. Any feature, step, element, embodiment, or aspect of the disclosure can be used in combination with any other unless specifically indicated otherwise. Although the present disclosure has been described in some detail by way of illustration and example for purposes of clarity and understanding, it will be apparent that certain changes and modifications may be practiced within the scope of the appended claims.

Claims

Attorney Docket No. GH0160WO CLAIMS WHAT IS CLAIMED IS: 1. A method of determining that a first variant of interest at a first locus is absent at a clonal level in a cell-free deoxyribonucleic acid (cfDNA) sample of a human subject, the method comprising: accessing a plurality of sequence reads of the cfDNA sample; determining that the first variant has not been detected at the first locus in the sample based on the plurality of sequence reads; generating a first likelihood value based on a probability that the first variant is absent at the clonal level and/or a second likelihood value based on a probability that the first variant is not absent at the clonal level; optionally, determining a quantitative value based on the first likelihood value and/or the second likelihood value; comparing the quantitative value and/or the first likelihood value and/or the second likelihood value to a threshold; and determining that the first variant of interest at the first locus is absent at the clonal level based on the comparison. 2. The method of claim 1, wherein generating the first likelihood value and the second likelihood value comprises: determining a tumor fraction estimate of the sample, wherein the first likelihood value and the second likelihood value is based on the tumor fraction estimate. 3. The method of claim 2, wherein determining the tumor fraction estimate comprises: determining a maximum mutant allele frequency (epi MAF) of a tumor mutation in the sample. 4. The method of claim 3, wherein determining the epi MAF comprises determining a molecule count associated with the tumor mutation based on the plurality of sequence reads. 5. The method of claim 3, wherein generating the first likelihood value and the second likelihood value comprises: determining an allele frequency of at least a second variant, wherein the first likelihood value and the second likelihood value are based further on the allele frequency and the epi MAF. 6. The method of claim 5, further comprising: Attorney Docket No. GH0160WO comparing the allele frequency with a second threshold that is based on the epi MAF, wherein determining that the first variant of interest at the first locus is absent at the clonal level is based further on the comparison of the MAF with the second threshold. 7. The method of claim 5, wherein determining the allele frequency comprises: determining a first molecule count associated with the first variant based on the plurality of sequence reads. 8. The method of claim 5, wherein determining the quantitative value comprises: accessing covariable information indicating a historical prevalence of one or more variants exhibiting co-occurrence and/or mutual exclusivity with the first variant, wherein the quantitative value is based on the covariable information. 9. The method of claims 8, further comprising: determining a prevalence of at least a second variant in the cfDNA sample, wherein the quantitative value is based further on the covariable information. 10. The method of claim 1, wherein determining the quantitative value comprises: accessing covariable information indicating a historical prevalence of one or more variants exhibiting co-occurrence and/or mutual exclusivity with the first variant, wherein the quantitative value is based on the covariable information. 11. The method of claim 10, further comprising: determining a prevalence of at least a second variant in the cfDNA sample, wherein the quantitative value is based further on the prevalence of the second variant. 12. The method of claim 1, wherein the quantitative value is based on the ratio of the first likelihood value to the second likelihood value. 13. The method of claim 1, further comprising determining a level of confidence that the first variant is absent at the clonal level in the cfDNA sample based on the quantitative value. 14. The method of claim 1, further comprising determining a treatment plan to treat a disease in the human subject. 15. The method of claim 14, wherein the disease is cancer. 16. The method of claim 1, further comprising: determining a prevalence of at least a second variant in the cfDNA sample; and adjusting the quantitative value based on the prevalence of at least a second variant in the cfDNA sample. 17. A method of determining that a first target nucleic acid variant is absent at a first genetic locus in a cell-free nucleic acid (cfNA) sample obtained from a subject having a given cancer type at least partially using a computer, the method comprising: Attorney Docket No. GH0160WO determining that the first target nucleic acid variant at the first genetic locus is not detected in the cfNA sample; determining, by the computer, a coverage of the first genetic locus from sequence information generated from the cfNA sample; determining, by the computer, a tumor fraction from the sequence information generated from the cfNA sample; determining, by the computer, a probability that the first target nucleic acid variant is not absent at the first genetic locus in the cfNA sample from the coverage and the tumor fraction to generate a quantitative value; and, determining that the first target nucleic acid variant is absent at the first genetic locus in the cfNA sample when the quantitative value differs from a threshold value. 18. A method of determining that a first target nucleic acid variant is absent at a first genetic locus in a cell-free nucleic acid (cfNA) sample obtained from a subject at least partially using a computer, the method comprising: determining that the first target nucleic acid variant is not detected in the cfNA sample obtained from the subject to generate a first test result; determining that at least a second target nucleic acid variant is detected in the cfNA sample obtained from the subject to generate a second test result; determining, by the computer, a first probability that the first target nucleic acid variant is absent in the cfNA sample given the second test result and/or a second probability that the first target nucleic acid is not absent in the cfNA sample given the second test result; generating, by the computer, a quantitative value using the first probability, the second probability, and/or a ratio thereof; and, determining that the first target nucleic acid variant is absent at the first genetic locus in the cfNA sample when the quantitative value differs from a threshold value. 19. A method of determining that a first target nucleic acid variant is absent at a first genetic locus in a cell-free nucleic acid (cfNA) sample obtained from a subject having a given cancer type at least partially using a computer, the method comprising: Attorney Docket No. GH0160WO determining that the first target nucleic acid variant is not detected in the cfNA sample obtained from the subject; generating, by the computer, at least one tumor fraction based value; generating, by the computer, at least one mutual exclusivity value; and, determining that the first target nucleic acid variant is absent at the first genetic locus in the cfNA sample using the tumor fraction based value and/or the mutual exclusivity value. 20. The method of any one of the preceding claims, wherein the quantitative value is less than the threshold value. 21. The method of any one of the preceding claims, wherein the quantitative value is greater than the threshold value. 22. The method of any one of the preceding claims, wherein the first and second test results are dependent upon one another. 23. The method of any one of the preceding claims, comprising determining that a plurality of other selected target nucleic variants are absent at one or more other genetic loci. 24. The method of any one of the preceding claims, wherein the quantitative value comprises a log likelihood ratio (LLR) threshold value. 25. The method of any one of the preceding claims, comprising determining that the first target nucleic acid variant is absent at the first genetic locus in a plurality of reference cfNA samples to generate the threshold value. 26. The method of claim 25, wherein the threshold value comprises a clonality or a sub- clonality threshold value. 27. The method of any one of the preceding claims, wherein the first target nucleic acid variant comprises a driver mutation. 28. The method of any one of the preceding claims, further comprising administering one or more therapies to the subject based upon the determination that the first target nucleic acid variant is absent at the first genetic locus in the cfNA sample. 29. The method of any one of the preceding claims, comprising estimating a probability of detecting the first target nucleic acid variant at the first genetic locus in the cfNA sample using the tumor fraction and a binomial model. Attorney Docket No. GH0160WO 30. The method of claim 29, wherein the binomial model comprises information about the given cancer type and/or the second target nucleic acid variant. 31. The method of any one of the preceding claims, wherein the determination that the first target nucleic acid variant is absent at the first genetic locus in the cfNA sample indicates that the first genetic locus is wild type. 32. The method of any one of the preceding claims, wherein the given cancer type is colorectal cancer, wherein the first genetic locus is KRAS, BRAF, or NRAS, and wherein the determination that the first target nucleic acid variant is absent at the first genetic locus in the cfNA sample indicates that the first genetic locus is wild type KRAS, BRAF, or NRAS. 33. The method of claim 32, further comprising administering Cetuximab and/or Panitumumab to the subject. 34. The method of any one of the preceding claims, wherein the cfNA comprises cfDNA. 35. The method of any one of the preceding claims, wherein the cfNA comprises cfRNA. 36. The method of any one of the preceding claims, further comprising repeating the method one or more times to monitor whether the first target nucleic acid variant is absent at the first genetic locus in different cfNA samples obtained from the subject at different time points. 37. The method of any one of the preceding claims, further comprising performing one or more additional tests to confirm or refute the determination that the first target nucleic acid variant is absent at the first genetic locus in the cfNA sample. 38. The method of any one of the preceding claims, comprising determining a maximum mutant allele frequency (epi MAF) for the cfNA sample and using the epi MAF as an estimate of the tumor fraction. 39. The method of any one of the preceding claims, comprising determining that first target nucleic acid variant at the first genetic locus is not detected in the cfNA sample based upon a plurality of sequencing reads obtained from the cfNA sample. 40. The method of any one of the preceding claims, comprising determining that the first target nucleic acid variant is absent at a clonal level in the cfNA sample. Attorney Docket No. GH0160WO 41. The method of any one of the preceding claims, comprising generating a first likelihood value based on the first probability and a second likelihood value based on the second probability. 42. The method of any one of the preceding claims, comprising determining the quantitative value based on the first likelihood value and the second likelihood value. 43. The method of any one of the preceding claims, wherein generating the first likelihood value and the second likelihood value comprises determining the tumor fraction estimate of the cfNA sample, wherein the first likelihood value and the second likelihood value is based on the tumor fraction estimate. 44. The method of claim 43, wherein determining the tumor fraction estimate comprises determining a maximum mutant allele frequency (epi MAF) of a tumor mutation in the cfNA sample. 45. The method of claim 44, wherein determining the epi MAF comprises determining a molecule count associated with the tumor mutation based on the plurality of sequence reads. 46. The method of claim 45, wherein generating the first likelihood value and the second likelihood value comprises determining an allele frequency of at least a second variant, wherein the first likelihood value and the second likelihood value are based further on the allele frequency and the epi MAF. 47. The method of claim 46, further comprising comparing the allele frequency with a second threshold that is based on the epi MAF, wherein determining that the first target nucleic acid variant of interest at the first genetic locus is absent at the clonal level is based further on the comparison of the MAF with the second threshold. 48. The method of claim 46, wherein determining the allele frequency comprises determining a first molecule count associated with the first target nucleic acid variant based on the plurality of sequence reads. 49. The method of claim 46, wherein determining the quantitative value comprises accessing covariable information indicating a historical prevalence of one or more variants exhibiting co-occurrence and/or mutual exclusivity with the first variant, wherein the quantitative value is based on the covariable information. Attorney Docket No. GH0160WO 50. The method of claim 49, further comprising determining a prevalence of at least the second target nucleic acid variant in the cfDNA sample, wherein the quantitative value is based further on the covariable information. 51. The method of claim 42, wherein determining the quantitative value comprises accessing covariable information indicating a historical prevalence of one or more variants exhibiting co-occurrence and/or mutual exclusivity with the first target nucleic acid variant, wherein the quantitative value is based on the covariable information. 52. The method of claim 51, further comprising determining a prevalence of at least the second target nucleic acid variant in the cfNA sample, wherein the quantitative value is based further on the prevalence of the second target nucleic acid variant. 53. The method of claim 42, wherein the quantitative value is based on the ratio of the first likelihood value to the second likelihood value. 54. The method of claim 42, further comprising determining a level of confidence that the first target nucleic acid variant is absent at a clonal level in the cfNA sample based on the quantitative value. 55. The method of claim 42, further comprising determining a prevalence of at least the second target nucleic acid variant in the cfNA sample; and adjusting the quantitative value based on the prevalence of at least the second target nucleic acid variant in the cfNA sample. 56. The method of any one of the preceding claims, wherein the ratio comprises a log posterior probability ratio (LPPR) equal to a sum of a log likelihood tumor fraction value, a log likelihood mutual exclusivity value, and a log prior value. 57. The method of any one of the preceding claims, wherein the first genetic locus or a second genetic locus comprises the second target nucleic acid variant. 58. The method of any one of the preceding claims, wherein the quantitative value comprises a negative predictive value (NPV) score. 59. The method of any one of the preceding claims, wherein the given cancer type comprises lung cancer and the first target nucleic acid variant is a mutation in a gene selected from the group consisting of: EGFR, BRAF, ALK, ROS1, and MET. Attorney Docket No. GH0160WO 60. The method of any one of the preceding claims, wherein the given cancer type comprises colorectal cancer and the first target nucleic acid variant is a mutation in a gene selected from the group consisting of: KRAS, BRAF, and NRAS. 61. A system, comprising a controller comprising, or capable of accessing, computer readable media comprising non-transitory computer executable instructions which, when executed by at least one electronic processor, perform at least: accessing a plurality of sequence reads of the cfDNA sample; determining that the first variant has not been detected at the first locus in the sample based on the plurality of sequence reads; generating a first likelihood value based on a probability that the first variant is absent at the clonal level and a second likelihood value based on a probability that the first variant is not absent at the clonal level; determining a quantitative value based on the first likelihood value and the second likelihood value; comparing the quantitative value to a threshold; and determining that the first variant of interest at the first locus is absent at the clonal level based on the comparison. 62. A system, comprising a controller comprising, or capable of accessing, computer readable media comprising non-transitory computer executable instructions which, when executed by at least one electronic processor, perform at least: accessing sequence information generated from a cell-free nucleic acid (cfNA) sample obtained from a subject having a given cancer type; determining that a first target nucleic acid variant at a first genetic locus is not detected in cfNA sample from the sequence information; determining a coverage of the first genetic locus from the sequence information; determining a tumor fraction from the sequence information; determining a probability that the first target nucleic acid variant is not absent at the first genetic locus in the cfNA sample from the coverage and the tumor fraction to generate a quantitative value; and, determining that the first target nucleic acid variant is absent at the first genetic locus in the cfNA sample when the quantitative value differs from a threshold value. Attorney Docket No. GH0160WO 63. A system, comprising a controller comprising, or capable of accessing, computer readable media comprising non-transitory computer executable instructions which, when executed by at least one electronic processor, perform at least: accessing sequence information generated from a cell-free nucleic acid (cfNA) sample obtained from a subject; determining that the first target nucleic acid variant is not detected in the cfNA sample from the sequence information to generate a first test result; determining that at least a second target nucleic acid variant is detected in the cfNA sample from the sequence information to generate a second test result; determining a first probability that the first target nucleic acid variant is absent in the cfNA sample given the second test result and/or a second probability that the first target nucleic acid is not absent in the cfNA sample given the second test result; generating a quantitative value using the first probability, the second probability, and/or a ratio thereof; and, determining that the first target nucleic acid variant is absent at the first genetic locus in the cfNA sample when the quantitative value differs from a threshold value. 64. A system, comprising a controller comprising, or capable of accessing, computer readable media comprising non-transitory computer executable instructions which, when executed by at least one electronic processor, perform at least: accessing sequence information generated from a cell-free nucleic acid (cfNA) sample obtained from a subject; determining that the first target nucleic acid variant is not detected in the cfNA sample from the sequence information; generating at least one tumor fraction based value; generating at least one mutual exclusivity value; and, determining that the first target nucleic acid variant is absent at the first genetic locus in the cfNA sample using the tumor fraction based value and/or the mutual exclusivity value. 65. A computer readable media comprising non-transitory computer executable instruction which, when executed by at least electronic processor perform at least: Attorney Docket No. GH0160WO accessing a plurality of sequence reads of the cfDNA sample; determining that the first variant has not been detected at the first locus in the sample based on the plurality of sequence reads; generating a first likelihood value based on a probability that the first variant is absent at the clonal level and a second likelihood value based on a probability that the first variant is not absent at the clonal level; determining a quantitative value based on the first likelihood value and the second likelihood value; comparing the quantitative value to a threshold; and determining that the first variant of interest at the first locus is absent at the clonal level based on the comparison. 66. A computer readable media comprising non-transitory computer executable instruction which, when executed by at least electronic processor perform at least: accessing sequence information generated from a cell-free nucleic acid (cfNA) sample obtained from a subject having a given cancer type; determining that a first target nucleic acid variant at a first genetic locus is not detected in cfNA sample from the sequence information; determining a coverage of the first genetic locus from the sequence information; determining a tumor fraction from the sequence information; determining a probability that the first target nucleic acid variant is not absent at the first genetic locus in the cfNA sample from the coverage and the tumor fraction to generate a quantitative value; and, determining that the first target nucleic acid variant is absent at the first genetic locus in the cfNA sample when the quantitative value differs from a threshold value. 67. A computer readable media comprising non-transitory computer executable instruction which, when executed by at least electronic processor perform at least: accessing sequence information generated from a cell-free nucleic acid (cfNA) sample obtained from a subject; determining that the first target nucleic acid variant is not detected in the cfNA sample from the sequence information to generate a first test result; Attorney Docket No. GH0160WO determining that at least a second target nucleic acid variant is detected in the cfNA sample from the sequence information to generate a second test result; determining a first probability that the first target nucleic acid variant is absent in the cfNA sample given the second test result and/or a second probability that the first target nucleic acid is not absent in the cfNA sample given the second test result; generating a quantitative value using the first probability, the second probability, and/or a ratio thereof; and, determining that the first target nucleic acid variant is absent at the first genetic locus in the cfNA sample when the quantitative value differs from a threshold value. 68. A computer readable media comprising non-transitory computer executable instruction which, when executed by at least electronic processor perform at least: accessing sequence information generated from a cell-free nucleic acid (cfNA) sample obtained from a subject; determining that the first target nucleic acid variant is not detected in the cfNA sample from the sequence information; generating at least one tumor fraction based value; generating at least one mutual exclusivity value; and, determining that the first target nucleic acid variant is absent at the first genetic locus in the cfNA sample using the tumor fraction based value and/or the mutual exclusivity value. 69. The system or computer readable media of any one of the preceding claims, wherein the quantitative value is less than the threshold value. 70. The system or computer readable media of any one of the preceding claims, wherein the quantitative value is greater than the threshold value. 71. The system or computer readable media of any one of the preceding claims, wherein the first and second test results are dependent upon one another. 72. The system or computer readable media of any one of the preceding claims, comprising determining that a plurality of other selected target nucleic variants are absent at one or more other genetic loci. Attorney Docket No. GH0160WO 73. The system or computer readable media of any one of the preceding claims, wherein the quantitative value comprises a log likelihood ratio (LLR) threshold value. 74 The system or computer readable media of any one of the preceding claims, comprising determining that the first target nucleic acid variant is absent at the first genetic locus in a plurality of reference cfNA samples to generate the threshold value. 75. The system or computer readable media of claim 74, wherein the threshold value comprises a clonality or sub-clonality threshold value. 76. The system or computer readable media of any one of the preceding claims, wherein the first target nucleic acid variant comprises a driver mutation. 77. The system or computer readable media of any one of the preceding claims, wherein the instructions further perform at least: outputting one or more therapy recommendations for the subject based upon the determination that the first target nucleic acid variant is absent at the first genetic locus in the cfNA sample. 78. The system or computer readable media of any one of the preceding claims, wherein the instructions further perform at least: estimating a probability of detecting the first target nucleic acid variant at the first genetic locus in the cfNA sample using the tumor fraction and a binomial model. 79. The system or computer readable media of any one of the preceding claims, wherein the instructions further perform at least: determining a maximum mutant allele frequency (epi MAF) for the cfNA sample and using the epi MAF as an estimate of the tumor fraction. 80. The system or computer readable media of any one of the preceding claims, wherein the instructions further perform at least: determining that the first target nucleic acid variant is absent at a clonal level in the cfNA sample. 81. The system or computer readable media of any one of the preceding claims, wherein the instructions further perform at least: generating a first likelihood value based on the first probability and a second likelihood value based on the second probability. 82. The system or computer readable media of any one of the preceding claims, wherein the instructions further perform at least: determining the quantitative value based on the first likelihood value and the second likelihood value. Attorney Docket No. GH0160WO 83. The system or computer readable media of any one of the preceding claims, wherein the instructions further perform at least: generating the first likelihood value and the second likelihood value by determining the tumor fraction estimate of the cfNA sample, wherein the first likelihood value and the second likelihood value is based on the tumor fraction estimate. 84. The system or computer readable media of claim 83, wherein the instructions further perform at least: determining the tumor fraction estimate by determining a maximum mutant allele frequency (epi MAF) of a tumor mutation in the cfNA sample. 85. The system or computer readable media of claim 84, wherein the instructions further perform at least: determining the epi MAF by determining a molecule count associated with the tumor mutation based on the plurality of sequence reads. 86. The system or computer readable media of claim 84, wherein the instructions further perform at least: generating the first likelihood value and the second likelihood value by determining an allele frequency of at least a second variant, wherein the first likelihood value and the second likelihood value are based further on the allele frequency and the epi MAF. 87. The system or computer readable media of claim 86, wherein the instructions further perform at least: comparing the allele frequency with a second threshold that is based on the epi MAF and determining that the first target nucleic acid variant of interest at the first genetic locus is absent at the clonal level based further on the comparison of the MAF with the second threshold. 88. The system or computer readable media of claim 86, wherein the instructions further perform at least: determining the allele frequency by determining a first molecule count associated with the first target nucleic acid variant based on the plurality of sequence reads. 89. The system or computer readable media of claim 86, wherein the instructions further perform at least: determining the quantitative value by accessing covariable information indicating a historical prevalence of one or more variants exhibiting co-occurrence and/or mutual exclusivity with the first variant, wherein the quantitative value is based on the covariable information. 90. The system or computer readable media of claim 89, wherein the instructions further perform at least: determining a prevalence of at least the second target nucleic acid variant in the cfDNA sample, wherein the quantitative value is based further on the covariable information. Attorney Docket No. GH0160WO 91. The system or computer readable media of claim 83, wherein the instructions further perform at least: determining the quantitative value by accessing covariable information indicating a historical prevalence of one or more variants exhibiting co-occurrence and/or mutual exclusivity with the first target nucleic acid variant, wherein the quantitative value is based on the covariable information. 92. The system or computer readable media of claim 91, wherein the instructions further perform at least: determining a prevalence of at least the second target nucleic acid variant in the cfNA sample, wherein the quantitative value is based further on the prevalence of the second target nucleic acid variant. 93. The system or computer readable media of claim 83, wherein the instructions further perform at least: determining a level of confidence that the first target nucleic acid variant is absent at a clonal level in the cfNA sample based on the quantitative value. 94. The system or computer readable media of claim 83, wherein the instructions further perform at least: determining a prevalence of at least the second target nucleic acid variant in the cfNA sample; and adjusting the quantitative value based on the prevalence of at least the second target nucleic acid variant in the cfNA sample. 95. The system or computer readable media of any one of the preceding claims, wherein the ratio comprises a log posterior probability ratio (LPPR) equal to a sum of a log likelihood tumor fraction value, a log likelihood mutual exclusivity value, and a log prior value. 96. The method or system of any one of the preceding claims, further comprising generating a report which optionally includes information on, and/or information derived from, the absence of the first target nucleic acid variant at the first genetic locus in the sample. 97. The method or system of claim 96, further comprising communicating the report to a third party, such as the subject from whom the sample derived or a health care practitioner.
PCT/US2024/039252 2023-07-24 2024-07-24 Significance modeling of clonal-level target variants using methylation detection Pending WO2025024497A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202363515227P 2023-07-24 2023-07-24
US63/515,227 2023-07-24

Publications (1)

Publication Number Publication Date
WO2025024497A1 true WO2025024497A1 (en) 2025-01-30

Family

ID=92296008

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2024/039252 Pending WO2025024497A1 (en) 2023-07-24 2024-07-24 Significance modeling of clonal-level target variants using methylation detection

Country Status (1)

Country Link
WO (1) WO2025024497A1 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8486630B2 (en) 2008-11-07 2013-07-16 Industrial Technology Research Institute Methods for accurate sequence data and modified base position determination
WO2018119452A2 (en) 2016-12-22 2018-06-28 Guardant Health, Inc. Methods and systems for analyzing nucleic acid molecules
US20190316184A1 (en) * 2018-04-14 2019-10-17 Natera, Inc. Methods for cancer detection and monitoring
WO2020160414A1 (en) 2019-01-31 2020-08-06 Guardant Health, Inc. Compositions and methods for isolating cell-free dna
WO2021155241A1 (en) * 2020-01-31 2021-08-05 Guardant Health, Inc. Significance modeling of clonal-level absence of target variants
WO2021202752A1 (en) * 2020-03-31 2021-10-07 Guardant Health, Inc. Determining tumor fraction for a sample based on methyl binding domain calibration data

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8486630B2 (en) 2008-11-07 2013-07-16 Industrial Technology Research Institute Methods for accurate sequence data and modified base position determination
WO2018119452A2 (en) 2016-12-22 2018-06-28 Guardant Health, Inc. Methods and systems for analyzing nucleic acid molecules
US20190316184A1 (en) * 2018-04-14 2019-10-17 Natera, Inc. Methods for cancer detection and monitoring
WO2020160414A1 (en) 2019-01-31 2020-08-06 Guardant Health, Inc. Compositions and methods for isolating cell-free dna
WO2021155241A1 (en) * 2020-01-31 2021-08-05 Guardant Health, Inc. Significance modeling of clonal-level absence of target variants
WO2021202752A1 (en) * 2020-03-31 2021-10-07 Guardant Health, Inc. Determining tumor fraction for a sample based on methyl binding domain calibration data

Non-Patent Citations (20)

* Cited by examiner, † Cited by third party
Title
BLAKELY COLLIN M ET AL: "Evolution and clinical impact of co-occurring genetic alterations in advanced-stage EGFR-mutant lung cancers", NATURE GENETICS, vol. 49, no. 12, 1 December 2017 (2017-12-01), New York, pages 1693 - 1704, XP055819218, ISSN: 1061-4036, Retrieved from the Internet <URL:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5709185/pdf/nihms912822.pdf> DOI: 10.1038/ng.3990 *
BOCK ET AL., NAT BIOTECH, vol. 28, 2010, pages 1106 - 1114
BROWN: "Genomes", 2002, JOHN WILEY & SONS, INC., article "Mutation, Repair, and Recombination"
CAMB, vol. 51, no. 15, 21 February 2015 (2015-02-21), pages 3266 - 3269
CORONEL: "Database Systems: Design, Implementation, & Management, Cengage Learning", 2014
EHRLICH, EPIGENOMICS, vol. 1, 2009, pages 239 - 259
ELMASRI: "Fundamentals of Database Systems, Addison Wesley", 2010
GREER ET AL., CELL, vol. 161, 2015, pages 868 - 878
HON ET AL., GENOME RES., vol. 22, 2012, pages 246 - 258
IURLARO ET AL., GENOME BIOL., vol. 14, 2013, pages R119
KANG ET AL., GENOME BIOL., vol. 18, 2017, pages 53
KANG ET AL.: "Cancer Genome Atlas", GENOME BIOLOGY, vol. 18, 2017, pages 53
KUMAR ET AL., FRONTIERS GENET., vol. 9, 2018, pages 640
KUROSE: "Computer Networking: A Top-Down Approach, Pearson", 2016
MOSS ET AL., NAT COMMUN., vol. 9, 2018, pages 5068
PETERSON: "Cloud Computing Architected: Solution Design Handbook", 2011, RECURSIVE PRESS
SONG ET AL., NAT BIOTECH, vol. 29, 2011, pages 68 - 72
SUN ET AL., BIOESSAYS, vol. 37, 2015, pages 1155 - 62
TUCKER: "Programming Languages", 2006, MCGRAW-HILL
VAISVILA R ET AL.: "EM-seq: Detection of DNA methylation at single base resolution from picograms of DNA", BIORXIV, 2019, Retrieved from the Internet <URL:www.biorxiv.org/content/10.1101/2019.12.20.884692v1>

Similar Documents

Publication Publication Date Title
US12359245B2 (en) Methods and systems for analyzing nucleic acid molecules
JP7573536B2 (en) Compositions and methods for isolating cell-free DNA
JP7696975B2 (en) Tumor mutation burden normalization
EP4504971A1 (en) Detecting the presence of a tumor based on methylation status of cell-free nucleic acid molecules
EP4189111A1 (en) Methods for isolating cell-free dna
JP2024523401A (en) Methods and compositions for copy number information-based tissue origin analysis
WO2025007038A1 (en) Methods for early detection of cancer
WO2025024497A1 (en) Significance modeling of clonal-level target variants using methylation detection
EP4486911A1 (en) Methods for analyzing cytosine methylation and hydroxymethylation
US20250218587A1 (en) Methods and systems for identifying tumor origin
US20250201344A1 (en) Methods and systems for identifying an origin of a variant
US20250250648A1 (en) Probe design for detection of oncogenic viruses
JP2023524681A (en) Methods for sequencing using distributed nucleic acids
US20250243550A1 (en) Minimum residual disease (mrd) detection in early stage cancer using urine
US20250308636A1 (en) Inferring cnvs from the distribution of molecules in hyper partition
US20250101522A1 (en) Brca1 promoter methylation in sporadic breast cancer patients detected by liquid biopsy
US20250246310A1 (en) Genomic and methylation biomarkers for determining patient risk of heart disease and novel genomic and epigenomic drug targets to decrease risk of heart disease and/or improve patient outcome after myocardial infarction or cardiac injury
US20250308629A1 (en) Small variant calling with error-rate based model
US20250364077A1 (en) Generalized probabilistic generative modeling method for analysis of tumor methylated molecules in target capture regions
WO2025235602A1 (en) Predictive, prognostic signatures for immuno-oncology using liquid biopsy
WO2025106837A1 (en) Tumor fraction and outcome association in a real-world non-small cell lung cancer (nsclc) cohort using a methylation-based circulating tumor dna (ctdna) assay
WO2025085784A1 (en) Genomic and methylation biomarkers for determining patient risk of heart disease and novel genomic and epigenomic drug targets to decrease risk of heart disease and/or improve patient outcome after myocardial infarction or cardiac injury
WO2025019254A1 (en) Classification of breast tumors using dna methylation from liquid biopsy
WO2025106796A1 (en) Non-small cell lung cancer (nsclc) histology classification using dna methylation data captured from liquid biopsies
WO2025208044A1 (en) Methods for cancer detection using molecular patterns

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24754527

Country of ref document: EP

Kind code of ref document: A1